Skip to content

Conversation

thierry-martinez
Copy link
Contributor

This commit adds a GraphML serializer:

def write_graphml(
    graphs: list[PyGraph | PyDiGraph],
    keys: list[tuple[str, Domain, str, Type, Any]],
    path: str,
    /,
    compression: str | None = ...,
) -> None: ...

keys is a list of tuples: id, domain, name of the key, type, and default value. This commit also introduces the
read_graphml_with_keys function, which returns the key definitions in the same format, along with the list of parsed graphs.

The implementation preserves the ids of graphs, nodes, and edges when possible. If some ids conflict, fresh ids are generated in the written GraphML file. The read_graphml function has also been updated to store the graph id in the graph attributes, just like node and edge ids are stored in the corresponding attributes.

The write_graphml function supports gzip compression, as does read_graphml.

Note that the JSON node-link serializer (the other part of #840) was already implemented in #1091.

Compared to #1462:

  • Keys are passed explicitly instead of being inferred (which allows to use the types float and int, and to use default values);

  • Attributes for graphs, nodes, and edges are taken from the weight of elements, instead of relying on callbacks. This allows write_graphml to act as a proper reciprocal of read_graphml. Round-trip tests have been added.

  • IDs are taken from attributes when possible, instead of being generated from indices.

  • Multiple graphs can be written to the same file.

  • Gzip compression is supported.

  • Tests have been added.

Regarding @IvanIsCoding's
comment (#1462 (comment)), about using https://github.com/jonasbb/petgraph-graphml:

  • Rustworkx's graphml.rs introduces an internal Graph data structure, which is used for read_graphml. It is natural to have write_graphml rely on the same data structure.

  • petgraph-graphml generates ids from indices, which prevents us from preserving ids accross the read_graphml/write_graphml round trip.

This commit adds a GraphML serializer:

```python
def write_graphml(
    graphs: list[PyGraph | PyDiGraph],
    keys: list[tuple[str, Domain, str, Type, Any]],
    path: str,
    /,
    compression: str | None = ...,
) -> None: ...
```

`keys` is a list of tuples: id, domain, name of the key, type, and
default value. This commit also introduces the
`read_graphml_with_keys` function, which returns the key definitions
in the same format, along with the list of parsed graphs.

The implementation preserves the ids of graphs, nodes, and edges when
possible. If some ids conflict, fresh ids are generated in the written
GraphML file. The `read_graphml` function has also been updated to
store the graph id in the graph attributes, just like node and edge
ids are stored in the corresponding attributes.

The `write_graphml` function supports gzip compression, as does
`read_graphml`.

Note that the JSON node-link serializer (the other part of Qiskit#840) was
already implemented in Qiskit#1091.

Compared to Qiskit#1462:

- Keys are passed explicitly instead of being inferred (which allows
to use the types `float` and `int`, and to use default values);

- Attributes for graphs, nodes, and edges are taken from the weight of elements, instead of relying on callbacks. This allows write_graphml to act as a proper reciprocal of read_graphml. Round-trip tests have been added.

- IDs are taken from attributes when possible, instead of being generated from indices.

- Multiple graphs can be written to the same file.

- Gzip compression is supported.

- Tests have been added.

Regarding @IvanIsCoding's
comment (Qiskit#1462 (comment)),
about using https://github.com/jonasbb/petgraph-graphml:

- Rustworkx's `graphml.rs` introduces an internal `Graph` data
structure, which is used for `read_graphml`. It is natural to
have `write_graphml` rely on the same data structure.

- `petgraph-graphml` generates ids from indices, which prevents us
from preserving ids accross the `read_graphml`/`write_graphml` round
trip.
@CLAassistant
Copy link

CLAassistant commented Jun 9, 2025

CLA assistant check
All committers have signed the CLA.

Copy link
Collaborator

@IvanIsCoding IvanIsCoding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add more comments later, but I agree that reusing the existing code from reading XML is the best approach

src/graphml.rs Outdated
@@ -428,6 +716,24 @@ impl GraphML {
Ok(())
}

fn get_keys(&self, domain: Domain) -> &IndexMap<String, Key> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use DictMap instead of IndexMap

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function returns the key_for_* fields of struct GraphML, which are defined as IndexMap with the default std::hash::RandomState. If I change it here to DictMap, I would also need to change the type in the definition of struct GraphML, which is not introduced in this PR.

Should I go ahead and change it here, or would it be better to open an issue to update the definition of struct GraphML in a separate PR?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DictMap is just IndexMap with a faster hasher. It should be safe to replace

@1ucian0 1ucian0 linked an issue Jun 10, 2025 that may be closed by this pull request
@coveralls
Copy link

coveralls commented Jun 11, 2025

Pull Request Test Coverage Report for Build 15707571026

Details

  • 447 of 487 (91.79%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.07%) to 95.211%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/graphml.rs 442 482 91.7%
Totals Coverage Status
Change from base Build 15670521294: -0.07%
Covered Lines: 19385
Relevant Lines: 20360

💛 - Coveralls

Copy link
Collaborator

@IvanIsCoding IvanIsCoding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this is pretty close to being merged but some comments on my opinion of supporting GraphML.

  • For read_graphml, we don't know which library wrote it so we should do our best to support most things specified in the standard.
  • For write_graphml, I think it's ok to have a simplified version. As long as we write a subset that is valid by the standard and is understood by other libraries, we should be fine. I don't think we should support all the expressiviness of GraphML like serializing multiple graphs.

My vision for this PR is to have code where rustworkx.write_graphml(graph, "graph.xml") just works.

b"edge" => Ok(Domain::Edge),
b"graph" => Ok(Domain::Graph),
b"all" => Ok(Domain::All),
_ => Err(()),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to include a return message explaining why a call to this failed though? Right now that is not possible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention was for the failure cause to be known and owned by the caller, who can inject the error context that makes sense for their usage. As shown in this usage, the function returns a Result<..., ()>, and the responsibility of interpreting or enriching the error lies with the calling code. Typing ensures that this is done properly, since there is no From<()> defined for the error types we use.

src/graphml.rs Outdated
pygraph: &StablePyGraph<Ty>,
attrs: &PyObject,
) -> PyResult<Self> {
let mut attrs: Option<std::collections::HashMap<String, Value>> = attrs.extract(py).ok();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use HashMap from https://crates.io/crates/hashbrown or our own DictMap (which is just an alias for IndexMap with a fast hasher)

src/graphml.rs Outdated
@@ -428,6 +716,24 @@ impl GraphML {
Ok(())
}

fn get_keys(&self, domain: Domain) -> &IndexMap<String, Key> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DictMap is just IndexMap with a faster hasher. It should be safe to replace

@IvanIsCoding
Copy link
Collaborator

Another comment for DictMap vs IndexMap vs HashMap:

  • HashMap has a non-deterministic iteration order
  • IndexMap preservers insertion order when iterating
  • DictMap is just an alias we have for IndexMap with a faster hasher

Why am I mentioning this? For this PR it doesn't matter and it's something we can fix later. But ideally, if we called rustworkx.write_graphml twice for the same graph, we'd want to get an identical file. With HashMap, that might not be true as the order within the XML could be swapped! Keep that in mind.

@thierry-martinez
Copy link
Contributor Author

I have replaced all occurrences of IndexMap and HashMap with DictMap, as you suggested. Thank you!

Copy link
Collaborator

@IvanIsCoding IvanIsCoding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. There are some minor details like adding a docstring, making the function in rustworkx/__init__.py consistent with others, etc. I will send a PR to fix those later.

For all intents and purposes, this completes 99% of the requirements for serialiizing graphs to XML. It is a great addition and it should be awarded the bounty for #840.

@IvanIsCoding IvanIsCoding enabled auto-merge June 17, 2025 11:54
@IvanIsCoding IvanIsCoding added this pull request to the merge queue Jun 17, 2025
Merged via the queue into Qiskit:main with commit fb530db Jun 17, 2025
31 checks passed
SILIZ4 pushed a commit to SILIZ4/rustworkx that referenced this pull request Jul 4, 2025
* Fix Qiskit#840: Add GraphML serializer

This commit adds a GraphML serializer:

```python
def write_graphml(
    graphs: list[PyGraph | PyDiGraph],
    keys: list[tuple[str, Domain, str, Type, Any]],
    path: str,
    /,
    compression: str | None = ...,
) -> None: ...
```

`keys` is a list of tuples: id, domain, name of the key, type, and
default value. This commit also introduces the
`read_graphml_with_keys` function, which returns the key definitions
in the same format, along with the list of parsed graphs.

The implementation preserves the ids of graphs, nodes, and edges when
possible. If some ids conflict, fresh ids are generated in the written
GraphML file. The `read_graphml` function has also been updated to
store the graph id in the graph attributes, just like node and edge
ids are stored in the corresponding attributes.

The `write_graphml` function supports gzip compression, as does
`read_graphml`.

Note that the JSON node-link serializer (the other part of Qiskit#840) was
already implemented in Qiskit#1091.

Compared to Qiskit#1462:

- Keys are passed explicitly instead of being inferred (which allows
to use the types `float` and `int`, and to use default values);

- Attributes for graphs, nodes, and edges are taken from the weight of elements, instead of relying on callbacks. This allows write_graphml to act as a proper reciprocal of read_graphml. Round-trip tests have been added.

- IDs are taken from attributes when possible, instead of being generated from indices.

- Multiple graphs can be written to the same file.

- Gzip compression is supported.

- Tests have been added.

Regarding @IvanIsCoding's
comment (Qiskit#1462 (comment)),
about using https://github.com/jonasbb/petgraph-graphml:

- Rustworkx's `graphml.rs` introduces an internal `Graph` data
structure, which is used for `read_graphml`. It is natural to
have `write_graphml` rely on the same data structure.

- `petgraph-graphml` generates ids from indices, which prevents us
from preserving ids accross the `read_graphml`/`write_graphml` round
trip.

* Fix clippy comments

* Prefix types with GraphML

Suggested by @IvanIsCoding: Qiskit#1464 (comment)

* Black

* Add release notes

* Fix stubs error

* Add documentation

* Remove read_graphml_with_keys / write_graphml for single graph only

* Use `DictMap` everywhere

Suggested by @IvanIsCoding: Qiskit#1464 (comment)

* rustfmt and clippy

* Remove unused math module (ruff check)

* Use `from __future__ import annotations` for Python <3.10

* Add stub for `write_graphml`

* Remove `read_graphml_with_keys` from documentation

* Apply suggestions from code review

* Update rustworkx/__init__.py

* Apply suggestions from code review

---------

Co-authored-by: Ivan Carvalho <8753214+IvanIsCoding@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add reverse direction serialization
4 participants