-
Notifications
You must be signed in to change notification settings - Fork 193
Fix #840: Add GraphML serializer #1464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This commit adds a GraphML serializer: ```python def write_graphml( graphs: list[PyGraph | PyDiGraph], keys: list[tuple[str, Domain, str, Type, Any]], path: str, /, compression: str | None = ..., ) -> None: ... ``` `keys` is a list of tuples: id, domain, name of the key, type, and default value. This commit also introduces the `read_graphml_with_keys` function, which returns the key definitions in the same format, along with the list of parsed graphs. The implementation preserves the ids of graphs, nodes, and edges when possible. If some ids conflict, fresh ids are generated in the written GraphML file. The `read_graphml` function has also been updated to store the graph id in the graph attributes, just like node and edge ids are stored in the corresponding attributes. The `write_graphml` function supports gzip compression, as does `read_graphml`. Note that the JSON node-link serializer (the other part of Qiskit#840) was already implemented in Qiskit#1091. Compared to Qiskit#1462: - Keys are passed explicitly instead of being inferred (which allows to use the types `float` and `int`, and to use default values); - Attributes for graphs, nodes, and edges are taken from the weight of elements, instead of relying on callbacks. This allows write_graphml to act as a proper reciprocal of read_graphml. Round-trip tests have been added. - IDs are taken from attributes when possible, instead of being generated from indices. - Multiple graphs can be written to the same file. - Gzip compression is supported. - Tests have been added. Regarding @IvanIsCoding's comment (Qiskit#1462 (comment)), about using https://github.com/jonasbb/petgraph-graphml: - Rustworkx's `graphml.rs` introduces an internal `Graph` data structure, which is used for `read_graphml`. It is natural to have `write_graphml` rely on the same data structure. - `petgraph-graphml` generates ids from indices, which prevents us from preserving ids accross the `read_graphml`/`write_graphml` round trip.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add more comments later, but I agree that reusing the existing code from reading XML is the best approach
src/graphml.rs
Outdated
@@ -428,6 +716,24 @@ impl GraphML { | |||
Ok(()) | |||
} | |||
|
|||
fn get_keys(&self, domain: Domain) -> &IndexMap<String, Key> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use DictMap
instead of IndexMap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function returns the key_for_*
fields of struct GraphML
, which are defined as IndexMap
with the default std::hash::RandomState
. If I change it here to DictMap
, I would also need to change the type in the definition of struct GraphML
, which is not introduced in this PR.
Should I go ahead and change it here, or would it be better to open an issue to update the definition of struct GraphML
in a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DictMap is just IndexMap with a faster hasher. It should be safe to replace
Pull Request Test Coverage Report for Build 15707571026Details
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this is pretty close to being merged but some comments on my opinion of supporting GraphML.
- For
read_graphml
, we don't know which library wrote it so we should do our best to support most things specified in the standard. - For
write_graphml
, I think it's ok to have a simplified version. As long as we write a subset that is valid by the standard and is understood by other libraries, we should be fine. I don't think we should support all the expressiviness of GraphML like serializing multiple graphs.
My vision for this PR is to have code where rustworkx.write_graphml(graph, "graph.xml")
just works.
b"edge" => Ok(Domain::Edge), | ||
b"graph" => Ok(Domain::Graph), | ||
b"all" => Ok(Domain::All), | ||
_ => Err(()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to include a return message explaining why a call to this failed though? Right now that is not possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention was for the failure cause to be known and owned by the caller, who can inject the error context that makes sense for their usage. As shown in this usage, the function returns a Result<..., ()>
, and the responsibility of interpreting or enriching the error lies with the calling code. Typing ensures that this is done properly, since there is no From<()>
defined for the error types we use.
src/graphml.rs
Outdated
pygraph: &StablePyGraph<Ty>, | ||
attrs: &PyObject, | ||
) -> PyResult<Self> { | ||
let mut attrs: Option<std::collections::HashMap<String, Value>> = attrs.extract(py).ok(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use HashMap from https://crates.io/crates/hashbrown or our own DictMap
(which is just an alias for IndexMap
with a fast hasher)
src/graphml.rs
Outdated
@@ -428,6 +716,24 @@ impl GraphML { | |||
Ok(()) | |||
} | |||
|
|||
fn get_keys(&self, domain: Domain) -> &IndexMap<String, Key> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DictMap is just IndexMap with a faster hasher. It should be safe to replace
Another comment for
Why am I mentioning this? For this PR it doesn't matter and it's something we can fix later. But ideally, if we called |
I have replaced all occurrences of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. There are some minor details like adding a docstring, making the function in rustworkx/__init__.py
consistent with others, etc. I will send a PR to fix those later.
For all intents and purposes, this completes 99% of the requirements for serialiizing graphs to XML. It is a great addition and it should be awarded the bounty for #840.
* Fix Qiskit#840: Add GraphML serializer This commit adds a GraphML serializer: ```python def write_graphml( graphs: list[PyGraph | PyDiGraph], keys: list[tuple[str, Domain, str, Type, Any]], path: str, /, compression: str | None = ..., ) -> None: ... ``` `keys` is a list of tuples: id, domain, name of the key, type, and default value. This commit also introduces the `read_graphml_with_keys` function, which returns the key definitions in the same format, along with the list of parsed graphs. The implementation preserves the ids of graphs, nodes, and edges when possible. If some ids conflict, fresh ids are generated in the written GraphML file. The `read_graphml` function has also been updated to store the graph id in the graph attributes, just like node and edge ids are stored in the corresponding attributes. The `write_graphml` function supports gzip compression, as does `read_graphml`. Note that the JSON node-link serializer (the other part of Qiskit#840) was already implemented in Qiskit#1091. Compared to Qiskit#1462: - Keys are passed explicitly instead of being inferred (which allows to use the types `float` and `int`, and to use default values); - Attributes for graphs, nodes, and edges are taken from the weight of elements, instead of relying on callbacks. This allows write_graphml to act as a proper reciprocal of read_graphml. Round-trip tests have been added. - IDs are taken from attributes when possible, instead of being generated from indices. - Multiple graphs can be written to the same file. - Gzip compression is supported. - Tests have been added. Regarding @IvanIsCoding's comment (Qiskit#1462 (comment)), about using https://github.com/jonasbb/petgraph-graphml: - Rustworkx's `graphml.rs` introduces an internal `Graph` data structure, which is used for `read_graphml`. It is natural to have `write_graphml` rely on the same data structure. - `petgraph-graphml` generates ids from indices, which prevents us from preserving ids accross the `read_graphml`/`write_graphml` round trip. * Fix clippy comments * Prefix types with GraphML Suggested by @IvanIsCoding: Qiskit#1464 (comment) * Black * Add release notes * Fix stubs error * Add documentation * Remove read_graphml_with_keys / write_graphml for single graph only * Use `DictMap` everywhere Suggested by @IvanIsCoding: Qiskit#1464 (comment) * rustfmt and clippy * Remove unused math module (ruff check) * Use `from __future__ import annotations` for Python <3.10 * Add stub for `write_graphml` * Remove `read_graphml_with_keys` from documentation * Apply suggestions from code review * Update rustworkx/__init__.py * Apply suggestions from code review --------- Co-authored-by: Ivan Carvalho <8753214+IvanIsCoding@users.noreply.github.com>
This commit adds a GraphML serializer:
keys
is a list of tuples: id, domain, name of the key, type, and default value. This commit also introduces theread_graphml_with_keys
function, which returns the key definitions in the same format, along with the list of parsed graphs.The implementation preserves the ids of graphs, nodes, and edges when possible. If some ids conflict, fresh ids are generated in the written GraphML file. The
read_graphml
function has also been updated to store the graph id in the graph attributes, just like node and edge ids are stored in the corresponding attributes.The
write_graphml
function supports gzip compression, as doesread_graphml
.Note that the JSON node-link serializer (the other part of #840) was already implemented in #1091.
Compared to #1462:
Keys are passed explicitly instead of being inferred (which allows to use the types
float
andint
, and to use default values);Attributes for graphs, nodes, and edges are taken from the weight of elements, instead of relying on callbacks. This allows write_graphml to act as a proper reciprocal of read_graphml. Round-trip tests have been added.
IDs are taken from attributes when possible, instead of being generated from indices.
Multiple graphs can be written to the same file.
Gzip compression is supported.
Tests have been added.
Regarding @IvanIsCoding's
comment (#1462 (comment)), about using https://github.com/jonasbb/petgraph-graphml:
Rustworkx's
graphml.rs
introduces an internalGraph
data structure, which is used forread_graphml
. It is natural to havewrite_graphml
rely on the same data structure.petgraph-graphml
generates ids from indices, which prevents us from preserving ids accross theread_graphml
/write_graphml
round trip.