Pandas 2.0.0 Compatibility #419
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Pandas 2.0.0 was released today. A new argument called
dtype_backend
was added to theread_csv()
function that appears to affect the default behavior when reading null values.When the respective
master.csv
are read with Pandas 2.0.0, when the value"None"
is written in a string, it appears to now be parsed by default toNaN
. This is problematic in places where themeta_info
object dictionary's propertyadditional edge files
andadditional node files
are assumed to be a string.ogb/linkpropped/dataset.py
:The
.split(",")
function called on these will throwAttributeError: 'float' object has no attribute 'split'
.There are more instances that can cause this exception beyond the two above.
I considered two options - either edit the script and corresponding
master.csv
files to contain the empty string instead of 'None', which are parsed as""
instead ofNaN
, or add the keyword argumentkeep_default_na=False
to instances ofpd.read_csv
where this could be an issue. This keyword argument prevents the"None"
s from being parsed asNaN
s.Seeing as there are more instances of the latter option and would require a larger diff, I opted for the former approach. This involved editing the
make_master_file.py
files in their respective directories. I may have discovered a small inconsistency with the Python code inogb/linkproppred/make_master_file.py
for thehas_edge_attr
property for the ogbl-vessel dataset. Inmake_master_file.py
, this property was set toFalse
, but the committed file in the latest release has this property set toTrue
in the csv file.