Skip to content

Conversation

rjurney
Copy link
Collaborator

@rjurney rjurney commented Feb 16, 2025

This is a sub-PR of #473. It is the code that corresponds to #511.

These changes are necessary to:

  1. Allow users to download the Stack Exchange data dump graphframes.tutorials.donwload
  2. Convert the XML to a Parquet file graphframes.tutorials.stackexchange
  3. Build a test knowledge graph out of the data dump graphframes.tutorials.stackexchange
  4. Run some motifs on that knowledge graph graphframes.tutorials.motif - this is here to provide for future unit testability of tutorials and as a Github browsable reference that matches the Motif Finding tutorial in 3 of 3: Documentation cleanup and update. Added a motif finding tutorial. #511.

In addition:

  1. The Stack Exchange knowledge graph this PR creates can be wired into the unit tests for a more realistic setting in a near future PR by me.

Why are the changes needed?

These changes are needed to make the docs in #511 work. Otherwise that PR's new Motif Finding Tutorial won't work. Merge me first :)

@rjurney rjurney requested a review from WeichenXu123 February 16, 2025 05:38
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is here within the package, as is common in Java projects, to wholly encapsulate the tutorial within the project in a standard way. I don't want to rely on the user having this or that operating or filesystem.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is executable Click CLI to download any stackexchange.com site, I think stackoverflow.com would need a small change, the rest work.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is to provide for a working motif tutorial on Github, where the motif finding tutorial I wrote in Markdown doesn't display correctly.

Copy link
Collaborator Author

@rjurney rjurney Feb 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file builds a knowledge graph nodes / edge files out of the Stack Exchange data dump XML files. I plan to create a knowledge graph construction tutorial and this file is a Github readable version of that future Markdown tutorial.

I want to use the files this creates for extended unit tests with a more realistic workload.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are utilities that would distract from the code in the other files.

@rjurney
Copy link
Collaborator Author

rjurney commented Feb 17, 2025

Closed in favor of a minimized PR #518.

@rjurney
Copy link
Collaborator Author

rjurney commented Feb 17, 2025

Closing in favor of minimized PR.

@rjurney rjurney closed this Feb 17, 2025
@rjurney rjurney deleted the rjurney/motif-tutorial-code branch April 15, 2025 00:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant