Skip to content

Conversation

galvana
Copy link
Contributor

@galvana galvana commented Jul 17, 2025

Closes ENG-992

Description Of Changes

Misc performance improvements to dataset traversal

Code Changes

  • Filtering out collections with skip_processing=True when initializing a DatasetGraph
  • Adding edges_by_node dict for O(1) edge by node address lookups
  • Using string values for set lookups to avoid expensive hash operations on objects
  • Adding a deleted_edges_tracker dict to avoid the expensive remaining_edges: Set[Edge] = self.edges.copy()

Steps to Confirm

This one is tricky to verify manually. I'm testing directly with tests/ops/graph/test_graph_traversal.py and implicitly with the rest of our test suite. Dataset traversal is core to a lot of workflows so if there were any issues it would cause other tests to fail.

Pre-Merge Checklist

  • Issue requirements met
  • All CI pipelines succeeded
  • CHANGELOG.md updated
    • Add a db-migration This indicates that a change includes a database migration label to the entry if your change includes a DB migration
    • Add a high-risk This issue suggests changes that have a high-probability of breaking existing code label to the entry if your change includes a high-risk change (i.e. potential for performance impact or unexpected regression) that should be flagged
    • Updates unreleased work already in Changelog, no new entry necessary
  • Followup issues:
    • Followup issues created
    • No followup issues
  • Database migrations:
    • Ensure that your downrev is up to date with the latest revision on main
    • Ensure that your downgrade() migration is correct and works
      • If a downgrade migration is not possible for this change, please call this out in the PR description!
    • No migrations
  • Documentation:
    • Documentation complete, PR opened in fidesdocs
    • Documentation issue created in fidesdocs
    • If there are any new client scopes created as part of the pull request, remember to update public-facing documentation that references our scope registry
    • No documentation updates required

Copy link

vercel bot commented Jul 17, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

2 Skipped Deployments
Name Status Preview Comments Updated (UTC)
fides-plus-nightly ⬜️ Ignored (Inspect) Visit Preview Jul 30, 2025 4:29am
fides-privacy-center ⬜️ Ignored (Inspect) Jul 30, 2025 4:29am

@galvana galvana added the run unsafe ci checks Runs fides-related CI checks that require sensitive credentials label Jul 17, 2025
@galvana galvana marked this pull request as ready for review July 17, 2025 22:24
Copy link

codecov bot commented Jul 18, 2025

Codecov Report

❌ Patch coverage is 86.66667% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.97%. Comparing base (70834d3) to head (9ed4704).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/fides/api/graph/traversal.py 90.56% 4 Missing and 1 partial ⚠️
src/fides/api/graph/graph.py 57.14% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6353      +/-   ##
==========================================
- Coverage   86.98%   86.97%   -0.01%     
==========================================
  Files         454      454              
  Lines       28909    28961      +52     
  Branches     3211     3228      +17     
==========================================
+ Hits        25146    25190      +44     
- Misses       3046     3052       +6     
- Partials      717      719       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Base automatically changed from ENG-908-optimized-bigquery-delete to main July 22, 2025 23:13
@galvana galvana changed the title Optimize dataset traversal ENG-992 Optimize dataset traversal Jul 23, 2025
Copy link
Contributor

@thabofletcher thabofletcher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 🚀 🚀

@galvana galvana merged commit 6238608 into main Jul 30, 2025
17 checks passed
@galvana galvana deleted the ENG-992-optimize-dataset-traversal branch July 30, 2025 04:39
Copy link

cypress bot commented Jul 30, 2025

fides    Run #13179

Run Properties:  status check passed Passed #13179  •  git commit 6238608beb: ENG-992 Optimize dataset traversal (#6353)
Project fides
Branch Review main
Run status status check passed Passed #13179
Run duration 00m 51s
Commit git commit 6238608beb: ENG-992 Optimize dataset traversal (#6353)
Committer Adrian Galvan
View all properties for this run ↗︎

Test results
Tests that failed  Failures 0
Tests that were flaky  Flaky 0
Tests that did not run due to a developer annotating a test with .skip  Pending 0
Tests that did not run due to a failure in a mocha hook  Skipped 0
Tests that passed  Passing 5
⚠️ You've recorded test results over your free plan limit.
Upgrade your plan to view test results.
View all changes introduced in this branch ↗︎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run unsafe ci checks Runs fides-related CI checks that require sensitive credentials
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants