-
Notifications
You must be signed in to change notification settings - Fork 85
Adding missing table detection for erasures #6414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 2 Skipped Deployments
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Summary
This PR extends the missing table detection functionality from PR #6397 to erasure operations by adding the same error handling pattern to the SQLConnector.mask_data
method that was previously only available in retrieve_data
. The changes ensure consistent behavior between access and erasure operations when encountering missing tables in SQL databases, particularly important for BigQuery scenarios.
The core implementation wraps the SQL execution in mask_data
with a try-catch block that checks table existence when exceptions occur. For collections without dependencies (leaf collections), it raises TableNotFound
to allow graceful skipping. For collections with dependencies, it raises ConnectionException
to prevent data integrity issues by failing the operation entirely.
The PR also updates the BigQuery QA scenario to properly handle erase_after
dependency references when creating test datasets with missing tables, and adds comprehensive test coverage for both leaf and dependency collection scenarios during erasure operations. This ensures the privacy request execution engine maintains proper error handling and data integrity across both access and erasure phases.
Important Files Changed
Files Modified
Filename | Score | Overview |
---|---|---|
src/fides/api/service/connectors/sql_connector.py |
4/5 | Adds missing table detection to mask_data method with try-catch error handling |
tests/ops/service/connectors/test_sql_connector.py |
4/5 | Comprehensive test coverage for mask_data missing table scenarios |
qa/scenarios/bigquery_table_not_found.py |
4/5 | Updates QA scenario to handle erase_after references in dataset modifications |
tests/ops/service/privacy_request/test_bigquery_privacy_requests.py |
4/5 | Integration tests for BigQuery erasure operations with missing tables |
Confidence score: 4/5
- This PR extends existing functionality with well-tested patterns and maintains consistency across access and erasure operations
- Score reflects solid implementation following established patterns, but complexity of error handling logic warrants careful review
- Pay close attention to the error handling logic in sql_connector.py to ensure dependency checking works correctly
Sequence Diagram
sequenceDiagram
participant User
participant QAScenario as "QA Scenario"
participant FidesAPI as "Fides API"
participant SQLConnector as "SQL Connector"
participant BigQueryConnector as "BigQuery Connector"
participant Database as "BigQuery Database"
User->>QAScenario: "Run QA scenario with table parameter"
QAScenario->>QAScenario: "Create modified dataset with renamed table"
QAScenario->>FidesAPI: "Create dataset in Fides"
QAScenario->>FidesAPI: "Create system and connection"
QAScenario->>FidesAPI: "Link dataset to connection"
User->>FidesAPI: "Submit erasure request for customer-1@example.com"
FidesAPI->>SQLConnector: "Execute mask_data for each collection"
loop "For each row to mask"
SQLConnector->>SQLConnector: "Generate update statement"
alt "SQL dry run disabled"
SQLConnector->>Database: "Execute update statement"
Database-->>SQLConnector: "Exception: Table not found"
SQLConnector->>SQLConnector: "Get qualified table name"
SQLConnector->>BigQueryConnector: "Check if table exists"
BigQueryConnector->>Database: "Query table metadata"
Database-->>BigQueryConnector: "Table does not exist"
BigQueryConnector-->>SQLConnector: "Return false"
alt "Collection has outgoing dependencies"
SQLConnector-->>FidesAPI: "Raise ConnectionException (hard error)"
FidesAPI-->>User: "Request fails with error"
else "Collection is leaf node"
SQLConnector-->>FidesAPI: "Raise TableNotFound (skip collection)"
FidesAPI->>FidesAPI: "Skip collection and continue"
FidesAPI-->>User: "Request completes successfully"
end
else "SQL dry run enabled"
SQLConnector->>SQLConnector: "Log SQL statement instead of executing"
end
end
4 files reviewed, no comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Tested, including the recent change to verify BIGQUERY_DATASET
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #6414 +/- ##
==========================================
+ Coverage 87.09% 87.11% +0.01%
==========================================
Files 455 455
Lines 29248 29254 +6
Branches 3242 3243 +1
==========================================
+ Hits 25475 25485 +10
+ Misses 3048 3044 -4
Partials 725 725 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
fides
|
Project |
fides
|
Branch Review |
main
|
Run status |
|
Run duration | 01m 03s |
Commit |
|
Committer | Adrian Galvan |
View all properties for this run ↗︎ |
Test results | |
---|---|
|
0
|
|
0
|
|
0
|
|
0
|
|
5
|
View all changes introduced in this branch ↗︎ |
Description Of Changes
Continuation of #6397.
Code Changes
SQLConnector.mask_data
Steps to Confirm
python qa bigquery_table_not_found setup
QA scenariocustomer-1@example.com
customer_profile
collection should be skipped for the access steps and complete (with 0 results) for the erasure stepspython qa bigquery_table_not_found setup --table customer
QA scenario (note thecustomer
param)customer-1@example.com
Pre-Merge Checklist
CHANGELOG.md
updatedmain
downgrade()
migration is correct and works