Skip to content

Conversation

galvana
Copy link
Contributor

@galvana galvana commented Aug 5, 2025

Closes ENG-1073

Description Of Changes

  • Introduces first-class DB models to support custom taxonomies and cross-taxonomy tagging:
    • Taxonomy: defines a taxonomy type and its allowed targets via applies_to
    • TaxonomyAllowedUsage: declares which target types/taxonomies a taxonomy can apply to
    • TaxonomyElement: hierarchical elements within a taxonomy (supports parent/child)
    • TaxonomyUsage: records applications of one taxonomy’s element to another element
  • Adds Alembic migration to create new tables, indexes, FKs, and seed system-managed legacy taxonomies (data_categories, data_uses, data_subjects, system_groups).
  • Adds a TaxonomyService with centralized validation and hierarchy activation/deactivation behavior, and a base handler abstraction with a legacy handler implementation.
  • Routes legacy taxonomy endpoints (data_category, data_use, data_subject) through the new TaxonomyService to unify validation and reactivation behavior.
  • Comprehensive tests for the new models, service behavior, and constraints.

Potential caveats:

  • Removing an applies_to target that’s currently referenced by taxonomy_usage will fail due to FK constraints; usages must be removed first.
  • Deleting a parent TaxonomyElement with existing children is restricted at the DB level; delete or move children first.
  • The service layer currently supports legacy taxonomies only for API operations; CRUD for fully custom taxonomies/elements/usages is not yet exposed via public endpoints.

Code Changes

  • New models:
    • src/fides/api/models/taxonomy.py (adds Taxonomy, TaxonomyAllowedUsage, TaxonomyElement, TaxonomyUsage)
  • Migration:
    • src/fides/api/alembic/migrations/versions/3baf42d251a6_add_generic_taxonomy_models.py
  • API changes:
    • src/fides/api/api/v1/endpoints/generic_overrides.py (create/update for data uses/categories/subjects now use TaxonomyService; legacy inline validation removed)
  • DI and model registration:
    • src/fides/api/service/deps.py (new get_taxonomy_service)
    • src/fides/api/db/base.py (imports new models)
  • New service layer:
    • src/fides/service/taxonomy/__init__.py
    • src/fides/service/taxonomy/handlers/base.py
    • src/fides/service/taxonomy/handlers/legacy_handler.py
    • src/fides/service/taxonomy/taxonomy_service.py
    • src/fides/service/taxonomy/utils.py
  • Dataset updates:
    • .fides/db_dataset.yml (alignments for taxonomy-related filtering)
  • Tests:
    • tests/ops/models/test_taxonomy.py
    • tests/service/test_taxonomy_service.py

Steps to Confirm

  1. Run targeted tests for models and service:
    • docker exec fides bash -c "pytest -q tests/ops/models/test_taxonomy.py"
    • docker exec fides bash -c "pytest -q tests/service/test_taxonomy_service.py"
  2. Exercise legacy taxonomy endpoints (requires backend running and auth):
    • POST /api/v1/data_category with a new category; expect 201 and generated/validated key behavior.
    • PUT /api/v1/data_category to toggle active and verify parent/child cascade semantics via subsequent GETs.
    • Repeat for /api/v1/data_use and /api/v1/data_subject.
  3. Database checks after migration:
    • Verify tables exist: taxonomy, taxonomy_allowed_usage, taxonomy_element, taxonomy_usage.
    • Confirm seeded rows exist for the legacy taxonomies in taxonomy.
  4. Confirm FK behavior:
    • Attempt to remove an applies_to target that’s in use (should fail).
    • Attempt to delete a TaxonomyElement with children (should be restricted).

Pre-Merge Checklist

  • Issue requirements met
  • All CI pipelines succeeded
  • CHANGELOG.md updated
    • Add a db-migration label to the entry
    • Add a high-risk label if you believe the FK constraints or activation/deactivation logic could have performance/regression risk
    • Updates unreleased work already in Changelog, no new entry necessary
  • Followup issues:
    • Followup issues created
    • No followup issues
      • Suggested follow-ups:
        • Expose CRUD endpoints for custom taxonomies/elements/usages
        • Document new tables, relationships, and service behaviors
        • UI support for managing custom taxonomies
  • Database migrations:
    • Ensure that your downrev is up to date with the latest revision on main
    • Ensure that your downgrade() migration is correct and works
      • Note: Downgrade drops all four new tables and seeded data
    • No migrations
  • Documentation:

Copy link

vercel bot commented Aug 5, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

2 Skipped Deployments
Project Deployment Preview Comments Updated (UTC)
fides-plus-nightly Ignored Ignored Preview Aug 19, 2025 9:45pm
fides-privacy-center Ignored Ignored Aug 19, 2025 9:45pm

greptile-apps[bot]

This comment was marked as outdated.

Copy link

codecov bot commented Aug 5, 2025

Codecov Report

❌ Patch coverage is 93.40278% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.65%. Comparing base (5a6c9fd) to head (daa6282).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...rc/fides/api/api/v1/endpoints/generic_overrides.py 75.00% 4 Missing and 3 partials ⚠️
src/fides/service/taxonomy/handlers/base.py 76.92% 6 Missing ⚠️
src/fides/service/taxonomy/utils.py 90.90% 2 Missing and 3 partials ⚠️
src/fides/api/models/taxonomy.py 98.70% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6421      +/-   ##
==========================================
+ Coverage   87.61%   87.65%   +0.04%     
==========================================
  Files         459      465       +6     
  Lines       29494    29723     +229     
  Branches     3280     3303      +23     
==========================================
+ Hits        25841    26054     +213     
- Misses       2948     2960      +12     
- Partials      705      709       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@galvana
Copy link
Contributor Author

galvana commented Aug 7, 2025

@greptileai

greptile-apps[bot]

This comment was marked as outdated.

@galvana
Copy link
Contributor Author

galvana commented Aug 9, 2025

@greptileai

greptile-apps[bot]

This comment was marked as outdated.

@galvana galvana requested a review from erosselli August 12, 2025 00:09
@@ -354,133 +346,7 @@ def clean_datasets(
)


def activate_taxonomy_parents(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for moving all these to a service class! 😌

return validate_and_create_taxonomy(db, model, validation_schema, data)

return validate_and_create_taxonomy(db, model, validation_schema, data)
# Legacy validation functions removed - functionality moved to TaxonomyService
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this comment -- it reads like cursor explaining its changes lol

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You right 😄



class Taxonomy(Base, FidesBase):
"""The SQL model for taxonomy resources."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we maybe add some info to the docstring about how the out-of-the-box taxonomies relate to this model? so that new devs (or us in 2 years lol) running into this code don't have to search around for an explanation

Comment on lines 41 to 43
id = Column(
String(255), nullable=False, index=False, default=FidesBase.generate_uuid
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usually we inherit the id column from FidesBase right? is there no issue with re-defining the id column?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also don't we need a name column?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine for the id column, I'm just overriding the base definition so it's not treated as a primary key. As for name, it's inherited from FidesBase.

Copy link
Contributor

@erosselli erosselli Aug 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay I totally missed that on my first read. So what's the primary key of this table?

Comment on lines 151 to 154
__table_args__ = (
# Simple two-column index for queries
Index("ix_allowed_usage_lookup", "source_taxonomy_key", "target_type"),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need this index given that you've defined (source_taxonomy_key, target_type) as a composite PK with index=True ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! I got rid of the extra index definition

Comment on lines 17 to 20
@abstractmethod
def get_model(self, taxonomy_type: str) -> Type:
pass

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be able to type this more strictly; maybe we can define a TaxonomyType union type or a base class that other types inherit from?

Actually what I'd really like to know is if we can actually use overloads to do something like the following,(ab?)using the fact that a string literal is actually a type:

@overloads
def get_model(self, taxonomy_type: Literal['data_category']) -> DataCategory:
@overloads
def get_model(self, taxonomy_type: Literal['data_use'])->DataUse:
.... other overloads

so if you call get_model('data_category') the type checker will actually know the return type is DataCategory

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idk if this would be done here or below in the concrete class

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice! I didn't know about @overloads in Python

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I too didn't know they existed but whenever I find myself thinking "In Typescript I could do this way cleaner" I go and look up if there's a way to do it in Python 😆 and there was!

Copy link
Contributor Author

@galvana galvana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the recommendations, I think it's a lot cleaner now!

Comment on lines 41 to 43
id = Column(
String(255), nullable=False, index=False, default=FidesBase.generate_uuid
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine for the id column, I'm just overriding the base definition so it's not treated as a primary key. As for name, it's inherited from FidesBase.


# Create TaxonomyAllowedUsage objects for each applies_to value
allowed_usages = [
TaxonomyAllowedUsage( # type: ignore
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was getting this error from mypy

src/fides/api/models/taxonomy.py:111: error: Unexpected column "source_taxonomy_key" for model
"TaxonomyAllowedUsage"  [misc]
                        TaxonomyAllowedUsage(
                        ^
src/fides/api/models/taxonomy.py:111: error: Unexpected column "target_type" for model
"TaxonomyAllowedUsage"  [misc]
                        TaxonomyAllowedUsage(
                        ^

But I just found out you could do this instead (I was missing the Column[str] annotations)

source_taxonomy_key: Column[str] = Column(
    String,
    ForeignKey("taxonomy.fides_key", ondelete="CASCADE"),
    primary_key=True,
    index=True,
)
target_type: Column[str] = Column(
    String, primary_key=True
)  # Can be "system", "dataset", OR a taxonomy key like "data_categories"

Comment on lines 151 to 154
__table_args__ = (
# Simple two-column index for queries
Index("ix_allowed_usage_lookup", "source_taxonomy_key", "target_type"),
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! I got rid of the extra index definition

Comment on lines 17 to 20
@abstractmethod
def get_model(self, taxonomy_type: str) -> Type:
pass

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice! I didn't know about @overloads in Python

return validate_and_create_taxonomy(db, model, validation_schema, data)

return validate_and_create_taxonomy(db, model, validation_schema, data)
# Legacy validation functions removed - functionality moved to TaxonomyService
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You right 😄

@galvana galvana requested a review from erosselli August 18, 2025 05:27
@galvana galvana merged commit 047dc43 into main Aug 19, 2025
17 checks passed
@galvana galvana deleted the ENG-1073-db-models-for-custom-taxonomies branch August 19, 2025 21:51
Copy link

cypress bot commented Aug 19, 2025

fides    Run #13252

Run Properties:  status check passed Passed #13252  •  git commit 047dc431c1: DB models for custom taxonomies (#6421)
Project fides
Branch Review main
Run status status check passed Passed #13252
Run duration 00m 54s
Commit git commit 047dc431c1: DB models for custom taxonomies (#6421)
Committer Adrian Galvan
View all properties for this run ↗︎

Test results
Tests that failed  Failures 0
Tests that were flaky  Flaky 0
Tests that did not run due to a developer annotating a test with .skip  Pending 0
Tests that did not run due to a failure in a mocha hook  Skipped 0
Tests that passed  Passing 5
View all changes introduced in this branch ↗︎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants