Skip to content

.extract() doesn't set foreign key when extracted columns contain NULL value #423

@jlieth

Description

@jlieth

I've run into an issue with extract and I don't believe this is the intended behaviour.

I'm working with a database with music listening information. Currently it has one large table listens that contains all information. I'm trying to normalize the database by extracting relevant columns to separate tables (artists, tracks, albums). Not every track has an album.

A simplified demonstration with just track_title and album_title columns:

In [1]: import sqlite_utils

In [2]: db = sqlite_utils.Database(memory=True)

In [3]: db["listens"].insert_all([
   ...:     {"id": 1, "track_title": "foo", "album_title": "bar"},
   ...:     {"id": 2, "track_title": "baz", "album_title": None}
   ...: ], pk="id")
Out[3]: <Table listens (id, track_title, album_title)>

The track in the first row has an album, the second track doesn't. Now I extract album information into a separate column:

In [4]: db["listens"].extract(columns=["album_title"], table="albums", fk_column="album_id")
Out[4]: <Table listens (id, track_title, album_id)>

In [5]: list(db["albums"].rows)
Out[5]: [{'id': 1, 'album_title': 'bar'}, {'id': 2, 'album_title': None}]

In [6]: list(db["listens"].rows)
Out[6]: 
[{'id': 1, 'track_title': 'foo', 'album_id': 1},
 {'id': 2, 'track_title': 'baz', 'album_id': None}]

This behaves as expected -- the album table contains entries for both the existing album and the NULL album. The listens table has a foreign key only for the first row (since the album in the second row was empty).

Now I want to extract the track information as well. Album information belongs to the track so I want to extract both columns to a new table.

In [7]: db["listens"].extract(columns=["track_title", "album_id"], table="tracks", fk_column="track_id")
Out[7]: <Table listens (id, track_id)>

In [8]: list(db["tracks"].rows)
Out[8]: 
[{'id': 1, 'track_title': 'foo', 'album_id': 1},
 {'id': 2, 'track_title': 'baz', 'album_id': None}]

In [9]: list(db["listens"].rows)
Out[9]: [{'id': 1, 'track_id': 1}, {'id': 2, 'track_id': None}]

Extracting to the tracks table worked fine (both tracks are present with correct columns). However, the listens table only has a foreign key to the newly created tracks for the first row, the foreign key in the second row is NULL.

Changing the order of extracts doesn't help.

I poked around in the source a bit and I believe this line (essentially comparing NULL = NULL) is the problem, but I don't know enough about SQL to create a reliable fix myself.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions