-
-
Notifications
You must be signed in to change notification settings - Fork 121
Description
I've run into an issue with extract
and I don't believe this is the intended behaviour.
I'm working with a database with music listening information. Currently it has one large table listens
that contains all information. I'm trying to normalize the database by extracting relevant columns to separate tables (artists
, tracks
, albums
). Not every track has an album.
A simplified demonstration with just track_title
and album_title
columns:
In [1]: import sqlite_utils
In [2]: db = sqlite_utils.Database(memory=True)
In [3]: db["listens"].insert_all([
...: {"id": 1, "track_title": "foo", "album_title": "bar"},
...: {"id": 2, "track_title": "baz", "album_title": None}
...: ], pk="id")
Out[3]: <Table listens (id, track_title, album_title)>
The track in the first row has an album, the second track doesn't. Now I extract album information into a separate column:
In [4]: db["listens"].extract(columns=["album_title"], table="albums", fk_column="album_id")
Out[4]: <Table listens (id, track_title, album_id)>
In [5]: list(db["albums"].rows)
Out[5]: [{'id': 1, 'album_title': 'bar'}, {'id': 2, 'album_title': None}]
In [6]: list(db["listens"].rows)
Out[6]:
[{'id': 1, 'track_title': 'foo', 'album_id': 1},
{'id': 2, 'track_title': 'baz', 'album_id': None}]
This behaves as expected -- the album
table contains entries for both the existing album and the NULL album. The listens
table has a foreign key only for the first row (since the album in the second row was empty).
Now I want to extract the track information as well. Album information belongs to the track so I want to extract both columns to a new table.
In [7]: db["listens"].extract(columns=["track_title", "album_id"], table="tracks", fk_column="track_id")
Out[7]: <Table listens (id, track_id)>
In [8]: list(db["tracks"].rows)
Out[8]:
[{'id': 1, 'track_title': 'foo', 'album_id': 1},
{'id': 2, 'track_title': 'baz', 'album_id': None}]
In [9]: list(db["listens"].rows)
Out[9]: [{'id': 1, 'track_id': 1}, {'id': 2, 'track_id': None}]
Extracting to the tracks
table worked fine (both tracks are present with correct columns). However, the listens
table only has a foreign key to the newly created tracks for the first row, the foreign key in the second row is NULL.
Changing the order of extracts doesn't help.
I poked around in the source a bit and I believe this line (essentially comparing NULL = NULL
) is the problem, but I don't know enough about SQL to create a reliable fix myself.