Skip to content
This repository was archived by the owner on Apr 26, 2024. It is now read-only.
This repository was archived by the owner on Apr 26, 2024. It is now read-only.

psycopg2.errors.UniqueViolation: could not create unique index "receipts_graph_unique_index" when upgrading from <1.68.0 to >=1.70.0 #14406

@squahtx

Description

@squahtx

Seen as #14123

When upgrading Synapse from a version older than 1.68.0, the receipts_graph_unique_index background update may fail with

2022-10-10 16:50:14,874 - synapse.storage.background_updates - 428 - INFO - background_updates-0- Starting update batch on background update 'receipts_graph_unique_index'
2022-10-10 16:50:14,877 - synapse.storage.background_updates - 620 - INFO - background_updates-0- Adding index receipts_graph_unique_index to receipts_graph
2022-10-10 16:50:14,949 - synapse.storage.background_updates - 299 - ERROR - background_updates-0- Error doing update
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/synapse/storage/background_updates.py", line 291, in run_background_updates
    result = await self.do_next_background_update(sleep)
  File "/usr/local/lib/python3.9/site-packages/synapse/storage/background_updates.py", line 421, in do_next_background_update
    await self._do_background_update(desired_duration_ms)
  File "/usr/local/lib/python3.9/site-packages/synapse/storage/background_updates.py", line 464, in _do_background_update
    items_updated = await update_handler(progress, batch_size)
  File "/usr/local/lib/python3.9/site-packages/synapse/storage/background_updates.py", line 621, in updater
    await self.db_pool.runWithConnection(runner)
  File "/usr/local/lib/python3.9/site-packages/synapse/storage/database.py", line 976, in runWithConnection
    return await make_deferred_yieldable(
  File "/usr/local/lib/python3.9/site-packages/twisted/python/threadpool.py", line 244, in inContext
    result = inContext.theWork()  # type: ignore[attr-defined]
  File "/usr/local/lib/python3.9/site-packages/twisted/python/threadpool.py", line 260, in <lambda>
    inContext.theWork = lambda: context.call(  # type: ignore[attr-defined]
  File "/usr/local/lib/python3.9/site-packages/twisted/python/context.py", line 117, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/usr/local/lib/python3.9/site-packages/twisted/python/context.py", line 82, in callWithContext
    return func(*args, **kw)
  File "/usr/local/lib/python3.9/site-packages/twisted/enterprise/adbapi.py", line 282, in _runWithConnection
    result = func(conn, *args, **kw)
  File "/usr/local/lib/python3.9/site-packages/synapse/storage/database.py", line 969, in inner_func
    return func(db_conn, *args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/synapse/storage/background_updates.py", line 572, in create_index_psql
    c.execute(sql)
  File "/usr/local/lib/python3.9/site-packages/synapse/storage/database.py", line 389, in execute
    self._do_execute(self.txn.execute, sql, *args)
  File "/usr/local/lib/python3.9/site-packages/synapse/storage/database.py", line 432, in _do_execute
    return func(sql, *args, **kwargs)
psycopg2.errors.UniqueViolation: could not create unique index "receipts_graph_unique_index"
DETAIL:  Key (room_id, receipt_type, user_id)=(!watercooler-v9:maunium.net, m.read, @cat:feline.support) is duplicated.

The series of schema deltas involved are:

21/receipts.sql

  • Creates the receipts_graph(room_id, receipt_type, user_id, event_ids, data) table with CONSTRAINT receipts_graph_uniqueness UNIQUE (room_id, receipt_type, user_id)
    ie. there is one receipt for each user for each room.

72/07thread_receipts.sql.postgres (1.68.0)

  • Adds a thread_id column to receipts_graph
  • Adds a per-thread unique constraint: CONSTRAINT receipts_graph_uniqueness_thread UNIQUE (room_id, receipt_type, user_id, thread_id);
    ie. there is one receipt for each user for each thread in a room.
  • At this point, per-thread receipts don't work yet, because the original (room_id, receipt_type, user_id) constraint is too restrictive.

72/08thread_receipts.sql (1.68.0)

  • Queues up the receipts_graph_unique_index background update, which adds a (room_id, receipt_type, user_id) constraint WHERE thread_id IS NULL.
    ie. there is one non-thread receipt for each user for each room.

73/08thread_receipts_non_null.sql.postgres (1.70.0)

  • Drops the original, non-thread-aware, receipts_graph_uniqueness constraint, allowing thread receipts to work.

sqlite takes a similar, equally confusing path.

The window where there is no unique constraint

Since background updates are run asynchronously, the receipts_graph_unique_index background update may run after the last schema delta, leaving a window where there is no unique constraint on (room_id, receipt_type, user_id) for NULL thread_ids.

Unsafe upserts

But that isn't the bug. We have logic to deal with this window. See

UNIQUE_INDEX_BACKGROUND_UPDATES = {
"user_ips": "user_ips_device_unique_index",
"device_lists_remote_extremeties": "device_lists_remote_extremeties_unique_idx",
"device_lists_remote_cache": "device_lists_remote_cache_unique_idx",
"event_search": "event_search_event_id_idx",
"local_media_repository_thumbnails": "local_media_repository_thumbnails_method_idx",
"remote_media_cache_thumbnails": "remote_media_repository_thumbnails_method_idx",
"event_push_summary": "event_push_summary_unique_index2",
"receipts_linearized": "receipts_linearized_unique_index",
"receipts_graph": "receipts_graph_unique_index",
}
.
When one of these background updates is in progress, all our simple_upsert* operations are done manually without relying on unique constraints.
And we don't upsert into receipts_graph with handwritten SQL anywhere.

Emulated upsert internals

The emulated upsert first tries an UPDATE, then an INSERT if the UPDATE modified 0 rows.
The default isolation level in Synapse is REPEATABLE READ, which does not prevent the race where two upserts try to insert the same row at the same time.
But we've already thought of this and lock the entire table when doing the emulated upsert:

if lock:
self.engine.lock_table(txn, table)

Except the locking is controlled by a parameter... and we've left it as False:

self.db_pool.simple_upsert_txn(
txn,
table="receipts_graph",
keyvalues=keyvalues,
values={
"event_ids": json_encoder.encode(event_ids),
"data": json_encoder.encode(data),
},
where_clause=where_clause,
# receipts_graph has a unique constraint on
# (user_id, room_id, receipt_type), so no need to lock
lock=False,
)


In summary, there's a window where there is no non-thread unique constraint on receipts_graph and a race where we try to insert two new rows at the same time for the same (room_id, receipt_type, user_id).

The same probably applies to receipts_linearized.

Metadata

Metadata

Assignees

Labels

A-Background-UpdatesFilling in database columns, making the database eventually up-to-dateA-DatabaseDB stuff like queries, migrations, new/remove columns, indexes, unexpected entries in the dbO-OccasionalAffects or can be seen by some users regularly or most users rarelyS-MinorBlocks non-critical functionality, workarounds exist.T-DefectBugs, crashes, hangs, security vulnerabilities, or other reported issues.X-Release-BlockerMust be resolved before making a release

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions