[BUG] PostgreQueueDAO.popMessage may pop messages from incorrect queues and return more messages than expected #478
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Pull Request type
NOTE: Please remember to run
./gradlew spotlessApply
to fix any format violations.Changes in this PR
This PR fixes a bug in the PostgreQueueDAO.popMessage() method, where the SQL query does not limit the update to the provided queue_name, which can result in unintentionally popping messages from other queues and returning too many rows.
The issue stems from the SQL used: it updates all messages matching the message_ids returned from a subquery, but does not check the associated queue_name. Since the table uses a composite primary key of (queue_name, message_id), both fields must be provided to uniquely identify a message.
To reproduce the bug:
Note
The
queue_name
name in theRETURNING
is not part of the original query it was added in the sample here to show the issue.This query may return rows like:
"queue_name" "message_id" "priority"
"_callbackFinalizeQueue" "98936225-c0d7-4963-a140-550d0ee4d57f" 0
"_callbackFailureQueue" "856b1b6f-e6ea-4dc7-bd91-6248e5ef4ae9" 0
"_callbackFinalizeQueue" "856b1b6f-e6ea-4dc7-bd91-6248e5ef4ae9" 0
"_callbackFailureQueue" "98936225-c0d7-4963-a140-550d0ee4d57f" 0
We intended to pop messages only from _callbackFailureQueue, but ended up updating rows from multiple queues due to ambiguous use of message_id.
Fixed query (with CTE):
Note
The
queue_name
name in theRETURNING
is not part of the original query it was added in the sample here to show the issue is fixed.This query correctly returns:
"queue_name" "message_id" "priority"
"_callbackFailureQueue" "856b1b6f-e6ea-4dc7-bd91-6248e5ef4ae9" 0
"_callbackFailureQueue" "98936225-c0d7-4963-a140-550d0ee4d57f" 0
Only the intended messages in the specified queue are updated.
Another advantage of using the CTE is that it allows PostgreSQL to optimize the query plan more effectively as the dataset grows.
This new CTE-based approach also ensures that a message hasn't been popped by another process while the CTE was running, preventing unnecessary updates in high-concurrency scenarios.
Issue #369