Skip to content

Backfill process never completes during periods of high INSERT query rates #583

@santhosh-programmer

Description

@santhosh-programmer

Hello team,

For testing, I tried this example 14_add_reviews_table.json to create a reviews table and then 17_add_rating_column.json to add a rating column.

I have inserted 50k dummy rows initially to this table.

Now to test alter column operation, i started 18_change_column_type.json migration and i ran pgbench parallelly as follows,

pgbench -f insert_script.sql -c 1 -T 2147483647 postgres 
insert_script.sql
INSERT INTO public.reviews (username, product, review, rating)
SELECT 
    'user_' || generate_series(1, 10000), 
    'Product_' || generate_series(1, 10000), 
    'This is a dummy review for Product_' || generate_series(1, 10000),
    (random() * 5)::integer::text;

Since the table is loaded with high INSERT query rates from pgbench, the backfill procedure is not ending. Though initially 50k rows was present in the table, the backfilling continued for all the newer rows inserted by pgbench.

Since the trigger is created before starting backfill, i think it is redundant to backfill the newer rows which are inserted after trigger creation as backfill for this rows will be taken care by the trigger created earlier. Below is the console screenshot where the backfilling continued for 6 million+ rows created with pgbench and running infinitely.

image

In a large scale production environment, we can expect high volume of INSERT queries because multiple connections parallelly sends the INSERT queries. In such case, what should be done to avoid infinite backfilling problem? Should we block INSERT queries by manually acquiring some LOCK until backfill is completed ? If so, does it violates the concept of zero-downtime ?

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions