-
Notifications
You must be signed in to change notification settings - Fork 106
Description
Hello team,
For testing, I tried this example 14_add_reviews_table.json to create a reviews
table and then 17_add_rating_column.json to add a rating
column.
I have inserted 50k dummy rows initially to this table.
Now to test alter column
operation, i started 18_change_column_type.json migration and i ran pgbench
parallelly as follows,
pgbench -f insert_script.sql -c 1 -T 2147483647 postgres
insert_script.sql
INSERT INTO public.reviews (username, product, review, rating)
SELECT
'user_' || generate_series(1, 10000),
'Product_' || generate_series(1, 10000),
'This is a dummy review for Product_' || generate_series(1, 10000),
(random() * 5)::integer::text;
Since the table is loaded with high INSERT
query rates from pgbench
, the backfill procedure is not ending. Though initially 50k rows was present in the table, the backfilling continued for all the newer rows inserted by pgbench.
Since the trigger
is created before starting backfill, i think it is redundant to backfill the newer rows which are inserted after trigger
creation as backfill for this rows will be taken care by the trigger
created earlier. Below is the console screenshot where the backfilling continued for 6 million+ rows created with pgbench
and running infinitely.
In a large scale production environment, we can expect high volume of INSERT
queries because multiple connections parallelly sends the INSERT
queries. In such case, what should be done to avoid infinite backfilling problem? Should we block INSERT
queries by manually acquiring some LOCK
until backfill is completed ? If so, does it violates the concept of zero-downtime ?