`Backfill` process never completes during periods of high `INSERT` query rates

Hello team,

For testing, I tried this example [14_add_reviews_table.json](https://github.com/xataio/pgroll/blob/main/examples/14_add_reviews_table.json) to create a `reviews` table and then [17_add_rating_column.json](https://github.com/xataio/pgroll/blob/main/examples/17_add_rating_column.json) to add a `rating` column. 

I have inserted 50k dummy rows initially to this table.

Now to test `alter column` operation, i started [18_change_column_type.json](https://github.com/xataio/pgroll/blob/main/examples/18_change_column_type.json) migration and  i ran `pgbench`  parallelly  as follows,

```
pgbench -f insert_script.sql -c 1 -T 2147483647 postgres 
```
<details>
<summary><b> insert_script.sql</b></summary>

```sql
INSERT INTO public.reviews (username, product, review, rating)
SELECT 
    'user_' || generate_series(1, 10000), 
    'Product_' || generate_series(1, 10000), 
    'This is a dummy review for Product_' || generate_series(1, 10000),
    (random() * 5)::integer::text;
```
</details>

Since the table is loaded with high `INSERT`  query rates from `pgbench`, the backfill procedure is not ending. Though initially 50k rows was present in the table, the backfilling continued for all the newer rows inserted by pgbench. 

Since the `trigger` is created before starting backfill, i think it is redundant to backfill the newer rows which are inserted after `trigger` creation as backfill for this rows will be taken care by the `trigger` created earlier.  Below is the console screenshot where the backfilling continued for 6 million+ rows created with `pgbench` and running infinitely.

![image](https://github.com/user-attachments/assets/ef4c2919-cc9d-4cdd-a24a-bb2f929d61a8)


In a large scale production environment, we can expect high volume of `INSERT` queries because multiple connections parallelly sends the `INSERT` queries. In such case, what should be done to avoid infinite backfilling problem? Should we block `INSERT` queries by manually acquiring some `LOCK` until backfill is completed ? If so, does it violates the concept of zero-downtime ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`Backfill` process never completes during periods of high `INSERT` query rates #583

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Backfill process never completes during periods of high INSERT query rates #583

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`Backfill` process never completes during periods of high `INSERT` query rates #583