Skip to content

performance of Binary COPY is not so good #5513

@pantonis

Description

@pantonis

Steps to reproduce

private static async Task BinaryImportAsync<T>(DbContext dbContext,
                                               List<T> data,
                                               string tableName) where T : class
{
    var connection = dbContext.Database.GetDbConnection() as NpgsqlConnection;
    if (connection == null)
        throw new InvalidOperationException("Connection must be an NpgsqlConnection");

    if (connection.State != System.Data.ConnectionState.Open)
        await connection.OpenAsync();

    var entityType = dbContext.Model.FindEntityType(typeof(T));

    var columns = entityType.GetProperties()
        .OrderBy(p => p.GetColumnName())
        .ToList();

    var columnNames = columns.Select(p => $"\"{p.GetColumnName()}\"").ToList();
    string columnsQuery = string.Join(", ", columnNames);

    await using (var writer = await connection.BeginBinaryImportAsync($"COPY {tableName} " +
                                 $"({columnsQuery}) FROM STDIN (FORMAT BINARY)"))
    {
        if (writer == null)
            throw new InvalidOperationException("Connection must be an NpgsqlConnection");

        foreach (var item in data)
        {
            await writer.StartRowAsync();
            foreach (var prop in columns)
            {
                var value = prop.PropertyInfo.GetValue(item);

                // Handle nulls
                if (value == null)
                {
                    await writer.WriteNullAsync();
                }
                else
                {
                    // Handle enums (write as string or integer)
                    if (prop.PropertyInfo.PropertyType.IsEnum)
                        value = (int)value;

                    await writer.WriteAsync(value);
                }
            }
        }

        await writer.CompleteAsync();
    }
}

The issue

Writing 15000 rows in an empty table with 15 columns takes about 8 seconds to save in PostgreSQL. Since I'm new in PostgreSQL ecosystem I was wondering if this is a normal time or it is considered slow? (DB Hardware 6 CPUs with 16GB of RAM)

Further technical details

Npgsql version: 7.0.6
PostgreSQL version: "PostgreSQL 16.1 (Ubuntu 16.1-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0, 64-bit"
Operating system: ubuntu 20.04.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions