Skip to content

I Benchmarked 9 Ways to Insert Data Into PostgreSQL From Java. COPY BINARY Won by a Landslide.

PostgreSQL elephant racing through INSERT statements at high speed

TL;DR: We benchmarked 9 different methods of inserting data into PostgreSQL from Java, from naive individual INSERTs to DuckDB-via-Arrow pipelines. COPY BINARY wins decisively at scale: 712K rows/sec for 1M rows — 2x faster than COPY CSV, 5x faster than JDBC batch, and 18x faster than individual INSERTs. The sweet spot for most applications is reWriteBatchedInserts=true (one URL parameter, 2.5x speedup over naive batch). For high-volume ingestion, COPY BINARY via PgBulkInsert is unbeatable.

We built a benchmark that simulates an IoT sensor ingestion pipeline — a table with 9 columns covering the interesting PostgreSQL types: UUID, TIMESTAMPTZ, DOUBLE PRECISION, NUMERIC, SMALLINT, BOOLEAN, TEXT, and TEXT[]. One million rows of randomized but deterministic sensor readings (temperature, humidity, pressure, battery level, anomaly flags, location tags).

Each method was warmed up (2 runs), then measured (5 runs, median taken). After every insert, we verified both row count and spot-checked the first and last 10 rows against source data. PostgreSQL 18 via TestContainers. Java 25.

All benchmark code is open source: examples/java-postgres-benchmark.

Here’s what we tested, roughly in order of complexity:

One executeUpdate() per row, wrapped in a single transaction. This is what most beginners write, and what many ORMs produce. (code)

Standard JDBC batching. Accumulate rows with addBatch(), flush with executeBatch(). The driver sends them as individual INSERT statements but pipelines the network calls. (code)

3. Batch INSERT with reWriteBatchedInserts=true

Section titled “3. Batch INSERT with reWriteBatchedInserts=true”

Same code as #2, but with one URL parameter added: ?reWriteBatchedInserts=true. The PostgreSQL JDBC driver automatically rewrites individual batched INSERTs into multi-value statements: INSERT INTO t VALUES (...),(...),.... This is the lowest-effort optimization you can make. (code)

Manually build INSERT INTO t VALUES (...),(...),(...) ... with chunks of 1,000 rows. The hand-rolled version of what reWriteBatchedInserts does automatically. (code)

INSERT INTO t SELECT * FROM unnest($1::uuid[], $2::timestamptz[], ...)

Pass one array per column instead of one parameter per cell. Only 9 parameters regardless of row count, so it avoids PostgreSQL’s 65K parameter limit. Reduces query planning overhead. (code, generated)

PostgreSQL’s COPY ... FROM STDIN WITH (FORMAT CSV) via the JDBC driver’s CopyManager. Build a CSV string in memory, stream it to PostgreSQL. This is the standard “fast path” that most PostgreSQL performance guides recommend. (code)

PostgreSQL’s COPY ... FROM STDIN (FORMAT BINARY) via PgBulkInsert. Instead of text CSV, it writes PostgreSQL’s native binary wire format — no parsing overhead on the server side. (code, generated)

Load data into an in-memory DuckDB table via DuckDB’s Appender API (extremely fast — ~850K rows/sec into DuckDB), then push to PostgreSQL via DuckDB’s postgres extension: INSERT INTO pg.table SELECT * FROM local_table. DuckDB uses COPY BINARY under the hood for the transfer. (code, generated)

Build Apache Arrow columnar vectors in Java, register them with DuckDB as a virtual table (zero-copy), then push to PostgreSQL via the postgres extension. Skips the DuckDB staging table entirely. (code)

The interactive report below shows all batch sizes from 100 to 1,000,000 rows:

PostgreSQL Insert Benchmark

Comparing 9 insert methods across batch sizes from 100 to 1,000,000 rows

Throughput vs Total Rows

100 rows

MethodTime (ms)Rows/secvs Best
Individual INSERT9.011,05710.0x slower
Batch INSERT1.472,4201.5x slower
Batch (rewrite=true)1.469,4331.6x slower
Multi-value INSERT1.471,1851.6x slower
UNNEST1.374,8101.5x slower
COPY CSV0.9110,549fastest
COPY BINARY2.051,2342.2x slower
DuckDB -> PostgreSQL3.033,5313.3x slower
Arrow -> DuckDB -> PG4.223,9374.6x slower

1,000 rows

MethodTime (ms)Rows/secvs Best
Individual INSERT39.025,6129.7x slower
Batch INSERT6.8147,2461.7x slower
Batch (rewrite=true)5.1196,5761.3x slower
Multi-value INSERT7.9127,2382.0x slower
UNNEST6.4155,4511.6x slower
COPY CSV4.0249,664fastest
COPY BINARY4.9205,9231.2x slower
DuckDB -> PostgreSQL7.5133,5471.9x slower
Arrow -> DuckDB -> PG10.991,7562.7x slower

10,000 rows

MethodTime (ms)Rows/secvs Best
Individual INSERT266.037,58916.5x slower
Batch INSERT64.3155,5444.0x slower
Batch (rewrite=true)36.1276,8762.2x slower
Multi-value INSERT39.8251,0902.5x slower
UNNEST40.5247,0872.5x slower
COPY CSV32.2310,4792.0x slower
COPY BINARY16.1621,596fastest
DuckDB -> PostgreSQL40.4247,2282.5x slower
Arrow -> DuckDB -> PG39.3254,7442.4x slower

100,000 rows

MethodTime (ms)Rows/secvs Best
Individual INSERT2,650.237,73317.8x slower
Batch INSERT633.3157,9074.2x slower
Batch (rewrite=true)375.6266,2162.5x slower
Multi-value INSERT386.2258,9642.6x slower
UNNEST392.0255,1242.6x slower
COPY CSV278.0359,6711.9x slower
COPY BINARY149.1670,830fastest
DuckDB -> PostgreSQL354.2282,3392.4x slower
Arrow -> DuckDB -> PG290.4344,3631.9x slower

1,000,000 rows

MethodTime (ms)Rows/secvs Best
Individual INSERT25,494.339,22518.2x slower
Batch INSERT6,173.2161,9904.4x slower
Batch (rewrite=true)3,423.6292,0912.4x slower
Multi-value INSERT3,960.3252,5042.8x slower
UNNEST4,124.8242,4372.9x slower
COPY CSV2,774.5360,4202.0x slower
COPY BINARY1,403.7712,418fastest
DuckDB -> PostgreSQL3,274.8305,3622.3x slower
Arrow -> DuckDB -> PG2,830.8353,2632.0x slower

System Information

JavaJava 25 (Temurin 25+36-LTS)
OSLinux 6.8.0-100-generic
Archamd64
CPUs24
Max Memory7,956 MB
PostgreSQL18 (TestContainers)
Warmup Runs2
Measurement Runs5

The performance characteristics change with data volume. At small sizes (100-1,000 rows), the overhead of COPY setup makes simpler methods competitive. COPY CSV actually wins below ~5K rows. But once you cross 10K rows, COPY BINARY pulls away and the gap widens linearly.

At 1M rows, COPY BINARY delivers 712K rows/sec — 2x faster than COPY CSV, 2.4x faster than reWriteBatchedInserts, and 18x faster than individual INSERTs.

The DuckDB approach was the most fun to benchmark. DuckDB is known for being heavily optimized, and we were curious whether routing data through its postgres extension could beat a direct COPY. The idea: spin up an in-memory DuckDB instance as a temporary buffer — no files, no persistence, just a fast columnar engine to stage data before pushing it to PostgreSQL.

DuckDB Appender path (method 8): DuckDB’s appender loads 1M rows into an in-memory table in ~1.1s (~935K rows/sec). But then the DuckDB postgres extension takes ~2.3s to push that data to PostgreSQL. Total: 3.3s. The bottleneck is the cross-engine transfer.

Arrow path (method 9): Building Arrow vectors directly and registering them as a virtual table with DuckDB (zero-copy, no staging table needed) is faster than the appender path (~635ms vs ~1.1s for 1M rows). DuckDB scans the Arrow memory directly. But the copy-to-PG step takes the same ~2.2s. Total: 2.8s.

Key insight: DuckDB’s postgres extension already uses COPY BINARY under the hood (we confirmed this by reading the source). So the DuckDB → PG transfer is doing the same thing as PgBulkInsert, but with additional overhead from the cross-engine bridge. Direct PgBulkInsert cuts out the middleman.

Where DuckDB shines is if your data is already in DuckDB (from a Parquet file, a CSV, a data pipeline). In that case, ATTACH + INSERT INTO is the most natural path and gives you COPY BINARY for free without any extra libraries. For data that starts in Java, going direct is faster.

For most applications: reWriteBatchedInserts=true

Section titled “For most applications: reWriteBatchedInserts=true”

If you’re using JDBC (directly or through Spring/Hibernate), just add ?reWriteBatchedInserts=true to your connection URL. One parameter change, zero code changes, ~2.5x throughput improvement over naive batching. This is the best ROI optimization.

jdbc:postgresql://host:5432/db?reWriteBatchedInserts=true

For bulk ingestion: COPY BINARY via PgBulkInsert

Section titled “For bulk ingestion: COPY BINARY via PgBulkInsert”

If you’re ingesting thousands or millions of rows — ETL pipelines, IoT data, analytics ingestion, data migrations — use PgBulkInsert by Philipp Wagner. It’s a well-maintained Java library that handles the binary wire format, supports all PostgreSQL types (including arrays, UUIDs, JSONB, timestamps with timezone), and delivers ~700K rows/sec.

If you’re already using COPY CSV (a common recommendation), know that switching to COPY BINARY roughly doubles your throughput. The difference is server-side parsing: CSV requires text → type conversion for every value, while binary sends data in PostgreSQL’s native format.

UNNEST (INSERT INTO t SELECT * FROM unnest(...)) is often recommended as “faster than multi-value INSERT” because it reduces query planning overhead. In our benchmark with 9 columns, it was roughly equivalent to multi-value INSERT at most sizes. The advantage may be more pronounced with wider tables or simpler column types. Its real benefit is avoiding the 65K parameter limit — with 9 columns, multi-value INSERT maxes out at ~7,200 rows per statement.

The benchmark was built using SQG, a SQL-first code generator. SQG reads annotated .sql files and generates type-safe database access code for Java, TypeScript, and Python — queries, inserts, COPY BINARY appenders, DuckDB bulk insert appenders, all from a single SQL file.

In this benchmark, SQG generated the COPY BINARY appender (with PgBulkInsert mapper and row record), the DuckDB appender, the UNNEST insert, the individual INSERT, and the verification queries — all from queries.sql. The only hand-written insert methods are Batch INSERT and COPY CSV, which use raw JDBC.

  • PostgreSQL 18 (alpine) via TestContainers 2.0.4
  • Java 25 (Temurin 25+36-LTS)
  • DuckDB 1.5.1, Apache Arrow 19.0.0, PgBulkInsert 9.0.0
  • Each method: 2 warmup runs + 5 measurement runs, median reported
  • Data verified after every insert (row count + first/last row spot checks)
  • All methods use a single connection (no parallel streams)
  • TestContainers runs PostgreSQL in Docker with default settings (no tuning)
Terminal window
git clone https://github.com/sqg-dev/sqg
cd sqg/examples/java-postgres-benchmark
just run

This generates the code from SQL files, runs the full benchmark, and writes an HTML report with interactive charts.


Have questions or want to challenge the methodology? Open an issue on GitHub.