I Benchmarked 9 Ways to Insert Data Into PostgreSQL From Java. COPY BINARY Won by a Landslide.
TL;DR: We benchmarked 9 different methods of inserting data into PostgreSQL from Java, from naive individual INSERTs to DuckDB-via-Arrow pipelines. COPY BINARY wins decisively at scale: 712K rows/sec for 1M rows — 2x faster than COPY CSV, 5x faster than JDBC batch, and 18x faster than individual INSERTs. The sweet spot for most applications is reWriteBatchedInserts=true (one URL parameter, 2.5x speedup over naive batch). For high-volume ingestion, COPY BINARY via PgBulkInsert is unbeatable.
The Setup
Section titled “The Setup”We built a benchmark that simulates an IoT sensor ingestion pipeline — a table with 9 columns covering the interesting PostgreSQL types: UUID, TIMESTAMPTZ, DOUBLE PRECISION, NUMERIC, SMALLINT, BOOLEAN, TEXT, and TEXT[]. One million rows of randomized but deterministic sensor readings (temperature, humidity, pressure, battery level, anomaly flags, location tags).
Each method was warmed up (2 runs), then measured (5 runs, median taken). After every insert, we verified both row count and spot-checked the first and last 10 rows against source data. PostgreSQL 18 via TestContainers. Java 25.
All benchmark code is open source: examples/java-postgres-benchmark.
The Methods
Section titled “The Methods”Here’s what we tested, roughly in order of complexity:
1. Individual INSERT (the naive approach)
Section titled “1. Individual INSERT (the naive approach)”One executeUpdate() per row, wrapped in a single transaction. This is what most beginners write, and what many ORMs produce. (code)
2. Batch INSERT (addBatch / executeBatch)
Section titled “2. Batch INSERT (addBatch / executeBatch)”Standard JDBC batching. Accumulate rows with addBatch(), flush with executeBatch(). The driver sends them as individual INSERT statements but pipelines the network calls. (code)
3. Batch INSERT with reWriteBatchedInserts=true
Section titled “3. Batch INSERT with reWriteBatchedInserts=true”Same code as #2, but with one URL parameter added: ?reWriteBatchedInserts=true. The PostgreSQL JDBC driver automatically rewrites individual batched INSERTs into multi-value statements: INSERT INTO t VALUES (...),(...),.... This is the lowest-effort optimization you can make. (code)
4. Multi-value INSERT (hand-crafted)
Section titled “4. Multi-value INSERT (hand-crafted)”Manually build INSERT INTO t VALUES (...),(...),(...) ... with chunks of 1,000 rows. The hand-rolled version of what reWriteBatchedInserts does automatically. (code)
5. UNNEST
Section titled “5. UNNEST”INSERT INTO t SELECT * FROM unnest($1::uuid[], $2::timestamptz[], ...)Pass one array per column instead of one parameter per cell. Only 9 parameters regardless of row count, so it avoids PostgreSQL’s 65K parameter limit. Reduces query planning overhead. (code, generated)
6. COPY CSV
Section titled “6. COPY CSV”PostgreSQL’s COPY ... FROM STDIN WITH (FORMAT CSV) via the JDBC driver’s CopyManager. Build a CSV string in memory, stream it to PostgreSQL. This is the standard “fast path” that most PostgreSQL performance guides recommend. (code)
7. COPY BINARY
Section titled “7. COPY BINARY”PostgreSQL’s COPY ... FROM STDIN (FORMAT BINARY) via PgBulkInsert. Instead of text CSV, it writes PostgreSQL’s native binary wire format — no parsing overhead on the server side. (code, generated)
8. DuckDB → PostgreSQL
Section titled “8. DuckDB → PostgreSQL”Load data into an in-memory DuckDB table via DuckDB’s Appender API (extremely fast — ~850K rows/sec into DuckDB), then push to PostgreSQL via DuckDB’s postgres extension: INSERT INTO pg.table SELECT * FROM local_table. DuckDB uses COPY BINARY under the hood for the transfer. (code, generated)
9. Arrow → DuckDB → PostgreSQL
Section titled “9. Arrow → DuckDB → PostgreSQL”Build Apache Arrow columnar vectors in Java, register them with DuckDB as a virtual table (zero-copy), then push to PostgreSQL via the postgres extension. Skips the DuckDB staging table entirely. (code)
The Results
Section titled “The Results”The interactive report below shows all batch sizes from 100 to 1,000,000 rows:
PostgreSQL Insert Benchmark
Comparing 9 insert methods across batch sizes from 100 to 1,000,000 rows
Throughput vs Total Rows
100 rows
| Method | Time (ms) | Rows/sec | vs Best |
|---|---|---|---|
| Individual INSERT | 9.0 | 11,057 | 10.0x slower |
| Batch INSERT | 1.4 | 72,420 | 1.5x slower |
| Batch (rewrite=true) | 1.4 | 69,433 | 1.6x slower |
| Multi-value INSERT | 1.4 | 71,185 | 1.6x slower |
| UNNEST | 1.3 | 74,810 | 1.5x slower |
| COPY CSV | 0.9 | 110,549 | fastest |
| COPY BINARY | 2.0 | 51,234 | 2.2x slower |
| DuckDB -> PostgreSQL | 3.0 | 33,531 | 3.3x slower |
| Arrow -> DuckDB -> PG | 4.2 | 23,937 | 4.6x slower |
1,000 rows
| Method | Time (ms) | Rows/sec | vs Best |
|---|---|---|---|
| Individual INSERT | 39.0 | 25,612 | 9.7x slower |
| Batch INSERT | 6.8 | 147,246 | 1.7x slower |
| Batch (rewrite=true) | 5.1 | 196,576 | 1.3x slower |
| Multi-value INSERT | 7.9 | 127,238 | 2.0x slower |
| UNNEST | 6.4 | 155,451 | 1.6x slower |
| COPY CSV | 4.0 | 249,664 | fastest |
| COPY BINARY | 4.9 | 205,923 | 1.2x slower |
| DuckDB -> PostgreSQL | 7.5 | 133,547 | 1.9x slower |
| Arrow -> DuckDB -> PG | 10.9 | 91,756 | 2.7x slower |
10,000 rows
| Method | Time (ms) | Rows/sec | vs Best |
|---|---|---|---|
| Individual INSERT | 266.0 | 37,589 | 16.5x slower |
| Batch INSERT | 64.3 | 155,544 | 4.0x slower |
| Batch (rewrite=true) | 36.1 | 276,876 | 2.2x slower |
| Multi-value INSERT | 39.8 | 251,090 | 2.5x slower |
| UNNEST | 40.5 | 247,087 | 2.5x slower |
| COPY CSV | 32.2 | 310,479 | 2.0x slower |
| COPY BINARY | 16.1 | 621,596 | fastest |
| DuckDB -> PostgreSQL | 40.4 | 247,228 | 2.5x slower |
| Arrow -> DuckDB -> PG | 39.3 | 254,744 | 2.4x slower |
100,000 rows
| Method | Time (ms) | Rows/sec | vs Best |
|---|---|---|---|
| Individual INSERT | 2,650.2 | 37,733 | 17.8x slower |
| Batch INSERT | 633.3 | 157,907 | 4.2x slower |
| Batch (rewrite=true) | 375.6 | 266,216 | 2.5x slower |
| Multi-value INSERT | 386.2 | 258,964 | 2.6x slower |
| UNNEST | 392.0 | 255,124 | 2.6x slower |
| COPY CSV | 278.0 | 359,671 | 1.9x slower |
| COPY BINARY | 149.1 | 670,830 | fastest |
| DuckDB -> PostgreSQL | 354.2 | 282,339 | 2.4x slower |
| Arrow -> DuckDB -> PG | 290.4 | 344,363 | 1.9x slower |
1,000,000 rows
| Method | Time (ms) | Rows/sec | vs Best |
|---|---|---|---|
| Individual INSERT | 25,494.3 | 39,225 | 18.2x slower |
| Batch INSERT | 6,173.2 | 161,990 | 4.4x slower |
| Batch (rewrite=true) | 3,423.6 | 292,091 | 2.4x slower |
| Multi-value INSERT | 3,960.3 | 252,504 | 2.8x slower |
| UNNEST | 4,124.8 | 242,437 | 2.9x slower |
| COPY CSV | 2,774.5 | 360,420 | 2.0x slower |
| COPY BINARY | 1,403.7 | 712,418 | fastest |
| DuckDB -> PostgreSQL | 3,274.8 | 305,362 | 2.3x slower |
| Arrow -> DuckDB -> PG | 2,830.8 | 353,263 | 2.0x slower |
System Information
| Java | Java 25 (Temurin 25+36-LTS) |
| OS | Linux 6.8.0-100-generic |
| Arch | amd64 |
| CPUs | 24 |
| Max Memory | 7,956 MB |
| PostgreSQL | 18 (TestContainers) |
| Warmup Runs | 2 |
| Measurement Runs | 5 |
How They Scale
Section titled “How They Scale”The performance characteristics change with data volume. At small sizes (100-1,000 rows), the overhead of COPY setup makes simpler methods competitive. COPY CSV actually wins below ~5K rows. But once you cross 10K rows, COPY BINARY pulls away and the gap widens linearly.
At 1M rows, COPY BINARY delivers 712K rows/sec — 2x faster than COPY CSV, 2.4x faster than reWriteBatchedInserts, and 18x faster than individual INSERTs.
The DuckDB Detour: What We Learned
Section titled “The DuckDB Detour: What We Learned”The DuckDB approach was the most fun to benchmark. DuckDB is known for being heavily optimized, and we were curious whether routing data through its postgres extension could beat a direct COPY. The idea: spin up an in-memory DuckDB instance as a temporary buffer — no files, no persistence, just a fast columnar engine to stage data before pushing it to PostgreSQL.
DuckDB Appender path (method 8): DuckDB’s appender loads 1M rows into an in-memory table in ~1.1s (~935K rows/sec). But then the DuckDB postgres extension takes ~2.3s to push that data to PostgreSQL. Total: 3.3s. The bottleneck is the cross-engine transfer.
Arrow path (method 9): Building Arrow vectors directly and registering them as a virtual table with DuckDB (zero-copy, no staging table needed) is faster than the appender path (~635ms vs ~1.1s for 1M rows). DuckDB scans the Arrow memory directly. But the copy-to-PG step takes the same ~2.2s. Total: 2.8s.
Key insight: DuckDB’s postgres extension already uses COPY BINARY under the hood (we confirmed this by reading the source). So the DuckDB → PG transfer is doing the same thing as PgBulkInsert, but with additional overhead from the cross-engine bridge. Direct PgBulkInsert cuts out the middleman.
Where DuckDB shines is if your data is already in DuckDB (from a Parquet file, a CSV, a data pipeline). In that case, ATTACH + INSERT INTO is the most natural path and gives you COPY BINARY for free without any extra libraries. For data that starts in Java, going direct is faster.
Recommendations
Section titled “Recommendations”For most applications: reWriteBatchedInserts=true
Section titled “For most applications: reWriteBatchedInserts=true”If you’re using JDBC (directly or through Spring/Hibernate), just add ?reWriteBatchedInserts=true to your connection URL. One parameter change, zero code changes, ~2.5x throughput improvement over naive batching. This is the best ROI optimization.
jdbc:postgresql://host:5432/db?reWriteBatchedInserts=trueFor bulk ingestion: COPY BINARY via PgBulkInsert
Section titled “For bulk ingestion: COPY BINARY via PgBulkInsert”If you’re ingesting thousands or millions of rows — ETL pipelines, IoT data, analytics ingestion, data migrations — use PgBulkInsert by Philipp Wagner. It’s a well-maintained Java library that handles the binary wire format, supports all PostgreSQL types (including arrays, UUIDs, JSONB, timestamps with timezone), and delivers ~700K rows/sec.
For COPY CSV users
Section titled “For COPY CSV users”If you’re already using COPY CSV (a common recommendation), know that switching to COPY BINARY roughly doubles your throughput. The difference is server-side parsing: CSV requires text → type conversion for every value, while binary sends data in PostgreSQL’s native format.
What about UNNEST?
Section titled “What about UNNEST?”UNNEST (INSERT INTO t SELECT * FROM unnest(...)) is often recommended as “faster than multi-value INSERT” because it reduces query planning overhead. In our benchmark with 9 columns, it was roughly equivalent to multi-value INSERT at most sizes. The advantage may be more pronounced with wider tables or simpler column types. Its real benefit is avoiding the 65K parameter limit — with 9 columns, multi-value INSERT maxes out at ~7,200 rows per statement.
How We Built This
Section titled “How We Built This”The benchmark was built using SQG, a SQL-first code generator. SQG reads annotated .sql files and generates type-safe database access code for Java, TypeScript, and Python — queries, inserts, COPY BINARY appenders, DuckDB bulk insert appenders, all from a single SQL file.
In this benchmark, SQG generated the COPY BINARY appender (with PgBulkInsert mapper and row record), the DuckDB appender, the UNNEST insert, the individual INSERT, and the verification queries — all from queries.sql. The only hand-written insert methods are Batch INSERT and COPY CSV, which use raw JDBC.
Methodology
Section titled “Methodology”- PostgreSQL 18 (alpine) via TestContainers 2.0.4
- Java 25 (Temurin 25+36-LTS)
- DuckDB 1.5.1, Apache Arrow 19.0.0, PgBulkInsert 9.0.0
- Each method: 2 warmup runs + 5 measurement runs, median reported
- Data verified after every insert (row count + first/last row spot checks)
- All methods use a single connection (no parallel streams)
- TestContainers runs PostgreSQL in Docker with default settings (no tuning)
Try It Yourself
Section titled “Try It Yourself”git clone https://github.com/sqg-dev/sqgcd sqg/examples/java-postgres-benchmarkjust runThis generates the code from SQL files, runs the full benchmark, and writes an HTML report with interactive charts.
Credits
Section titled “Credits”- PgBulkInsert by Philipp Wagner — the library that makes COPY BINARY accessible from Java
- DuckDB and its postgres extension
- TestContainers
Have questions or want to challenge the methodology? Open an issue on GitHub.