Skip to content

Blog

I Benchmarked 9 Ways to Insert Data Into PostgreSQL From Java. COPY BINARY Won by a Landslide.

PostgreSQL elephant racing through INSERT statements at high speed

TL;DR: We benchmarked 9 different methods of inserting data into PostgreSQL from Java, from naive individual INSERTs to DuckDB-via-Arrow pipelines. COPY BINARY wins decisively at scale: 712K rows/sec for 1M rows — 2x faster than COPY CSV, 5x faster than JDBC batch, and 18x faster than individual INSERTs. The sweet spot for most applications is reWriteBatchedInserts=true (one URL parameter, 2.5x speedup over naive batch). For high-volume ingestion, COPY BINARY via PgBulkInsert is unbeatable.

We built a benchmark that simulates an IoT sensor ingestion pipeline — a table with 9 columns covering the interesting PostgreSQL types: UUID, TIMESTAMPTZ, DOUBLE PRECISION, NUMERIC, SMALLINT, BOOLEAN, TEXT, and TEXT[]. One million rows of randomized but deterministic sensor readings (temperature, humidity, pressure, battery level, anomaly flags, location tags).

Each method was warmed up (2 runs), then measured (5 runs, median taken). After every insert, we verified both row count and spot-checked the first and last 10 rows against source data. PostgreSQL 18 via TestContainers. Java 25.

All benchmark code is open source: examples/java-postgres-benchmark.

Here’s what we tested, roughly in order of complexity:

One executeUpdate() per row, wrapped in a single transaction. This is what most beginners write, and what many ORMs produce. (code)

Standard JDBC batching. Accumulate rows with addBatch(), flush with executeBatch(). The driver sends them as individual INSERT statements but pipelines the network calls. (code)

3. Batch INSERT with reWriteBatchedInserts=true

Section titled “3. Batch INSERT with reWriteBatchedInserts=true”

Same code as #2, but with one URL parameter added: ?reWriteBatchedInserts=true. The PostgreSQL JDBC driver automatically rewrites individual batched INSERTs into multi-value statements: INSERT INTO t VALUES (...),(...),.... This is the lowest-effort optimization you can make. (code)

Manually build INSERT INTO t VALUES (...),(...),(...) ... with chunks of 1,000 rows. The hand-rolled version of what reWriteBatchedInserts does automatically. (code)

INSERT INTO t SELECT * FROM unnest($1::uuid[], $2::timestamptz[], ...)

Pass one array per column instead of one parameter per cell. Only 9 parameters regardless of row count, so it avoids PostgreSQL’s 65K parameter limit. Reduces query planning overhead. (code, generated)

PostgreSQL’s COPY ... FROM STDIN WITH (FORMAT CSV) via the JDBC driver’s CopyManager. Build a CSV string in memory, stream it to PostgreSQL. This is the standard “fast path” that most PostgreSQL performance guides recommend. (code)

PostgreSQL’s COPY ... FROM STDIN (FORMAT BINARY) via PgBulkInsert. Instead of text CSV, it writes PostgreSQL’s native binary wire format — no parsing overhead on the server side. (code, generated)

Load data into an in-memory DuckDB table via DuckDB’s Appender API (extremely fast — ~850K rows/sec into DuckDB), then push to PostgreSQL via DuckDB’s postgres extension: INSERT INTO pg.table SELECT * FROM local_table. DuckDB uses COPY BINARY under the hood for the transfer. (code, generated)

Build Apache Arrow columnar vectors in Java, register them with DuckDB as a virtual table (zero-copy), then push to PostgreSQL via the postgres extension. Skips the DuckDB staging table entirely. (code)

The interactive report below shows all batch sizes from 100 to 1,000,000 rows:

PostgreSQL Insert Benchmark

Comparing 9 insert methods across batch sizes from 100 to 1,000,000 rows

Throughput vs Total Rows

100 rows

MethodTime (ms)Rows/secvs Best
Individual INSERT9.011,05710.0x slower
Batch INSERT1.472,4201.5x slower
Batch (rewrite=true)1.469,4331.6x slower
Multi-value INSERT1.471,1851.6x slower
UNNEST1.374,8101.5x slower
COPY CSV0.9110,549fastest
COPY BINARY2.051,2342.2x slower
DuckDB -> PostgreSQL3.033,5313.3x slower
Arrow -> DuckDB -> PG4.223,9374.6x slower

1,000 rows

MethodTime (ms)Rows/secvs Best
Individual INSERT39.025,6129.7x slower
Batch INSERT6.8147,2461.7x slower
Batch (rewrite=true)5.1196,5761.3x slower
Multi-value INSERT7.9127,2382.0x slower
UNNEST6.4155,4511.6x slower
COPY CSV4.0249,664fastest
COPY BINARY4.9205,9231.2x slower
DuckDB -> PostgreSQL7.5133,5471.9x slower
Arrow -> DuckDB -> PG10.991,7562.7x slower

10,000 rows

MethodTime (ms)Rows/secvs Best
Individual INSERT266.037,58916.5x slower
Batch INSERT64.3155,5444.0x slower
Batch (rewrite=true)36.1276,8762.2x slower
Multi-value INSERT39.8251,0902.5x slower
UNNEST40.5247,0872.5x slower
COPY CSV32.2310,4792.0x slower
COPY BINARY16.1621,596fastest
DuckDB -> PostgreSQL40.4247,2282.5x slower
Arrow -> DuckDB -> PG39.3254,7442.4x slower

100,000 rows

MethodTime (ms)Rows/secvs Best
Individual INSERT2,650.237,73317.8x slower
Batch INSERT633.3157,9074.2x slower
Batch (rewrite=true)375.6266,2162.5x slower
Multi-value INSERT386.2258,9642.6x slower
UNNEST392.0255,1242.6x slower
COPY CSV278.0359,6711.9x slower
COPY BINARY149.1670,830fastest
DuckDB -> PostgreSQL354.2282,3392.4x slower
Arrow -> DuckDB -> PG290.4344,3631.9x slower

1,000,000 rows

MethodTime (ms)Rows/secvs Best
Individual INSERT25,494.339,22518.2x slower
Batch INSERT6,173.2161,9904.4x slower
Batch (rewrite=true)3,423.6292,0912.4x slower
Multi-value INSERT3,960.3252,5042.8x slower
UNNEST4,124.8242,4372.9x slower
COPY CSV2,774.5360,4202.0x slower
COPY BINARY1,403.7712,418fastest
DuckDB -> PostgreSQL3,274.8305,3622.3x slower
Arrow -> DuckDB -> PG2,830.8353,2632.0x slower

System Information

JavaJava 25 (Temurin 25+36-LTS)
OSLinux 6.8.0-100-generic
Archamd64
CPUs24
Max Memory7,956 MB
PostgreSQL18 (TestContainers)
Warmup Runs2
Measurement Runs5

The performance characteristics change with data volume. At small sizes (100-1,000 rows), the overhead of COPY setup makes simpler methods competitive. COPY CSV actually wins below ~5K rows. But once you cross 10K rows, COPY BINARY pulls away and the gap widens linearly.

At 1M rows, COPY BINARY delivers 712K rows/sec — 2x faster than COPY CSV, 2.4x faster than reWriteBatchedInserts, and 18x faster than individual INSERTs.

The DuckDB approach was the most fun to benchmark. DuckDB is known for being heavily optimized, and we were curious whether routing data through its postgres extension could beat a direct COPY. The idea: spin up an in-memory DuckDB instance as a temporary buffer — no files, no persistence, just a fast columnar engine to stage data before pushing it to PostgreSQL.

DuckDB Appender path (method 8): DuckDB’s appender loads 1M rows into an in-memory table in ~1.1s (~935K rows/sec). But then the DuckDB postgres extension takes ~2.3s to push that data to PostgreSQL. Total: 3.3s. The bottleneck is the cross-engine transfer.

Arrow path (method 9): Building Arrow vectors directly and registering them as a virtual table with DuckDB (zero-copy, no staging table needed) is faster than the appender path (~635ms vs ~1.1s for 1M rows). DuckDB scans the Arrow memory directly. But the copy-to-PG step takes the same ~2.2s. Total: 2.8s.

Key insight: DuckDB’s postgres extension already uses COPY BINARY under the hood (we confirmed this by reading the source). So the DuckDB → PG transfer is doing the same thing as PgBulkInsert, but with additional overhead from the cross-engine bridge. Direct PgBulkInsert cuts out the middleman.

Where DuckDB shines is if your data is already in DuckDB (from a Parquet file, a CSV, a data pipeline). In that case, ATTACH + INSERT INTO is the most natural path and gives you COPY BINARY for free without any extra libraries. For data that starts in Java, going direct is faster.

For most applications: reWriteBatchedInserts=true

Section titled “For most applications: reWriteBatchedInserts=true”

If you’re using JDBC (directly or through Spring/Hibernate), just add ?reWriteBatchedInserts=true to your connection URL. One parameter change, zero code changes, ~2.5x throughput improvement over naive batching. This is the best ROI optimization.

jdbc:postgresql://host:5432/db?reWriteBatchedInserts=true

For bulk ingestion: COPY BINARY via PgBulkInsert

Section titled “For bulk ingestion: COPY BINARY via PgBulkInsert”

If you’re ingesting thousands or millions of rows — ETL pipelines, IoT data, analytics ingestion, data migrations — use PgBulkInsert by Philipp Wagner. It’s a well-maintained Java library that handles the binary wire format, supports all PostgreSQL types (including arrays, UUIDs, JSONB, timestamps with timezone), and delivers ~700K rows/sec.

If you’re already using COPY CSV (a common recommendation), know that switching to COPY BINARY roughly doubles your throughput. The difference is server-side parsing: CSV requires text → type conversion for every value, while binary sends data in PostgreSQL’s native format.

UNNEST (INSERT INTO t SELECT * FROM unnest(...)) is often recommended as “faster than multi-value INSERT” because it reduces query planning overhead. In our benchmark with 9 columns, it was roughly equivalent to multi-value INSERT at most sizes. The advantage may be more pronounced with wider tables or simpler column types. Its real benefit is avoiding the 65K parameter limit — with 9 columns, multi-value INSERT maxes out at ~7,200 rows per statement.

The benchmark was built using SQG, a SQL-first code generator. SQG reads annotated .sql files and generates type-safe database access code for Java, TypeScript, and Python — queries, inserts, COPY BINARY appenders, DuckDB bulk insert appenders, all from a single SQL file.

In this benchmark, SQG generated the COPY BINARY appender (with PgBulkInsert mapper and row record), the DuckDB appender, the UNNEST insert, the individual INSERT, and the verification queries — all from queries.sql. The only hand-written insert methods are Batch INSERT and COPY CSV, which use raw JDBC.

  • PostgreSQL 18 (alpine) via TestContainers 2.0.4
  • Java 25 (Temurin 25+36-LTS)
  • DuckDB 1.5.1, Apache Arrow 19.0.0, PgBulkInsert 9.0.0
  • Each method: 2 warmup runs + 5 measurement runs, median reported
  • Data verified after every insert (row count + first/last row spot checks)
  • All methods use a single connection (no parallel streams)
  • TestContainers runs PostgreSQL in Docker with default settings (no tuning)
Terminal window
git clone https://github.com/sqg-dev/sqg
cd sqg/examples/java-postgres-benchmark
just run

This generates the code from SQL files, runs the full benchmark, and writes an HTML report with interactive charts.


Have questions or want to challenge the methodology? Open an issue on GitHub.

SQG v0.10.0: Java Streams & List Type Support

SQG is a type-safe SQL code generator — you write .sql files with annotated queries, and it generates strongly-typed database access code for TypeScript and Java by introspecting your queries against real database engines at build time.

Here’s what’s new in v0.10.0.

Generated Java code now includes methods that return Stream<T> in addition to List<T>. This gives you lazy evaluation, easier composition with the standard library, and avoids materializing large result sets into memory when you don’t need to.

try (Stream<User> users = queries.getAllUsersStream()) {
users.forEach(user -> process(user));
}

The stream holds a reference to the underlying ResultSet, so it needs to be closed after use — hence the try-with-resources.

Array columns like TEXT[] or INTEGER[] are now handled correctly in generated Java code. Previously these types could produce incorrect mappings — they now resolve to proper List<String>, List<Integer>, etc.

The DuckDB appender now supports list/array column types. If your table has a column like tags VARCHAR[], the generated appender method accepts the corresponding list type and writes it correctly using DuckDB’s bulk insert API.


Upgrade with npm install -g @sqg/[email protected] or update your project’s dependency. Full source on GitHub. Try it out in the playground.

Discuss this on Hackernews.

PostgreSQL Support & Built-in Migration Tracking

SQG v0.8.0 brings two major features: improved PostgreSQL support for Java and a built-in migration runner that tracks which migrations have been applied.

Until now, SQG generated a getMigrations() method that returned raw SQL strings — you were responsible for tracking which ones had been applied. That meant writing your own migration runner or integrating an external tool.

With v0.8.0, you can now enable a built-in applyMigrations() method:

sqg.yaml
version: 1
name: my-app
sql:
- files:
- queries.sql
gen:
- generator: typescript/sqlite
output: ./src/generated/
config:
migrations: true # enable built-in migration tracking

Then in your application:

import Database from 'better-sqlite3';
import { MyApp } from './generated/my-app';
const db = new Database('app.db');
// One line — creates tracking table, checks what's applied, runs the rest
MyApp.applyMigrations(db);
const queries = new MyApp(db);

The generated applyMigrations() method:

  1. Creates a _sqg_migrations table if it doesn’t exist
  2. Checks which migrations have already been applied for this project
  3. Applies new migrations in order
  4. Records each migration with a timestamp
  5. Wraps everything in a transaction for safety

The tracking table uses a composite primary key of (project, migration_id), which means multiple SQG projects can share the same database without conflicts. The project name comes from the name field in your sqg.yaml.

// Default project name from sqg.yaml
MyApp.applyMigrations(db);
// Override for multi-tenant scenarios
MyApp.applyMigrations(db, 'tenant-123');

The migration runner works with every SQG generator:

GeneratorMethod
TypeScript/SQLite (better-sqlite3)MyApp.applyMigrations(db)
TypeScript/SQLite (node:sqlite)MyApp.applyMigrations(db)
TypeScript/SQLite (libSQL)await MyApp.applyMigrations(client)
TypeScript/DuckDBawait MyApp.applyMigrations(conn)
Java/JDBC (any engine)MyApp.applyMigrations(connection)
Java/DuckDB ArrowAnalytics.applyMigrations(connection)

Each implementation uses the appropriate transaction mechanism for its engine — BEGIN IMMEDIATE for SQLite, setAutoCommit(false) for JDBC, and so on.

The feature is entirely opt-in. Without config.migrations: true, SQG generates the same getMigrations() method as before. You can continue using external migration tools like Flyway, Liquibase, or your own scripts.

v0.8.0 significantly improves PostgreSQL support for the java/postgres generator.

SQG now introspects PostgreSQL’s pg_type system catalog to resolve user-defined types. This means ENUMs work out of the box:

-- MIGRATE 1
CREATE TYPE task_status AS ENUM ('pending', 'active', 'completed', 'cancelled');
CREATE TABLE tasks (
id SERIAL PRIMARY KEY,
title TEXT NOT NULL,
status task_status DEFAULT 'pending'
);
-- QUERY get_tasks_by_status
@set status = 'active'
SELECT id, title, status FROM tasks WHERE status = ${status}::task_status;

SQG resolves the ENUM OID from pg_type and generates a type-safe Java enum class with getValue() and fromValue() methods. Query parameters and results use the generated enum type directly instead of raw strings. See the Java + JDBC PostgreSQL documentation for details.

PostgreSQL array columns like TEXT[] and INTEGER[] are now properly mapped to List<T> in Java:

CREATE TABLE tasks (
id SERIAL PRIMARY KEY,
tags TEXT[],
priority_scores INTEGER[]
);
// Generated record
record GetAllTasksResult(Integer id, String title, List<String> tags, List<Integer> priorityScores) {}
for (var task : queries.getAllTasks()) {
System.out.println(task.tags()); // [urgent, backend]
System.out.println(task.priorityScores()); // [10, 20, 30]
}

TIMESTAMPTZ columns are now correctly mapped to OffsetDateTime (instead of LocalDateTime), preserving timezone information:

// TIMESTAMPTZ -> OffsetDateTime with UTC offset
record EventResult(Integer id, OffsetDateTime createdAt) {}

If the SQG_POSTGRES_URL environment variable is not set, SQG now automatically starts a PostgreSQL container using Testcontainers. This makes it easy to get started without installing PostgreSQL locally — just have Docker running:

Terminal window
# No env var needed — SQG starts a container automatically
sqg sqg.yaml

For CI/CD or production builds, set the environment variable to point to your PostgreSQL server:

Terminal window
export SQG_POSTGRES_URL="postgresql://user:password@localhost:5432/mydb"
sqg sqg.yaml

Install or update SQG:

Terminal window
pnpm add -g @sqg/sqg@latest

Enable migration tracking by adding config.migrations: true to your generator config. Check out the updated SQL Syntax Reference and generator documentation for full details.


Have questions or feedback? Open an issue on GitHub.

SQLite Driver Benchmark: Comparing better-sqlite3, node:sqlite, libSQL, Turso

Choosing the right SQLite driver for Node.js can impact your application’s performance.

We ran performance tests comparing better-sqlite3, node:sqlite, libSQL, and Turso across common database operations. Here’s what we found.

This library has been tested in production for years, offering excellent performance through a synchronous API. The synchronous API makes it very performant and easy to use, and is perfect for fast queries.

GitHub better-sqlite3

Node.js 22+ includes a built-in SQLite module (still experimental). It provides zero-dependency SQLite access with a synchronous API similar to better-sqlite3, and also an asynchronous API.

Node.js SQlite Documentation

An open-source fork of SQLite created by the Turso team. It provides an async API and supports both local database files and remote libSQL/Turso servers. Fully compatible with the original SQLite (file format, API)

libSQL TypeScript

Turso is a SQLite compatible database written in Rust. Currently still in beta.

Turso for Javascript

We tested all four drivers with identical queries. A simple scenario with a user and posts table.

  • 10,000 users and 500,000 posts (50 per user)
  • Optimized pragma settings (WAL mode, 64MB cache, memory-mapped I/O)
  • Various query patterns: simple selects, indexed lookups, JOINs, aggregates, inserts, updates

Note: better-sqlite3 and node:sqlite use synchronous APIs, while libSQL and Turso use async/await.

All databases used these optimized settings:

PRAGMA journal_mode = WAL; -- Write-Ahead Logging
PRAGMA synchronous = NORMAL; -- Balance safety/speed
PRAGMA cache_size = -64000; -- 64MB cache
PRAGMA temp_store = MEMORY; -- Temp tables in memory
PRAGMA mmap_size = 268435456; -- 256MB memory-mapped I/O

SQLite Driver Benchmark Report

Comparing better-sqlite3 vs node:sqlite vs libsql vs turso (baseline: node:sqlite)

System Information

Node.js Versionv25.3.0
Platformlinux
Architecturex64
CPU12th Gen Intel(R) Core(TM) i9-12900K
CPU Cores24
Total Memory31.07 GB

Summary

Operation better-sqlite3 node:sqlite libsql turso Winner vs node:sqlite
getAllUsers 360 ops/s 268 ops/s 50 ops/s 104 ops/s better-sqlite3 1.34x
getUserById 1,223,260 ops/s 1,073,001 ops/s 61,093 ops/s 707,859 ops/s better-sqlite3 1.14x
getUserByEmail 557,631 ops/s 457,659 ops/s 49,510 ops/s 233,913 ops/s better-sqlite3 1.22x
countUsers (pluck) 538,031 ops/s 398,431 ops/s 108,632 ops/s 5,593 ops/s better-sqlite3 1.35x
getPostsByUser 1,090,293 ops/s 980,550 ops/s 47,304 ops/s 414,672 ops/s better-sqlite3 1.11x
getPublishedPosts (JOIN) 27 ops/s 27 ops/s 54 ops/s 7 ops/s libsql 1.98x
getPostWithAuthor (JOIN :one) 477,271 ops/s 379,911 ops/s 32,433 ops/s 236,297 ops/s better-sqlite3 1.26x
countPostsByUser (pluck) 1,151,783 ops/s 689,478 ops/s 111,824 ops/s 377,235 ops/s better-sqlite3 1.67x
insertUser 53,693 ops/s 41,291 ops/s 28,385 ops/s 63,017 ops/s turso 1.53x
updatePostViews 136,399 ops/s 97,956 ops/s 53,598 ops/s 59,273 ops/s better-sqlite3 1.39x
  • better-sqlite3 is the fastest for most operations, with node:sqlite second
  • Turso has a surprisingly slow query for countPostsByUser (better-sqlite3 is almost 100x faster here). I did not investigate why this is, it might be that the Turso database like many others (eg Postgresql) needs to scan the full table in order to count the number of rows and has no fast handling for this special case.

The benchmark is implemented using SQG, a SQL to code generator.

One advantage of using a code generator like SQG is that you can switch between SQLite drivers without rewriting your queries. SQG generates type-safe code for all four drivers from the same SQL file:

# sqg.yaml - generate code for multiple drivers
sql:
- files:
- queries.sql
gen:
- generator: typescript/sqlite/better-sqlite3
output: ./src/db-better-sqlite3.ts
- generator: typescript/sqlite/node
output: ./src/db-node-sqlite.ts
- generator: typescript/sqlite/libsql
output: ./src/db-libsql.ts
- generator: typescript/sqlite/turso
output: ./src/db-turso.ts

This makes it easy to benchmark with your actual queries and switch drivers by changing imports.

The benchmark code is available in our examples repository:

Terminal window
git clone https://github.com/sqg-dev/sqg
cd sqg/examples/typescript-sqlite-benchmark
pnpm install
pnpm generate
pnpm bench

This will generate an HTML report with the results table shown above.

I think for most applications, better-sqlite3 remains the best choice.


Benchmarks generated using SQG. Have questions? Open an issue on GitHub.

Introducing SQG

We’re excited to announce SQG (SQL Query Generator), a tool that generates type-safe database access code from your SQL queries.

Writing database code means maintaining SQL queries and matching TypeScript/Java types. When schemas change, you update both—and hope nothing breaks. ORMs abstract SQL away, but what if you want to write SQL directly while keeping type safety?

SQG reads annotated .sql files, executes queries against real databases to introspect column types, and generates fully-typed wrapper code.

Write your SQL with simple annotations:

-- MIGRATE createUsersTable
CREATE TABLE users (id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
email TEXT);
-- QUERY getUserById :one
@set id = 1
SELECT id, name, email FROM users WHERE id = ${id};
-- QUERY getUsers
SELECT id, name, email FROM users;
-- EXEC insertUser
@set name = 'John'
@set email = '[email protected]'
INSERT INTO users (name, email) VALUES (${name}, ${email});

SQG generates type-safe code:

export class Queries {
getUserById(id: number): { id: number; name: string; email: string | null } | undefined
getUsers(): { id: number; name: string; email: string | null }[]
insertUser(name: string, email: string): RunResult
}
  • Type-safe by design - Column types inferred from your actual database
  • Multiple databases - SQLite, DuckDB, and PostgreSQL
  • Multiple languages - Generate TypeScript or Java from the same SQL
  • DBeaver compatible - Develop queries in DBeaver, generate code from the same file
  • Zero runtime overhead - Generated code is plain function calls

Install SQG:

Terminal window
pnpm add -g @sqg/sqg
pnpm approve-builds -g # needed for sqlite dependency

Create a sqg.yaml config and your SQL file, then run sqg sqg.yaml. Check out the Getting Started guide for details.

We’d love your feedback! File issues and feature requests on GitHub.