Skip to content

Benchmarking DuckDB From Java: Fast INSERT, UPDATE, and DELETE

DuckDB benchmark comparing 7 methods for INSERT, UPDATE, and DELETE from Java

DuckDB’s Java documentation recommends the Appender as the efficient way to insert data. But two questions come up in practice: how do you efficiently UPDATE and DELETE? And is the Appender really the fastest option for INSERT, or are there faster alternatives?

We benchmarked 7 different methods — from naive row-by-row JDBC to the Appender, Arrow streams, and DuckDB’s Java table function API — across INSERT, UPDATE, and DELETE, scaling from 1K to 1M rows.

We used a sensor-readings table with 9 columns covering DuckDB’s core types: UUID, TIMESTAMPTZ, DOUBLE, DECIMAL(10,2), SMALLINT, BOOLEAN, VARCHAR, and VARCHAR[] (same schema as our PostgreSQL benchmark).

Each method was warmed up, then measured multiple times (median taken). After every operation, we verified correctness: row counts, spot-checked values, and for UPDATE specifically verified the target column was actually modified. DuckDB runs on disk with a single connection.

All benchmark code is open source: examples/java-duckdb-benchmark.

We tested 6 insert methods:

1. Individual INSERT — One executeUpdate() per row in a transaction. Simple but slow: ~3.3K rows/sec at all sizes. (code)

2. JDBC Batch — Standard JDBC addBatch() / executeBatch() via the DuckDB JDBC driver. On PostgreSQL this is a solid optimization; on DuckDB it barely helps — ~5K rows/sec. (code)

3. Multi-value INSERT — Build INSERT INTO t VALUES (...),(...),(...) ... with up to 10K rows per statement. ~14K rows/sec. (code)

4. Appender — DuckDB’s recommended bulk insert API. Bypasses the SQL parser — you call beginRow(), append() for each column, and endRow(): (code)

try (var appender = ((DuckDBConnection) conn)
.createAppender(DuckDBConnection.DEFAULT_SCHEMA, "sensor_readings")) {
for (var r : rows) {
appender.beginRow();
appender.append(r.deviceId());
appender.append(r.timestamp());
appender.append(r.temperature());
// ... remaining columns
appender.endRow();
}
}

5. Arrow Stream — Build Apache Arrow columnar vectors in Java, register them with DuckDB via registerArrowStream, then INSERT INTO ... SELECT FROM stream. Since Arrow vectors must be fully built before registering, the data is processed in chunks of 10K rows — each chunk requires a separate INSERT ... SELECT statement. (code)

6. Table Function (UDF) — Register a Java table function via DuckDB’s DuckDBFunctions.tableFunction() API. DuckDB pulls rows directly from a Java Iterator in chunks of 2,048. Unlike Arrow, this executes a single SQL statement regardless of data size — DuckDB calls the apply() callback repeatedly until the iterator is exhausted: (code)

DuckDBFunctions.tableFunction()
.withName("_bench_insert_rows")
.withFunction(new DuckDBTableFunction<>() {
public long apply(DuckDBTableFunctionCallInfo info,
DuckDBDataChunkWriter output) {
// Fill columnar vectors, up to output.capacity() rows
// DuckDB calls this repeatedly until we return 0
}
})
.register(conn);
// Then: INSERT INTO sensor_readings SELECT ... FROM _bench_insert_rows()

For UPDATE and DELETE, Individual and JDBC Batch work the same way as for INSERT — one statement per row or batch. (update code, delete code)

The interesting methods are the set-based approaches:

UNNEST list params — Pass all keys as parallel array parameters: (code)

DELETE FROM sensor_readings
WHERE (device_id, timestamp) IN (
SELECT UNNEST(?::UUID[]), UNNEST(?::TIMESTAMPTZ[])
);

Arrow stream join — Stage keys as Arrow vectors, then join against the main table. Same registerArrowStream technique but with only the key columns. (update code, delete code)

DELETE FROM sensor_readings s
USING arrow_stream k
WHERE s.device_id = k.device_id::UUID AND s.timestamp = k.timestamp

Temp table + Appender — Create a temp table, bulk-load keys with the Appender, then execute a set-based join. This turns DuckDB’s INSERT-only Appender into a tool for bulk UPDATE and DELETE: (update code, delete code)

// 1. Create staging table
stmt.execute("CREATE TEMP TABLE _keys (device_id UUID, timestamp TIMESTAMPTZ)");
// 2. Bulk-load keys via Appender
try (var appender = ((DuckDBConnection) conn)
.createAppender(DuckDBConnection.DEFAULT_SCHEMA, "_keys")) {
for (var r : rows) {
appender.beginRow();
appender.append(r.deviceId());
appender.append(r.timestamp());
appender.endRow();
}
}
// 3. Set-based delete
stmt.execute("""
DELETE FROM sensor_readings s
USING _keys k
WHERE s.device_id = k.device_id AND s.timestamp = k.timestamp""");
stmt.execute("DROP TABLE _keys");

The same pattern works for UPDATE — add the new values to the staging table:

CREATE TEMP TABLE _update_keys (device_id UUID, timestamp TIMESTAMPTZ, temperature DOUBLE);
-- Appender-load keys + new values
UPDATE sensor_readings s
SET temperature = k.temperature
FROM _update_keys k
WHERE s.device_id = k.device_id AND s.timestamp = k.timestamp;

Table function join — Same as the insert table function, but DuckDB uses it as a key source in a DELETE ... USING or UPDATE ... FROM join. (update code, delete code)

DuckDB Insert / Update / Delete Benchmark

Comparing JDBC :batch, appender, Arrow, UNNEST list params, and temp-table join

INSERT: Throughput vs Total Rows

UPDATE: Throughput vs Total Rows

DELETE: Throughput vs Total Rows

INSERT results

1,000 rows

MethodTime (ms)Rows/secvs Best
Individual INSERT345.92,89129.3x slower
Batch (:batch)216.84,61318.3x slower
Multi-value INSERT53.818,5824.6x slower
Appender (:appender)11.884,630fastest
Arrow stream19.251,9771.6x slower
Table function13.673,6221.1x slower

5,000 rows

MethodTime (ms)Rows/secvs Best
Individual INSERT1,582.33,16064.4x slower
Batch (:batch)1,009.24,95441.1x slower
Multi-value INSERT334.514,94813.6x slower
Appender (:appender)35.0142,9851.4x slower
Arrow stream26.1191,6771.1x slower
Table function24.6203,537fastest

10,000 rows

MethodTime (ms)Rows/secvs Best
Individual INSERT3,016.63,315111.4x slower
Batch (:batch)1,950.25,12872.0x slower
Multi-value INSERT750.513,32527.7x slower
Appender (:appender)62.3160,4722.3x slower
Arrow stream27.1369,291fastest
Table function39.2254,8081.4x slower

50,000 rows

MethodTime (ms)Rows/secvs Best
Individual INSERT15,062.93,319106.5x slower
Batch (:batch)9,805.05,09969.3x slower
Multi-value INSERT3,722.113,43326.3x slower
Appender (:appender)347.8143,7512.5x slower
Arrow stream188.1265,7911.3x slower
Table function141.5353,377fastest

100,000 rows

MethodTime (ms)Rows/secvs Best
Individual INSERT30,299.73,300128.7x slower
Batch (:batch)19,854.55,03784.3x slower
Multi-value INSERT7,300.813,69731.0x slower
Appender (:appender)629.1158,9452.7x slower
Arrow stream235.4424,734fastest
Table function250.2399,6831.1x slower

500,000 rows

MethodTime (ms)Rows/secvs Best
Individual INSERT--skipped
Batch (:batch)--skipped
Multi-value INSERT--skipped
Appender (:appender)3,120.3160,2432.6x slower
Arrow stream1,288.7387,9941.1x slower
Table function1,179.9423,747fastest

1,000,000 rows

MethodTime (ms)Rows/secvs Best
Individual INSERT--skipped
Batch (:batch)--skipped
Multi-value INSERT--skipped
Appender (:appender)6,544.0152,8122.5x slower
Arrow stream2,922.8342,1391.1x slower
Table function2,608.4383,377fastest
UPDATE results

1,000 rows

MethodTime (ms)Rows/secvs Best
Individual UPDATE226.24,42120.8x slower
Batch (:batch)192.05,20817.7x slower
UNNEST list params12.281,7561.1x slower
Arrow stream join11.685,9001.1x slower
Temp table join10.991,933fastest
Table function11.090,759~same

5,000 rows

MethodTime (ms)Rows/secvs Best
Individual UPDATE1,179.64,23995.1x slower
Batch (:batch)1,052.64,75084.8x slower
UNNEST list params17.0294,1311.4x slower
Arrow stream join15.2329,7651.2x slower
Temp table join12.4402,927fastest
Table function12.6396,662~same

10,000 rows

MethodTime (ms)Rows/secvs Best
Individual UPDATE2,863.73,492211.6x slower
Batch (:batch)2,298.94,350169.9x slower
UNNEST list params24.0417,1461.8x slower
Arrow stream join16.7597,3721.2x slower
Temp table join13.5739,041fastest
Table function15.1660,4871.1x slower

50,000 rows

MethodTime (ms)Rows/secvs Best
Individual UPDATE14,630.43,418688.1x slower
Batch (:batch)13,578.43,682638.6x slower
UNNEST list params119.9417,1885.6x slower
Arrow stream join69.0724,5043.2x slower
Temp table join21.32,351,675fastest
Table function24.62,033,8701.2x slower

100,000 rows

MethodTime (ms)Rows/secvs Best
Individual UPDATE36,936.22,7071199.7x slower
Batch (:batch)31,714.73,1531030.1x slower
UNNEST list params232.0431,0707.5x slower
Arrow stream join138.0724,7984.5x slower
Temp table join30.83,248,036fastest
Table function36.92,707,6821.2x slower

500,000 rows

MethodTime (ms)Rows/secvs Best
Individual UPDATE--skipped
Batch (:batch)--skipped
UNNEST list params1,223.8408,5757.6x slower
Arrow stream join669.7746,5914.2x slower
Temp table join160.93,106,897fastest
Table function198.92,513,7921.2x slower

1,000,000 rows

MethodTime (ms)Rows/secvs Best
Individual UPDATE--skipped
Batch (:batch)--skipped
UNNEST list params2,340.3427,2999.1x slower
Arrow stream join1,321.6756,6545.1x slower
Temp table join257.33,887,168fastest
Table function330.83,023,0351.3x slower
DELETE results

1,000 rows

MethodTime (ms)Rows/secvs Best
Individual DELETE199.45,01620.1x slower
Batch (:batch)166.85,99716.8x slower
UNNEST list params10.991,4741.1x slower
Arrow stream join12.878,1301.3x slower
Temp table join10.694,4941.1x slower
Table function9.9100,844fastest

5,000 rows

MethodTime (ms)Rows/secvs Best
Individual DELETE916.25,45781.0x slower
Batch (:batch)808.06,18871.5x slower
UNNEST list params15.6320,6131.4x slower
Arrow stream join14.5343,7471.3x slower
Temp table join11.8423,315~same
Table function11.3442,306fastest

10,000 rows

MethodTime (ms)Rows/secvs Best
Individual DELETE1,820.05,495142.6x slower
Batch (:batch)1,784.65,603139.8x slower
UNNEST list params22.5444,3111.8x slower
Arrow stream join16.5606,2961.3x slower
Temp table join12.8783,441fastest
Table function14.5689,8561.1x slower

50,000 rows

MethodTime (ms)Rows/secvs Best
Individual DELETE10,988.24,550584.2x slower
Batch (:batch)8,356.65,983444.3x slower
UNNEST list params94.6528,4075.0x slower
Arrow stream join64.5774,7373.4x slower
Temp table join18.82,658,112fastest
Table function25.11,993,2601.3x slower

100,000 rows

MethodTime (ms)Rows/secvs Best
Individual DELETE23,546.44,247836.7x slower
Batch (:batch)21,117.74,735750.4x slower
UNNEST list params192.2520,2136.8x slower
Arrow stream join130.4766,9584.6x slower
Temp table join28.13,553,297fastest
Table function40.02,500,6961.4x slower

500,000 rows

MethodTime (ms)Rows/secvs Best
Individual DELETE--skipped
Batch (:batch)--skipped
UNNEST list params956.8522,5959.4x slower
Arrow stream join642.3778,4046.3x slower
Temp table join101.64,922,569fastest
Table function148.83,359,9511.5x slower

1,000,000 rows

MethodTime (ms)Rows/secvs Best
Individual DELETE--skipped
Batch (:batch)--skipped
UNNEST list params1,891.2528,75610.3x slower
Arrow stream join1,248.1801,1926.8x slower
Temp table join184.25,429,384fastest
Table function302.23,308,9811.6x slower

System Information

Java25 (Eclipse Adoptium)
OSLinux 6.8.0-107-generic
Archamd64
CPUs24
Max Memory7,956 MB
DuckDBv1.5.2
Warmup Runs1
Measurement Runs3

The Appender is the right starting point for most Java applications. It’s simple, has no extra dependencies, and at 160K rows/sec it’s 30× faster than JDBC batch. For batches under 5K rows, it’s actually the fastest method.

Table Functions and Arrow: For Higher Throughput

Section titled “Table Functions and Arrow: For Higher Throughput”

Above 10K rows, Table Functions (427K rows/sec at 1M) and Arrow (381K rows/sec) both deliver 2.5–2.7× the Appender’s throughput:

  • Table Functions need no extra dependencies — just the DuckDB JDBC driver. You implement a bind()/init()/apply() callback that writes columnar vectors. The key advantage: the entire operation runs as a single SQL statement, regardless of data size. DuckDB streams through the callback internally. This is why Table Functions overtake Arrow at large sizes (500K+). For UPDATE/DELETE they place second behind temp-table-join at 3.0–3.4M rows/sec.
  • Arrow requires Apache Arrow JARs and more code (schema definitions, vector building, stream registration, off-heap memory management). Arrow vectors must be fully built in memory before registering, so data is processed in chunks — each chunk is a separate SQL statement. At 1M rows with 10K-row chunks, that’s 100 statements vs Table Function’s 1. Arrow wins at medium sizes (10K–100K) but falls behind Table Functions at 1M. It’s the right choice when your data is already columnar or when you need interop with other Arrow-based systems.

The temp-table-join pattern delivers 5.8M deletes/sec and 3.8M updates/sec at 1M rows — over 1,000× faster than JDBC batch. Instead of executing 1M individual statements, DuckDB executes a single set-based join. UNNEST (the other set-based approach) is 9–11× slower at 1M.

On DuckDB, addBatch() / executeBatch() provides almost no speedup over individual statements (~5K vs ~3.3K rows/sec). DuckDB doesn’t rewrite batched statements the way PostgreSQL does with reWriteBatchedInserts. JDBC Batch is still the simplest option and works across both DuckDB and PostgreSQL, making it useful for portable code or small batches.

  • DuckDB JDBC (in-process, on-disk database)
  • Java 25 (Temurin)
  • Apache Arrow 19.0.0
  • Each method: warmup + measurement runs, median reported
  • Data verified after every operation (row count + value checks)
  • All methods use a single connection
  • All methods accept Iterable (not List) — no method assumes the full dataset fits in memory
  • Methods that need batching (Arrow, UNNEST, Multi-value, JDBC Batch) chunk at 10,000 rows internally
  • The Table Function streams naturally via its Iterator-based callback at 2,048 rows per chunk
Terminal window
git clone https://github.com/sqg-dev/sqg
cd sqg/examples/java-duckdb-benchmark
just run # normal (skips slow methods at large sizes)
just run --all # include Individual/Batch at every size

The benchmark uses SQG, a type-safe SQL code generator. You write annotated .sql files with your queries, and SQG introspects them against a real database at build time to generate type-safe access code for Java, TypeScript, or Python. No ORM, no runtime reflection — just plain SQL in, type-safe code out.

For example, this annotation:

-- TABLE sensor_readings :appender

generates a complete type-safe SensorReadingsAppender class with append(UUID deviceId, OffsetDateTime timestamp, ...) — column types inferred directly from DuckDB.

From the single queries.sql file, SQG generated most of the code used in this benchmark:

The Arrow and Table Function methods are the only hand-written ones — everything else is generated from SQL.

SQG can currently generate DuckDB appender code and also produce Arrow code for reading results (SELECT). We are planning to add more support for generating code for UDFs and methods which use Arrow as input.


Questions or feedback? Join the discussion on GitHub.