Java + DuckDB Arrow

Generate Java code for DuckDB using the high-performance Apache Arrow API. This generator provides zero-copy data access and is optimized for analytical workloads with large result sets.

Overview

Property	Value
Generator	`java/duckdb/arrow`
Runtime	JVM (Java 17+)
API Style	Synchronous with Arrow vectors
Driver	DuckDB Java + Arrow

When to Use

Choose java/duckdb/arrow over java/duckdb when:

Processing large result sets (millions of rows)
Need maximum performance for data processing
Integrating with Arrow-based tools (Pandas, Spark)

Use java/duckdb instead when:

Working with small result sets

Installation

Gradle

dependencies {
    implementation 'org.duckdb:duckdb_jdbc:1.4.2.0'
    implementation 'org.apache.arrow:arrow-vector:18.3.0'
    implementation 'org.apache.arrow:arrow-c-data:18.3.0'
    implementation 'org.apache.arrow:arrow-memory-netty:18.3.0'
}

Maven

<dependencies>
    <dependency>
        <groupId>org.duckdb</groupId>
        <artifactId>duckdb_jdbc</artifactId>
        <version>1.4.2.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.arrow</groupId>
        <artifactId>arrow-vector</artifactId>
        <version>18.3.0</version>
    </dependency>
        <dependency>
        <groupId>org.apache.arrow</groupId>
        <artifactId>arrow-c-data</artifactId>
        <version>18.3.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.arrow</groupId>
        <artifactId>arrow-memory-netty</artifactId>
        <version>18.3.0</version>
    </dependency>
</dependencies>

Configuration

version: 1
name: analytics

sql:
  - files:
      - queries.sql
    gen:
      - generator: java/duckdb/arrow
        output: ./src/main/java/com/myapp/analytics/
        config:
          package: com.myapp.analytics
          migrations: true  # generates applyMigrations() method

Quick Start

Write your SQL

Java DuckDB Arrow Complete Example
Full example for Java DuckDB Arrow generator with complex types

Try in Playground

-- MIGRATE 1
CREATE TABLE events (
  id INTEGER,
  user_id INTEGER,
  event_type VARCHAR,
  properties STRUCT(page VARCHAR, referrer VARCHAR),
  tags VARCHAR[],
  timestamp TIMESTAMP
);

-- TESTDATA seed
INSERT INTO events VALUES
  (1, 100, 'pageview', {'page': '/home', 'referrer': 'google.com'}, ['web'], NOW()),
  (2, 100, 'click', {'page': '/products', 'referrer': '/home'}, ['web', 'conversion'], NOW());

-- QUERY events_by_user
@set user_id = 100
SELECT * FROM events WHERE user_id = ${user_id} ORDER BY timestamp DESC;

-- QUERY event_counts
SELECT event_type, COUNT(*) as count
FROM events
GROUP BY event_type
ORDER BY count DESC;

-- QUERY events_with_tag
@set tag = 'web'
SELECT * FROM events WHERE list_contains(tags, ${tag});

Generate code
Terminal window
```
sqg sqg.yaml
```

Use the generated code

Generated Java (Arrow)

From: Java DuckDB Arrow Complete Example

See full code in Playground

// ...

    public record EventCountsResult(
        PreparedStatement statement,
        RootAllocator allocator,
        ArrowReader reader,
        VarCharVector event_type,
        BigIntVector count
    ) implements AutoCloseable {
        public boolean loadNextBatch() throws IOException {
            return reader.loadNextBatch();
        }

        public int getRowCount() throws IOException {
            return reader.getVectorSchemaRoot().getRowCount();
        }

        public void close() throws IOException, SQLException {
            reader.close();
            allocator.close();
            statement.close();
        }
    }

    public EventCountsResult eventCounts() throws SQLException, IOException {
        var stmt = connection.prepareStatement(
            """
            SELECT event_type, COUNT(*) as count
            FROM events
            GROUP BY event_type
            ORDER BY count DESC;"""
        );
        var rs = (DuckDBResultSet) stmt.executeQuery();
        var allocator = new RootAllocator();
        var reader = (ArrowReader) rs.arrowExportStream(allocator, 65536);
        var root = reader.getVectorSchemaRoot();

        return new EventCountsResult(
            stmt,
            allocator,
            reader,
            (VarCharVector) root.getVector("event_type"),
            (BigIntVector) root.getVector("count")
        );
    }


// ...

Example usage:

import com.myapp.analytics.Analytics;
import org.duckdb.DuckDBConnection;
import java.sql.DriverManager;

public class Main {
    public static void main(String[] args) throws Exception {
        // Connect to DuckDB
        DuckDBConnection conn = (DuckDBConnection) DriverManager
            .getConnection("jdbc:duckdb:");

        // Apply migrations (with built-in tracking)
        Analytics.applyMigrations(conn);

        // Create query instance
        Analytics queries = new Analytics(conn);

        // Query with Arrow result
        for (var event : queries.eventsByUser(100)) {
            System.out.println(event.eventType());
            System.out.println(event.properties().page());  // Struct access
            System.out.println(event.tags());  // List access
        }

        // Aggregate query
        for (var count : queries.eventCounts()) {
            System.out.printf("%s: %d%n", count.eventType(), count.count());
        }

        conn.close();
    }
}

Complex Types

The Arrow generator provides native support for DuckDB’s complex types:

Structs

CREATE TABLE products (
  id INTEGER,
  details STRUCT(name VARCHAR, price DOUBLE, available BOOLEAN)
);

var product = queries.getProduct(1);
System.out.println(product.details().name());
System.out.println(product.details().price());
System.out.println(product.details().available());

Lists

CREATE TABLE posts (
  id INTEGER,
  tags VARCHAR[],
  scores INTEGER[]
);

var post = queries.getPost(1);
List<String> tags = post.tags();  // Direct List access
List<Integer> scores = post.scores();

Nested Types

CREATE TABLE reports (
  id INTEGER,
  data STRUCT(
    summary VARCHAR,
    metrics STRUCT(views INTEGER, clicks INTEGER)[]
  )
);

var report = queries.getReport(1);
System.out.println(report.data().summary());
for (var metric : report.data().metrics()) {
    System.out.printf("Views: %d, Clicks: %d%n",
        metric.views(), metric.clicks());
}

Type Mapping

DuckDB Type	Java Type
`INTEGER`	`Integer`
`BIGINT`	`Long`
`DOUBLE`	`Double`
`VARCHAR`	`String`
`BOOLEAN`	`Boolean`
`TIMESTAMP`	`Instant`
`DATE`	`LocalDate`
`TIME`	`LocalTime`
`DECIMAL`	`BigDecimal`
`UUID`	`UUID`
`BLOB`	`byte[]`
`STRUCT(...)`	Generated record type
`T[]`	`List<T>`
`MAP(K,V)`	`HashMap<K,V>`

Batch Processing

Arrow enables efficient batch processing:

// Process large results in batches
try (var result = queries.largeQuery()) {
    while (result.hasNext()) {
        var batch = result.nextBatch();
        processBatch(batch);
    }
}

Use Cases

Real-time analytics dashboards
Log analysis and aggregation
Data pipeline processing
Machine learning feature extraction
Business intelligence applications

Example: Analytics Dashboard

Analytics Dashboard
Example analytics dashboard queries with aggregations and date filtering