Skip to content

Java + DuckDB Arrow

Generate Java code for DuckDB using the high-performance Apache Arrow API. This generator provides zero-copy data access and is optimized for analytical workloads with large result sets.

PropertyValue
Generatorjava/duckdb/arrow
RuntimeJVM (Java 17+)
API StyleSynchronous with Arrow vectors
DriverDuckDB Java + Arrow

Choose java/duckdb/arrow over java/duckdb when:

  • Processing large result sets (millions of rows)
  • Need maximum performance for data processing
  • Integrating with Arrow-based tools (Pandas, Spark)

Use java/duckdb instead when:

  • Working with small result sets
build.gradle
dependencies {
implementation 'org.duckdb:duckdb_jdbc:1.4.2.0'
implementation 'org.apache.arrow:arrow-vector:18.3.0'
implementation 'org.apache.arrow:arrow-c-data:18.3.0'
implementation 'org.apache.arrow:arrow-memory-netty:18.3.0'
}
pom.xml
<dependencies>
<dependency>
<groupId>org.duckdb</groupId>
<artifactId>duckdb_jdbc</artifactId>
<version>1.4.2.0</version>
</dependency>
<dependency>
<groupId>org.apache.arrow</groupId>
<artifactId>arrow-vector</artifactId>
<version>18.3.0</version>
</dependency>
<dependency>
<groupId>org.apache.arrow</groupId>
<artifactId>arrow-c-data</artifactId>
<version>18.3.0</version>
</dependency>
<dependency>
<groupId>org.apache.arrow</groupId>
<artifactId>arrow-memory-netty</artifactId>
<version>18.3.0</version>
</dependency>
</dependencies>
sqg.yaml
version: 1
name: analytics
sql:
- files:
- queries.sql
gen:
- generator: java/duckdb/arrow
output: ./src/main/java/com/myapp/analytics/
config:
package: com.myapp.analytics
  1. Write your SQL

    Java DuckDB Arrow Complete Example
    Full example for Java DuckDB Arrow generator with complex types
    Try in Playground
    -- MIGRATE 1
    CREATE TABLE events (
    id INTEGER,
    user_id INTEGER,
    event_type VARCHAR,
    properties STRUCT(page VARCHAR, referrer VARCHAR),
    tags VARCHAR[],
    timestamp TIMESTAMP
    );
    -- TESTDATA seed
    INSERT INTO events VALUES
    (1, 100, 'pageview', {'page': '/home', 'referrer': 'google.com'}, ['web'], NOW()),
    (2, 100, 'click', {'page': '/products', 'referrer': '/home'}, ['web', 'conversion'], NOW());
    -- QUERY events_by_user
    @set user_id = 100
    SELECT * FROM events WHERE user_id = ${user_id} ORDER BY timestamp DESC;
    -- QUERY event_counts
    SELECT event_type, COUNT(*) as count
    FROM events
    GROUP BY event_type
    ORDER BY count DESC;
    -- QUERY events_with_tag
    @set tag = 'web'
    SELECT * FROM events WHERE list_contains(tags, ${tag});
  2. Generate code

    Terminal window
    sqg sqg.yaml
  3. Use the generated code

    Generated Java (Arrow)
    From: Java DuckDB Arrow Complete Example
    See full code in Playground
    // ...
    public record EventCountsResult(
    PreparedStatement statement,
    RootAllocator allocator,
    ArrowReader reader,
    VarCharVector event_type,
    BigIntVector count
    ) implements AutoCloseable {
    public boolean loadNextBatch() throws IOException {
    return reader.loadNextBatch();
    }
    public int getRowCount() throws IOException {
    return reader.getVectorSchemaRoot().getRowCount();
    }
    public void close() throws IOException, SQLException {
    reader.close();
    allocator.close();
    statement.close();
    }
    }
    public EventCountsResult eventCounts() throws SQLException, IOException {
    var stmt = connection.prepareStatement(
    """
    SELECT event_type, COUNT(*) as count
    FROM events
    GROUP BY event_type
    ORDER BY count DESC;"""
    );
    var rs = (DuckDBResultSet) stmt.executeQuery();
    var allocator = new RootAllocator();
    var reader = (ArrowReader) rs.arrowExportStream(allocator, 65536);
    var root = reader.getVectorSchemaRoot();
    return new EventCountsResult(
    stmt,
    allocator,
    reader,
    (VarCharVector) root.getVector("event_type"),
    (BigIntVector) root.getVector("count")
    );
    }
    // ...

    Example usage:

    import com.myapp.analytics.Analytics;
    import org.duckdb.DuckDBConnection;
    import java.sql.DriverManager;
    public class Main {
    public static void main(String[] args) throws Exception {
    // Connect to DuckDB
    DuckDBConnection conn = (DuckDBConnection) DriverManager
    .getConnection("jdbc:duckdb:");
    // Run migrations
    for (String migration : Analytics.getMigrations()) {
    conn.createStatement().execute(migration);
    }
    // Create query instance
    Analytics queries = new Analytics(conn);
    // Query with Arrow result
    for (var event : queries.eventsByUser(100)) {
    System.out.println(event.eventType());
    System.out.println(event.properties().page()); // Struct access
    System.out.println(event.tags()); // List access
    }
    // Aggregate query
    for (var count : queries.eventCounts()) {
    System.out.printf("%s: %d%n", count.eventType(), count.count());
    }
    conn.close();
    }
    }

The Arrow generator provides native support for DuckDB’s complex types:

CREATE TABLE products (
id INTEGER,
details STRUCT(name VARCHAR, price DOUBLE, available BOOLEAN)
);
var product = queries.getProduct(1);
System.out.println(product.details().name());
System.out.println(product.details().price());
System.out.println(product.details().available());
CREATE TABLE posts (
id INTEGER,
tags VARCHAR[],
scores INTEGER[]
);
var post = queries.getPost(1);
List<String> tags = post.tags(); // Direct List access
List<Integer> scores = post.scores();
CREATE TABLE reports (
id INTEGER,
data STRUCT(
summary VARCHAR,
metrics STRUCT(views INTEGER, clicks INTEGER)[]
)
);
var report = queries.getReport(1);
System.out.println(report.data().summary());
for (var metric : report.data().metrics()) {
System.out.printf("Views: %d, Clicks: %d%n",
metric.views(), metric.clicks());
}
DuckDB TypeJava Type
INTEGERInteger
BIGINTLong
DOUBLEDouble
VARCHARString
BOOLEANBoolean
TIMESTAMPInstant
DATELocalDate
TIMELocalTime
DECIMALBigDecimal
UUIDUUID
BLOBbyte[]
STRUCT(...)Generated record type
T[]List<T>
MAP(K,V)HashMap<K,V>

Arrow enables efficient batch processing:

// Process large results in batches
try (var result = queries.largeQuery()) {
while (result.hasNext()) {
var batch = result.nextBatch();
processBatch(batch);
}
}
  • Real-time analytics dashboards
  • Log analysis and aggregation
  • Data pipeline processing
  • Machine learning feature extraction
  • Business intelligence applications
Analytics Dashboard
Example analytics dashboard queries with aggregations and date filtering
Try in Playground
-- MIGRATE 1
CREATE TABLE events (
id INTEGER PRIMARY KEY,
user_id INTEGER NOT NULL,
event_type VARCHAR NOT NULL,
properties STRUCT(page VARCHAR, referrer VARCHAR),
timestamp TIMESTAMP NOT NULL
);
-- TESTDATA seed
INSERT INTO events (id, user_id, event_type, properties, timestamp) VALUES
(1, 100, 'pageview', {'page': '/home', 'referrer': 'google'}, TIMESTAMP '2024-01-15 10:00:00'),
(2, 101, 'pageview', {'page': '/products', 'referrer': 'direct'}, TIMESTAMP '2024-01-15 11:00:00'),
(3, 100, 'conversion', {'page': '/checkout', 'referrer': 'internal'}, TIMESTAMP '2024-01-15 12:00:00'),
(4, 102, 'pageview', {'page': '/home', 'referrer': 'google'}, TIMESTAMP '2024-01-16 09:00:00'),
(5, 101, 'conversion', {'page': '/checkout', 'referrer': 'internal'}, TIMESTAMP '2024-01-16 14:00:00');
-- QUERY daily_metrics
@set start_date = 2024-01-01
@set end_date = 2024-01-31
SELECT
DATE_TRUNC('day', timestamp) as day,
COUNT(*) as events,
COUNT(DISTINCT user_id) as unique_users,
COUNT(*) FILTER (WHERE event_type = 'conversion') as conversions
FROM events
WHERE timestamp BETWEEN ${start_date} AND ${end_date}
GROUP BY 1
ORDER BY 1;
-- QUERY top_pages
@set limit_count = 10
SELECT
properties.page as page,
COUNT(*) as views,
COUNT(DISTINCT user_id) as unique_visitors
FROM events
WHERE event_type = 'pageview'
GROUP BY 1
ORDER BY views DESC
LIMIT ${limit_count};