Java + DuckDB Arrow
Generate Java code for DuckDB using the high-performance Apache Arrow API. This generator provides zero-copy data access and is optimized for analytical workloads with large result sets.
Overview
Section titled “Overview”| Property | Value |
|---|---|
| Generator | java/duckdb/arrow |
| Runtime | JVM (Java 17+) |
| API Style | Synchronous with Arrow vectors |
| Driver | DuckDB Java + Arrow |
When to Use
Section titled “When to Use”Choose java/duckdb/arrow over java/duckdb when:
- Processing large result sets (millions of rows)
- Need maximum performance for data processing
- Integrating with Arrow-based tools (Pandas, Spark)
Use java/duckdb instead when:
- Working with small result sets
Installation
Section titled “Installation”Gradle
Section titled “Gradle”dependencies { implementation 'org.duckdb:duckdb_jdbc:1.4.2.0' implementation 'org.apache.arrow:arrow-vector:18.3.0' implementation 'org.apache.arrow:arrow-c-data:18.3.0' implementation 'org.apache.arrow:arrow-memory-netty:18.3.0'}<dependencies> <dependency> <groupId>org.duckdb</groupId> <artifactId>duckdb_jdbc</artifactId> <version>1.4.2.0</version> </dependency> <dependency> <groupId>org.apache.arrow</groupId> <artifactId>arrow-vector</artifactId> <version>18.3.0</version> </dependency> <dependency> <groupId>org.apache.arrow</groupId> <artifactId>arrow-c-data</artifactId> <version>18.3.0</version> </dependency> <dependency> <groupId>org.apache.arrow</groupId> <artifactId>arrow-memory-netty</artifactId> <version>18.3.0</version> </dependency></dependencies>Configuration
Section titled “Configuration”version: 1name: analytics
sql: - files: - queries.sql gen: - generator: java/duckdb/arrow output: ./src/main/java/com/myapp/analytics/ config: package: com.myapp.analyticsQuick Start
Section titled “Quick Start”-
Write your SQL
-- MIGRATE 1CREATE TABLE events (id INTEGER,user_id INTEGER,event_type VARCHAR,properties STRUCT(page VARCHAR, referrer VARCHAR),tags VARCHAR[],timestamp TIMESTAMP);-- TESTDATA seedINSERT INTO events VALUES(1, 100, 'pageview', {'page': '/home', 'referrer': 'google.com'}, ['web'], NOW()),(2, 100, 'click', {'page': '/products', 'referrer': '/home'}, ['web', 'conversion'], NOW());-- QUERY events_by_user@set user_id = 100SELECT * FROM events WHERE user_id = ${user_id} ORDER BY timestamp DESC;-- QUERY event_countsSELECT event_type, COUNT(*) as countFROM eventsGROUP BY event_typeORDER BY count DESC;-- QUERY events_with_tag@set tag = 'web'SELECT * FROM events WHERE list_contains(tags, ${tag}); -
Generate code
Terminal window sqg sqg.yaml -
Use the generated code
// ...public record EventCountsResult(PreparedStatement statement,RootAllocator allocator,ArrowReader reader,VarCharVector event_type,BigIntVector count) implements AutoCloseable {public boolean loadNextBatch() throws IOException {return reader.loadNextBatch();}public int getRowCount() throws IOException {return reader.getVectorSchemaRoot().getRowCount();}public void close() throws IOException, SQLException {reader.close();allocator.close();statement.close();}}public EventCountsResult eventCounts() throws SQLException, IOException {var stmt = connection.prepareStatement("""SELECT event_type, COUNT(*) as countFROM eventsGROUP BY event_typeORDER BY count DESC;""");var rs = (DuckDBResultSet) stmt.executeQuery();var allocator = new RootAllocator();var reader = (ArrowReader) rs.arrowExportStream(allocator, 65536);var root = reader.getVectorSchemaRoot();return new EventCountsResult(stmt,allocator,reader,(VarCharVector) root.getVector("event_type"),(BigIntVector) root.getVector("count"));}// ...Example usage:
import com.myapp.analytics.Analytics;import org.duckdb.DuckDBConnection;import java.sql.DriverManager;public class Main {public static void main(String[] args) throws Exception {// Connect to DuckDBDuckDBConnection conn = (DuckDBConnection) DriverManager.getConnection("jdbc:duckdb:");// Run migrationsfor (String migration : Analytics.getMigrations()) {conn.createStatement().execute(migration);}// Create query instanceAnalytics queries = new Analytics(conn);// Query with Arrow resultfor (var event : queries.eventsByUser(100)) {System.out.println(event.eventType());System.out.println(event.properties().page()); // Struct accessSystem.out.println(event.tags()); // List access}// Aggregate queryfor (var count : queries.eventCounts()) {System.out.printf("%s: %d%n", count.eventType(), count.count());}conn.close();}}
Complex Types
Section titled “Complex Types”The Arrow generator provides native support for DuckDB’s complex types:
Structs
Section titled “Structs”CREATE TABLE products ( id INTEGER, details STRUCT(name VARCHAR, price DOUBLE, available BOOLEAN));var product = queries.getProduct(1);System.out.println(product.details().name());System.out.println(product.details().price());System.out.println(product.details().available());CREATE TABLE posts ( id INTEGER, tags VARCHAR[], scores INTEGER[]);var post = queries.getPost(1);List<String> tags = post.tags(); // Direct List accessList<Integer> scores = post.scores();Nested Types
Section titled “Nested Types”CREATE TABLE reports ( id INTEGER, data STRUCT( summary VARCHAR, metrics STRUCT(views INTEGER, clicks INTEGER)[] ));var report = queries.getReport(1);System.out.println(report.data().summary());for (var metric : report.data().metrics()) { System.out.printf("Views: %d, Clicks: %d%n", metric.views(), metric.clicks());}Type Mapping
Section titled “Type Mapping”| DuckDB Type | Java Type |
|---|---|
INTEGER | Integer |
BIGINT | Long |
DOUBLE | Double |
VARCHAR | String |
BOOLEAN | Boolean |
TIMESTAMP | Instant |
DATE | LocalDate |
TIME | LocalTime |
DECIMAL | BigDecimal |
UUID | UUID |
BLOB | byte[] |
STRUCT(...) | Generated record type |
T[] | List<T> |
MAP(K,V) | HashMap<K,V> |
Batch Processing
Section titled “Batch Processing”Arrow enables efficient batch processing:
// Process large results in batchestry (var result = queries.largeQuery()) { while (result.hasNext()) { var batch = result.nextBatch(); processBatch(batch); }}Use Cases
Section titled “Use Cases”- Real-time analytics dashboards
- Log analysis and aggregation
- Data pipeline processing
- Machine learning feature extraction
- Business intelligence applications
Example: Analytics Dashboard
Section titled “Example: Analytics Dashboard”-- MIGRATE 1CREATE TABLE events ( id INTEGER PRIMARY KEY, user_id INTEGER NOT NULL, event_type VARCHAR NOT NULL, properties STRUCT(page VARCHAR, referrer VARCHAR), timestamp TIMESTAMP NOT NULL);
-- TESTDATA seedINSERT INTO events (id, user_id, event_type, properties, timestamp) VALUES (1, 100, 'pageview', {'page': '/home', 'referrer': 'google'}, TIMESTAMP '2024-01-15 10:00:00'), (2, 101, 'pageview', {'page': '/products', 'referrer': 'direct'}, TIMESTAMP '2024-01-15 11:00:00'), (3, 100, 'conversion', {'page': '/checkout', 'referrer': 'internal'}, TIMESTAMP '2024-01-15 12:00:00'), (4, 102, 'pageview', {'page': '/home', 'referrer': 'google'}, TIMESTAMP '2024-01-16 09:00:00'), (5, 101, 'conversion', {'page': '/checkout', 'referrer': 'internal'}, TIMESTAMP '2024-01-16 14:00:00');
-- QUERY daily_metrics@set start_date = 2024-01-01@set end_date = 2024-01-31SELECT DATE_TRUNC('day', timestamp) as day, COUNT(*) as events, COUNT(DISTINCT user_id) as unique_users, COUNT(*) FILTER (WHERE event_type = 'conversion') as conversionsFROM eventsWHERE timestamp BETWEEN ${start_date} AND ${end_date}GROUP BY 1ORDER BY 1;
-- QUERY top_pages@set limit_count = 10SELECT properties.page as page, COUNT(*) as views, COUNT(DISTINCT user_id) as unique_visitorsFROM eventsWHERE event_type = 'pageview'GROUP BY 1ORDER BY views DESCLIMIT ${limit_count};