a pipe for quackheads
🐤 quackpype
QuackPy is a serverless OLAP API built on top of DuckDB exposing HTTP/S and Arrow Flight SQL interfaces
Important
- Arrow Flight API for modern data clients (DuckDB Airport)
- Easy HTTP API with multiple formats (JSON,CSV,Parquet)
- Unlocked Concurrent inserts and querying on DuckDB
- Persistent storage using w/ multiuser authentication
- Native access to any DuckDB Extension & Format
- Embedded SQL Query Interface for instant usage
Run using docker or build from source
docker pull ghcr.io/quackscience/quackpy:latest
docker run -ti --rm -p 8123:8123 -p 8815:8815 ghcr.io/quackscience/quackpy:latest
See the Examples directory for quick starters
Note
Quackpipe executes queries in :memory:
unless authentication details are provided for data persistence
Execute DuckDB queries using the HTTP POST/GET API (compatible with the ClickHouse HTTP API)
curl -X POST "http://user:persistence@localhost:8123" \
-H "Content-Type: application/json" \
-d 'SELECT version()'
Execute DuckDB queries using the experimental Flight GRPC API and Airport
Note
Quackpipe executes queries in :memory:
unless an authorization
header is provided for data persistence
CREATE SECRET airport_flight (
· type airport,
‣ auth_token 'user:password',
· scope 'grpc://localhost:8815'
· );
D select flight_descriptor, endpoint from airport_list_flights('grpc://127.0.0.1:8815', null);
┌─────────────────────────────────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ flight_descriptor │ endpoint │
│ union(cmd blob, path varchar[]) │ struct(ticket blob, "location" varchar[], expiration_time timestamp, app_metadata blob)[] │
├─────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ show_databases │ [{'ticket': SHOW DATABASES, 'location': [grpc://localhost:8815], 'expiration_time': NULL, 'app_metadata': }] │
│ show_tables │ [{'ticket': SHOW TABLES, 'location': [grpc://localhost:8815], 'expiration_time': NULL, 'app_metadata': }] │
│ show_version │ [{'ticket': SELECT version(), 'location': [grpc://localhost:8815], 'expiration_time': NULL, 'app_metadata': }] │
└─────────────────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
D select * from airport_take_flight('grpc://localhost:8815/', ['show_version']);
┌─────────────┐
│ "version"() │
│ varchar │
├─────────────┤
│ v1.1.3 │
└─────────────┘
D --- Attach to Flight Server
D ATTACH 'deltalake' (TYPE AIRPORT, location 'grpc://localhost:8815/');
D --- Create Schema + Table
D CREATE SCHEMA deltalake.test1;
D CREATE TABLE deltalake.test1.people (
name VARCHAR,
love_of_duckdb INT,
tags VARCHAR[]
);
D --- Insert into Flight Table
D INSERT INTO deltalake.test1.people values
('rusty', 5, ['airport', 'datasketches']);
D --- Select from Flight Table
D SELECT * FROM deltalake.test1.people;
┌─────────┬────────────────┬─────────────────────────┐
│ name │ love_of_duckdb │ tags │
│ varchar │ int32 │ varchar[] │
├─────────┼────────────────┼─────────────────────────┤
│ rusty │ 5 │ [airport, datasketches] │
├─────────┴────────────────┴─────────────────────────┤
│ 1 row. 3 columns │
└────────────────────────────────────────────────────┘
Flight Tables can be accessed via HTTP API using the schema name
USE test1; SELECT * FROM people;
D SELECT * FROM airport_take_flight('grpc://localhost:8815', 'SELECT 1', headers := MAP{'authorization':'user:password'} );
┌───────┐
│ 1 │
│ int32 │
├───────┤
│ 1 │
└───────┘
from pyarrow.flight import FlightClient, Ticket, FlightCallOptions
import json
import pandas
import tabulate
sql="""SELECT version()"""
flight_ticket = Ticket(sql)
token = (b"authorization", bytes(f"user:persistence".encode('utf-8')))
options = FlightCallOptions(headers=[token])
client = FlightClient(f"grpc://localhost:8815")
reader = client.do_get(flight_ticket, options)
arrow_table = reader.read_all()
# Use pyarrow and pandas to view and analyze data
data_frame = arrow_table.to_pandas()
print(data_frame.to_markdown())
| | "version"() |
|---:|:--------------|
| 0 | v1.1.3 |
quackpy ships with the DuckDB SQL quack user-interface based on ch-ui
![](https://private-user-images.githubusercontent.com/1423657/400143127-902a6336-c4f4-4a4e-85d5-78dd62cb7602.gif?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkyODQ1NDgsIm5iZiI6MTczOTI4NDI0OCwicGF0aCI6Ii8xNDIzNjU3LzQwMDE0MzEyNy05MDJhNjMzNi1jNGY0LTRhNGUtODVkNS03OGRkNjJjYjc2MDIuZ2lmP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIxMSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMTFUMTQzMDQ4WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9MGIxZDM2MGQ2ZDRhODlhM2E0MDA0Yjk5NzFmNzc1MDU0Y2M1NWE3ZWFkMWMzMDhlZjZiZjBmMzUxNGE3ODI5ZiZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.TwcuonjqLIffK_olb4emwpXwwnZ5-FnhFV1hOuw99HU)
sequenceDiagram
participant Client
participant QuackPy
participant DuckDB
Client ->> QuackPy: ListFlights
QuackPy ->> Client: Return Flights Table
Client ->> QuackPy: GetFlightInfo
QuackPy ->> DuckDB: DuckDB Execute
DuckDB ->> QuackPy: Arrow Results Stream
QuackPy ->> Client: FlightInfo(ticket)
Client ->> QuackPy: do_get(ticket)
QuackPy ->> Client: Stream of Results
Footnotes
-
DuckDB ® is a trademark of DuckDB Foundation. All rights reserved by their respective owners. 1 ↩
-
ClickHouse ® is a trademark of ClickHouse Inc. No direct affiliation or endorsement. 2 ↩
-
Released under the MIT license. See LICENSE for details. All rights reserved by their respective owners. 3 ↩
-
Flight implementation inspired by Duck Takes Flight 4 ↩