Skip to content

Latest commit

 

History

History
195 lines (148 loc) · 9.19 KB

README.md

File metadata and controls

195 lines (148 loc) · 9.19 KB

a pipe for quackheads

🐤 quackpype

QuackPy is a serverless OLAP API built on top of DuckDB exposing HTTP/S and Arrow Flight SQL interfaces

Important

  • Arrow Flight API for modern data clients (DuckDB Airport)
  • Easy HTTP API with multiple formats (JSON,CSV,Parquet)
  • Unlocked Concurrent inserts and querying on DuckDB
  • Persistent storage using w/ multiuser authentication
  • Native access to any DuckDB Extension & Format
  • Embedded SQL Query Interface for instant usage

🌱 Get Started

Run using docker or build from source

docker pull ghcr.io/quackscience/quackpy:latest
docker run -ti --rm -p 8123:8123 -p 8815:8815 ghcr.io/quackscience/quackpy:latest

👉 Usage

See the Examples directory for quick starters

Note

Quackpipe executes queries in :memory: unless authentication details are provided for data persistence

🕸️ HTTP API

Execute DuckDB queries using the HTTP POST/GET API (compatible with the ClickHouse HTTP API)

curl -X POST "http://user:persistence@localhost:8123" \
   -H "Content-Type: application/json" \
   -d 'SELECT version()'  

✈️ FLIGHT API

Execute DuckDB queries using the experimental Flight GRPC API and Airport

Note

Quackpipe executes queries in :memory: unless an authorization header is provided for data persistence

🎫 Pass Airport Security
CREATE SECRET airport_flight (
·       type airport,
‣       auth_token 'user:password',
·       scope 'grpc://localhost:8815'
· );
🎫 Take Airport Flights
D select flight_descriptor, endpoint from airport_list_flights('grpc://127.0.0.1:8815', null);
┌─────────────────────────────────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│        flight_descriptor        │                                                    endpoint                                                    │
│ union(cmd blob, path varchar[]) │           struct(ticket blob, "location" varchar[], expiration_time timestamp, app_metadata blob)[]            │
├─────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ show_databases                  │ [{'ticket': SHOW DATABASES, 'location': [grpc://localhost:8815], 'expiration_time': NULL, 'app_metadata': }]   │
│ show_tables                     │ [{'ticket': SHOW TABLES, 'location': [grpc://localhost:8815], 'expiration_time': NULL, 'app_metadata': }]      │
│ show_version                    │ [{'ticket': SELECT version(), 'location': [grpc://localhost:8815], 'expiration_time': NULL, 'app_metadata': }] │
└─────────────────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

D select * from airport_take_flight('grpc://localhost:8815/', ['show_version']);
┌─────────────┐
│ "version"() │
│   varchar   │
├─────────────┤
│ v1.1.3      │
└─────────────┘
🎫 ATTACH Flights Tables
D --- Attach to Flight Server
D ATTACH 'deltalake' (TYPE AIRPORT, location 'grpc://localhost:8815/'); 

D --- Create Schema + Table
D CREATE SCHEMA deltalake.test1; 
D CREATE TABLE deltalake.test1.people (
     name VARCHAR,
     love_of_duckdb INT,
     tags VARCHAR[]
    );

D --- Insert into Flight Table
D INSERT INTO deltalake.test1.people values
  ('rusty', 5, ['airport', 'datasketches']);

D --- Select from Flight Table
D SELECT * FROM deltalake.test1.people;
┌─────────┬────────────────┬─────────────────────────┐
│  name   │ love_of_duckdb │          tags           │
│ varchar │     int32      │        varchar[]        │
├─────────┼────────────────┼─────────────────────────┤
│ rusty   │              5 │ [airport, datasketches] │
├─────────┴────────────────┴─────────────────────────┤
│ 1 row.                                   3 columns │
└────────────────────────────────────────────────────┘

Flight Tables can be accessed via HTTP API using the schema name

USE test1; SELECT * FROM people;

image

🎫 Take Custom Flights w/ Custom Headers + Ticket
D SELECT * FROM airport_take_flight('grpc://localhost:8815', 'SELECT 1', headers := MAP{'authorization':'user:password'} );
┌───────┐
│   1   │
│ int32 │
├───────┤
│   1   │
└───────┘
🎫 Take Python Flights
from pyarrow.flight import FlightClient, Ticket, FlightCallOptions 
import json
import pandas
import tabulate

sql="""SELECT version()"""
  
flight_ticket = Ticket(sql)

token = (b"authorization", bytes(f"user:persistence".encode('utf-8')))
options = FlightCallOptions(headers=[token])
client = FlightClient(f"grpc://localhost:8815")

reader = client.do_get(flight_ticket, options)
arrow_table = reader.read_all()
# Use pyarrow and pandas to view and analyze data
data_frame = arrow_table.to_pandas()
print(data_frame.to_markdown())
|    | "version"()   |
|---:|:--------------|
|  0 | v1.1.3        |

📺 SQL User-Interface

quackpy ships with the DuckDB SQL quack user-interface based on ch-ui

sequenceDiagram
    participant Client
    participant QuackPy
    participant DuckDB


    Client ->> QuackPy: ListFlights
    QuackPy ->> Client: Return Flights Table
    Client ->> QuackPy: GetFlightInfo
    QuackPy ->> DuckDB: DuckDB Execute
    DuckDB ->> QuackPy: Arrow Results Stream
    QuackPy ->> Client: FlightInfo(ticket)
    Client ->> QuackPy: do_get(ticket)
    QuackPy ->> Client: Stream of Results

Loading

🃏 Disclaimers

Footnotes

  1. DuckDB ® is a trademark of DuckDB Foundation. All rights reserved by their respective owners. 1

  2. ClickHouse ® is a trademark of ClickHouse Inc. No direct affiliation or endorsement. 2

  3. Released under the MIT license. See LICENSE for details. All rights reserved by their respective owners. 3

  4. Flight implementation inspired by Duck Takes Flight 4