3.2
Highlights
- (new) ETL offload: support for running custom extract-transform-load workloads on (and by) storage cluster;
- (new) TensorFlow integration to support existing training clients that use S3 API - done via
tar2tf
ETL offload that handles on-the-fly TFRecord/tf.Example conversion;
- List objects v2: optimized
list-objects
to greatly reduce response times; - (new) Query objects: extends
list-objects
with advanced filtering capabilities; - (new) Downloader: an option to keep AIS bucket in-sync with a (downloaded) destination;
- (new) Information Center (IC), to improve visibility and manageability of the asynchronous batch operations (such as global rebalance, n-way mirroring, erasure coding, ETL, and more);
- (new) role-based authentication;
- Distributed Shuffle (dSort) - performance improvements;
- multi-checksumming, with per-dataset configurable checksum and (new) support for cryptographic checksums.
And also:
- performance optimizations, CLI usability improvements, erasure coding optimizations, automated no-downtime rebalancing for erasure-coded buckets, refactoring, cleanup, and stability fixes across the board.
Downloader
Skip already downloaded/existing objects, limit download speed, support Azure Cloud, option to synchronize Cloud into AIS bucket, numerous CLI improvements.
- New API (and CLI) option to keep Cloud bucket and AIS bucket in-sync - #760, !2322
- Throttle download - #726
- Download an entire bucket (an option that specifies a range or list of objects to download can now be omitted) - #759
- Store 3rd party Cloud metadata (version, md5) as part of the AIS object's own metadata; use Cloud metadata for multi-versioning (latest version) and data protection - #701
- Progress Bar when downloading from Cloud - #773
- Downloader to support Azure Cloud - #763
- CLI: download
prefix
-ed objects - !2204 - Fix re-downloading a cloud bucket (skip downloading when have identical local replica) - !2221, !2236
- Downloading a Cloud bucket can be now done only to an AIS bucket that has an associated cloud backend - !2241
Distributed Shuffle (dSort)
Reduce/optimize CPU and memory usage. Refactor and stabilize.
- CLI usability and improvements - #768
- Reduce memory usage - !2197
- Number of workers per mountpath to optimize disk utilization - !2263
- CLI: Add support for alternative output shard name formats - !2205
- Use MessagePack instead of JSON for intra-cluster communications - !2262
Authentication server (AuthN)
Replace old basic authentication with a role-based one. Allow a single AuthN server to manage any number of AIS clusters. Add support for both HTTP and HTTPS AIS clusters. More API endpoints require a token issued by AuthN when AuthN is enabled (before this all GET requests worked without any authentication)
- Use BuntDB to persist all authentication data (instead of previously used separate JSON files) - !2146, !2178
- Remove (obsolete) user Cloud credentials management - !2146
- Support multiple AIS clusters with automatic HTTP/HTTPS selection - !2153
- CLI: new AuthN management commands:
add
/remove
/show user
/show cluster
- !2153 - Introduce user roles (admin/cluster owner/bucket owner/read-only) - !2213
- When AuthN is deployed majority of requests to AIS cluster require to carry valid AuthN token (previously only PUT operations) - !2284
List and Query objects
Revised and fast list-objects
. Reduce memory usage. Use MessagePack. Employ bigger pages to speed up listing operations.
Experimental support for the caching - list-objects
result can now be used across multiple users/requests.
- Massive speed-up via streamable listing - #850, #856, #862, #851, !2494
list-objects
API is now always paged; remove-fast
option as obsolete - !2539- Use MessagePack for intra-cluster communications; optionally, employ MessagePack for client <=> cluster requests as well - !2568
- Additional options to control
list-objects
content:only-cached
,include-misplaced
- !2613 - Rename page marker as continuation token and fix paging the semantics accordingly - !2592
- Use bigger pages (10,000 by default) for AIS buckets; use 10K-size pages for Cloud buckets for
only-cached
option - !2645
Query objects
New API that extends list-objects
with added support for filtering and selection (a so-called inner and outer* SELECT).
- Add
init
andnext
API - #754, !2399 - Use MessagePack instead of JSON (client side) - !2672
- Add support for querying Cloud buckets - !2521
Data protection
No more hardcoded xxhash
as AIS checksum for objects: any checksum can be selected from a list that currently also includes MD5, SHA, CRC, and can be easily extended.
- Multiple per-bucket configurable checksums - #722, !2154, !2187
- SHA-256 and SHA-512 - !2190
- Self-healing: automatic restore of a corrupted object from EC slices and/or mirrored replicas - !2196
CLI
Numerous improvements and bug fixes. In particular, new command-line options, shorter commands, better readable output, improved TAB-TAB
support.
- Show target uptime in
show cluster
- #744 - PUT object from stdin
ais put object bck/obj -
- #748 s3://
andgs://
are aliases foraws://
andgcp://
- !1789- Rename
register
asjoin
(as in: join new cluster node) - !1988 TAB-TAB
and output improvements - #649, #772, !1888, !1857- User-provided checksum and end-to-end data protection - #779
- Improve
show cluster
to display a single JSON output - #810 - Add
--chunk-size
option for PUT object - !2164 - Improve
show object command
- !2185 - Add
search
command - !2400 - All
ais start xaction <name>
are nowais start <name>
- !2448 - Run LRU on a list of specified buckets - allow user to temporarily override bucket's own LRU configuration - !2493
- Improve
set props
command to show what's actually changed - !2479
Erasure Coding (EC)
- Fix sending calculated slices on PUT objects - !2419
- CLI: improve EC stats output - #823
- Improve user experience on PUT - !2366
- CLI: added options
--parity-slices
and--data-slices
forais ec-encode
command` - !2387 - Automatically enable EC when user starts erasure-coding of a given bucket (via
start xaction
orset props
CLI, for instance) - !2377
Information Center (IC)
To efficiently and optimally monitor asynchronous operations (jobs), AIStore employs what we call Information Center (IC) - a group of gateways that “own” all the currently running (as well as already finished) jobs in the cluster. Those jobs, codenamed eXtended actions, or xactions, include global rebalance, n-way mirroring, erasure coding, ETL-type distributed workload, and more. IC continuously monitors all async by coordinating with other clustered nodes.
- Cluster-wide ID for cluster-wide xactions - !2294, !2551
- Intra-cluster notifications for xactions - !2304, !2326, !2321, !2334, !2378, !2355, !2346
- 3 (three) IC members by default - !2561
- Support
list-
andquery-objects
caching - !2570 - Always keep IC members in-sync as far as currently-running and finished async ops - !2639, !2648
Extract-Transform-Load (ETL) locally
- In-cluster ETL v1.0 - #842, !2659, !2660, !2651
- Target and ETL affinity - !2451
- CLI: add support for ETL - !2453
- List all transformations - !2498
aisloader
: add support for ETL (for benchmarking) - !2573
AIS loader (aisloader
)
Support TAR generating and reading. Support ETL benchmarking via included echo
(at https://hub.docker.com/repository/docker/aistore/transformer_echo), md5
, and tar2tf
ETL containers.
- Add TAR reader - !2585
- Add support for standard
AIS_ENDPOINT
environment variable (options--port
and--ip
are still supported) - !2642
Local Playground + Kubernetes (for developers)
- Add
minikube
based Kubernetes development environment - !2456, !2558, !2508 - Enable Kubernetes-based testing on GitLab CI - !2510, !2562
- Enable Kubernetes based tests on Jenkins - !2609, !2685
Build & Release
- Scripts for automating release management; in particular, scripts to upload released AIS binaries - !2597
- An option to build
aisnode
(AIS target and AIS proxy) Alpine Linux-based minimal-footprint docker image - !2709
Miscellaneous
Make names of used environment variables consistent. Introduce $trash
directory to keep deleted buckets for a while. Safer and better node startup: assorted APIs are now accessible only after the node is up and running.
Extend Local Playground for developers: add K8s minikube .
- Rename a bunch of environment variables used by
ais
/aisloader
/cli
for consistency - !2133 - Extend create bucket API (allow setting props) - #782, !2266
- Added special
$trash
directory to put deleted buckets to it - !2351 - Add
minukube
dev deployment - !2456 - Node startup vs availability of assorted APIs - !2601, !2624