3.3
Highlights
- ETL - inline and offline dataset transformations, custom user-defined transformations via both user-provided containers and Python scripts, simplified ETL initialization, ETL directly to and from Cloud buckets;
- Multi-Cloud capability supporting co-existence and management of datasets originating from (or hosted by) different Cloud storages - !2736, !2737, !2748, !2792, !2793;
- Maintenance and decommission - the capability to put a clustered node in maintenance mode and/or safely and permanently remove it from the cluster - #947, !2935, !2957, !2983, !2990, !3094;
- Volume metadata (
VMD
) - persistent information that describes each clustered node's storage configuration (including data drives, local filesystems,mountpaths
) further used to reinforce data integrity and protection - #939, #941, !3118, !3198; - New protocol prefix
ht://
- uniform access to "vanilla" HTTP(S) based datasets - #882, #889; - Terraform integration - easy and automated deployment via Terraform - there's a separate repository (of scripts, charts, and documentation) that we use for production deployments;
- Intra-cluster communications - the transport we use to rebalance user data, transfer erasure-coded slices, copy and transform datasets - a major upgrade !2860, !2895, !2984, !3053, !3055, !3066, !3084, !3085, !3097, !3112, !3181, !3183, !3184, !3187, !3189, !3201, !3265, !3268, !3274, !3286, !3303, !3356, !3357, !3396, !3403, !3409, !3415, !3417.
And also:
- performance optimizations, CLI usability improvements, refactoring, cleanup, and stability fixes across the board.
Multi-Cloud
A new protocol prefix ht://
(in addition to s3://
, gs://
, and azure://
) for seamless integration and uniform access to "vanilla" HTTP(S) based datasets.
Multi-Cloud via a single deployed runtime. Improved access to public Cloud buckets (from different Cloud providers). Bucket copying and transformations (see ETL below) extended to supports Cloud buckets.
- New HTTP provider (
ht://
) - #882, #889 - Multi-Cloud - added runtime support for bucket management of multiple Cloud providers - !2736, !2737
- Support multiple regions for AWS buckets - #778, !2804
- Improve Google provider error handling - !2792, !2793
- Public GCP buckets can be use without setting
PROJECT_ID
- !2723 - Remove default Cloud provider option (provider no must be set explicitly) - 2748
- Support Cloud-based source/destination in a bucket copy operation - !2975
- Prefetch performance improvement: keep cached object properties longer - #969
Core
Improve cluster stability in the presence of exceptional events, optimize cluster operation under heavy workloads, introduce maintenance mode
, support permanent decommissioning
of nodes from the cluster, improve the reliability of bucket destroy
operation, optimize and further stabilize cluster rebalancing logic.
- Node
maintenance
feature - #947, !2935, !2990, !3094 - Improved out-of-space (out of capacity) handling - #822
Backend
buckets vs bucket initialization - !2841- Improve cluster stability while it is in transition (when the primary changes) - #945, #968, #960
- If cluster restarts during rebalancing we will now resume the rebalance - #913
- Optimize
copy-bucket
and other bucket-traversing workloads - #917 - Make promote consistent with other object operations - !2763, !2765
- Add transfer statistics for
resilvering
- !2926 - Configuration option
Rebalance
. Enabled now; affects only automatic rebalance (manual one can always be started - !2915 - Reduce resource usage by
StatsD
(Grafana, Graphite) client - !3240 - New CLI option
--daemon-id
to join a node with user-predefined ID - !3255 - Fix
object rename
operation to work across differentmountpaths
- !3329 - Make
destroy bucket
operation transactional - !3315 - Volume meta data (
VMD
) - persistent information about a node and its storage configuration, used on startup when running node integrity checks - #939, #941, !3118, !3198 - No
metasync
when shutting down - !2844 - Not ignoring errors when listing multiple Cloud providers - !2845
- Refactor
reb
(rebalance) package - !2857 - Refactor target handlers and fix transactions' housekeeping logic - !2869
- Refactor
copy-object
interface - !2879 - Revise and refactor
PROMOTE
(command and API) - !2880 - Refactor target
copy-object
andput-remote
interfaces - !2881 - Use data mover to copy buckets - !2893
LOM
: fixCopyObject
- !2908cmn.JoinWords
and friends - !2913- Always allow manual rebalance (even if automatic one is disabled) - !2915
Mountpath resilvering
now counts moved objects and their total size - !2926- Copy buckets to return correct total size of copied content - !2919
- Revise and optimize intra-cluster broadcasting - !2943
- Improve
HrwTargetList
performance - !2945 - Fix zero-size objects scenario - !3531
ETL
Multiple improvements and enhancements to the capability (introduced first with v3.2) to easily run user-defined custom dataset transformations - and scale the performance linearly with each added storage server. This release adds offline (dataset-to-dataset) transformation.
For ETL documentation (that now also includes animated presentations), please refer to docs/etl.md and etl/README.md
- Add offline, local and cloud, bucket transformation - !2827, !2854, !2898, !3445
- ETL for objects in the Cloud - !3399
- ETL
build
operation - easy initialization based on the function definition - !2873, !2884, !2918, !3369 - Remove
kubectl
(shell) calls, use K8sclient-go
instead - !2896, !2907 - Support retrieving ETL logs - !2947
- Stability and performance improvements, bug fixes - !2955, !2977, !3330, !3369, !3374, !3411
- Add and improve labels in Pods and Services - !3445
- Improve waiting for the Pod/Service to be ready - !3332, !3397
- Add extension, prefix, and suffix flags for offline ETL - !2846
- Support aborting offline ETL - !2850
- Add dry run option for offline ETL - !2854
- Simplify flow to initialize ETL - !2853
- Consistent naming of API constants - !2861
- ETL build: remove unnecessary annotations - !2871
- Update skeleton docker images used to run custom Python-based transforms - !2870
- Install dependencies in
initContainer
- !2873 - POD spec: add volume mount - !2883
- Unify offline ETL with
copy-bucket
- !2898, !2933 - Improve waiting for POD-ready - !2912
- Add
dry-run
capability - !2939 - K8s client: pod namespace & refactoring - !2948
- The capability to throttle ETL (transforms) depending on disk utilizations - !2998
Terraform integration
Dramatically simplified deployment of AIStore cluster on the Cloud via Terraform. This release delivers GKE but can be easily extended to support any Cloud that provides Kubernetes (service). It is now possible to start a fully functional AIStore cluster with a single command - for details, please refer to AIStore Kubernetes repository.
- Add scripts for easy deployment and shutdown of the AIStore cluster on the cloud - !16, !56-!68, #14, #17
- Add
admin
container image - !3079, !3195, !3359 - Remove requirement for
K8S_HOST_NAME
environment variable - !3451
Information Center (IC)
More reliable extended action (xaction
) status management and reporting, automatic cluster-wide xaction
abort, xaction
progress notifications (new). In AIS, xaction
is a long-lived asynchronous operation, a job.
- Notify all participating nodes when any one of them aborts
xaction
- !2928 - Improve
IC
status reporting by pollingxaction
status from targets that have not reportedxaction
status yet - !2953 - Fix
xaction
registration for newly added targets - !2924 - Support both transactional and non-transactional
xactions
- !2734 - Replace target polling with notifications when waiting for
xaction
to complete - !2868 xactions
to return user-friendly status - !2865
Downloader
Integration with IC
, more robust downloader job handling.
- Downloader naming; fix
mountpath
register/unregister - !2842 - Better job aborting; improved completion mechanisms - #902, !2960
- Progress Bar: report periodic status and stats to
IC
(see above) - !2911
Distributed Shuffle (dSort
)
Performance improvements, resource usage optimizations.
- Performance: decrease resource usage - #938
- Better data transport streams handling - #936, !3307
Erasure Coding (EC)
Resource usage optimizations, better slice checksum handling.
- Fix checksum when sending constructed slices to other targets - !3073, !3132
- Improve operation over data transport streams - #916, !3311
- Fix receiving object slices when the bucket is being destroyed - #887
- Add support for nodes in maintenance mode - !3404
Intra-cluster communications
The transport that we use to rebalance user data (e. g., when adding/removing nodes), transfer erasure-coded slices, copy and transform datasets has undergone a major upgrade:
- Add data mover layer - !2860, !2895, !2899
- Support for short messages and message streams - !2984, !3055, !3066, !3084, !3085, !3097, !3112, !3181, !3183, !3184, !3187, !3189, !3201, !3265, !3268, !3274, !3303
- Revise and optimize transport stream multiplexing - !3141
- When done transmitting, wait for data mover quiescence - !2903
- Support streaming unsized objects - objects of unknown size - the functionality in particular useful when ETL-transforming objects on the fly (that is, inline) - !3356, !3357, !3396, !3403, !3409, !3415, !3417
- Optimize memory management and debug unlikely races: !3053, !3189, !3286, !3298, !3309, !3314, !3319
Data mover
: is-open vs quiescent - !2941
CLI (tool)
New command ais show mountpath
, new option --keep
for PROMOTE
operation, allow running certain commands without accessing a cluster, redesigned ais rm node
command, automatic progress indicator for long ais ls <bucket>
operations, many fixes for various show
commands.
- Display EC
xaction
extra information forais show xaction
command - #823 - Improve user experience: commands that do not need a cluster do not require the cluster is running - #878, !2914
- Listing bucket objects with the flag
--all
displays all objects (including temporarily misplaced) - #964 - Command
ais cat
now prints only object content, trailing object size information line is removed - !2729 - Cloud bucket can be downloaded without setting backend bucket - !2803
- Added progress indicator when listing a huge bucket - #884, !2786
- Unify
--all
sub-option for all commands - !2843, !3264 - New option for
PROMOTE
command:--keep
original files after promoting them to objects - !2880 - New command
ais show mountpath
to display targetmountpath
info - !2900, !3387 - Fix displaying rebalance statistics - !3264
- Fix
ais show xaction rebalance
to show the lastxaction
- !3250 - Fix
ais show cluster smap
- !3243 - Revise
ais rm node
command: add mandatory option--mode
(to choose between node decommission and putting node in maintenance), and optional--no-rebalance
(to skip rebalance and execute removal immediately) - !2965 - An option to remove all finished download jobs - !2849
- Wait option (flag) - !2876
- New command
ais show mountpath
- !2900 - Fix 'show rebalance' showing rebalance stats - !2954
- Refactor CLI
cat
/get
top-level commands - !2972
Other
aisloader
(benchmark): add progress indicator when listing very large buckets - !2821aisfs
:APPEND
operation is now checksum-protected - #780build
: use custom image for faster CI, enable more linters, switch to Go 1.15, add memory and CPU profiling options viamake
, upgrade third-party packages - !3235, !3121, !2949, !2916, !2993, !3050CI/CD
: fix k8s development scripts, run many more tests inminikube
CI, add terraform GCP playground - !2851, !2858, !2980.S3 compatibility
: support AIS buckets with Cloud backend - !3532, #67, #68