Skip to content

3.3

Compare
Choose a tag to compare
@alex-aizman alex-aizman released this 10 Dec 04:03

Highlights

  • ETL - inline and offline dataset transformations, custom user-defined transformations via both user-provided containers and Python scripts, simplified ETL initialization, ETL directly to and from Cloud buckets;

  • Multi-Cloud capability supporting co-existence and management of datasets originating from (or hosted by) different Cloud storages - !2736, !2737, !2748, !2792, !2793;
  • Maintenance and decommission - the capability to put a clustered node in maintenance mode and/or safely and permanently remove it from the cluster - #947, !2935, !2957, !2983, !2990, !3094;
  • Volume metadata (VMD) - persistent information that describes each clustered node's storage configuration (including data drives, local filesystems, mountpaths) further used to reinforce data integrity and protection - #939, #941, !3118, !3198;
  • New protocol prefixht:// - uniform access to "vanilla" HTTP(S) based datasets - #882, #889;
  • Terraform integration - easy and automated deployment via Terraform - there's a separate repository (of scripts, charts, and documentation) that we use for production deployments;
  • Intra-cluster communications - the transport we use to rebalance user data, transfer erasure-coded slices, copy and transform datasets - a major upgrade !2860, !2895, !2984, !3053, !3055, !3066, !3084, !3085, !3097, !3112, !3181, !3183, !3184, !3187, !3189, !3201, !3265, !3268, !3274, !3286, !3303, !3356, !3357, !3396, !3403, !3409, !3415, !3417.

And also:

  • performance optimizations, CLI usability improvements, refactoring, cleanup, and stability fixes across the board.

Multi-Cloud

A new protocol prefix ht:// (in addition to s3://, gs://, and azure://) for seamless integration and uniform access to "vanilla" HTTP(S) based datasets.

Multi-Cloud via a single deployed runtime. Improved access to public Cloud buckets (from different Cloud providers). Bucket copying and transformations (see ETL below) extended to supports Cloud buckets.

  • New HTTP provider (ht://) - #882, #889
  • Multi-Cloud - added runtime support for bucket management of multiple Cloud providers - !2736, !2737
  • Support multiple regions for AWS buckets - #778, !2804
  • Improve Google provider error handling - !2792, !2793
  • Public GCP buckets can be use without setting PROJECT_ID - !2723
  • Remove default Cloud provider option (provider no must be set explicitly) - 2748
  • Support Cloud-based source/destination in a bucket copy operation - !2975
  • Prefetch performance improvement: keep cached object properties longer - #969

Core

Improve cluster stability in the presence of exceptional events, optimize cluster operation under heavy workloads, introduce maintenance mode, support permanent decommissioning of nodes from the cluster, improve the reliability of bucket destroy operation, optimize and further stabilize cluster rebalancing logic.

  • Node maintenance feature - #947, !2935, !2990, !3094
  • Improved out-of-space (out of capacity) handling - #822
  • Backend buckets vs bucket initialization - !2841
  • Improve cluster stability while it is in transition (when the primary changes) - #945, #968, #960
  • If cluster restarts during rebalancing we will now resume the rebalance - #913
  • Optimize copy-bucket and other bucket-traversing workloads - #917
  • Make promote consistent with other object operations - !2763, !2765
  • Add transfer statistics for resilvering - !2926
  • Configuration option Rebalance. Enabled now; affects only automatic rebalance (manual one can always be started - !2915
  • Reduce resource usage by StatsD (Grafana, Graphite) client - !3240
  • New CLI option --daemon-id to join a node with user-predefined ID - !3255
  • Fix object rename operation to work across different mountpaths - !3329
  • Make destroy bucket operation transactional - !3315
  • Volume meta data (VMD) - persistent information about a node and its storage configuration, used on startup when running node integrity checks - #939, #941, !3118, !3198
  • No metasync when shutting down - !2844
  • Not ignoring errors when listing multiple Cloud providers - !2845
  • Refactor reb (rebalance) package - !2857
  • Refactor target handlers and fix transactions' housekeeping logic - !2869
  • Refactor copy-object interface - !2879
  • Revise and refactor PROMOTE (command and API) - !2880
  • Refactor target copy-object and put-remote interfaces - !2881
  • Use data mover to copy buckets - !2893
  • LOM: fix CopyObject - !2908
  • cmn.JoinWords and friends - !2913
  • Always allow manual rebalance (even if automatic one is disabled) - !2915
  • Mountpath resilvering now counts moved objects and their total size - !2926
  • Copy buckets to return correct total size of copied content - !2919
  • Revise and optimize intra-cluster broadcasting - !2943
  • Improve HrwTargetList performance - !2945
  • Fix zero-size objects scenario - !3531

ETL

Multiple improvements and enhancements to the capability (introduced first with v3.2) to easily run user-defined custom dataset transformations - and scale the performance linearly with each added storage server. This release adds offline (dataset-to-dataset) transformation.

For ETL documentation (that now also includes animated presentations), please refer to docs/etl.md and etl/README.md

  • Add offline, local and cloud, bucket transformation - !2827, !2854, !2898, !3445
  • ETL for objects in the Cloud - !3399
  • ETL build operation - easy initialization based on the function definition - !2873, !2884, !2918, !3369
  • Remove kubectl (shell) calls, use K8s client-go instead - !2896, !2907
  • Support retrieving ETL logs - !2947
  • Stability and performance improvements, bug fixes - !2955, !2977, !3330, !3369, !3374, !3411
  • Add and improve labels in Pods and Services - !3445
  • Improve waiting for the Pod/Service to be ready - !3332, !3397
  • Add extension, prefix, and suffix flags for offline ETL - !2846
  • Support aborting offline ETL - !2850
  • Add dry run option for offline ETL - !2854
  • Simplify flow to initialize ETL - !2853
  • Consistent naming of API constants - !2861
  • ETL build: remove unnecessary annotations - !2871
  • Update skeleton docker images used to run custom Python-based transforms - !2870
  • Install dependencies in initContainer - !2873
  • POD spec: add volume mount - !2883
  • Unify offline ETL with copy-bucket - !2898, !2933
  • Improve waiting for POD-ready - !2912
  • Adddry-run capability - !2939
  • K8s client: pod namespace & refactoring - !2948
  • The capability to throttle ETL (transforms) depending on disk utilizations - !2998

Terraform integration

Dramatically simplified deployment of AIStore cluster on the Cloud via Terraform. This release delivers GKE but can be easily extended to support any Cloud that provides Kubernetes (service). It is now possible to start a fully functional AIStore cluster with a single command - for details, please refer to AIStore Kubernetes repository.

  • Add scripts for easy deployment and shutdown of the AIStore cluster on the cloud - !16, !56-!68, #14, #17
  • Add admin container image - !3079, !3195, !3359
  • Remove requirement for K8S_HOST_NAME environment variable - !3451

Information Center (IC)

More reliable extended action (xaction) status management and reporting, automatic cluster-wide xaction abort, xaction progress notifications (new). In AIS, xaction is a long-lived asynchronous operation, a job.

  • Notify all participating nodes when any one of them aborts xaction - !2928
  • Improve IC status reporting by polling xaction status from targets that have not reported xaction status yet - !2953
  • Fix xaction registration for newly added targets - !2924
  • Support both transactional and non-transactional xactions - !2734
  • Replace target polling with notifications when waiting for xaction to complete - !2868
  • xactions to return user-friendly status - !2865

Downloader

Integration with IC, more robust downloader job handling.

  • Downloader naming; fix mountpath register/unregister - !2842
  • Better job aborting; improved completion mechanisms - #902, !2960
  • Progress Bar: report periodic status and stats to IC (see above) - !2911

Distributed Shuffle (dSort)

Performance improvements, resource usage optimizations.

  • Performance: decrease resource usage - #938
  • Better data transport streams handling - #936, !3307

Erasure Coding (EC)

Resource usage optimizations, better slice checksum handling.

  • Fix checksum when sending constructed slices to other targets - !3073, !3132
  • Improve operation over data transport streams - #916, !3311
  • Fix receiving object slices when the bucket is being destroyed - #887
  • Add support for nodes in maintenance mode - !3404

Intra-cluster communications

The transport that we use to rebalance user data (e. g., when adding/removing nodes), transfer erasure-coded slices, copy and transform datasets has undergone a major upgrade:

  • Add data mover layer - !2860, !2895, !2899
  • Support for short messages and message streams - !2984, !3055, !3066, !3084, !3085, !3097, !3112, !3181, !3183, !3184, !3187, !3189, !3201, !3265, !3268, !3274, !3303
  • Revise and optimize transport stream multiplexing - !3141
  • When done transmitting, wait for data mover quiescence - !2903
  • Support streaming unsized objects - objects of unknown size - the functionality in particular useful when ETL-transforming objects on the fly (that is, inline) - !3356, !3357, !3396, !3403, !3409, !3415, !3417
  • Optimize memory management and debug unlikely races: !3053, !3189, !3286, !3298, !3309, !3314, !3319
  • Data mover: is-open vs quiescent - !2941

CLI (tool)

New command ais show mountpath, new option --keep for PROMOTE operation, allow running certain commands without accessing a cluster, redesigned ais rm node command, automatic progress indicator for long ais ls <bucket> operations, many fixes for various show commands.

  • Display EC xaction extra information for ais show xaction command - #823
  • Improve user experience: commands that do not need a cluster do not require the cluster is running - #878, !2914
  • Listing bucket objects with the flag --all displays all objects (including temporarily misplaced) - #964
  • Command ais cat now prints only object content, trailing object size information line is removed - !2729
  • Cloud bucket can be downloaded without setting backend bucket - !2803
  • Added progress indicator when listing a huge bucket - #884, !2786
  • Unify --all sub-option for all commands - !2843, !3264
  • New option for PROMOTE command: --keep original files after promoting them to objects - !2880
  • New command ais show mountpath to display target mountpath info - !2900, !3387
  • Fix displaying rebalance statistics - !3264
  • Fix ais show xaction rebalance to show the last xaction - !3250
  • Fix ais show cluster smap - !3243
  • Revise ais rm node command: add mandatory option --mode (to choose between node decommission and putting node in maintenance), and optional --no-rebalance (to skip rebalance and execute removal immediately) - !2965
  • An option to remove all finished download jobs - !2849
  • Wait option (flag) - !2876
  • New command ais show mountpath - !2900
  • Fix 'show rebalance' showing rebalance stats - !2954
  • Refactor CLI cat/get top-level commands - !2972

Other

  • aisloader (benchmark): add progress indicator when listing very large buckets - !2821
  • aisfs: APPEND operation is now checksum-protected - #780
  • build: use custom image for faster CI, enable more linters, switch to Go 1.15, add memory and CPU profiling options via make, upgrade third-party packages - !3235, !3121, !2949, !2916, !2993, !3050
  • CI/CD: fix k8s development scripts, run many more tests in minikube CI, add terraform GCP playground - !2851, !2858, !2980.
  • S3 compatibility: support AIS buckets with Cloud backend - !3532, #67, #68