Skip to content

Commit

Permalink
docs: update bucket.md, out_of_band.md, and CLI docs
Browse files Browse the repository at this point in the history
* multi-object operations
* out-of-band updates and versioning
* inline help and usage examples

Signed-off-by: Alex Aizman <[email protected]>
  • Loading branch information
alex-aizman committed Jan 4, 2024
1 parent 8cf4c57 commit 857604c
Show file tree
Hide file tree
Showing 8 changed files with 383 additions and 72 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -184,7 +184,7 @@ With a little effort, they all could be extracted and used outside.
- [Downloader](/docs/downloader.md)
- [On-disk layout](/docs/on_disk_layout.md)
- [Buckets: definition, operations, properties](https://github.com/NVIDIA/aistore/blob/main/docs/bucket.md#bucket)
- [Out of band updates](/docs/validate_warm_get.md)
- [Out of band updates](/docs/out_of_band.md)

## License

Expand Down
59 changes: 58 additions & 1 deletion docs/bucket.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ redirect_from:
- [Public HTTP(S) Datasets](#public-https-dataset)
- [Prefetch/Evict Objects](#prefetchevict-objects)
- [Evict Remote Bucket](#evict-remote-bucket)
- [Out of band updates](/docs/out_of_band.md)
- [Backend Bucket](#backend-bucket)
- [AIS bucket as a reference](#ais-bucket-as-a-reference)
- [Bucket Properties](#bucket-properties)
Expand Down Expand Up @@ -468,9 +469,61 @@ To use a [range operation](batch.md#range) to evict the 1000th to 2000th objects
$ ais bucket evict aws://abc --template "__tst/test-{1000..2000}"
```

### See also

* [Operations on Lists and Ranges](/docs/cli/object.md#operations-on-lists-and-ranges)

## Evict Remote Bucket

Before a remote bucket is accessed through AIS, the cluster has no awareness of the bucket.
This is `ais bucket evict` command but most of the time we'll be using its `ais evict` alias:

```console
$ ais evict --help
NAME:
ais evict - (alias for "bucket evict") evict one remote bucket, multiple remote buckets, or
selected objects in a given remote bucket or buckets, e.g.:
- 'evict gs://abc' - evict entire bucket (all gs://abc objects in aistore);
- 'evict gs:' - evict all GCP buckets from the cluster;
- 'evict gs://abc --template images/' - evict all objects from the virtual subdirectory "images";
- 'evict gs://abc/images/' - same as above;
- 'evict gs://abc --template "shard-{0000..9999}.tar.lz4"' - evict the matching range (prefix + brace expansion);
- 'evict "gs://abc/shard-{0000..9999}.tar.lz4"' - same as above (notice double quotes)

USAGE:
ais evict [command options] BUCKET[/OBJECT_NAME_or_TEMPLATE] [BUCKET[/OBJECT_NAME_or_TEMPLATE] ...]

OPTIONS:
--list value comma-separated list of object or file names, e.g.:
--list 'o1,o2,o3'
--list "abc/1.tar, abc/1.cls, abc/1.jpeg"
or, when listing files and/or directories:
--list "/home/docs, /home/abc/1.tar, /home/abc/1.jpeg"
--template value template to match object or file names; may contain prefix (that could be empty) with zero or more ranges
(with optional steps and gaps), e.g.:
--template "" # (an empty or '*' template matches eveything)
--template 'dir/subdir/'
--template 'shard-{1000..9999}.tar'
--template "prefix-{0010..0013..2}-gap-{1..2}-suffix"
and similarly, when specifying files and directories:
--template '/home/dir/subdir/'
--template "/abc/prefix-{0010..9999..2}-suffix"
--wait wait for an asynchronous operation to finish (optionally, use '--timeout' to limit the waiting time)
--timeout value maximum time to wait for a job to finish; if omitted: wait forever or until Ctrl-C;
valid time units: ns, us (or µs), ms, s (default), m, h
--progress show progress bar(s) and progress of execution in real time
--refresh value interval for continuous monitoring;
valid time units: ns, us (or µs), ms, s (default), m, h
--keep-md keep bucket metadata
--prefix value select objects that have names starting with the specified prefix, e.g.:
'--prefix a/b/c' - matches names 'a/b/c/d', 'a/b/cdef', and similar;
'--prefix a/b/c/' - only matches objects from the virtual directory a/b/c/
--dry-run preview the results without really running the action
--verbose, -v verbose output
--non-verbose, --nv non-verbose (quiet) output, minimized reporting
--help, -h show help
```

Note usage examples above. You can always run `--help` option to see the most recently updated inline help.

Once there is a request to access the bucket, or a request to change the bucket's properties (see `set bucket props` in [REST API](http_api.md)), then the AIS cluster starts keeping track of the bucket.

Expand All @@ -485,6 +538,10 @@ $ ais bucket evict aws://abc
Note: When an HDFS bucket is evicted, AIS will only delete objects stored in the cluster. AIS will retain the bucket's metadata to allow the bucket to re-register later.
This behavior can be applied to other remote buckets by using the `--keep-md` flag with `ais bucket evict`.

### See also

* [Operations on Lists and Ranges](/docs/cli/object.md#operations-on-lists-and-ranges)

# Backend Bucket

So far, we have covered AIS and remote buckets. These abstractions are sufficient for almost all use cases. But there are times when we would like to download objects from an existing remote bucket and then make use of the features available only for AIS buckets.
Expand Down
107 changes: 84 additions & 23 deletions docs/cli/bucket.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ rmb bucket rm
- [Evict remote bucket](#evict-remote-bucket)
- [Move or Rename a bucket](#move-or-rename-a-bucket)
- [Copy bucket](#copy-bucket)
- [Copy multiple objects](#copy-multiple-objects)
- [Show bucket summary](#show-bucket-summary)
- [Start N-way Mirroring](#start-n-way-mirroring)
- [Start Erasure Coding](#start-erasure-coding)
Expand Down Expand Up @@ -598,15 +599,13 @@ To check the status, run: ais show job xaction mvlb ais://new_bucket_name

## Copy bucket

`ais cp SRC_BUCKET DST_BUCKET`
`ais cp [command options] SRC_BUCKET[/OBJECT_NAME_or_TEMPLATE] DST_BUCKET`

Copy source bucket (`SRC_BUCKET`) to destination bucket (`DST_BUCKET`).
Source bucket must exist. When the destination bucket is remote (e.g. in the Cloud) it must also exist and be writeable.

Source bucket must exist. When the destination bucket is in the Cloud it must also exist (and be writeable).
> **NOTE:** there's _no_ requirement that either of the buckets is _present_ in aistore.
There's _no_ requirement that either of the buckets is _present_ in aistore.

> Note: not to confuse in-cluster presence and existence.
> **NOTE:** not to confuse in-cluster _presence_ and existence. Remote object may exist (remotely), etc.
Moreover, when the destination is AIS (`ais://`) or remote AIS (`ais://@remote-alias`) bucket, the existence is optional: the destination will be created on the fly, with bucket properties copied from the source (`SRC_BUCKET`).

Expand Down Expand Up @@ -647,22 +646,12 @@ Listed: 2,290 names
```console
$ ais cp --help
NAME:
ais cp - (alias for "bucket cp") copy entire bucket or selected objects (to select, use '--list' or '--template')
ais cp - (alias for "bucket cp") copy entire bucket or selected objects (to select multiple, use '--list' or '--template')

USAGE:
ais cp [command options] SRC_BUCKET DST_BUCKET
ais cp [command options] SRC_BUCKET[/OBJECT_NAME_or_TEMPLATE] DST_BUCKET

OPTIONS:
--all copy all objects from a remote bucket including those that are not present (not "cached") in the cluster
--cont-on-err keep running archiving xaction (job) in presence of errors in a any given multi-object transaction
--force, -f force an action
--dry-run show total size of new objects without really creating them
--prepend value prefix to prepend to every copied object name, e.g.:
--prepend=abc - prefix all copied object names with "abc"
--prepend=abc/ - copy objects into a virtual directory "abc" (note trailing filepath separator)
--prefix value copy objects that start with the specified prefix, e.g.:
'--prefix a/b/c' - copy virtual directory a/b/c and/or objects from the virtual directory
a/b that have their names (relative to this directory) starting with the letter c
--list value comma-separated list of object or file names, e.g.:
--list 'o1,o2,o3'
--list "abc/1.tar, abc/1.cls, abc/1.jpeg"
Expand All @@ -677,17 +666,40 @@ OPTIONS:
and similarly, when specifying files and directories:
--template '/home/dir/subdir/'
--template "/abc/prefix-{0010..9999..2}-suffix"
--prefix value select objects that have names starting with the specified prefix, e.g.:
'--prefix a/b/c' - matches names 'a/b/c/d', 'a/b/cdef', and similar;
'--prefix a/b/c/' - only matches objects from the virtual directory a/b/c/
--all copy all objects from a remote bucket including those that are not present (not "cached") in the cluster
--cont-on-err keep running archiving xaction (job) in presence of errors in a any given multi-object transaction
--force, -f force an action
--dry-run show total size of new objects without really creating them
--prepend value prefix to prepend to every copied object name, e.g.:
--prepend=abc - prefix all copied object names with "abc"
--prepend=abc/ - copy objects into a virtual directory "abc" (note trailing filepath separator)
--progress show progress bar(s) and progress of execution in real time
--refresh value interval for continuous monitoring;
valid time units: ns, us (or µs), ms, s (default), m, h
--wait wait for an asynchronous operation to finish (optionally, use '--timeout' to limit the waiting time)
--timeout value maximum time to wait for a job to finish; if omitted: wait forever or until Ctrl-C;
valid time units: ns, us (or µs), ms, s (default), m, h
--sync synchronize destination bucket with its remote (e.g., Cloud) source;
in particular, the option may entail removing of the objects that no longer exist remotely
--help, -h show help
```

### Examples

#### Copy _non-existing_ remote bucket to a non-existing in-cluster destination

```console
$ ais ls s3
No "s3://" buckets in the cluster. Use '--all' option to list matching remote buckets, if any.

$ ais cp s3://abc ais://nnn --all
Warning: destination ais://nnn doesn't exist and will be created with configuration copied from the source (s3://abc))
Copying s3://abc => ais://nnn. To monitor the progress, run 'ais show job tco-JcTKbhvFy'
```

#### Copy AIS bucket

Copy AIS bucket `src_bucket` to AIS bucket `dst_bucket`.
Expand All @@ -698,7 +710,7 @@ Copying bucket "ais://bucket_name" to "ais://dst_bucket" in progress.
To check the status, run: ais show job xaction copy-bck ais://dst_bucket
```

#### Copy AIS bucket and wait until finish
#### Copy AIS bucket and wait until the job finishes

The same as above, but wait until copying is finished.

Expand All @@ -721,25 +733,74 @@ Copying bucket "aws://src_bucket" to "aws://dst_bucket" in progress.
To check the status, run: ais show job xaction copy-bck aws://dst_bucket
```

#### Copy only selected objects
## Copy multiple objects

Copy objects `obj1.tar` and `obj1.info` from bucket `ais://bck1` to `ais://bck2`, and wait until the operation finishes.
The same `ais cp` command can also copy multiple selected objects. Here's the corresponding excerpt from the inline help:

```console
$ ais cp --help
NAME:
ais cp - (alias for "bucket cp") copy entire bucket or selected objects (to select multiple, use '--list' or '--template')

USAGE:
ais cp [command options] SRC_BUCKET[/OBJECT_NAME_or_TEMPLATE] DST_BUCKET

OPTIONS:
--list value comma-separated list of object or file names, e.g.:
--list 'o1,o2,o3'
--list "abc/1.tar, abc/1.cls, abc/1.jpeg"
or, when listing files and/or directories:
--list "/home/docs, /home/abc/1.tar, /home/abc/1.jpeg"
--template value template to match object or file names; may contain prefix (that could be empty) with zero or more ranges
(with optional steps and gaps), e.g.:
--template "" # (an empty or '*' template matches eveything)
--template 'dir/subdir/'
--template 'shard-{1000..9999}.tar'
--template "prefix-{0010..0013..2}-gap-{1..2}-suffix"
and similarly, when specifying files and directories:
--template '/home/dir/subdir/'
--template "/abc/prefix-{0010..9999..2}-suffix"
--prefix value select objects that have names starting with the specified prefix, e.g.:
'--prefix a/b/c' - matches names 'a/b/c/d', 'a/b/cdef', and similar;
'--prefix a/b/c/' - only matches objects from the virtual directory a/b/c/
--all copy all objects from a remote bucket including those that are not present (not "cached") in the cluster
...
...
```

### Examples

**1.** Copy objects `obj1.tar` and `obj1.info` from bucket `ais://bck1` to `ais://bck2`, and wait until the operation finishes

```console
$ ais cp ais://bck1 ais://bck2 --list obj1.tar,obj1.info --wait
copying objects operation ("ais://bck1" => "ais://bck2") is in progress...
copying objects operation succeeded.
```

Copy object with pattern matching: copy `obj2`, `obj3`, and `obj4` from `ais://bck1` to `ais://bck2`.
Do not wait for the operation is done.
**2.** Copy objects matching Bash brace-expansion `obj{2..4}, do not wait for the operation is done.

```console
$ ais cp ais://bck1 ais://bck2 --template "obj{2..4}"
copying objects operation ("ais://bck1" => "ais://bck2") is in progress...
To check the status, run: ais show job xaction copy-bck ais://bck2
```

**3.** Use `--sync` option to copy remote virtual subdirectory

```console
$ ais cp gs://coco-dataset --sync --prefix d-tokens
Copying objects gs://coco-dataset. To monitor the progress, run 'ais show job tco-kJPUtYJld'
```

In the example, `--sync` synchronizes destination bucket with its remote (e.g., Cloud) source.

In particular, the option will make sure that aistore has the **latest** versions of remote objects _and_ may also entail **removing** of the objects that no longer exist remotely

### See also

* [Out of band updates](/docs/out_of_band.md)

## Show bucket summary

`ais storage summary [command options] PROVIDER:[//BUCKET_NAME] - show bucket sizes and the respective percentages of used capacity on a per-bucket basis
Expand Down
Loading

0 comments on commit 857604c

Please sign in to comment.