-
With AWS interoperable API, we can create a multi-shard upload from a source [bucket,object] or a source memory buffer. In the GCS (google cloud cpp) API, there is only an API function ParallelUploadFile for multi-shard upload. However, this function forces the input source to be a local file name. This prevents the flexibility of upload data from a memory buffer or a source object. Can you help provide some hints on how this can be achieved with the latest release of Google Cloud CPP? Thanks. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
You would need to do something like this: namespace gcs = ::google::cloud::storage;
google::cloud::StatusOr<gcs::ObjectMetadata> MultiShardUpload(
gcs::Client client, std::string const& bucket, std::string const& object,
std::string const& scratch_area_prefix,
int shard_count, std::string const& buffer) {
auto const shard_size = buffer.size() / shard_count; // maybe adjust shard_count and shard_size if too small
std::vector<std::size_t> offset(shard_count);
std::iota(offset.begin(), offset.end(), 0);
std::transform(offset.begin(), offset.end(), offset.begin(), [&](auto v) { return v * shard_size; });
std::vector<std::future<StatusOr<gcs::ObjectMetadata>> tasks(shard_count);
std::transform(offset.begin(), offset.end(), [&](auto o) {
return std::async(std::launch::async, [&] {
auto const len = std::max(o + shard_size, buffer.size() - o);
return client.InsertObject(bucket, scratch_area_prefix + "/shard@" + std::to_string(offset), buffer.substr(o, len));
});
});
std::vector<gcs::ComposeSourceObject> shards(shard_count);
std::transform(tasks.begin(), tasks.end(), shards.begin(), [](auto f) {
auto metadata = f.get().value(); // note the lack of error handling
return ComposeSourceObject{/*object_name=*/metadata.name(), /*generation=*/metadata.generation()};
});
auto object = gcs::ComposeMany(client, bucket_name, shards, scratch_area_prefix, object_name, true);
for (auto& s : shards) client.DeleteObject(bucket_name, s.object_name, gcs::Generation(*s.generation()));
return object;
} Note that I have not compiled or tested that code, and it omits a lot of error handling. Note also that there is some unfortunate data copying in that code, we could fix that with some changes to the library. |
Beta Was this translation helpful? Give feedback.
You would need to do something like this: