-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
research: OCI artifacts #1209
Comments
Summaryormb is a product which used to packet a model into an OCI artifact. However, it will become so slow when model size up to such as In this research, we will investigate the cons of Review of S3
graph LR
1[calculate md5 local] --> 2[upload file and md5]
2[upload file and md5] --> 3[calculate md5 remote]
3[calculate md5 remote] --> 4[validate]
graph LR
1[upload file without md5] --> 2[not validate]
Improve ormb tool1 - no compression
2 - hash once
graph LR
1[calculate sha256 local at ormb] --> 2[calling oras with sha256]
2[calling oras with sha256] --> 3[calculate sha256 local at oras]
3[calculate sha256 local at oras] --> 4[validate]
4[validate] --> 5[Commit]
graph LR
1[calling oras without sha256] --> 2[calculate sha256 local at oras]
2[calculate sha256 local at oras] --> 3[Commit]
Thus, time cost of 3 - new hash algorithmFrom Speed Hashing, we could see
The OCI image-spec pointed out that an image could use any unregistered algorithm for digestion, an unrecognized digested will pass validation. However, in open source registries django-oci and distribution/distribution(the core library for many registry operators including Docker Hub, GitHub Container Registry, GitLab Container Registry and DigitalOcean Container Registry), they would reject any unsupported algorithm. For this reason, we could not pick a faster algorithm, like xxHash. Though opencontainer group proposed a new hash algorithm ConclusionIn the above discussions, we concluded that most of the time consumption of OCI upload is from calculating sha256, while S3 uses contentMd5 to validate uploaded files. Moreover, Though It is impossible to accelerate |
It requires a cryptographic hash algorithm, thus you cannot use something like |
in the last two months, I am developing a LLM model. If you guys have any questions about LLM over 200GB, I am willing to give you feedbcks. |
Description
Using oci artifact standard to store the artifacts/models when developing ML models
Message from the maintainers:
Love this enhancement proposal? Give it a 👍. We prioritise the proposals with the most 👍.
The text was updated successfully, but these errors were encountered: