-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds neuron support #486
Adds neuron support #486
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Summary
This PR adds AWS Neuron support to the Infinity project, enabling deployment on AWS Inferentia hardware through Docker containers and ECS tasks with optimized model inference.
- Added
NeuronOptimumEmbedder
in/libs/infinity_emb/infinity_emb/transformer/embedder/neuron.py
with dynamic batch size support and proper core detection - Added AWS Neuron base Dockerfile
/infra/aws_neuron/Dockerfile.base
with neuronx packages and runtime configuration - Added deployment instructions in
/infra/aws_neuron/README.md
for EC2 and ECS with Huggingface AMI - Added proper device mounting and IPC configuration in ECS task definition for Neuron accelerator access
- Added version-pinned dependencies in
reqs_frozen.txt
for reproducible Neuron builds
7 file(s) reviewed, 19 comment(s)
Edit PR Review Bot Settings | Greptile
# RUN pip3 install \ | ||
# neuronx-cc==2.15.143.0 \ | ||
# torch-neuronx==2.1.2.2.3.2 \ | ||
# transformers-neuronx==0.12.313 \ | ||
# libneuronxla==2.0.5347.0 \ | ||
# --extra-index-url=https://pip.repos.neuron.amazonaws.com |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Remove commented out code block - it's obsolete and could cause confusion
RUN pip3 install --upgrade \ | ||
neuronx-cc==2.* \ | ||
libneuronxla==2.0.5347.0 \ | ||
torch-neuronx==2.1.2.2.2.0 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: torch-neuronx version 2.1.2.2.2.0 is older than the commented out version 2.1.2.2.3.2 above - verify this downgrade was intentional
# COPY reqs_frozen.txt reqs_frozen.txt | ||
# RUN pip3 install -r reqs_frozen.txt | ||
# Install optimum-neuron | ||
#14 19.70 Successfully installed aiohappyeyeballs-2.4.4 aiohttp-3.11.9 aiosignal-1.3.1 async-timeout-5.0.1 attrs-24.2.0 coloredlogs-15.0.1 datasets-3.1.0 dill-0.3.8 frozenlist-1.5.0 fsspec-2024.9.0 humanfriendly-10.0 multidict-6.1.0 multiprocess-0.70.16 optimum-1.18.0 optimum-neuron-0.0.1 pandas-2.2.3 propcache-0.2.1 pyarrow-18.1.0 pytz-2024.2 requests-2.32.3 sentencepiece-0.2.0 tokenizers-0.15.2 transformers-4.39.3 tzdata-2024.2 xxhash-3.5.0 yarl-1.18.3 | ||
# RUN pip3 install optimum[neuronx] --extra-index-url=https://pip.repos.neuron.amazonaws.com |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Remove commented installation commands and build output logs
ENV PATH="/opt/bin/:/opt/aws/neuron/bin:${PATH}" | ||
|
||
FROM neuron AS infinity | ||
RUN apt-get update -y && apt-get install -y nano |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Installing nano editor increases image size unnecessarily - remove if not critical for production
# Is an mirror of | ||
# 763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-inference-neuronx:2.1.2-transformers4.43.2-neuronx-py310-sdk2.20.0-ubuntu20.04 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
syntax: typo in comment: 'Is an mirror' should be 'Is a mirror'
optimum[neuronx]==1.22.0 | ||
orjson==3.10.7 | ||
overrides==7.7.0 | ||
packaging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: packaging package is missing version pin, which could cause dependency conflicts
hf-transfer | ||
httptools==0.6.4 ; python_version >= "3.9" and python_version < "4" | ||
huggingface-hub |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: hf-transfer and huggingface-hub packages missing version constraints. Add specific versions to ensure reproducible builds.
mpmath==1.3.0 ; python_version >= "3.9" and python_version < "4" | ||
multidict==6.1.0 ; python_version >= "3.9" and python_version < "4" | ||
multiprocess==0.70.15 ; python_version >= "3.9" and python_version < "4" | ||
numpy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: numpy package missing version constraint. Should be pinned to ensure compatibility with other dependencies.
scikit-learn | ||
scipy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: scikit-learn and scipy packages missing version constraints. Should be pinned for reproducibility.
aiohappyeyeballs==2.4.3 ; python_version >= "3.9" and python_version < "4" | ||
aiohttp==3.10.10 ; python_version >= "3.9" and python_version < "4" | ||
aiosignal==1.3.1 ; python_version >= "3.9" and python_version < "4" | ||
async-timeout==4.0.3 ; python_version >= "3.9" and python_version < "3.11" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: async-timeout restricted to Python < 3.11 while other packages support up to Python 4. May cause issues with Python 3.11+ environments.
Codecov ReportAll modified and coverable lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #486 +/- ##
==========================================
- Coverage 79.54% 79.48% -0.06%
==========================================
Files 41 41
Lines 3422 3422
==========================================
- Hits 2722 2720 -2
- Misses 700 702 +2 ☔ View full report in Codecov by Sentry. |
Related Issue
Checklist
Additional Notes
Add any other context about the PR here.