Skip to content

Releases: NVIDIA/NeMo

NVIDIA Neural Modules 2.2.0rc2

17 Feb 17:04
798b676
Compare
Choose a tag to compare
Pre-release

Prerelease: NVIDIA Neural Modules 2.2.0rc2 (2025-02-17)

NVIDIA Neural Modules 2.2.0rc1

04 Feb 08:02
18e2bd8
Compare
Choose a tag to compare
Pre-release

Prerelease: NVIDIA Neural Modules 2.2.0rc1 (2025-02-04)

NVIDIA Neural Modules 2.2.0rc0

02 Feb 23:30
2f66ada
Compare
Choose a tag to compare
Pre-release

Prerelease: NVIDIA Neural Modules 2.2.0rc0 (2025-02-02)

NVIDIA Neural Modules 2.1.0

03 Jan 10:31
633cb60
Compare
Choose a tag to compare

Highlights

  • Training
    • Fault Tolerance
      • Straggler Detection
      • Auto Relaunch
  • LLM & MM
    • MM models
      • Llava-next
      • Llama 3.2
    • Sequence Model Parallel for NeVa
    • Enable Energon
    • SigLIP (NeMo 1.0 only)
    • LLM 2.0 migration
      • Starcoder2
      • Gemma 2
      • T5
      • Baichuan
      • BERT
      • Mamba
      • ChatGLM
    • DoRA support
  • Export
    • Nemo 2.0 base model export path for NIM
    • PTQ in Nemo 2.0
  • ASR
    • Timestamps with TDT decoder
    • Timestamps option with .transcribe()

Detailed Changelogs:

ASR

Changelog

TTS

Changelog

NLP / NMT

Changelog

Text Normalization / Inverse Text Normalization

Changelog

Export

Changelog

Bugfixes

Changelog

Uncategorized:

Changelog
Read more

NVIDIA Neural Modules 2.1.0rc2

21 Dec 18:54
49ef560
Compare
Choose a tag to compare
Pre-release

Prerelease: NVIDIA Neural Modules 2.1.0rc2 (2024-12-21)

NVIDIA Neural Modules 2.1.0rc1

20 Dec 08:48
526a525
Compare
Choose a tag to compare
Pre-release

Prerelease: NVIDIA Neural Modules 2.1.0rc1 (2024-12-20)

NVIDIA Neural Modules 2.1.0rc0

11 Dec 23:16
ceeafa4
Compare
Choose a tag to compare
Pre-release
[🤠]: Howdy folks, let's release NeMo `r2.1.0` ! (#11556)

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: pablo-garay <[email protected]>

NVIDIA Neural Modules 2.0.0

14 Nov 18:57
e938df3
Compare
Choose a tag to compare

Highlights

Large language models & Multi modal

  • Training
    • Long context recipe
    • PyTorch Native FSDP 1
  • Models
    • Llama 3
    • Mixtral
    • Nemotron
  • NeMo 1.0
    • SDXL (text-2-image)
    • Model Opt
      • Depth Pruning (docs)
      • Logit based Knowledge Distillation (docs)

Export

  • TensorRT-LLM v0.12 integration
  • LoRA support for vLLM
  • FP8 checkpoint

ASR

  • Parakeet large (ASR with PnC model)
  • Added Uzbek offline and Gregorian streaming models
  • Optimization feature for efficient bucketing to improve bs consumption on GPUs

Detailed Changelogs

ASR

Changelog

TTS

Changelog

NLP / NMT

Changelog

NVIDIA Neural Modules 2.0.0rc1

15 Aug 21:55
579983f
Compare
Choose a tag to compare

Highlights

Large language models

  • PEFT: QLoRA support, LoRA/QLora for Mixture-of-Experts (MoE) dense layer
  • State Space Models & Hybrid Architecture support (Mamba2 and NV-Mamba2-hybrid)
  • Support Nemotron, Minitron, Gemma2, Qwen, RAG
  • Custom Tokenizer training in NeMo
  • Update the Auto-Configurator for EP, CP and FSDP

Multimodal

  • NeVA: Add SOTA LLM backbone support (Mixtral/LLaMA3) and suite of model parallelism support (PP/EP)
  • Support Language Instructed Temporal-Localization Assistant (LITA) on top of video NeVA

ASR

  • SpeechLM and SALM
  • Adapters for Canary Customization
  • Pytorch allocator in PyTorch 2.2 improves training speed up to 30% for all ASR models
  • Cuda Graphs for Transducer Inference
  • Replaced webdataset with Lhotse - gives up to 2x speedup
  • Transcription Improvements - Speedup and QoL Changes
  • ASR Prompt Formatter for multimodal Canary

Export & Deploy

  • In framework PyTriton deployment with backends: - PyTorch - vLLM - TRT-LLM update to 0.10
  • TRT-LLM C++ runtime

Detailed Changelogs

ASR

Changelog

TTS

Changelog

LLM/Multimodal

Changelog
Read more

NVIDIA Neural Modules 2.0.0rc0

06 Jun 05:46
Compare
Choose a tag to compare

Highlights

LLM and MM

Models

  • Megatron Core RETRO

    • Pre-training
    • Zero-shot Evaluation
  • Pretraining, conversion, evaluation, SFT, and PEFT for:

    • Mixtral 8X22B
    • Llama 3
    • SpaceGemma
  • Embedding Models Fine Tuning

    • Mistral
    • BERT
  • BERT models

    • Context Parallel
    • Distributed checkpoint
  • Video capabilities with NeVa

Performance

  • Distributed Checkpointing

    • Torch native backend
    • Parallel read/write
    • Async write
  • Multimodal LLM (LLAVA/NeVA)

    • Pipeline Parallelism support
    • Sequence packing support

Export

  • Integration of Export & Deploy Modules into NeMo Framework container
    • Upgrade to TRT-LLM 0.9

Speech (ASR & TTS)

Models

  • AED Multi Task Models (Canary) - Multi-Task Multi-Lingual Speech Recognition / Speech Translation model
  • Multimodal Domain - Speech LLM supporting SALM Model
  • Parakeet-tdt_ctc-1.1b Model - RTFx of > 1500 (can transcribe 1500 seconds of audio in 1 second)
  • Audio Codec 16kHz Small - NeMo Neural Audio Codec for discretizing speech for use in LLMs
    • mel_codec_22khz_medium
    • mel_codec_44khz_medium

Perf Improvements

  • Transcribe() upgrade - Enables one line transcribe with files, tensors, data loaders
  • Frame looping algorithm for RNNT faster decoding - Improves Real Time Factor (RTF) by 2-3x
  • Cuda Graphs + Label-Looping algorithm for RNN-T and TDT Decoding - Transducer Greedy decoding at over 1500x RTFx, on par with CTC Non-Autoregressive models
  • Semi Sorted Batching support - External User contribution that speeds up training by 15-30%.

Customization

  • Context biasing for CTC word stamping - Improve accuracy for custom vocabulary and pronunciation
    • Longform Inference
    • Longform inference support for AED models
  • Transcription of multi-channel audio for AED models

Misc

  • Upgraded webdataset - Speech and LLM / Multimodal unified container

Detailed Changelogs

ASR

Changelog
  • Enable using hybrid asr models in CTC Segmentation tool by @erastorgueva-nv :: PR: #8828
  • TDT confidence fix by @GNroy :: PR: #8982
  • Fix union type annotations for autodoc+mock-import rendering by @pzelasko :: PR: #8956
  • NeMo dev doc restructure by @yaoyu-33 :: PR: #8896
  • Improved random seed configuration for Lhotse dataloaders with docs by @pzelasko :: PR: #9001
  • Fix #8948, allow preprocessor to be stream captured to a cuda graph when doing per_feature normalization by @galv :: PR: #8964
  • [ASR] Support for transcription of multi-channel audio for AED models by @anteju :: PR: #9007
  • Add ASR latest news by @titu1994 :: PR: #9073
  • Fix docs errors and most warnings by @erastorgueva-nv :: PR: #9006
  • PyTorch CUDA allocator optimization for dynamic batch shape dataloading in ASR by @pzelasko :: PR: #9061
  • RNN-T and TDT inference: use CUDA graphs by default by @artbataev :: PR: #8972
  • Fix #8891 by supported GPU-side batched CTC Greedy Decoding by @galv :: PR: #9100
  • Update branch for notebooks and ci in release by @ericharper :: PR: #9189
  • Enable CUDA graphs by default only for transcription by @artbataev :: PR: #9196
  • rename paths2audiofiles to audio by @nithinraok :: PR: #9209
  • Fix ASR_Context_Biasing.ipynb contains FileNotFoundError by @andrusenkoau :: PR: #9233
  • Cherrypick: Support dataloader as input to audio for transcription (#9201) by @titu1994 :: PR: #9235
  • Update Online_Offline_Microphone_VAD_Demo.ipynb by @stevehuang52 :: PR: #9252
  • Dgalvez/fix greedy batch strategy name r2.0.0rc0 by @galv :: PR: #9243
  • Accept None as an argument to decoder_lengths in GreedyBatchedCTCInfer::forward by @galv :: PR: #9246
  • Fix loading github raw images on notebook by @nithinraok :: PR: #9282
  • typos by @nithinraok :: PR: #9314
  • Re-enable cuda graphs in training modes. by @galv :: PR: #9338
  • add large model stable training fix and contrastive loss update for variable seq by @nithinraok :: PR: #9259
  • Fix conv1d package in r2.0.0rc0 by @pablo-garay :: PR: #9369
  • Fix GreedyBatchedCTCInfer regression from GreedyCTCInfer. (#9347) by @titu1994 :: PR: #9350
  • Make a backward compatibility for old MSDD configs in label models by @tango4j :: PR: #9377
  • Force diarizer to use CUDA if cuda is available and if device=None. by @tango4j :: PR: #9380

TTS

Changelog

LLM and MM

Changelog

Export

Changelog

General Improvements

Changelog
Read more