NGPU-LM (N-Gram LM on GPU) + Transducer greedy decoding (RNN-T, TDT) #10989

artbataev · 2024-10-22T12:54:08Z

What does this PR do ?

Implements NGPU-LM (N-Gram LM on GPU) and transducer (RNN-T, TDT) greedy decoding with NGPU-LM.
Paper: "NGPU-LM: GPU-Accelerated N-Gram Language Model for Context-Biasing in Greedy ASR Decoding" https://arxiv.org/abs/2505.22857

Results for nvidia/parakeet-rnnt-1.1b and nvidia/parakeet-tdt-1.1b on SLURP (out-of-domain):

Model	Decoding	LM weight*	SLURP Test, WER %↓	RTFx↑
RNN-T	greedy	-	17.92	529
RNN-T	greedy	0.8	16.40	520
TDT	greedy	-	17.47	553
TDT	greedy	0.7	15.96	545

*LM weight - optimal on SLURP dev set with step 0.1.

Batch size 32, high fp32 matmul precision, sorted manifest, A6000 GPU.
6-gram LM is built on SLURP train texts.

Overhead for using LM - less than 2% (with CUDA graphs).

Collection: [ASR]

Changelog

Add specific line by line info of high level changes in this PR.

Usage

Step 1: Build LM for ASR Model (tokenizer-dependent) on domain texts

python nemo/scripts/asr_language_modeling/ngram_lm/train_kenlm.py \
      nemo_model_file="nvidia/parakeet-rnnt-1.1b" \
      train_paths=["<train_manifest>"] \
      kenlm_bin_path=$KENLM_BIN_PATH \
      kenlm_model_file=parakeet-rnnt-1.1b_lm-o6.arpa \
      ngram_length=6 \
      preserve_arpa=true \
      save_nemo=true

Step 2: Run Decoding with LM

python examples/asr/speech_to_text_eval.py \
   pretrained_name="nvidia/parakeet-rnnt-1.1b" \
   dataset_manifest=<dataset_manifest>  \
   batch_size=32 \
   output_filename=decoded.jsonl \
   rnnt_decoding.strategy="greedy_batch" \
   rnnt_decoding.greedy.ngram_lm_model="parakeet-rnnt-1.1b_lm-o6.arpa.nemo" \
   rnnt_decoding.greedy.ngram_lm_alpha=0.5

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

nemo/collections/asr/parts/ngram_lm.py

nemo/collections/asr/parts/submodules/rnnt_loop_labels_computer.py

nemo/collections/asr/parts/submodules/tdt_loop_labels_computer.py

github-advanced-security

CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

Signed-off-by: artbataev <artbataev@users.noreply.github.com>

tests/collections/asr/test_ngram_lm.py

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

nemo/collections/asr/parts/submodules/ngram_lm/ngram_lm_batched.py

…with `FastNGramLM` Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

lilithgrigoryan

LGTM! Thanks!

andrusenkoau

Looks good, thank you!

* `FastNGramLM` - N-Gram LM on GPU (GPULM) * `KenLMBatchedWrapper` for reference and testing purposes * Integrate `FastNGramLM` into Greedy RNN-T and TDT decoding (label-looping) --------- Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> Signed-off-by: artbataev <artbataev@users.noreply.github.com> Co-authored-by: artbataev <artbataev@users.noreply.github.com>

) * `FastNGramLM` - N-Gram LM on GPU (GPULM) * `KenLMBatchedWrapper` for reference and testing purposes * Integrate `FastNGramLM` into Greedy RNN-T and TDT decoding (label-looping) --------- Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> Signed-off-by: artbataev <artbataev@users.noreply.github.com> Co-authored-by: artbataev <artbataev@users.noreply.github.com>

) * `FastNGramLM` - N-Gram LM on GPU (GPULM) * `KenLMBatchedWrapper` for reference and testing purposes * Integrate `FastNGramLM` into Greedy RNN-T and TDT decoding (label-looping) --------- Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> Signed-off-by: artbataev <artbataev@users.noreply.github.com> Co-authored-by: artbataev <artbataev@users.noreply.github.com> Signed-off-by: Nitin <28332485+nitin9252@users.noreply.github.com>

github-actions bot added the ASR label Oct 22, 2024

github-advanced-security bot found potential problems Oct 22, 2024

View reviewed changes

artbataev force-pushed the gpu_lm_decoding branch from 7283fa1 to 442081f Compare October 22, 2024 16:28

github-actions bot added core Changes to NeMo Core TTS NLP CI common Multi Modal labels Oct 22, 2024

github-advanced-security bot found potential problems Oct 22, 2024

View reviewed changes

artbataev added 2 commits October 22, 2024 20:48

GPU LM Decoding

32aad14

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

Merge branch 'main' into gpu_lm_decoding

0e24533

artbataev force-pushed the gpu_lm_decoding branch from 442081f to 0e24533 Compare October 22, 2024 16:49

github-actions bot removed core Changes to NeMo Core TTS NLP CI common Multi Modal labels Oct 22, 2024

Merge branch 'main' into gpu_lm_decoding

520ec2f

github-actions bot added the stale label Nov 6, 2024

artbataev removed the stale label Nov 6, 2024

artbataev and others added 4 commits November 11, 2024 19:29

Merge branch 'main' into gpu_lm_decoding

5324114

Merge branch 'main' into gpu_lm_decoding

b303ad4

Add copyright

d2c1643

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

Update baseline

5b67a60

Signed-off-by: artbataev <artbataev@users.noreply.github.com>

github-actions bot added the stale label Nov 29, 2024

github-actions bot closed this Dec 6, 2024

artbataev removed the stale label Dec 13, 2024

artbataev reopened this Dec 13, 2024

artbataev added Run CICD and removed Run CICD labels Mar 18, 2025

artbataev requested a review from lilithgrigoryan March 18, 2025 09:57

andrusenkoau reviewed Mar 18, 2025

View reviewed changes

tests/collections/asr/test_ngram_lm.py Show resolved Hide resolved

tests/collections/asr/test_ngram_lm.py Show resolved Hide resolved

Add empty sentence to tests

615e3e7

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

ko3n1g added Run CICD and removed Run CICD labels Mar 18, 2025

lilithgrigoryan reviewed Mar 18, 2025

View reviewed changes

nemo/collections/asr/parts/submodules/ngram_lm/ngram_lm_batched.py Show resolved Hide resolved

nemo/collections/asr/parts/submodules/ngram_lm/ngram_lm_batched.py Outdated Show resolved Hide resolved

Add from_file constructor to KenLMBatchedWrapper for consistency …

10cdb49

…with `FastNGramLM` Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

ko3n1g added Run CICD and removed Run CICD labels Mar 18, 2025

lilithgrigoryan approved these changes Mar 18, 2025

View reviewed changes

andrusenkoau approved these changes Mar 18, 2025

View reviewed changes

artbataev added Run CICD and removed Run CICD labels Mar 18, 2025

artbataev enabled auto-merge (squash) March 18, 2025 13:41

Merge branch 'main' into gpu_lm_decoding

d272295

ko3n1g added Run CICD and removed Run CICD labels Mar 18, 2025

artbataev merged commit 8960707 into NVIDIA:main Mar 18, 2025
213 checks passed

artbataev deleted the gpu_lm_decoding branch March 19, 2025 06:53

This was referenced Mar 21, 2025

AED Decoding with NGPU-LM (N-Gram LM on GPU) #12730

Merged

Rename FastNGramLM -> NGramGPULanguageModel #12755

Merged

artbataev mentioned this pull request Apr 18, 2025

Batched beam search for transducers (RNN-T and TDT) #12729

Merged

8 tasks

lilithgrigoryan mentioned this pull request May 14, 2025

add CTC batched beam search #13337

Merged

8 tasks

artbataev mentioned this pull request May 22, 2025

New streaming RNN-T and TDT inference #9106

Merged

8 tasks

artbataev changed the title ~~N-Gram LM on GPU + Transducer greedy decoding (RNN-T, TDT)~~ NGPU-LM (N-Gram LM on GPU) + Transducer greedy decoding (RNN-T, TDT) May 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

NGPU-LM (N-Gram LM on GPU) + Transducer greedy decoding (RNN-T, TDT) #10989

NGPU-LM (N-Gram LM on GPU) + Transducer greedy decoding (RNN-T, TDT) #10989

Uh oh!

artbataev commented Oct 22, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-advanced-security bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lilithgrigoryan left a comment

Uh oh!

andrusenkoau left a comment

Uh oh!

Uh oh!

Uh oh!

NGPU-LM (N-Gram LM on GPU) + Transducer greedy decoding (RNN-T, TDT) #10989

NGPU-LM (N-Gram LM on GPU) + Transducer greedy decoding (RNN-T, TDT) #10989

Uh oh!

Conversation

artbataev commented Oct 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Changelog

Usage

GitHub Actions CI

Before your PR is "Ready for review"

Who can review?

Additional Information

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-advanced-security bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lilithgrigoryan left a comment

Choose a reason for hiding this comment

Uh oh!

andrusenkoau left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

artbataev commented Oct 22, 2024 •

edited

Loading