Skip to content

NGPU-LM (N-Gram LM on GPU) + Transducer greedy decoding (RNN-T, TDT) #10989

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 106 commits into from
Mar 18, 2025

Conversation

artbataev
Copy link
Collaborator

@artbataev artbataev commented Oct 22, 2024

What does this PR do ?

Implements NGPU-LM (N-Gram LM on GPU) and transducer (RNN-T, TDT) greedy decoding with NGPU-LM.
Paper: "NGPU-LM: GPU-Accelerated N-Gram Language Model for Context-Biasing in Greedy ASR Decoding" https://arxiv.org/abs/2505.22857

Results for nvidia/parakeet-rnnt-1.1b and nvidia/parakeet-tdt-1.1b on SLURP (out-of-domain):

Model Decoding LM weight* SLURP Test, WER %↓ RTFx↑
RNN-T greedy - 17.92 529
RNN-T greedy 0.8 16.40 520
TDT greedy - 17.47 553
TDT greedy 0.7 15.96 545

*LM weight - optimal on SLURP dev set with step 0.1.

Batch size 32, high fp32 matmul precision, sorted manifest, A6000 GPU.
6-gram LM is built on SLURP train texts.

Overhead for using LM - less than 2% (with CUDA graphs).

Collection: [ASR]

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

Step 1: Build LM for ASR Model (tokenizer-dependent) on domain texts

python nemo/scripts/asr_language_modeling/ngram_lm/train_kenlm.py \
      nemo_model_file="nvidia/parakeet-rnnt-1.1b" \
      train_paths=["<train_manifest>"] \
      kenlm_bin_path=$KENLM_BIN_PATH \
      kenlm_model_file=parakeet-rnnt-1.1b_lm-o6.arpa \
      ngram_length=6 \
      preserve_arpa=true \
      save_nemo=true

Step 2: Run Decoding with LM

python examples/asr/speech_to_text_eval.py \
   pretrained_name="nvidia/parakeet-rnnt-1.1b" \
   dataset_manifest=<dataset_manifest>  \
   batch_size=32 \
   output_filename=decoded.jsonl \
   rnnt_decoding.strategy="greedy_batch" \
   rnnt_decoding.greedy.ngram_lm_model="parakeet-rnnt-1.1b_lm-o6.arpa.nemo" \
   rnnt_decoding.greedy.ngram_lm_alpha=0.5

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

@github-actions github-actions bot added the ASR label Oct 22, 2024
Copy link
Contributor

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
@github-actions github-actions bot added the stale label Nov 6, 2024
@artbataev artbataev removed the stale label Nov 6, 2024
artbataev and others added 4 commits November 11, 2024 19:29
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
@github-actions github-actions bot added the stale label Nov 29, 2024
@github-actions github-actions bot closed this Dec 6, 2024
@artbataev artbataev removed the stale label Dec 13, 2024
@artbataev artbataev reopened this Dec 13, 2024
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
@ko3n1g ko3n1g added Run CICD and removed Run CICD labels Mar 18, 2025
…with `FastNGramLM`

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
@ko3n1g ko3n1g added Run CICD and removed Run CICD labels Mar 18, 2025
Copy link
Collaborator

@lilithgrigoryan lilithgrigoryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks!

Copy link
Collaborator

@andrusenkoau andrusenkoau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you!

@artbataev artbataev enabled auto-merge (squash) March 18, 2025 13:41
@ko3n1g ko3n1g added Run CICD and removed Run CICD labels Mar 18, 2025
@artbataev artbataev merged commit 8960707 into NVIDIA:main Mar 18, 2025
213 checks passed
@artbataev artbataev deleted the gpu_lm_decoding branch March 19, 2025 06:53
cspades pushed a commit that referenced this pull request Mar 24, 2025
* `FastNGramLM` - N-Gram LM on GPU (GPULM)
* `KenLMBatchedWrapper` for reference and testing purposes
* Integrate `FastNGramLM` into Greedy RNN-T and TDT decoding (label-looping)
---------

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>
nitin9252 pushed a commit to nitin9252/NeMo that referenced this pull request Apr 7, 2025
)

* `FastNGramLM` - N-Gram LM on GPU (GPULM)
* `KenLMBatchedWrapper` for reference and testing purposes
* Integrate `FastNGramLM` into Greedy RNN-T and TDT decoding (label-looping)
---------

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>
nitin9252 pushed a commit to nitin9252/NeMo that referenced this pull request Apr 7, 2025
)

* `FastNGramLM` - N-Gram LM on GPU (GPULM)
* `KenLMBatchedWrapper` for reference and testing purposes
* Integrate `FastNGramLM` into Greedy RNN-T and TDT decoding (label-looping)
---------

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>
Signed-off-by: Nitin <28332485+nitin9252@users.noreply.github.com>
@artbataev artbataev changed the title N-Gram LM on GPU + Transducer greedy decoding (RNN-T, TDT) NGPU-LM (N-Gram LM on GPU) + Transducer greedy decoding (RNN-T, TDT) May 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ASR core Changes to NeMo Core Run CICD
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants