Skip to content

Issue with Language Specific Transcription Using txtai and Whisper #593

@Nondzu

Description

@Nondzu

Environment

  • txtai version: 6.2.0
  • whisper version:
  • Python version: 3.11.5
  • Operating System:
    Description: Linux Mint 21.2
    Release: 21.2
    Codename: victoria

Description

I'm attempting to transcribe Polish audio using the Whisper model within txtai, and while I am able to get transcriptions, they appear to be in English rather than the native language of the audio.

Here's a snippet of the code I'm using:

from txtai.transcription import Transcription

transcribe = Transcription("openai/whisper-large-v2")
for text in transcribe(files):
    print(text)

Questions

  1. Does txtai's transcription feature automatically translate the text to English, or is it supposed to return text in the language of the audio?
  2. How can I disable any automatic translation feature or specify the language of the audio to ensure that the transcription is in Polish?

Any guidance or suggestions on this matter would be greatly appreciated.

Thank you!

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions