Skip to content
/ whishper Public

Transcribe any audio to text, translate and edit subtitles 100% locally with a web UI. Powered by whisper models!

License

Notifications You must be signed in to change notification settings

pluja/whishper

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

AnySub banner

❇️ Open-source, 100% local audio transcription and subtitling suite with a full-featured web UI ❇️


🚧 WORK IN PROGRESS...

Warning

This rewrite is under development. The initial stages of development are focused on enhancing the quality and reliability of the APIs. The goal is to ensure easier scalability, broader compatibility, and overall improved performance. After the APIs are reliable and ready, the focus will move to the implementation of a better web UI.

Tip

The WhisperX API, which powers Anysub, is available for testing. For instructions on how to run it, refer to the README.


βœ… Currently working

  • πŸ—£οΈ Transcribe any media to text: audio, video, etc.
    • Upload a file to transcribe.
    • Speaker detection and diarization.
    • WhisperX alignment.
    • Better segment splitting.
  • 🌐 Translate transcriptions to any language supported by Libretranslate
  • 🏠 100% Local: transcription, translation and subtitle edition happen 100% on your machine (can even work offline!).
  • πŸš€ Fast: uses WhisperX as the Whisper backend: get much faster transcription times on CPU!
  • πŸ“₯ Download transcriptions in:
    • VTT - Speakers colorized
    • ASS - Speakers colorized
    • JSON
    • TXT
  • 🐎 CPU: Anysub is fully optimized to run efficiently on CPU-only systems
  • πŸ”₯ GPU Acceleration: Leverage NVIDIA GPUs to achieve significantly faster transcription times
  • 🦾 Backend workers
    • Anysub can seamlessly orchestrate multiple whisperx-api workers, balancing the job queue across all available resources. Uses asynq.
  • 🐧 User authentication. You can now register multiple users with separate workspaces.

🏁 Todos before release

  • Web UI
    • Create
    • Translate
    • Download subtitles
    • Summarize
    • Subtitle editor
  • Transcribe from URLs (any source supported by yt-dlp)
  • Subtitle editor
    • Transcription highlighting based on media position
    • CPS (Characters per second) warnings
    • Segment splitting
    • Segment insertion
    • Subtitle language selection
  • Quick and easy setup: use the quick start script, or run through a few steps
  • AI summarization of transcriptions: either using OpenAI or Ollama

✨ What's New

  • No longer using MongoDB. Uses an MariaDB backend.
  • Uses WhisperX backend: better accuracy, speaker diarization, alignment...
  • Anysub isn't limited to a single machine! With the worker system, you can set up multiple whisperx-api workers on different servers (or on the same one). Anysub will then handle the tasks, making the best use of all available resources.

πŸ§ͺ Testing

At present, there is no testing documentation. Comprehensive testing guidelines will be provided once the To-Dos Before Release are completed.

The WhisperX-API is available for testing as standalone; check out the README for running instructions.

Development environment

You will need golang, templ, docker, npm and optionally gow.

  1. docker compose up
  2. Run npm run dev to start development environment.
  3. Visit http://localhost:1337

πŸ—ΊοΈ Post-release Roadmap

  • Local folder as media input.
  • Full-text search all transcriptions.
  • Audio recording from the browser.

🧱 Tech Stack