Whispering Tiger – Detailed Features

At a Glance

Whispering Tiger unifies Speech Recognition, Real-time & Batch Translation, Text-to-Speech, Image / Screen OCR, In-Game Text Monitoring, and a rich Plugin & Automation system – all running locally for privacy and performance.

Jump to: Languages Speech Recognition Translation Text-to-Speech OCR & Screen Monitoring Automation & Plugins Performance Workflow Features Privacy FAQ

Supported Languages

General Coverage

Depending on the selected models, Whispering Tiger can understand or translate between 100 to 200+ languages. Coverage varies per model family (see sections below). Many pipelines allow language autodetection.

ASR (Speech-to-Text): Typical Whisper-family models cover ~99 languages; with support to translate to English. Other models already support translation directly to other languages as well.
Text Translation: Multi-lingual models (e.g. NLLB-200, M2M100) reach 200 languages; With local Large-Language Models (LLMs) with high translation accuracy.
TTS: Many Engines; some offer dozens of voices across 100+ languages, others focus on a single language. With some even supporting voice cloning with a short audio sample.
OCR: OCR backends (e.g. Easy OCR, GOT OCR 2.0, Phi-4 or Tesseract via plugin) support large multilingual sets.

Speech Recognition (ASR)

Supported Model Families

Whisper original (multiple sizes: tiny → large-v3)
Transformer-Whisper (Supporting Flash-Attention)
Faster-Whisper / CTranslate2 optimized variants
With support for custom user-provided models
Seamless M4T
MMS
Speech T5
Wav2Vec Bert 2.0
NeMo Canary / Parakeet
Phi-4 (LLM)
Voxtral (LLM)
+ Cloud Models like OpenAI, Google, etc. via Plugins

UI lets you pick size vs accuracy, precision / quantization, used hardware device and VAD (voice activity detection) options.

Features

Real-time streaming or batch file transcription
Automatic language detection
Timestamped segments with optional word-level timing (when backend supports it)
Customizable VAD sensitivity and chunk parameters
Segment post-processing (casing, punctuation, number formatting)
On-the-fly translation (transcribe + translate to target)
Profile-based hardware tuning (threads, GPU, fp16/int8/4-bit quantization)
Subtitle generation and translation (SRT, VTT, etc.)

Text & Speech Translation

Model Backends

NLLB-200 + CTranslate2 optimized with (200 languages)
M2M100 (100 languages)
Seamless M4T (101 languages)
Phi-4 (23 languages)
Voxtral (13 languages)
+ Cloud Models like DeepL, OpenAI, Google, etc. via Plugins

Capabilities

Translation of text files or clipboard contents
Real-time speech → text → translated text workflow
Chained pipelines: ASR → MT → TTS for spoken output
Automatic source language detection & fallback strategies

Text-to-Speech (TTS) & Voice

Engines / Approaches

Silero (fast, CPU-friendly, multilingual)
Chatterbox (fast, high-quality, multilingual, expressive TTS with voice-cloning)
Kokoro (fast, high-quality, multilingual)
F5/E2 (high-quality, expressive TTS with voice-cloning)
Zonos (high-quality, expressive TTS with voice-cloning and many languages)
Orpheus (high-quality, expressive TTS with voice-cloning)
+ Plugins for more TTS options
- Bark with expressive generation
- Coqui TTS (neural multi-speaker models)
- Voice Conversion / Cloning plugins (RVCv2)
- Cloud Models like OpenAI, Google etc.

Features

Selectable voice, speed, pitch (engine dependent)
Text normalization & sentence splitting
Streamed audio generation for faster playback
Audio output device selection for multiple devices without the need of VoiceMeeter
For output into voice-chat applications like Games, a virtual audio device is required

Image / Screen OCR

OCR Backends

Easy OCR (fast & accurate for many scripts)
GOT OCR 2.0 (accurate for many scripts and complicated layouts)
Phi-4 (accurate for many scripts and complicated layouts with Large-Language-Model)
Tesseract (if plugin installed)
Game / screen region capture overlays

Capabilities

Configurable screen area polling
Automatic text translation of OCR results
Overlay and subtitle-mode output
Text-to-Speech playback of translated lines
Filtering & deduplication of repeated frames
Logged transcripts for later review

Automation & Plugins

Plugin System

Extend core capabilities without modifying the base application.

Install / update / remove inside the UI
Easy Plugin API (see plugin documentation)
Extensive Plugin Configuration UI
Events (stt, tts, sts and further custom plugin events like on_plugin_tts, on_audio_processor etc.)
Access to application, model & pipeline configuration
Fast Audio processing Functions usable by Plugins like resampling, normalization, etc.
Community-driven enhancements (see plugin list)

Automation Features

Profiles for different tasks (gaming, meetings, game translation etc.)
Custom hotkeys (start/stop, push-to-talk)
Auto model download
Update management for installed plugins

Performance & Optimization

Acceleration & Efficiency

GPU acceleration (CUDA, CPU, with possible ROCm support in the future)
Mixed precision / fp16 inference
Quantized model support (int8 / 4-bit) for low VRAM systems
Thread & batch tuning per model

Resource Management

On-demand model loading
Disk cache for reused weights
Offline mode as main focus with optional Plugins for online functionality

User Workflow Features

Interface & Usability

Tabbed UI for distinct tasks (ASR, Translation, TTS, OCR)
Central YAML Configuration file for easy sharing & collaboration (make sure to exclude sensitive information when sharing)
Dark theme focused on readability
Status indicators
Websocket support for integration into other applications like Streaming software or Virtual Reality overlays
VRChat Support via OSC

Output & Export

Subtitle formats (SRT / VTT) (when plugin enabled)
Plain text transcripts
Timestamp annotation
HTML files with integration for overlays and other use cases

Privacy & Local-First Design

Key Principles

All processing can run fully offline
No audio/text sent to external servers unless a plugin explicitly does so (clearly indicated)
User-controlled model and cache directories
Transparent logs; easy to audit behavior
Optional Error Reporting to help improve the system with user selective attached information

FAQ & Notes

Is this list exhaustive?

No. New plugins, models and features appear frequently. For the freshest additions check the GitHub repositories and plugin index.

Performance varies?

Yes. Model size, hardware (CPU/GPU), quantization, and concurrent tasks all influence throughput and latency. Profiles help tune these.

Where to report issues?

Open an issue on the main UI repository on GitHub, Join the Discord server
or send a report using the in-app feedback tool.

Model names & capabilities are provided for orientation only. Refer to each model's own license & repository for authoritative details.