218 lines
6.8 KiB
Markdown
218 lines
6.8 KiB
Markdown
# YouTube Auto Dub
|
|
|
|
YouTube Auto Dub is a Python pipeline that downloads a YouTube video, transcribes its speech with Whisper, translates the subtitle text through a local LM Studio server, and renders a subtitled output video.
|
|
|
|
## What Changed
|
|
|
|
- Translation now uses an OpenAI-compatible `/v1/chat/completions` endpoint.
|
|
- Google Translate scraping has been removed from the active runtime path.
|
|
- OpenAI compatible backend is now the default with no option for Google Translate.
|
|
- Translation settings can be configured with environment variables or CLI flags.
|
|
|
|
## Requirements
|
|
|
|
- Python 3.10+
|
|
- [uv](https://docs.astral.sh/uv/)
|
|
- FFmpeg and FFprobe available on `PATH`
|
|
- An OpenAI-compatible server
|
|
|
|
## Setup
|
|
|
|
Create a UV-managed virtual environment in a repo subfolder and install dependencies:
|
|
|
|
```powershell
|
|
uv venv --python "C:\pinokio\bin\miniconda\python.exe" .venv
|
|
uv pip install --python .venv\Scripts\python.exe -r requirements.txt
|
|
```
|
|
|
|
Verify the local toolchain:
|
|
|
|
```powershell
|
|
.venv\Scripts\python.exe --version
|
|
ffmpeg -version
|
|
ffprobe -version
|
|
.venv\Scripts\python.exe main.py --help
|
|
```
|
|
|
|
## LM Studio Configuration
|
|
|
|
Start LM Studio's local server and load a translation-capable model. The default model name in this repo is:
|
|
|
|
```text
|
|
gemma-3-4b-it
|
|
```
|
|
|
|
If your local LM Studio model name differs, set it with an environment variable or `--lmstudio-model`.
|
|
|
|
### Environment Variables
|
|
|
|
```powershell
|
|
$env:LM_STUDIO_BASE_URL="http://127.0.0.1:1234/v1"
|
|
$env:LM_STUDIO_API_KEY="lm-studio"
|
|
$env:LM_STUDIO_MODEL="gemma-3-4b-it"
|
|
```
|
|
|
|
Defaults if unset:
|
|
|
|
- `LM_STUDIO_BASE_URL=http://127.0.0.1:1234/v1`
|
|
- `LM_STUDIO_API_KEY=lm-studio`
|
|
- `LM_STUDIO_MODEL=gemma-3-4b-it`
|
|
|
|
## Usage
|
|
|
|
Basic example:
|
|
|
|
```powershell
|
|
.venv\Scripts\python.exe main.py "https://youtube.com/watch?v=VIDEO_ID" --lang es
|
|
```
|
|
|
|
### Gradio Web UI
|
|
|
|
Gradio provides a local browser UI for starting dub jobs, watching progress, and downloading finished videos:
|
|
|
|
```powershell
|
|
.venv\Scripts\python.exe web_app.py
|
|
```
|
|
|
|
Open `http://127.0.0.1:7860` and submit a YouTube URL. Jobs run through the same `main.py` pipeline, so the CLI options and environment variables still apply.
|
|
|
|
The OpenAI-compatible translation endpoint, API key, and model can be changed in the UI under **OpenAI-Compatible Settings**. Click **Save Settings** to persist them to `.cache/web_settings.json` for future web jobs. Unsaved values in the fields are still used for the next job you start.
|
|
|
|
You can also upload a local `.mp4` instead of entering a YouTube URL. Uploaded videos are staged under `.cache/uploads` and processed with the same transcription, translation, dubbing, and render pipeline. Restricted YouTube videos can use the **Upload Cookies File** control instead of typing a local cookies path.
|
|
|
|
The web UI automatically refreshes job status, progress, steps, and output choices every few seconds while it is open. The manual **Refresh** button is still available.
|
|
|
|
Translations and raw TTS clips are cached under `.cache/translations` and `.cache/tts`. This lets reruns skip work that already succeeded, which is especially useful after transient TTS failures. Set `TRANSLATION_CACHE_ENABLED=0` or `TTS_CACHE_ENABLED=0` to disable those caches.
|
|
|
|
### Docker
|
|
|
|
Build and run the Gradio UI in a container:
|
|
|
|
```powershell
|
|
docker build -t youtube-auto-dub:gradio .
|
|
docker run --rm -p 7860:7860 `
|
|
-e LM_STUDIO_BASE_URL=http://host.docker.internal:1234/v1 `
|
|
-e LM_STUDIO_API_KEY=lm-studio `
|
|
-e LM_STUDIO_MODEL=gemma-3-4b-it `
|
|
-v ${PWD}\.cache:/app/.cache `
|
|
-v ${PWD}\output:/app/output `
|
|
-v ${PWD}\logs:/app/logs `
|
|
-v ${PWD}\temp:/app/temp `
|
|
youtube-auto-dub:gradio
|
|
```
|
|
|
|
Or use Compose:
|
|
|
|
```powershell
|
|
docker compose up --build
|
|
```
|
|
|
|
When LM Studio runs on the host machine, use `http://host.docker.internal:1234/v1` from inside Docker instead of `http://127.0.0.1:1234/v1`.
|
|
|
|
Override the LM Studio endpoint or model from the CLI:
|
|
|
|
```powershell
|
|
.venv\Scripts\python.exe main.py "https://youtube.com/watch?v=VIDEO_ID" `
|
|
--lang fr `
|
|
--translation-backend lmstudio `
|
|
--lmstudio-base-url http://127.0.0.1:1234/v1 `
|
|
--lmstudio-model gemma-3-4b-it
|
|
```
|
|
|
|
Authentication options for restricted videos still work as before:
|
|
|
|
```powershell
|
|
.venv\Scripts\python.exe main.py "https://youtube.com/watch?v=VIDEO_ID" --lang ja --browser chrome
|
|
.venv\Scripts\python.exe main.py "https://youtube.com/watch?v=VIDEO_ID" --lang de --cookies cookies.txt
|
|
```
|
|
|
|
Process a local MP4:
|
|
|
|
```powershell
|
|
.venv\Scripts\python.exe main.py --input-file "C:\path\to\video.mp4" --lang es
|
|
```
|
|
|
|
## CLI Options
|
|
|
|
| Option | Description |
|
|
| --- | --- |
|
|
| `url` | YouTube video URL to process |
|
|
| `--input-file` | Local MP4 file to process instead of a YouTube URL |
|
|
| `--lang`, `-l` | Target language code |
|
|
| `--browser`, `-b` | Browser name for cookie extraction |
|
|
| `--cookies`, `-c` | Path to exported cookies file |
|
|
| `--gpu` | Prefer GPU acceleration when CUDA is available |
|
|
| `--whisper_model`, `-wm` | Override Whisper model |
|
|
| `--translation-backend` | Translation backend, currently `lmstudio` |
|
|
| `--lmstudio-base-url` | Override LM Studio base URL |
|
|
| `--lmstudio-model` | Override LM Studio model name |
|
|
|
|
## Translation Behavior
|
|
|
|
The LM Studio translator is tuned for subtitle-like text:
|
|
|
|
- preserves meaning, tone, and intent
|
|
- keeps punctuation natural
|
|
- returns translation text only
|
|
- preserves line and segment boundaries
|
|
- leaves names, brands, URLs, emails, code, and proper nouns unchanged unless transliteration is clearly needed
|
|
- avoids commentary, summarization, and censorship
|
|
|
|
Translation is currently performed segment-by-segment to keep subtitle ordering deterministic and reduce the risk of malformed batched output corrupting timing alignment.
|
|
|
|
## Testing
|
|
|
|
Run the focused validation suite:
|
|
|
|
```powershell
|
|
.venv\Scripts\python.exe -m pytest
|
|
.venv\Scripts\python.exe main.py --help
|
|
```
|
|
|
|
The tests cover:
|
|
|
|
- LM Studio request payload construction
|
|
- response parsing
|
|
- retry handling for transient HTTP failures
|
|
- empty or malformed response handling
|
|
- CLI and environment config precedence
|
|
|
|
## Troubleshooting
|
|
|
|
### LM Studio connection errors
|
|
|
|
- Make sure LM Studio's local server is running.
|
|
- Confirm the base URL ends in `/v1`.
|
|
- Check that the loaded model name matches `LM_STUDIO_MODEL` or `--lmstudio-model`.
|
|
|
|
### Empty or malformed translations
|
|
|
|
- Try a stronger local instruction-tuned model if your current model ignores formatting.
|
|
- Keep LM Studio in non-streaming OpenAI-compatible mode.
|
|
- Review the server logs for model-side failures.
|
|
|
|
### FFmpeg missing
|
|
|
|
If startup reports missing `ffmpeg` or `ffprobe`, install FFmpeg and add it to your system `PATH`.
|
|
|
|
## Project Layout
|
|
|
|
```text
|
|
youtube-auto-dub/
|
|
|-- main.py
|
|
|-- requirements.txt
|
|
|-- language_map.json
|
|
|-- README.md
|
|
|-- LM_STUDIO_MIGRATION.md
|
|
|-- src/
|
|
| |-- core_utils.py
|
|
| |-- engines.py
|
|
| |-- media.py
|
|
| |-- translation.py
|
|
| `-- youtube.py
|
|
`-- tests/
|
|
|-- conftest.py
|
|
|-- test_main_cli.py
|
|
`-- test_translation.py
|
|
```
|