Task 46: Whisper STT for Episodes 01-04
完了 ← タスク一覧
Task 046: Whisper STT for Episodes 01-04
Status: DONE
Motivation
Task 036 built Whisper STT infrastructure and successfully transcribed EP05 (Niconico-only, no YouTube subtitles). Episodes 01-04 have YouTube auto-generated VTT subtitles, but these are known to be unreliable for VOICEROID/software-talk content.
Running Whisper on EP01-04 provides a second, independent transcription source with higher accuracy.
Results
Whisper Quality (small model, CPU)
| EP | Segments | Reliable | Avg LogProb | Whisper Lines | VTT Lines | Whisper Chars | VTT Chars |
|---|---|---|---|---|---|---|---|
| 1 | 440 | 430 (98%) | -0.144 | 419 | 87 | 5,258 | 4,305 |
| 2 | 426 | 415 (97%) | -0.142 | 408 | 80 | 5,056 | 3,874 |
| 3 | 448 | 432 (96%) | -0.145 | 420 | 88 | 4,821 | 4,141 |
| 4 | 162 | 159 (98%) | -0.215 | 159 | 85 | 5,255 | 4,209 |
Key findings:
- Whisper captures 15-30% more text content than YouTube VTT
- Quality is excellent across all episodes (avg_logprob > -0.25, >95% reliable)
- EP04 has fewer, longer segments (dramatic pacing with longer pauses)
- All Whisper output stored in raw_data/whisper/ (gitignored)
New Scripts
process-whisper: Post-process raw Whisper JSON into subtitle.json + quality.jsonextract-dialogue-whisper: Whisper-specific Phase 1 extraction (minimal merging since Whisper segments are already sentence-level)compare-transcriptions: Quality comparison table (Whisper vs VTT)
Whisper-Specific Extraction Strategy
Standard VTT extraction uses aggressive merging (no_terminal_punctuation + small_gap → merge) because VTT auto-generated subtitles split mid-sentence. Whisper segments are already semantically complete utterances, so we use minimal merging: only merge fragments (<3 chars) with zero-gap adjacency.
Files Modified
ts/src/extract-dialogue-from-whisper.ts— NEW: Whisper-specific Phase 1 extractionts/src/process-whisper-output.ts— NEW: Post-process raw Whisper JSONts/src/compare-transcriptions.ts— NEW: Quality comparison scriptts/package.json— 3 new npm scripts
Files Created (gitignored, in raw_data/)
raw_data/audio/ep01_CQ_OkDjEwRk.wavthroughep04_1cTmWjYSlTM.wavraw_data/whisper/{videoId}_whisper.json— Raw Whisper JSON (4 files)raw_data/whisper/{videoId}_subtitle.json— RawSubtitleFile (4 files)raw_data/whisper/{videoId}_quality.json— Quality reports (4 files)raw_data/whisper/ep{01-04}_lines.json— Phase 1 extracted dialogue (4 files)
Depends on
- Task 036 (Whisper STT infrastructure) — DONE
- Task 004 (subtitle pipeline) — DONE
- Task 009 (dialogue extraction) — DONE