⚠ ネタバレ注意: 本サイトはSFアニメ「SOLAR LINE」の内容を詳細に分析しています。未視聴の方はご注意ください。
📝 AI生成コンテンツ: 本考証の大部分は AI(Claude Code 等)によって生成されています。内容の正確性については原作および引用元をご確認ください。

Task 56: Speaker Diarization Investigation

完了 ← タスク一覧

Task 056: Speaker Diarization Investigation

Status: DONE

Motivation

Human directive: 話者分離の技術を用いるとより精度が高まるかもしれない。

Scope

  1. Research speaker diarization tools (pyannote-audio, NeMo, etc.)
  2. Test on EP01 audio to evaluate VOICEROID voice separation quality
  3. If viable, integrate into dialogue pipeline to aid Phase 2 attribution
  4. Track model conditions in subtitle metadata

Findings

Tools Evaluated

Results on EP01 (19.3 min, 151 dialogue entries, 6 speakers)

Methodきりたん-ケイ AccuracyNotes
Resemblyzer nearest-centroid80.3%Centroid cosine similarity: 0.983 (near-identical)
F0 threshold56.2%F0 difference only 28.8 Hz with ~100 Hz std
Random Forest (multi-feature)67.9% ± 3.2%5-fold CV, F0+energy+voiced ratio features
Resemblyzer spectral clustering (6 speakers)65.3%Most clusters mapped to kestrel-ai

Cross-Episode Validation (EP05)

Key Insight

VOICEROID synthetic voices are too acoustically similar for general-purpose speaker diarization:

Conclusion

Speaker diarization is NOT viable as a primary tool for VOICEROID content with current general-purpose models. The 80% accuracy introduces more noise than signal compared to context-based Phase 2 attribution.

Future opportunities:

Artifacts

Notes