SPUR — Spatial Audio Understanding

Upload stereo audio and ask spatial questions about direction, distance, timing, and room acoustics.

Model: Qwen2-Audio-7B + LoRA rank-32 + Dual-Stream Spatial Encoder (SPUR V3)

Training: Instruction SFT (val=0.199) -> Temporal SFT step 600 (loss=0.178, val=0.182) | 50K synthetic clips, 686K QA pairs | LoRA active (576 tensors)

First request takes ~60s to load model onto GPU. Subsequent requests ~5s.

Trained on synthetic binaural clips via pyroomacoustics. Real recordings may vary.