SPUR — Spatial Audio Understanding
Upload stereo audio and ask spatial questions about direction, distance, timing, and room acoustics.
Model: Qwen2-Audio-7B + LoRA rank-32 + Dual-Stream Spatial Encoder (SPUR V3)
Training: Instruction SFT (val=0.199, 28% better than V2) + Temporal SFT | 50K synthetic clips, 686K QA pairs | LoRA active (576 tensors)
First request takes ~60s to load model onto GPU. Subsequent requests ~5s.
Trained on synthetic binaural clips via pyroomacoustics. Real recordings may vary.
0 2
0 1
1 200
32 512
1 2
Examples