SPUR — Spatial Audio Understanding

Upload stereo audio and ask spatial questions about direction, distance, timing, and room acoustics.

Model: Qwen2-Audio-7B + LoRA rank-32 + Dual-Stream Spatial Encoder (SPUR V3)

Training: Instruction SFT (val=0.199, 28% better than V2) + Temporal SFT | 50K synthetic clips, 686K QA pairs | LoRA active (576 tensors)

First request takes ~60s to load model onto GPU. Subsequent requests ~5s.

Trained on synthetic binaural clips via pyroomacoustics. Real recordings may vary.

0 2
0 1
1 200
32 512
1 2
Examples