Spinning the Golden Thread: Benchmarking Long-Form Generation in Language Models Paper • 2409.02076 • Published 17 days ago • 9