LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset Paper • 2309.11998 • Published Sep 21, 2023 • 25
LLMs Still Can't Plan; Can LRMs? A Preliminary Evaluation of OpenAI's o1 on PlanBench Paper • 2409.13373 • Published Sep 20 • 3
A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise Paper • 2312.12436 • Published Dec 19, 2023 • 13
Llamas Know What GPTs Don't Show: Surrogate Models for Confidence Estimation Paper • 2311.08877 • Published Nov 15, 2023 • 6
Technical Report: Large Language Models can Strategically Deceive their Users when Put Under Pressure Paper • 2311.07590 • Published Nov 9, 2023 • 16