Sudoku
Well done for the impressive work!
I was very impressed with Dream7B's incredible performance on Sudoku. In the blog, you mention, "The intuition behind is that diffusion language models are more effective for solving problems with multiple constraints or for achieving specific objectives." -- I realise authors don't like to speculate too much in their papers. Still, I'm curious about your best guess for the reason for the disparity in Sudoku performance between diffusion/auto-regressive models. In addition, have you identified a type of task where diffusion dominates as much as in Sudoku?
Thanks
Hi, we elaborate more on this intuition in our previous paper Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning. Most of the planning tasks discussed can be abstracted into a simple path-finding task introduced in the paper.