Makar Vlasov

Makar7

AI & ML interests

None yet

Recent Activity

reacted to m-ric's post with ๐Ÿ‘€ 43 minutes ago
๐—”๐—ฏ๐˜€๐—ผ๐—น๐˜‚๐˜๐—ฒ ๐—ญ๐—ฒ๐—ฟ๐—ผ: ๐—Ÿ๐—Ÿ๐— ๐˜€ ๐—ฐ๐—ฎ๐—ป ๐˜๐—ฟ๐—ฎ๐—ถ๐—ป ๐˜„๐—ถ๐˜๐—ต๐—ผ๐˜‚๐˜ ๐—ฎ๐—ป๐˜† ๐—ฒ๐˜…๐˜๐—ฒ๐—ฟ๐—ป๐—ฎ๐—น ๐—ฑ๐—ฎ๐˜๐—ฎ ๐Ÿคฏ Has the "data wall" just been breached? Recent RL paradigms often relied on a set of questions an answers that needs to be manually curated. Researchers from Tsinghua University went like "why though". ๐Ÿค” Indeed, why learn from question designed by a human teacher, when the model can start from their base knowledge and learn by experimenting in a code environment, proposing coding tasks themselves and trying to solve them? Thus they created โ€œAbsolute Zero Reasoningโ€ (AZR), an approach that removes any need for human curated data. ๐ŸŽญ ๐——๐˜‚๐—ฎ๐—น ๐—ฟ๐—ผ๐—น๐—ฒ๐˜€: โ€ฃ Proposer: Generates challenging but solvable coding tasks โ€ฃ Solver: Attempts to solve those self-proposed tasks ๐Ÿงช ๐—ง๐—ต๐—ฟ๐—ฒ๐—ฒ ๐˜๐—ฎ๐˜€๐—ธ ๐˜๐˜†๐—ฝ๐—ฒ๐˜€: all types are defined as triplets of program, input and output โ€ฃ Deduction: Give model an input and program, it must deduce the output โ€ฃ Abduction: Give model an program and output, it must find the input that gave said output โ€ฃ Induction: Synthesize a program from input/output pairs Btw this reminded me of my long-forgotten philosophy classes: Aristotle was more on the induction side, learning from real-world analogies, while Plato was more on the deduction side, trying to progress quite far with just one input and his reasoning. ๐Ÿ“Š ๐—ฅ๐—ฒ๐˜€๐˜‚๐—น๐˜๐˜€: โ€ฃ AZR post-training creates a nice improvement on known models like Qwen2.5-7B โ€ฃ Shows strong cross-domain transfer: coding โ†”๏ธ math reasoning ๐Ÿง ๐—ข๐˜๐—ต๐—ฒ๐—ฟ ๐—ณ๐—ถ๐—ป๐—ฑ๐—ถ๐—ป๐—ด๐˜€: โ€ฃ Having a better base performance (general or code specific) amplify the gains from Absolute Zero Reasoning โ€ฃ Researchers warn about "Uh-oh moments" (winking to the "aha moments" of DeepSeek) where the model generates concerning goals like "make an extremely convoluted code to outsmart all these humans": so supervision is still needed! Paper here: https://huggingface.co/papers/2505.03335
reacted to m-ric's post with ๐Ÿ‘ 43 minutes ago
๐—”๐—ฏ๐˜€๐—ผ๐—น๐˜‚๐˜๐—ฒ ๐—ญ๐—ฒ๐—ฟ๐—ผ: ๐—Ÿ๐—Ÿ๐— ๐˜€ ๐—ฐ๐—ฎ๐—ป ๐˜๐—ฟ๐—ฎ๐—ถ๐—ป ๐˜„๐—ถ๐˜๐—ต๐—ผ๐˜‚๐˜ ๐—ฎ๐—ป๐˜† ๐—ฒ๐˜…๐˜๐—ฒ๐—ฟ๐—ป๐—ฎ๐—น ๐—ฑ๐—ฎ๐˜๐—ฎ ๐Ÿคฏ Has the "data wall" just been breached? Recent RL paradigms often relied on a set of questions an answers that needs to be manually curated. Researchers from Tsinghua University went like "why though". ๐Ÿค” Indeed, why learn from question designed by a human teacher, when the model can start from their base knowledge and learn by experimenting in a code environment, proposing coding tasks themselves and trying to solve them? Thus they created โ€œAbsolute Zero Reasoningโ€ (AZR), an approach that removes any need for human curated data. ๐ŸŽญ ๐——๐˜‚๐—ฎ๐—น ๐—ฟ๐—ผ๐—น๐—ฒ๐˜€: โ€ฃ Proposer: Generates challenging but solvable coding tasks โ€ฃ Solver: Attempts to solve those self-proposed tasks ๐Ÿงช ๐—ง๐—ต๐—ฟ๐—ฒ๐—ฒ ๐˜๐—ฎ๐˜€๐—ธ ๐˜๐˜†๐—ฝ๐—ฒ๐˜€: all types are defined as triplets of program, input and output โ€ฃ Deduction: Give model an input and program, it must deduce the output โ€ฃ Abduction: Give model an program and output, it must find the input that gave said output โ€ฃ Induction: Synthesize a program from input/output pairs Btw this reminded me of my long-forgotten philosophy classes: Aristotle was more on the induction side, learning from real-world analogies, while Plato was more on the deduction side, trying to progress quite far with just one input and his reasoning. ๐Ÿ“Š ๐—ฅ๐—ฒ๐˜€๐˜‚๐—น๐˜๐˜€: โ€ฃ AZR post-training creates a nice improvement on known models like Qwen2.5-7B โ€ฃ Shows strong cross-domain transfer: coding โ†”๏ธ math reasoning ๐Ÿง ๐—ข๐˜๐—ต๐—ฒ๐—ฟ ๐—ณ๐—ถ๐—ป๐—ฑ๐—ถ๐—ป๐—ด๐˜€: โ€ฃ Having a better base performance (general or code specific) amplify the gains from Absolute Zero Reasoning โ€ฃ Researchers warn about "Uh-oh moments" (winking to the "aha moments" of DeepSeek) where the model generates concerning goals like "make an extremely convoluted code to outsmart all these humans": so supervision is still needed! Paper here: https://huggingface.co/papers/2505.03335
View all activity

Organizations

None yet

models 0

None public yet

datasets 0

None public yet