Ksenia Se

Kseniase

AI & ML interests

None yet

Recent Activity

replied to their post 4 days ago
9 Multimodal Chain-of-Thought methods How Chain-of-Thought (CoT) prompting can unlock models' full potential across images, video, audio and more? Finding special multimodal CoT techniques is the answer. Here are 9 methods of Multimodal Chain-of-Thought (MCoT). Most of them are open-source: 1. KAM-CoT -> https://huggingface.co/papers/2401.12863 This lightweight framework combines CoT prompting with knowledge graphs (KGs) and achieves 93.87% accuracy 2. Multimodal Visualization-of-Thought (MVoT) -> https://huggingface.co/papers/2501.07542 Lets models generate visual reasoning traces, using a token discrepancy loss to improve visual quality 3. Compositional CoT (CCoT) -> https://huggingface.co/papers/2311.17076 Uses scene graph (SG) representations generated by the LMM itself to improve performance on compositional and general multimodal benchmarks 4. URSA -> https://huggingface.co/papers/2501.04686 Brings System 2-style thinking to multimodal math reasoning, using a 3-module CoT data synthesis process with CoT distillation, trajectory-format rewriting and format unification 5. MM-Verify -> https://huggingface.co/papers/2502.13383 Introduces a verification mechanism with MM-Verifier and MM-Reasoner that implements synthesized high-quality CoT data for multimodal reasoning 6. Duty-Distinct CoT (DDCoT) -> https://huggingface.co/papers/2310.16436 Divides the reasoning responsibilities between LMs and visual models, integrating the visual recognition capabilities into the joint reasoning process 7. Multimodal-CoT from Amazon Web Services -> https://huggingface.co/papers/2302.00923 A two-stage framework separates rationale generation from answer prediction, allowing the model to reason more effectively using multimodal inputs 8. Graph-of-Thought (GoT) -> https://huggingface.co/papers/2305.16582 This two-stage framework models reasoning as a graph of interconnected ideas, improving performance on text-only and multimodal tasks More in the comments👇
posted an update 4 days ago
9 Multimodal Chain-of-Thought methods How Chain-of-Thought (CoT) prompting can unlock models' full potential across images, video, audio and more? Finding special multimodal CoT techniques is the answer. Here are 9 methods of Multimodal Chain-of-Thought (MCoT). Most of them are open-source: 1. KAM-CoT -> https://huggingface.co/papers/2401.12863 This lightweight framework combines CoT prompting with knowledge graphs (KGs) and achieves 93.87% accuracy 2. Multimodal Visualization-of-Thought (MVoT) -> https://huggingface.co/papers/2501.07542 Lets models generate visual reasoning traces, using a token discrepancy loss to improve visual quality 3. Compositional CoT (CCoT) -> https://huggingface.co/papers/2311.17076 Uses scene graph (SG) representations generated by the LMM itself to improve performance on compositional and general multimodal benchmarks 4. URSA -> https://huggingface.co/papers/2501.04686 Brings System 2-style thinking to multimodal math reasoning, using a 3-module CoT data synthesis process with CoT distillation, trajectory-format rewriting and format unification 5. MM-Verify -> https://huggingface.co/papers/2502.13383 Introduces a verification mechanism with MM-Verifier and MM-Reasoner that implements synthesized high-quality CoT data for multimodal reasoning 6. Duty-Distinct CoT (DDCoT) -> https://huggingface.co/papers/2310.16436 Divides the reasoning responsibilities between LMs and visual models, integrating the visual recognition capabilities into the joint reasoning process 7. Multimodal-CoT from Amazon Web Services -> https://huggingface.co/papers/2302.00923 A two-stage framework separates rationale generation from answer prediction, allowing the model to reason more effectively using multimodal inputs 8. Graph-of-Thought (GoT) -> https://huggingface.co/papers/2305.16582 This two-stage framework models reasoning as a graph of interconnected ideas, improving performance on text-only and multimodal tasks More in the comments👇
View all activity

Organizations

Turing Post's profile picture Journalists on Hugging Face's profile picture Social Post Explorers's profile picture Hugging Face Discord Community's profile picture Sandbox's profile picture

Kseniase's activity