Ksenia Se
Kseniase
198
followers
·
50 following
AI & ML interests
None yet
Recent Activity
reacted
to
their
post
with 🔥
about 18 hours ago
7 Open-source Methods to Improve Video Generation and Understanding
AI community is making great strides toward achieving the full potential of multimodality in video generation and understanding. Last week studies showed that working with videos is now one of the main focuses for improving AI models. Another highlight of the week is that open source, once again, proves its value. For those who were impressed by DeepSeek-R1, we’re with you!
Today, we’re combining these two key focuses and bringing you a list of open-source methods for better video generation and understanding:
1. VideoLLaMA 3 model: Excels in various video and image tasks thanks to vision-centric training approach. https://huggingface.co/papers/2501.13106
2. FILMAGENT framework assigns roles to multiple AI agents, like a director, screenwriter, actor, and cinematographer, to automate the filmmaking process in 3D virtual environments. https://huggingface.co/papers/2501.12909
3. https://huggingface.co/papers/2501.13918 proposes a new VideoReward Model and approach that uses human feedback to refine video generation models.
4. DiffuEraser video inpainting model, based on stable diffusion, is designed to fill in missing areas with detailed, realistic content and to ensure consistent structures across frames. https://huggingface.co/papers/2501.10018
5. MAGI is a hybrid video gen model that combines masked and casual modeling. Its key innovation, Complete Teacher Forcing (CTF), conditions masked frames on fully visible frames. https://huggingface.co/papers/2501.12389
6. https://huggingface.co/papers/2501.08331 proposes motion control, allowing users to guide how objects or the camera move in generated videos. Its noise warping algorithm replaces random noise in videos with structured noise based on motion info.
7. Video Depth Anything model estimates depth consistently in super-long videos (several minutes or more) without sacrificing quality or speed. https://huggingface.co/papers/2501.12375
View all activity
Organizations
view post
7 Open-source Methods to Improve Video Generation and Understanding AI community is making great strides toward achieving the full potential of multimodality in video generation and understanding. Last week studies showed that working with videos is now one of the main focuses for improving AI models. Another highlight of the week is that open source, once again, proves its value. For those who were impressed by DeepSeek-R1, we’re with you! Today, we’re combining these two key focuses and bringing you a list of open-source methods for better video generation and understanding: 1. VideoLLaMA 3 model: Excels in various video and image tasks thanks to vision-centric training approach.
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video
Understanding (2501.13106) 2. FILMAGENT framework assigns roles to multiple AI agents, like a director, screenwriter, actor, and cinematographer, to automate the filmmaking process in 3D virtual environments.
FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in
Virtual 3D Spaces (2501.12909) 3.
Improving Video Generation with Human Feedback (2501.13918) proposes a new VideoReward Model and approach that uses human feedback to refine video generation models. 4. DiffuEraser video inpainting model, based on stable diffusion, is designed to fill in missing areas with detailed, realistic content and to ensure consistent structures across frames.
DiffuEraser: A Diffusion Model for Video Inpainting (2501.10018) 5. MAGI is a hybrid video gen model that combines masked and casual modeling. Its key innovation, Complete Teacher Forcing (CTF), conditions masked frames on fully visible frames.
Taming Teacher Forcing for Masked Autoregressive Video Generation (2501.12389) 6.
Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using
Real-Time Warped Noise (2501.08331) proposes motion control, allowing users to guide how objects or the camera move in generated videos. Its noise warping algorithm replaces random noise in videos with structured noise based on motion info. 7. Video Depth Anything model estimates depth consistently in super-long videos (several minutes or more) without sacrificing quality or speed.
Video Depth Anything: Consistent Depth Estimation for Super-Long Videos (2501.12375)
See translation
view post
10 Recent Advancements in Math Reasoning Over the last few weeks, we have witnessed a surge in AI models' math reasoning capabilities. Top companies like Microsoft, NVIDIA, and Alibaba Qwen have already joined this race to make models "smarter" in mathematics. But why is this shift happening now? Complex math calculations require advanced multi-step reasoning, making mathematics an ideal domain for demonstrating a model's strong "thinking" capabilities. Additionally, as AI continues to evolve and is applied in math-intensive fields such as machine learning and quantum computing (which is predicted to see significant growth in 2025), it must meet the demands of complex reasoning. Moreover, AI models can be integrated with external tools like symbolic solvers or computational engines to tackle large-scale math problems, which also needs high-quality math reasoning. So here’s a list of 10 recent advancements in math reasoning of AI models: 1. NVIDIA:
AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward
Modeling (2412.15084) 2. Qwen, Alibaba: Qwen2.5-Math-PRM
The Lessons of Developing Process Reward Models in Mathematical
Reasoning (2501.07301) and PROCESSBENCH evaluation
ProcessBench: Identifying Process Errors in Mathematical Reasoning (2412.06559) 3. Microsoft Research:
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep
Thinking (2501.04519) 4.
BoostStep: Boosting mathematical capability of Large Language Models via
improved single-step reasoning (2501.03226) 5.
URSA: Understanding and Verifying Chain-of-thought Reasoning in
Multimodal Mathematics (2501.04686) 6.
U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills
in LLMs (2412.03205) 7.
Open Eyes, Then Reason: Fine-grained Visual Mathematical Understanding
in MLLMs (2501.06430) 8.
End-to-End Bangla AI for Solving Math Olympiad Problem Benchmark:
Leveraging Large Language Model Using Integrated Approach (2501.04425) 9.
Quantization Meets Reasoning: Exploring LLM Low-Bit Quantization
Degradation for Mathematical Reasoning (2501.03035) 10.
System-2 Mathematical Reasoning via Enriched Instruction Tuning (2412.16964)
See translation