AI & ML interests

None defined yet.

Recent Activity

blog-explorers's activity

AdinaY 
posted an update 1 day ago
view post
Post
1337
Wan2.1 🔥📹 new OPEN video model by Alibaba Wan team!

Model: Wan-AI/Wan2.1-T2V-14B
Demo: Wan-AI/Wan2.1

✨Apache 2.0
✨8.19GB VRAM, runs on most GPUs
✨Multi-Tasking: T2V, I2V, Video Editing, T2I, V2A
✨Text Generation: Supports Chinese & English
✨Powerful Video VAE: Encode/decode 1080P w/ temporal precision
AdinaY 
posted an update 2 days ago
view post
Post
2456
Try QwQ-Max-Preview, Qwen's reasoning model here👉 https://chat.qwen.ai
Can't wait for the model weights to drop on the Hugging Face Hub 🔥
  • 2 replies
·
AdinaY 
posted an update 2 days ago
view post
Post
2321
Two AI startups, DeepSeek & Moonshot AI , keep moving in perfect sync 👇

✨ Last December: DeepSeek & Moonshot AI released their reasoning models on the SAME DAY.
DeepSeek: deepseek-ai/DeepSeek-R1
MoonShot: https://github.com/MoonshotAI/Kimi-k1.5

✨ Last week: Both teams published papers on modifying attention mechanisms on the SAME DAY AGAIN.
DeepSeek: Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention (2502.11089)
Moonshot: MoBA: Mixture of Block Attention for Long-Context LLMs (2502.13189)

✨ TODAY:
DeepSeek unveiled Flash MLA: a efficient MLA decoding kernel for NVIDIA Hopper GPUs, optimized for variable-length sequences.
https://github.com/deepseek-ai/FlashMLA

Moonshot AI introduces Moonlight: a 3B/16B MoE trained on 5.7T tokens using Muon, pushing the Pareto frontier with fewer FLOPs.
moonshotai/Moonlight-16B-A3B

What's next? 👀
KnutJaegersberg 
posted an update 4 days ago
AdinaY 
posted an update 6 days ago
KnutJaegersberg 
posted an update 6 days ago
view post
Post
660
Mimicking Consciousness in LLMs: Ascending the Dimensions of Thought with Recurrent Processing

This blog post explores how **recurrent processing** can transform Large Language Models (LLMs) to mimic aspects of human thought by engaging in iterative feedback loops. Inspired by string theory, the post describes how LLMs can "ascend dimensions" of cognition, progressing through foundational cognitive loops—such as basic cognition, executive functions, and meta-cognition—before advancing into **world simulation**. In this stage, LLMs explore higher dimensions, perceiving non-linear time, simulating branching possibilities, and integrating multiple realities. The interaction between the **Generator** and **Reflective Compass** allows AI systems to refine their outputs iteratively, moving toward a **point attractor** where ideas become coherent and polished. While this process doesn't bestow true consciousness, it offers a compelling imitation of reflective and adaptive thinking, leading to smarter dialogue, enhanced creativity, and more robust problem-solving.

https://huggingface.co/blog/KnutJaegersberg/oscillatory-recurrence-for-llms
AdinaY 
posted an update 8 days ago
view post
Post
4168
🚀 StepFun阶跃星辰 is making BIG open moves!

Last year, their GOT-OCR 2.0 took the community by storm 🔥but many didn’t know they were also building some amazing models. Now, they’ve just dropped something huge on the hub!

📺 Step-Video-T2V: a 30B bilingual open video model that generates 204 frames (8-10s) at 540P resolution with high information density & consistency.
stepfun-ai/stepvideo-t2v

🔊 Step-Audio-TTS-3B : a TTS trained with the LLM-Chat paradigm on a large synthetic dataset, capable of generating RAP & Humming
stepfun-ai/step-audio-67b33accf45735bb21131b0b
·
AdinaY 
posted an update 8 days ago
ZennyKenny 
posted an update 8 days ago
view post
Post
2138
Really excited to start contributing to the SWE Arena project: https://swe-arena.com/

Led by IBM PhD fellow @terryyz , our goal is to advance research in code generation and app development by frontier LLMs.

ZennyKenny 
posted an update 10 days ago
view post
Post
1965
Okay this is pretty crazy. Snowflake has CortexAI and Uber is already teasing QueryGPT, both of which prominently feature plain text to SQL features to query your database.

I decided to see how hard it would be to put together something similar using 🤗 smolagents. Turns out, it was pretty straightforward. I managed to get it done in London Luton airport this afternoon.

ZennyKenny/sqlAgent
  • 2 replies
·
nroggendorff 
posted an update 11 days ago
view post
Post
2788
hello, dev mode explorers!
  • 2 replies
·
mrzjy 
posted an update 12 days ago
view post
Post
1278
A very small project:

Introducing CreativeTinyZero:
mrzjy/Qwen2.5-1.5B-GRPO-Creative-Ad-Generation

Unlike the impressive DeepSeek-R1(-Zero), this project focuses on a pure reinforcement learning (RL) experiment applied to an open-domain task: creative advertisement generation.

Objective:

- To investigate the feasibility of applying R1-like methods to an open-domain task without a verifiable ground-truth reward, while at least demonstrating its potential.
- To explore whether <think> and <answer> rewards can be explicitly designed to provide strong guidance through RL based on human prior knowledge.

Note:
- Our goal is not to induce self-reflective thinking, but to align with human thought processes purely through RL, without any supervised fine-tuning (SFT) on any constructed dataset.

Despite its small size, the resulting 1.5B-GRPO model demonstrates intriguing generative capabilities—though it's still far from perfect.
  • 1 reply
·
AdinaY 
posted an update 13 days ago
view post
Post
2545
Ovis2 🔥 a multimodal LLM released by Alibaba AIDC team.
AIDC-AI/ovis2-67ab36c7e497429034874464
✨1B/2B/4B/8B/16B/34B
✨Strong CoT for deeper problem solving
✨Multilingual OCR – Expanded beyond English & Chinese, with better data extraction
AdinaY 
posted an update 13 days ago
view post
Post
3541
InspireMusic 🎵🔥 an open music generation framework by Alibaba FunAudio Lab
Model: FunAudioLLM/InspireMusic-1.5B-Long
Demo: FunAudioLLM/InspireMusic
✨ Music, songs, audio - ALL IN ONE
✨ High quality audio: 24kHz & 48kHz sampling rates
✨ Long-Form Generation: enables extended audio creation
✨ Efficient Fine-Tuning: precision (BF16, FP16, FP32) with user-friendly scripts
  • 1 reply
·
ZennyKenny 
posted an update 15 days ago
view post
Post
3422
I've completed the first unit of the just-launched Hugging Face Agents Course. I would highly recommend it, even for experienced builders, because it is a great walkthrough of the smolagents library and toolkit.
nroggendorff 
posted an update 16 days ago
view post
Post
2639
Dearest None-yet Team,

I couldn't help but notice that our productivity has room for improvement. To address this, we will be engaging in a company-wide morale-building activity designed to boost teamwork, enthusiasm, and *most importantly* results.

I know you're all as excited as I am for this fun and absolutely required initiative. Participation is not just encouraged, it's mandatory. Think of it as a team-bonding experience you never signed up for but will absolutely tolerate.

More details to follow, but for now, mark your calendars and prepare for an engaging experience that will definitely make us all better, stronger, and more synchronized, or at least give us something to talk about later.

Looking forward to seeing you all there!

Best,
Me
·
KnutJaegersberg 
posted an update 18 days ago
view post
Post
2712
A Brief Survey of Associations Between Meta-Learning and General AI

The paper titled "A Brief Survey of Associations Between Meta-Learning and General AI" explores how meta-learning techniques can contribute to the development of Artificial General Intelligence (AGI). Here are the key points summarized:

1. General AI (AGI) and Meta-Learning:
- AGI aims to develop algorithms that can handle a wide variety of tasks, similar to human intelligence. Current AI systems excel at specific tasks but struggle with generalization to unseen tasks.
- Meta-learning or "learning to learn" improves model adaptation and generalization, allowing AI systems to tackle new tasks efficiently using prior experiences.

2. Neural Network Design in Meta-Learning:
- Techniques like Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks enable self-improvement and adaptability for deep models, supporting generalization across tasks.
- Highway networks and ResNet-style models use shortcuts for efficient backpropagation, allowing deeper models that can be used in meta-learning frameworks.

3. Coevolution:
- Coevolution involves the mutual evolution of multiple components, such as learners or task-solvers, to improve overall performance.
- Coevolution between learners enhances collaboration and competition within AI systems, while coevolution between tasks and solvers (e.g., POWERPLAY and AI-GA frameworks) pushes solvers to adapt to increasingly complex tasks.

4. Curiosity in Meta-Learning:
- Curiosity-based exploration encourages AI systems to discover new, diverse features of the environment, avoiding local optima.
- Curiosity-based objectives can be combined with performance-based objectives to ensure efficient exploration and adaptation in complex tasks.

5. Forgetting Mechanisms:
- Forgetting is crucial to avoid memory overload in AI systems

https://arxiv.org/abs/2101.04283
eienmojiki 
posted an update 19 days ago
KnutJaegersberg 
posted an update 19 days ago
view post
Post
1650
Artificial general intelligence through recursive data compression and grounded reasoning: a position paper


This paper proposes a system to achieve AGI through general data compression and grounded reasoning.

General Data Compression involves creating a flexible algorithm that adapts to input data to simplify and compress it recursively, identifying simple, orthogonal features to avoid redundancy. The algorithm measures AGI progress by solving problems based on increasing complexity, and it expands its search space according to the data itself. Compression is applied not only to data but also to model parameters, and sequences are segmented based on compressibility.

Grounded Reasoning refers to forming representations with various granularities, crucial for commonsense reasoning and AGI. The system simulates the real world as its model, switching between representations and maximizing resourcefulness. Key ideas include the world as its own model for reasoning and actions aimed at maximizing entropy to test hypotheses.

The paper emphasizes simplicity, data-dependent bias, recursion, orthogonality, resourcefulness, and grounding in real-world contexts as fundamental principles in building an AGI system.

https://arxiv.org/abs/1506.04366
  • 1 reply
·