Ksenia Se

Kseniase

AI & ML interests

None yet

Recent Activity

replied to their post 4 days ago
11 new types of RAG RAG is evolving fast, keeping pace with cutting-edge AI trends. Today it becomes more agentic and smarter at navigating complex structures like hypergraphs. Here are 11 latest RAG types: 1. InstructRAG -> https://huggingface.co/papers/2504.13032 Combines RAG with a multi-agent framework, using a graph-based structure, an RL agent to expand task coverage, and a meta-learning agent for better generalization 2. CoRAG (Collaborative RAG) -> https://huggingface.co/papers/2504.01883 A collaborative framework that extends RAG to settings where clients train a shared model using a joint passage store 3. ReaRAG -> https://huggingface.co/papers/2503.21729 It uses a Thought-Action-Observation loop to decide at each step whether to retrieve information or finalize an answer, reducing unnecessary reasoning and errors 4. MCTS-RAG -> https://huggingface.co/papers/2503.20757 Combines RAG with Monte Carlo Tree Search (MCTS) to help small LMs handle complex, knowledge-heavy tasks 5. Typed-RAG - > https://huggingface.co/papers/2503.15879 Improves answers on open-ended questions by identifying question types (a debate, personal experience, or comparison) and breaking it down into simpler parts 6. MADAM-RAG -> https://huggingface.co/papers/2504.13079 A multi-agent system where models debate answers over multiple rounds and an aggregator filters noise and misinformation 7. HM-RAG -> https://huggingface.co/papers/2504.12330 A hierarchical multi-agent RAG framework that uses 3 agents: one to split queries, one to retrieve across multiple data types (text, graphs and web), and one to merge and refine answers 8. CDF-RAG -> https://huggingface.co/papers/2504.12560 Works with causal graphs and enables multi-hop causal reasoning, refining queries. It validates responses against causal pathways To explore what is Causal AI, read our article: https://www.turingpost.com/p/causalai Subscribe to the Turing Post: https://www.turingpost.com/subscribe Read further πŸ‘‡
posted an update 4 days ago
11 new types of RAG RAG is evolving fast, keeping pace with cutting-edge AI trends. Today it becomes more agentic and smarter at navigating complex structures like hypergraphs. Here are 11 latest RAG types: 1. InstructRAG -> https://huggingface.co/papers/2504.13032 Combines RAG with a multi-agent framework, using a graph-based structure, an RL agent to expand task coverage, and a meta-learning agent for better generalization 2. CoRAG (Collaborative RAG) -> https://huggingface.co/papers/2504.01883 A collaborative framework that extends RAG to settings where clients train a shared model using a joint passage store 3. ReaRAG -> https://huggingface.co/papers/2503.21729 It uses a Thought-Action-Observation loop to decide at each step whether to retrieve information or finalize an answer, reducing unnecessary reasoning and errors 4. MCTS-RAG -> https://huggingface.co/papers/2503.20757 Combines RAG with Monte Carlo Tree Search (MCTS) to help small LMs handle complex, knowledge-heavy tasks 5. Typed-RAG - > https://huggingface.co/papers/2503.15879 Improves answers on open-ended questions by identifying question types (a debate, personal experience, or comparison) and breaking it down into simpler parts 6. MADAM-RAG -> https://huggingface.co/papers/2504.13079 A multi-agent system where models debate answers over multiple rounds and an aggregator filters noise and misinformation 7. HM-RAG -> https://huggingface.co/papers/2504.12330 A hierarchical multi-agent RAG framework that uses 3 agents: one to split queries, one to retrieve across multiple data types (text, graphs and web), and one to merge and refine answers 8. CDF-RAG -> https://huggingface.co/papers/2504.12560 Works with causal graphs and enables multi-hop causal reasoning, refining queries. It validates responses against causal pathways To explore what is Causal AI, read our article: https://www.turingpost.com/p/causalai Subscribe to the Turing Post: https://www.turingpost.com/subscribe Read further πŸ‘‡
View all activity

Organizations

Turing Post's profile picture Journalists on Hugging Face's profile picture Social Post Explorers's profile picture Hugging Face Discord Community's profile picture Sandbox's profile picture

Kseniase's activity

replied to their post 4 days ago
view reply

These are graph-centric types of RAG:

  1. NodeRAG -> https://huggingface.co/papers/2504.11544
    Uses well-designed heterogeneous graph structures and focuses on graph design to ensure smooth integration of graph algorithms. It outperforms GraphRAG and LightRAG on multi-hop and open-ended QA benchmarks

  2. HeteRAG -> https://huggingface.co/papers/2504.10529
    This heterogeneous RAG framework decouples knowledge chunk representations. It uses multi-granular views for retrieval and concise chunks for generation, along with adaptive prompt tuning

  3. Hyper-RAG -> https://huggingface.co/papers/2504.08758
    A hypergraph-based RAG method. By capturing both pairwise and complex relationships in domain-specific knowledge, it improves factual accuracy and reduces hallucinations, especially in high-stakes fields like medicine, surpassing Graph RAG and Light RAG. Its lightweight version also doubles retrieval speed

posted an update 4 days ago
view post
Post
6463
11 new types of RAG

RAG is evolving fast, keeping pace with cutting-edge AI trends. Today it becomes more agentic and smarter at navigating complex structures like hypergraphs.

Here are 11 latest RAG types:

1. InstructRAG -> InstructRAG: Leveraging Retrieval-Augmented Generation on Instruction Graphs for LLM-Based Task Planning (2504.13032)
Combines RAG with a multi-agent framework, using a graph-based structure, an RL agent to expand task coverage, and a meta-learning agent for better generalization

2. CoRAG (Collaborative RAG) -> CoRAG: Collaborative Retrieval-Augmented Generation (2504.01883)
A collaborative framework that extends RAG to settings where clients train a shared model using a joint passage store

3. ReaRAG -> ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation (2503.21729)
It uses a Thought-Action-Observation loop to decide at each step whether to retrieve information or finalize an answer, reducing unnecessary reasoning and errors

4. MCTS-RAG -> MCTS-RAG: Enhancing Retrieval-Augmented Generation with Monte Carlo Tree Search (2503.20757)
Combines RAG with Monte Carlo Tree Search (MCTS) to help small LMs handle complex, knowledge-heavy tasks

5. Typed-RAG - > Typed-RAG: Type-aware Multi-Aspect Decomposition for Non-Factoid Question Answering (2503.15879)
Improves answers on open-ended questions by identifying question types (a debate, personal experience, or comparison) and breaking it down into simpler parts

6. MADAM-RAG -> Retrieval-Augmented Generation with Conflicting Evidence (2504.13079)
A multi-agent system where models debate answers over multiple rounds and an aggregator filters noise and misinformation

7. HM-RAG -> HM-RAG: Hierarchical Multi-Agent Multimodal Retrieval Augmented Generation (2504.12330)
A hierarchical multi-agent RAG framework that uses 3 agents: one to split queries, one to retrieve across multiple data types (text, graphs and web), and one to merge and refine answers

8. CDF-RAG -> CDF-RAG: Causal Dynamic Feedback for Adaptive Retrieval-Augmented Generation (2504.12560)
Works with causal graphs and enables multi-hop causal reasoning, refining queries. It validates responses against causal pathways

To explore what is Causal AI, read our article: https://www.turingpost.com/p/causalai

Subscribe to the Turing Post: https://www.turingpost.com/subscribe

Read further πŸ‘‡
  • 1 reply
Β·
reacted to fdaudens's post with πŸ”₯ 8 days ago
view post
Post
1470
Just tested something this morning that feels kind of game-changing for how we publish, discover, and consume news with AI: connecting Claude directly to the New York Times through MCP.

Picture this: You ask Claude about a topic, and it instantly pulls verified and trusted NYT content β€” no more guessing if the info is accurate.

The cool part? Publishers stay in control of what they share via API, and users get fast, reliable access through the AI tools they already use. Instead of scraping random stuff off the web, we get a future where publishers actively shape how their journalism shows up in AI.

It’s still a bit technical to set up right now, but this could get super simple soon β€” like installing apps on your phone, but for your chatbot. And you keep the brand connection, too.

Not saying it solves everything, but it’s definitely a new way to distribute content β€” and maybe even find some fresh value in the middle of this whole news + AI shakeup. Early movers will have a head start.

Curious what folks think β€” could MCPs be a real opportunity for journalism?
  • 1 reply
Β·
reacted to their post with πŸ‘ 10 days ago
view post
Post
5477
16 new research on inference-time scaling:

For the last couple of weeks a large amount of studies on inference-time scaling has emerged. And it's so cool, because each new paper adds a trick to the toolbox, making LLMs more capable without needing to scale parameter count of the models.

So here are 13 new methods + 3 comprehensive studies on test-time scaling:

1. Inference-Time Scaling for Generalist Reward Modeling (2504.02495)
Probably, the most popular study. It proposes to boost inference-time scalability by improving reward modeling. To enhance performance, DeepSeek-GRM uses adaptive critiques, parallel sampling, pointwise generative RM, and Self-Principled Critique Tuning (SPCT)

2. T1: Tool-integrated Self-verification for Test-time Compute Scaling in Small Language Models (2504.04718)
Allows small models to use external tools, like code interpreters and calculator, to enhance self-verification

3. Z1: Efficient Test-time Scaling with Code (2504.00810)
Proposes to train LLMs on code-based reasoning paths to make test-time scaling more efficient, limiting unnecessary tokens with a special dataset and a Shifted Thinking Window

4. GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning (2504.00891)
Introduces GenPRM, a generative PRM, that uses CoT reasoning and code verification for step-by-step judgment. With only 23K training examples, GenPRM outperforms prior PRMs and larger models

5. Can Test-Time Scaling Improve World Foundation Model? (2503.24320)
SWIFT test-time scaling framework improves World Models' performance without retraining, using strategies like fast tokenization, Top-K pruning, and efficient beam search

6. Relevance Isn't All You Need: Scaling RAG Systems With Inference-Time Compute Via Multi-Criteria Reranking (2504.07104)
Proposes REBEL for RAG systems scaling, which uses multi-criteria optimization with CoT prompting for better performance-speed tradeoffs as inference compute increases

7. $Ο†$-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation (2503.13288)
Proposes a Ο†-Decoding strategy that uses foresight sampling, clustering and adaptive pruning to estimate and select optimal reasoning steps

Read further below πŸ‘‡

Also, subscribe to the Turing Post https://www.turingpost.com/subscribe
  • 2 replies
Β·
replied to their post 11 days ago
view reply
  1. Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing -> https://huggingface.co/papers/2503.19385
    An effective test-time scaling method for flow models with SDE-based generation for particle sampling, interpolant conversion to enhance diversity, and Rollover Budget Forcing (RBF) for adaptive compute allocation

  2. Dedicated Feedback and Edit Models Empower Inference-Time Scaling for Open-Ended General-Domain Tasks -> https://huggingface.co/papers/2503.04378
    Introduces a Feedback-Edit model setup that improves inference-time scaling, particularly for open-ended tasks, by using 3 different model for drafting, feedback and editing

  3. m1: Unleash the Potential of Test-Time Scaling for Medical Reasoning with Large Language Models -> https://huggingface.co/papers/2504.00869
    A simple m1 method improves medical performance at inference, with models under 10B outperforming previous benchmarks and a 32B model matching 70B models

  4. ToolACE-R: Tool Learning with Adaptive Self-Refinement -> https://huggingface.co/papers/2504.01400
    ToolACE-R enables adaptive self-refinement of tool use through model-aware iterative training. It refines tool calls without external feedback and scales inference compute efficiently

  5. Scaling Test-Time Inference with Policy-Optimized, Dynamic Retrieval-Augmented Generation via KV Caching and Decoding -> https://huggingface.co/papers/2504.01281
    Introduces a lightweight RAG framework that uses PORAG for better content use, ATLAS for adaptive retrieval timing, and CRITIC for efficient memory use. Together with optimized decoding strategies and adaptive reasoning depth, it allows the model to scale its inference steps effectively.

  6. Do We Truly Need So Many Samples? Multi-LLM Repeated Sampling Efficiently Scales Test-Time Compute -> https://huggingface.co/papers/2504.00762
    ModelSwitch is a sampling-then-voting strategy that uses multiple models (including weaker ones) to leverage diverse strengths, where a consistency signal guides dynamic model switching. It highlights the potential of multi-model generation-verification.

3 comprehensive surveys on inference time-scaling:

  1. Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead -> https://huggingface.co/papers/2504.00294

  2. What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models -> https://huggingface.co/papers/2503.24235

  3. Efficient Inference for Large Reasoning Models: A Survey -> https://huggingface.co/papers/2503.23077

posted an update 11 days ago
view post
Post
5477
16 new research on inference-time scaling:

For the last couple of weeks a large amount of studies on inference-time scaling has emerged. And it's so cool, because each new paper adds a trick to the toolbox, making LLMs more capable without needing to scale parameter count of the models.

So here are 13 new methods + 3 comprehensive studies on test-time scaling:

1. Inference-Time Scaling for Generalist Reward Modeling (2504.02495)
Probably, the most popular study. It proposes to boost inference-time scalability by improving reward modeling. To enhance performance, DeepSeek-GRM uses adaptive critiques, parallel sampling, pointwise generative RM, and Self-Principled Critique Tuning (SPCT)

2. T1: Tool-integrated Self-verification for Test-time Compute Scaling in Small Language Models (2504.04718)
Allows small models to use external tools, like code interpreters and calculator, to enhance self-verification

3. Z1: Efficient Test-time Scaling with Code (2504.00810)
Proposes to train LLMs on code-based reasoning paths to make test-time scaling more efficient, limiting unnecessary tokens with a special dataset and a Shifted Thinking Window

4. GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning (2504.00891)
Introduces GenPRM, a generative PRM, that uses CoT reasoning and code verification for step-by-step judgment. With only 23K training examples, GenPRM outperforms prior PRMs and larger models

5. Can Test-Time Scaling Improve World Foundation Model? (2503.24320)
SWIFT test-time scaling framework improves World Models' performance without retraining, using strategies like fast tokenization, Top-K pruning, and efficient beam search

6. Relevance Isn't All You Need: Scaling RAG Systems With Inference-Time Compute Via Multi-Criteria Reranking (2504.07104)
Proposes REBEL for RAG systems scaling, which uses multi-criteria optimization with CoT prompting for better performance-speed tradeoffs as inference compute increases

7. $Ο†$-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation (2503.13288)
Proposes a Ο†-Decoding strategy that uses foresight sampling, clustering and adaptive pruning to estimate and select optimal reasoning steps

Read further below πŸ‘‡

Also, subscribe to the Turing Post https://www.turingpost.com/subscribe
  • 2 replies
Β·
replied to their post 18 days ago
view reply
  1. Edge inference -> https://arxiv.org/pdf/2112.00616
    Refers to running AI models locally on edge devices (mobile phones, IoT devices, embedded hardware) or on servers at the network edge.

  2. Cloud inference -> https://huggingface.co/papers/2210.05889
    Input data is sent from users/devices to the cloud, where large-scale compute (CPUs, GPUs, TPUs) runs the AI model and returns the results.

Explore the other important aspects about AI inference, including how it works and what are the current trends, in our article: https://www.turingpost.com/p/inference-805f

If you like it, also subscribe to the Turing Post -> https://www.turingpost.com/subscribe

posted an update 18 days ago
view post
Post
2673
9 Types of AI inference

AI inference refers to the process when AI models generate predictions, classifications, or decisions based on input data and pre-trained models. It encompasses a wide range of approaches with different computational methods and deployment.

Firstly, here are 5 inference types, based on how the model reasons:

1. Probabilistic inference -> https://arxiv.org/pdf/2502.05244
Uses probability theory to reason under uncertainty. The system maintains degrees of belief over hypotheses and updates them as evidence comes in.

2. Rule-based inference -> Logicbreaks: A Framework for Understanding Subversion of Rule-based Inference (2407.00075)
Draws conclusions by applying explicit if-then rules encoded in a knowledge base. Mostly used in neurosymbolic AI.

3. Logical inference -> https://arxiv.org/abs/2009.03393
Uses formal logic to draw conclusions that are guaranteed true if the premises are. It supports theorem proving, logic programming, and tasks needing correctness, like software verification.

4. Abductive inference -> Can ChatGPT Make Explanatory Inferences? Benchmarks for Abductive Reasoning (2404.18982)
Involves forming hypotheses that would best explain a given set of observations - among multiple possible explanations, the goal is to choose the most plausible. Abduction is inherently creative and uncertain.

5. Fuzzy inference -> DCNFIS: Deep Convolutional Neuro-Fuzzy Inference System (2308.06378)
Applies fuzzy logic – reasoning with degrees of truth rather than binary true/false. Inputs are mapped to fuzzy sets with membership grades between 0 and 1.

Secondly, here are 4 inference types based on its execution contexts:

1. Batch inference -> BatchLLM: Optimizing Large Batched LLM Inference with Global Prefix Sharing and Throughput-oriented Token Batching (2412.03594)
Involves generating model predictions on large sets of data in bulk, often on a scheduled basis or as needed for analysis rather than immediate use.

2. Real-time inference -> Real-time Inference and Extrapolation via a Diffusion-inspired Temporal Transformer Operator (DiTTO) (2307.09072)
Produces outputs on-demand with minimal latency, so results are available immediately when needed.

Read further in the comments πŸ‘‡
  • 2 replies
Β·
replied to their post 25 days ago
posted an update 25 days ago
view post
Post
1975
9 Multimodal Chain-of-Thought methods

How Chain-of-Thought (CoT) prompting can unlock models' full potential across images, video, audio and more? Finding special multimodal CoT techniques is the answer.

Here are 9 methods of Multimodal Chain-of-Thought (MCoT). Most of them are open-source:

1. KAM-CoT -> KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts Reasoning (2401.12863)
This lightweight framework combines CoT prompting with knowledge graphs (KGs) and achieves 93.87% accuracy

2. Multimodal Visualization-of-Thought (MVoT) -> Imagine while Reasoning in Space: Multimodal Visualization-of-Thought (2501.07542)
Lets models generate visual reasoning traces, using a token discrepancy loss to improve visual quality

3. Compositional CoT (CCoT) -> Compositional Chain-of-Thought Prompting for Large Multimodal Models (2311.17076)
Uses scene graph (SG) representations generated by the LMM itself to improve performance on compositional and general multimodal benchmarks

4. URSA -> URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics (2501.04686)
Brings System 2-style thinking to multimodal math reasoning, using a 3-module CoT data synthesis process with CoT distillation, trajectory-format rewriting and format unification

5. MM-Verify -> MM-Verify: Enhancing Multimodal Reasoning with Chain-of-Thought Verification (2502.13383)
Introduces a verification mechanism with MM-Verifier and MM-Reasoner that implements synthesized high-quality CoT data for multimodal reasoning

6. Duty-Distinct CoT (DDCoT) -> DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models (2310.16436)
Divides the reasoning responsibilities between LMs and visual models, integrating the visual recognition capabilities into the joint reasoning process

7. Multimodal-CoT from Amazon Web Services -> Multimodal Chain-of-Thought Reasoning in Language Models (2302.00923)
A two-stage framework separates rationale generation from answer prediction, allowing the model to reason more effectively using multimodal inputs

8. Graph-of-Thought (GoT) -> Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Large Language Models (2305.16582)
This two-stage framework models reasoning as a graph of interconnected ideas, improving performance on text-only and multimodal tasks

More in the commentsπŸ‘‡
  • 1 reply
Β·
reacted to their post with πŸš€β€οΈπŸ‘€ 30 days ago
view post
Post
5095
8 types of RoPE

As we always use Transformers, it's helpful to understand RoPEβ€”Rotary Position Embedding. Since token order matters, RoPE encodes it by rotating token embeddings based on their position, so the model knows how to interpret which token comes first, second, and so on.

Here are 8 types of RoPE that can be implemented in different cases:

1. Original RoPE -> RoFormer: Enhanced Transformer with Rotary Position Embedding (2104.09864)
Encodes token positions by rotating token embeddings in the complex plane via a position-based rotation matrix, thereby providing the self-attention mechanism with relative positional info.

2. LongRoPE -> LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens (2402.13753)
Extends the context window of pre-trained LLMs to 2048k tokens, leveraging non-uniformities in positional interpolation with an efficient search.

3. LongRoPE2 -> LongRoPE2: Near-Lossless LLM Context Window Scaling (2502.20082)
Extends the effective context window of pre-trained LLMs to the target! length, rescaling RoPE guided by β€œneedle-driven” perplexity.

4. Multimodal RoPE (MRoPE) -> Qwen2.5-VL Technical Report (2502.13923)
Decomposes positional embedding into 3 components: temporal, height and width, so that positional features are aligned across modalities: text, images and videos.

5. Directional RoPE (DRoPE) -> DRoPE: Directional Rotary Position Embedding for Efficient Agent Interaction Modeling (2503.15029)
Adds an identity scalar, improving how angles are handled without extra complexity. It helps balance accuracy, speed, and memory usage.

6. VideoRoPE -> VideoRoPE: What Makes for Good Video Rotary Position Embedding? (2502.05173)
Adapts RoPE for video, featuring 3D structure, low-frequency temporal allocation, diagonal layout, and adjustable spacing.

7. VRoPE -> VRoPE: Rotary Position Embedding for Video Large Language Models (2502.11664)
An another RoPE for video, which restructures positional indices and balances encoding for uniform spatial focus.

8. XPos (Extrapolatable Position Embedding) -> https://huggingface.co/papers/2212.10
Introduces an exponential decay factor into the rotation matrix​, improving stability on long sequences.
  • 1 reply
Β·
replied to their post about 1 month ago
posted an update about 1 month ago
view post
Post
5095
8 types of RoPE

As we always use Transformers, it's helpful to understand RoPEβ€”Rotary Position Embedding. Since token order matters, RoPE encodes it by rotating token embeddings based on their position, so the model knows how to interpret which token comes first, second, and so on.

Here are 8 types of RoPE that can be implemented in different cases:

1. Original RoPE -> RoFormer: Enhanced Transformer with Rotary Position Embedding (2104.09864)
Encodes token positions by rotating token embeddings in the complex plane via a position-based rotation matrix, thereby providing the self-attention mechanism with relative positional info.

2. LongRoPE -> LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens (2402.13753)
Extends the context window of pre-trained LLMs to 2048k tokens, leveraging non-uniformities in positional interpolation with an efficient search.

3. LongRoPE2 -> LongRoPE2: Near-Lossless LLM Context Window Scaling (2502.20082)
Extends the effective context window of pre-trained LLMs to the target! length, rescaling RoPE guided by β€œneedle-driven” perplexity.

4. Multimodal RoPE (MRoPE) -> Qwen2.5-VL Technical Report (2502.13923)
Decomposes positional embedding into 3 components: temporal, height and width, so that positional features are aligned across modalities: text, images and videos.

5. Directional RoPE (DRoPE) -> DRoPE: Directional Rotary Position Embedding for Efficient Agent Interaction Modeling (2503.15029)
Adds an identity scalar, improving how angles are handled without extra complexity. It helps balance accuracy, speed, and memory usage.

6. VideoRoPE -> VideoRoPE: What Makes for Good Video Rotary Position Embedding? (2502.05173)
Adapts RoPE for video, featuring 3D structure, low-frequency temporal allocation, diagonal layout, and adjustable spacing.

7. VRoPE -> VRoPE: Rotary Position Embedding for Video Large Language Models (2502.11664)
An another RoPE for video, which restructures positional indices and balances encoding for uniform spatial focus.

8. XPos (Extrapolatable Position Embedding) -> https://huggingface.co/papers/2212.10
Introduces an exponential decay factor into the rotation matrix​, improving stability on long sequences.
  • 1 reply
Β·
replied to their post about 1 month ago
view reply
  1. Adaptive attention -> https://huggingface.co/papers/1612.01887
    Dynamically adjusts its attention behavior – when or whether to use attention, or how broad the attention should be.

  2. Scaled Dot-Product attention -> https://huggingface.co/papers/2404.16629
    Attention scores are computed by the dot product between a query vector and a key vector, and then divided by the square root of the key dimension before applying softmax.

  3. Additive attention -> https://huggingface.co/papers/1409.0473
    Computes attention scores using a small feed-forward that combines the query and key vectors.

  4. Global attention -> https://huggingface.co/papers/1508.04025
    Is a form of soft attention that considers all possible positions in the input sequence.

  5. Local attention -> https://huggingface.co/papers/1508.04025
    It's a compromise between hard and soft attention. The model only attends to a restricted subset of inputs at a given step.

  6. Sparse attention -> https://huggingface.co/papers/1602.02068
    Applies patterns that limit what each word can focus on.

  7. Hierarchical attention -> https://www.cs.cmu.edu/~./hovy/papers/16HLT-hierarchical-attention-networks.pdf
    Model first applies attention at the word level and produces a sentence representation. Then it applies another attention at the sentence level to determine which sentences are important for the document representation.

  8. Temporal attention -> https://huggingface.co/papers/1502.08029
    Deals with time-series or sequential data, allowing a model to focus on particular time steps or time segments.

posted an update about 1 month ago
view post
Post
7833
15 types of attention mechanisms

Attention mechanisms allow models to dynamically focus on specific parts of their input when performing tasks. In our recent article, we discussed Multi-Head Latent Attention (MLA) in detail and now it's time to summarize other existing types of attention.

Here is a list of 15 types of attention mechanisms used in AI models:

1. Soft attention (Deterministic attention) -> Neural Machine Translation by Jointly Learning to Align and Translate (1409.0473)
Assigns a continuous weight distribution over all parts of the input. It produces a weighted sum of the input using attention weights that sum to 1.

2. Hard attention (Stochastic attention) -> Effective Approaches to Attention-based Neural Machine Translation (1508.04025)
Makes a discrete selection of some part of the input to focus on at each step, rather than attending to everything.

3. Self-attention -> Attention Is All You Need (1706.03762)
Each element in the sequence "looks" at other elements and "decides" how much to borrow from each of them for its new representation.

4. Cross-Attention (Encoder-Decoder attention) -> Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation (2104.08771)
The queries come from one sequence and the keys/values come from another sequence. It allows a model to combine information from two different sources.

5. Multi-Head Attention (MHA) -> Attention Is All You Need (1706.03762)
Multiple attention β€œheads” are run in parallel.​ The model computes several attention distributions (heads), each with its own set of learned projections of queries, keys, and values.

6. Multi-Head Latent Attention (MLA) -> DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model (2405.04434)
Extends MHA by incorporating a latent space where attention heads can dynamically learn different latent factors or representations.

7. Memory-Based attention -> End-To-End Memory Networks (1503.08895)
Involves an external memory and uses attention to read from and write to this memory.

See other types in the comments πŸ‘‡
  • 1 reply
Β·
reacted to their post with πŸ”₯🧠 about 1 month ago
view post
Post
4119
5 New implementations of Diffusion Models

Diffusion models are widely used for image and video generation but remain underexplored in text generation, where autoregressive models (ARMs) dominate. Unlike ARMs, which produce tokens sequentially, diffusion models iteratively refine noise through denoising steps, offering greater flexibility and speed.
Recent advancements show a shift toward using diffusion models in place of, or alongside, ARMs. Researchers also combine strengths from both methods and integrate autoregressive concepts into diffusion.

Here are 5 new implementations of diffusion models:

1. Mercury family of diffusion LLMs (dLLMs) by Inception Labs -> https://www.inceptionlabs.ai/news
It applies diffusion to text and code data, enabling sequence generation 10x faster than today's top LLMs. Now available Mercury Coder can run at over 1,000 tokens/sec on NVIDIA H100s.

2. Diffusion of Thoughts (DoT) -> Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models (2402.07754)
Integrates diffusion models with Chain-of-Thought. DoT allows reasoning steps to diffuse gradually over time. This flexibility enables balancing between reasoning quality and computational cost.

3. LLaDA -> Large Language Diffusion Models (2502.09992)
Shows diffusion models' potential in replacing ARMs. Trained with pre-training and SFT, LLaDA masks tokens, predicts them via a Transformer, and optimizes a likelihood bound. LLaDA matches key LLM skills, and surpasses GPT-4o in reversal poetry.

4. LanDiff -> The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation (2503.04606)
This hybrid text-to-video model combines autoregressive and diffusion paradigms, introducing a semantic tokenizer, an LM for token generation, and a streaming diffusion model. LanDiff outperforms models like Sora.

5. General Interpolating Discrete Diffusion (GIDD) -> Generalized Interpolating Discrete Diffusion (2503.04482)
A flexible noising process with a novel diffusion ELBO enables combining masking and uniform noise, allowing diffusion models to correct mistakes, where ARMs struggle.
  • 3 replies
Β·
reacted to clem's post with πŸ‘ about 2 months ago
view post
Post
7315
I was chatting with @peakji , one of the cofounders of Manu AI, who told me he was on Hugging Face (very cool!).

He shared an interesting insight which is that agentic capabilities might be more of an alignment problem rather than a foundational capability issue. Similar to the difference between GPT-3 and InstructGPT, some open-source foundation models are simply trained to 'answer everything in one response regardless of the complexity of the question' - after all, that's the user preference in chatbot use cases. Just a bit of post-training on agentic trajectories can make an immediate and dramatic difference.

As a thank you to the community, he shared 100 invite code first-come first serve, just use β€œHUGGINGFACE” to get access!
Β·