Ksenia Se

Kseniase

AI & ML interests

None yet

Recent Activity

replied to their post 6 days ago
11 new types of RAG RAG is evolving fast, keeping pace with cutting-edge AI trends. Today it becomes more agentic and smarter at navigating complex structures like hypergraphs. Here are 11 latest RAG types: 1. InstructRAG -> https://huggingface.co/papers/2504.13032 Combines RAG with a multi-agent framework, using a graph-based structure, an RL agent to expand task coverage, and a meta-learning agent for better generalization 2. CoRAG (Collaborative RAG) -> https://huggingface.co/papers/2504.01883 A collaborative framework that extends RAG to settings where clients train a shared model using a joint passage store 3. ReaRAG -> https://huggingface.co/papers/2503.21729 It uses a Thought-Action-Observation loop to decide at each step whether to retrieve information or finalize an answer, reducing unnecessary reasoning and errors 4. MCTS-RAG -> https://huggingface.co/papers/2503.20757 Combines RAG with Monte Carlo Tree Search (MCTS) to help small LMs handle complex, knowledge-heavy tasks 5. Typed-RAG - > https://huggingface.co/papers/2503.15879 Improves answers on open-ended questions by identifying question types (a debate, personal experience, or comparison) and breaking it down into simpler parts 6. MADAM-RAG -> https://huggingface.co/papers/2504.13079 A multi-agent system where models debate answers over multiple rounds and an aggregator filters noise and misinformation 7. HM-RAG -> https://huggingface.co/papers/2504.12330 A hierarchical multi-agent RAG framework that uses 3 agents: one to split queries, one to retrieve across multiple data types (text, graphs and web), and one to merge and refine answers 8. CDF-RAG -> https://huggingface.co/papers/2504.12560 Works with causal graphs and enables multi-hop causal reasoning, refining queries. It validates responses against causal pathways To explore what is Causal AI, read our article: https://www.turingpost.com/p/causalai Subscribe to the Turing Post: https://www.turingpost.com/subscribe Read further 👇
posted an update 6 days ago
11 new types of RAG RAG is evolving fast, keeping pace with cutting-edge AI trends. Today it becomes more agentic and smarter at navigating complex structures like hypergraphs. Here are 11 latest RAG types: 1. InstructRAG -> https://huggingface.co/papers/2504.13032 Combines RAG with a multi-agent framework, using a graph-based structure, an RL agent to expand task coverage, and a meta-learning agent for better generalization 2. CoRAG (Collaborative RAG) -> https://huggingface.co/papers/2504.01883 A collaborative framework that extends RAG to settings where clients train a shared model using a joint passage store 3. ReaRAG -> https://huggingface.co/papers/2503.21729 It uses a Thought-Action-Observation loop to decide at each step whether to retrieve information or finalize an answer, reducing unnecessary reasoning and errors 4. MCTS-RAG -> https://huggingface.co/papers/2503.20757 Combines RAG with Monte Carlo Tree Search (MCTS) to help small LMs handle complex, knowledge-heavy tasks 5. Typed-RAG - > https://huggingface.co/papers/2503.15879 Improves answers on open-ended questions by identifying question types (a debate, personal experience, or comparison) and breaking it down into simpler parts 6. MADAM-RAG -> https://huggingface.co/papers/2504.13079 A multi-agent system where models debate answers over multiple rounds and an aggregator filters noise and misinformation 7. HM-RAG -> https://huggingface.co/papers/2504.12330 A hierarchical multi-agent RAG framework that uses 3 agents: one to split queries, one to retrieve across multiple data types (text, graphs and web), and one to merge and refine answers 8. CDF-RAG -> https://huggingface.co/papers/2504.12560 Works with causal graphs and enables multi-hop causal reasoning, refining queries. It validates responses against causal pathways To explore what is Causal AI, read our article: https://www.turingpost.com/p/causalai Subscribe to the Turing Post: https://www.turingpost.com/subscribe Read further 👇
View all activity

Organizations

Turing Post's profile picture Journalists on Hugging Face's profile picture Social Post Explorers's profile picture Hugging Face Discord Community's profile picture Sandbox's profile picture

Kseniase's activity

replied to their post 6 days ago
view reply

These are graph-centric types of RAG:

  1. NodeRAG -> https://huggingface.co/papers/2504.11544
    Uses well-designed heterogeneous graph structures and focuses on graph design to ensure smooth integration of graph algorithms. It outperforms GraphRAG and LightRAG on multi-hop and open-ended QA benchmarks

  2. HeteRAG -> https://huggingface.co/papers/2504.10529
    This heterogeneous RAG framework decouples knowledge chunk representations. It uses multi-granular views for retrieval and concise chunks for generation, along with adaptive prompt tuning

  3. Hyper-RAG -> https://huggingface.co/papers/2504.08758
    A hypergraph-based RAG method. By capturing both pairwise and complex relationships in domain-specific knowledge, it improves factual accuracy and reduces hallucinations, especially in high-stakes fields like medicine, surpassing Graph RAG and Light RAG. Its lightweight version also doubles retrieval speed

posted an update 6 days ago
view post
Post
6623
11 new types of RAG

RAG is evolving fast, keeping pace with cutting-edge AI trends. Today it becomes more agentic and smarter at navigating complex structures like hypergraphs.

Here are 11 latest RAG types:

1. InstructRAG -> InstructRAG: Leveraging Retrieval-Augmented Generation on Instruction Graphs for LLM-Based Task Planning (2504.13032)
Combines RAG with a multi-agent framework, using a graph-based structure, an RL agent to expand task coverage, and a meta-learning agent for better generalization

2. CoRAG (Collaborative RAG) -> CoRAG: Collaborative Retrieval-Augmented Generation (2504.01883)
A collaborative framework that extends RAG to settings where clients train a shared model using a joint passage store

3. ReaRAG -> ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation (2503.21729)
It uses a Thought-Action-Observation loop to decide at each step whether to retrieve information or finalize an answer, reducing unnecessary reasoning and errors

4. MCTS-RAG -> MCTS-RAG: Enhancing Retrieval-Augmented Generation with Monte Carlo Tree Search (2503.20757)
Combines RAG with Monte Carlo Tree Search (MCTS) to help small LMs handle complex, knowledge-heavy tasks

5. Typed-RAG - > Typed-RAG: Type-aware Multi-Aspect Decomposition for Non-Factoid Question Answering (2503.15879)
Improves answers on open-ended questions by identifying question types (a debate, personal experience, or comparison) and breaking it down into simpler parts

6. MADAM-RAG -> Retrieval-Augmented Generation with Conflicting Evidence (2504.13079)
A multi-agent system where models debate answers over multiple rounds and an aggregator filters noise and misinformation

7. HM-RAG -> HM-RAG: Hierarchical Multi-Agent Multimodal Retrieval Augmented Generation (2504.12330)
A hierarchical multi-agent RAG framework that uses 3 agents: one to split queries, one to retrieve across multiple data types (text, graphs and web), and one to merge and refine answers

8. CDF-RAG -> CDF-RAG: Causal Dynamic Feedback for Adaptive Retrieval-Augmented Generation (2504.12560)
Works with causal graphs and enables multi-hop causal reasoning, refining queries. It validates responses against causal pathways

To explore what is Causal AI, read our article: https://www.turingpost.com/p/causalai

Subscribe to the Turing Post: https://www.turingpost.com/subscribe

Read further 👇
  • 1 reply
·
reacted to fdaudens's post with 🔥 9 days ago
view post
Post
1496
Just tested something this morning that feels kind of game-changing for how we publish, discover, and consume news with AI: connecting Claude directly to the New York Times through MCP.

Picture this: You ask Claude about a topic, and it instantly pulls verified and trusted NYT content — no more guessing if the info is accurate.

The cool part? Publishers stay in control of what they share via API, and users get fast, reliable access through the AI tools they already use. Instead of scraping random stuff off the web, we get a future where publishers actively shape how their journalism shows up in AI.

It’s still a bit technical to set up right now, but this could get super simple soon — like installing apps on your phone, but for your chatbot. And you keep the brand connection, too.

Not saying it solves everything, but it’s definitely a new way to distribute content — and maybe even find some fresh value in the middle of this whole news + AI shakeup. Early movers will have a head start.

Curious what folks think — could MCPs be a real opportunity for journalism?
  • 1 reply
·
reacted to their post with 👍 12 days ago
view post
Post
5489
16 new research on inference-time scaling:

For the last couple of weeks a large amount of studies on inference-time scaling has emerged. And it's so cool, because each new paper adds a trick to the toolbox, making LLMs more capable without needing to scale parameter count of the models.

So here are 13 new methods + 3 comprehensive studies on test-time scaling:

1. Inference-Time Scaling for Generalist Reward Modeling (2504.02495)
Probably, the most popular study. It proposes to boost inference-time scalability by improving reward modeling. To enhance performance, DeepSeek-GRM uses adaptive critiques, parallel sampling, pointwise generative RM, and Self-Principled Critique Tuning (SPCT)

2. T1: Tool-integrated Self-verification for Test-time Compute Scaling in Small Language Models (2504.04718)
Allows small models to use external tools, like code interpreters and calculator, to enhance self-verification

3. Z1: Efficient Test-time Scaling with Code (2504.00810)
Proposes to train LLMs on code-based reasoning paths to make test-time scaling more efficient, limiting unnecessary tokens with a special dataset and a Shifted Thinking Window

4. GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning (2504.00891)
Introduces GenPRM, a generative PRM, that uses CoT reasoning and code verification for step-by-step judgment. With only 23K training examples, GenPRM outperforms prior PRMs and larger models

5. Can Test-Time Scaling Improve World Foundation Model? (2503.24320)
SWIFT test-time scaling framework improves World Models' performance without retraining, using strategies like fast tokenization, Top-K pruning, and efficient beam search

6. Relevance Isn't All You Need: Scaling RAG Systems With Inference-Time Compute Via Multi-Criteria Reranking (2504.07104)
Proposes REBEL for RAG systems scaling, which uses multi-criteria optimization with CoT prompting for better performance-speed tradeoffs as inference compute increases

7. $φ$-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation (2503.13288)
Proposes a φ-Decoding strategy that uses foresight sampling, clustering and adaptive pruning to estimate and select optimal reasoning steps

Read further below 👇

Also, subscribe to the Turing Post https://www.turingpost.com/subscribe
  • 2 replies
·
replied to their post 13 days ago
view reply
  1. Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing -> https://huggingface.co/papers/2503.19385
    An effective test-time scaling method for flow models with SDE-based generation for particle sampling, interpolant conversion to enhance diversity, and Rollover Budget Forcing (RBF) for adaptive compute allocation

  2. Dedicated Feedback and Edit Models Empower Inference-Time Scaling for Open-Ended General-Domain Tasks -> https://huggingface.co/papers/2503.04378
    Introduces a Feedback-Edit model setup that improves inference-time scaling, particularly for open-ended tasks, by using 3 different model for drafting, feedback and editing

  3. m1: Unleash the Potential of Test-Time Scaling for Medical Reasoning with Large Language Models -> https://huggingface.co/papers/2504.00869
    A simple m1 method improves medical performance at inference, with models under 10B outperforming previous benchmarks and a 32B model matching 70B models

  4. ToolACE-R: Tool Learning with Adaptive Self-Refinement -> https://huggingface.co/papers/2504.01400
    ToolACE-R enables adaptive self-refinement of tool use through model-aware iterative training. It refines tool calls without external feedback and scales inference compute efficiently

  5. Scaling Test-Time Inference with Policy-Optimized, Dynamic Retrieval-Augmented Generation via KV Caching and Decoding -> https://huggingface.co/papers/2504.01281
    Introduces a lightweight RAG framework that uses PORAG for better content use, ATLAS for adaptive retrieval timing, and CRITIC for efficient memory use. Together with optimized decoding strategies and adaptive reasoning depth, it allows the model to scale its inference steps effectively.

  6. Do We Truly Need So Many Samples? Multi-LLM Repeated Sampling Efficiently Scales Test-Time Compute -> https://huggingface.co/papers/2504.00762
    ModelSwitch is a sampling-then-voting strategy that uses multiple models (including weaker ones) to leverage diverse strengths, where a consistency signal guides dynamic model switching. It highlights the potential of multi-model generation-verification.

3 comprehensive surveys on inference time-scaling:

  1. Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead -> https://huggingface.co/papers/2504.00294

  2. What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models -> https://huggingface.co/papers/2503.24235

  3. Efficient Inference for Large Reasoning Models: A Survey -> https://huggingface.co/papers/2503.23077

posted an update 13 days ago
view post
Post
5489
16 new research on inference-time scaling:

For the last couple of weeks a large amount of studies on inference-time scaling has emerged. And it's so cool, because each new paper adds a trick to the toolbox, making LLMs more capable without needing to scale parameter count of the models.

So here are 13 new methods + 3 comprehensive studies on test-time scaling:

1. Inference-Time Scaling for Generalist Reward Modeling (2504.02495)
Probably, the most popular study. It proposes to boost inference-time scalability by improving reward modeling. To enhance performance, DeepSeek-GRM uses adaptive critiques, parallel sampling, pointwise generative RM, and Self-Principled Critique Tuning (SPCT)

2. T1: Tool-integrated Self-verification for Test-time Compute Scaling in Small Language Models (2504.04718)
Allows small models to use external tools, like code interpreters and calculator, to enhance self-verification

3. Z1: Efficient Test-time Scaling with Code (2504.00810)
Proposes to train LLMs on code-based reasoning paths to make test-time scaling more efficient, limiting unnecessary tokens with a special dataset and a Shifted Thinking Window

4. GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning (2504.00891)
Introduces GenPRM, a generative PRM, that uses CoT reasoning and code verification for step-by-step judgment. With only 23K training examples, GenPRM outperforms prior PRMs and larger models

5. Can Test-Time Scaling Improve World Foundation Model? (2503.24320)
SWIFT test-time scaling framework improves World Models' performance without retraining, using strategies like fast tokenization, Top-K pruning, and efficient beam search

6. Relevance Isn't All You Need: Scaling RAG Systems With Inference-Time Compute Via Multi-Criteria Reranking (2504.07104)
Proposes REBEL for RAG systems scaling, which uses multi-criteria optimization with CoT prompting for better performance-speed tradeoffs as inference compute increases

7. $φ$-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation (2503.13288)
Proposes a φ-Decoding strategy that uses foresight sampling, clustering and adaptive pruning to estimate and select optimal reasoning steps

Read further below 👇

Also, subscribe to the Turing Post https://www.turingpost.com/subscribe
  • 2 replies
·
updated a Space 19 days ago
replied to their post 20 days ago
view reply
  1. Edge inference -> https://arxiv.org/pdf/2112.00616
    Refers to running AI models locally on edge devices (mobile phones, IoT devices, embedded hardware) or on servers at the network edge.

  2. Cloud inference -> https://huggingface.co/papers/2210.05889
    Input data is sent from users/devices to the cloud, where large-scale compute (CPUs, GPUs, TPUs) runs the AI model and returns the results.

Explore the other important aspects about AI inference, including how it works and what are the current trends, in our article: https://www.turingpost.com/p/inference-805f

If you like it, also subscribe to the Turing Post -> https://www.turingpost.com/subscribe

posted an update 20 days ago
view post
Post
2675
9 Types of AI inference

AI inference refers to the process when AI models generate predictions, classifications, or decisions based on input data and pre-trained models. It encompasses a wide range of approaches with different computational methods and deployment.

Firstly, here are 5 inference types, based on how the model reasons:

1. Probabilistic inference -> https://arxiv.org/pdf/2502.05244
Uses probability theory to reason under uncertainty. The system maintains degrees of belief over hypotheses and updates them as evidence comes in.

2. Rule-based inference -> Logicbreaks: A Framework for Understanding Subversion of Rule-based Inference (2407.00075)
Draws conclusions by applying explicit if-then rules encoded in a knowledge base. Mostly used in neurosymbolic AI.

3. Logical inference -> https://arxiv.org/abs/2009.03393
Uses formal logic to draw conclusions that are guaranteed true if the premises are. It supports theorem proving, logic programming, and tasks needing correctness, like software verification.

4. Abductive inference -> Can ChatGPT Make Explanatory Inferences? Benchmarks for Abductive Reasoning (2404.18982)
Involves forming hypotheses that would best explain a given set of observations - among multiple possible explanations, the goal is to choose the most plausible. Abduction is inherently creative and uncertain.

5. Fuzzy inference -> DCNFIS: Deep Convolutional Neuro-Fuzzy Inference System (2308.06378)
Applies fuzzy logic – reasoning with degrees of truth rather than binary true/false. Inputs are mapped to fuzzy sets with membership grades between 0 and 1.

Secondly, here are 4 inference types based on its execution contexts:

1. Batch inference -> BatchLLM: Optimizing Large Batched LLM Inference with Global Prefix Sharing and Throughput-oriented Token Batching (2412.03594)
Involves generating model predictions on large sets of data in bulk, often on a scheduled basis or as needed for analysis rather than immediate use.

2. Real-time inference -> Real-time Inference and Extrapolation via a Diffusion-inspired Temporal Transformer Operator (DiTTO) (2307.09072)
Produces outputs on-demand with minimal latency, so results are available immediately when needed.

Read further in the comments 👇
  • 2 replies
·
upvoted an article 21 days ago
view article
Article

Topic 33: Slim Attention, KArAt, XAttention and Multi-Token Attention Explained – What’s Really Changing in Transformers?

By Kseniase and 1 other
14
published an article 21 days ago
view article
Article

Topic 33: Slim Attention, KArAt, XAttention and Multi-Token Attention Explained – What’s Really Changing in Transformers?

By Kseniase and 1 other
14
replied to their post 27 days ago
posted an update 27 days ago
view post
Post
1978
9 Multimodal Chain-of-Thought methods

How Chain-of-Thought (CoT) prompting can unlock models' full potential across images, video, audio and more? Finding special multimodal CoT techniques is the answer.

Here are 9 methods of Multimodal Chain-of-Thought (MCoT). Most of them are open-source:

1. KAM-CoT -> KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts Reasoning (2401.12863)
This lightweight framework combines CoT prompting with knowledge graphs (KGs) and achieves 93.87% accuracy

2. Multimodal Visualization-of-Thought (MVoT) -> Imagine while Reasoning in Space: Multimodal Visualization-of-Thought (2501.07542)
Lets models generate visual reasoning traces, using a token discrepancy loss to improve visual quality

3. Compositional CoT (CCoT) -> Compositional Chain-of-Thought Prompting for Large Multimodal Models (2311.17076)
Uses scene graph (SG) representations generated by the LMM itself to improve performance on compositional and general multimodal benchmarks

4. URSA -> URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics (2501.04686)
Brings System 2-style thinking to multimodal math reasoning, using a 3-module CoT data synthesis process with CoT distillation, trajectory-format rewriting and format unification

5. MM-Verify -> MM-Verify: Enhancing Multimodal Reasoning with Chain-of-Thought Verification (2502.13383)
Introduces a verification mechanism with MM-Verifier and MM-Reasoner that implements synthesized high-quality CoT data for multimodal reasoning

6. Duty-Distinct CoT (DDCoT) -> DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models (2310.16436)
Divides the reasoning responsibilities between LMs and visual models, integrating the visual recognition capabilities into the joint reasoning process

7. Multimodal-CoT from Amazon Web Services -> Multimodal Chain-of-Thought Reasoning in Language Models (2302.00923)
A two-stage framework separates rationale generation from answer prediction, allowing the model to reason more effectively using multimodal inputs

8. Graph-of-Thought (GoT) -> Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Large Language Models (2305.16582)
This two-stage framework models reasoning as a graph of interconnected ideas, improving performance on text-only and multimodal tasks

More in the comments👇
  • 1 reply
·
reacted to their post with 🚀❤️ about 1 month ago
view post
Post
5098
8 types of RoPE

As we always use Transformers, it's helpful to understand RoPE—Rotary Position Embedding. Since token order matters, RoPE encodes it by rotating token embeddings based on their position, so the model knows how to interpret which token comes first, second, and so on.

Here are 8 types of RoPE that can be implemented in different cases:

1. Original RoPE -> RoFormer: Enhanced Transformer with Rotary Position Embedding (2104.09864)
Encodes token positions by rotating token embeddings in the complex plane via a position-based rotation matrix, thereby providing the self-attention mechanism with relative positional info.

2. LongRoPE -> LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens (2402.13753)
Extends the context window of pre-trained LLMs to 2048k tokens, leveraging non-uniformities in positional interpolation with an efficient search.

3. LongRoPE2 -> LongRoPE2: Near-Lossless LLM Context Window Scaling (2502.20082)
Extends the effective context window of pre-trained LLMs to the target! length, rescaling RoPE guided by “needle-driven” perplexity.

4. Multimodal RoPE (MRoPE) -> Qwen2.5-VL Technical Report (2502.13923)
Decomposes positional embedding into 3 components: temporal, height and width, so that positional features are aligned across modalities: text, images and videos.

5. Directional RoPE (DRoPE) -> DRoPE: Directional Rotary Position Embedding for Efficient Agent Interaction Modeling (2503.15029)
Adds an identity scalar, improving how angles are handled without extra complexity. It helps balance accuracy, speed, and memory usage.

6. VideoRoPE -> VideoRoPE: What Makes for Good Video Rotary Position Embedding? (2502.05173)
Adapts RoPE for video, featuring 3D structure, low-frequency temporal allocation, diagonal layout, and adjustable spacing.

7. VRoPE -> VRoPE: Rotary Position Embedding for Video Large Language Models (2502.11664)
An another RoPE for video, which restructures positional indices and balances encoding for uniform spatial focus.

8. XPos (Extrapolatable Position Embedding) -> https://huggingface.co/papers/2212.10
Introduces an exponential decay factor into the rotation matrix​, improving stability on long sequences.
  • 1 reply
·