Tongyao PRO
tyzhu
·
AI & ML interests
Natural Language Processing
Organizations
None yet
knowledge
-
How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?
Paper • 2502.14502 • Published • 91 -
From RAG to Memory: Non-Parametric Continual Learning for Large Language Models
Paper • 2502.14802 • Published • 13 -
When an LLM is apprehensive about its answers -- and when its uncertainty is justified
Paper • 2503.01688 • Published • 21
IR
multilingual
long-context
pretraining
-
Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models
Paper • 2502.15499 • Published • 14 -
MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs
Paper • 2502.17422 • Published • 7 -
The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?
Paper • 2502.17535 • Published • 8 -
Scaling LLM Pre-training with Vocabulary Curriculum
Paper • 2502.17910 • Published • 1
reasoning
-
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?
Paper • 2502.19361 • Published • 28 -
Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning
Paper • 2502.17407 • Published • 26 -
Small Models Struggle to Learn from Strong Reasoners
Paper • 2502.12143 • Published • 39 -
Language Models can Self-Improve at State-Value Estimation for Better Search
Paper • 2503.02878 • Published • 10
daily-papers
-
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
Paper • 2409.10516 • Published • 44 -
Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse
Paper • 2409.11242 • Published • 7 -
Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models
Paper • 2409.11136 • Published • 25 -
On the Diagram of Thought
Paper • 2409.10038 • Published • 14
multimodal
long-context
knowledge
-
How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?
Paper • 2502.14502 • Published • 91 -
From RAG to Memory: Non-Parametric Continual Learning for Large Language Models
Paper • 2502.14802 • Published • 13 -
When an LLM is apprehensive about its answers -- and when its uncertainty is justified
Paper • 2503.01688 • Published • 21
pretraining
-
Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models
Paper • 2502.15499 • Published • 14 -
MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs
Paper • 2502.17422 • Published • 7 -
The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?
Paper • 2502.17535 • Published • 8 -
Scaling LLM Pre-training with Vocabulary Curriculum
Paper • 2502.17910 • Published • 1
IR
reasoning
-
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?
Paper • 2502.19361 • Published • 28 -
Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning
Paper • 2502.17407 • Published • 26 -
Small Models Struggle to Learn from Strong Reasoners
Paper • 2502.12143 • Published • 39 -
Language Models can Self-Improve at State-Value Estimation for Better Search
Paper • 2503.02878 • Published • 10
multilingual
daily-papers
-
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
Paper • 2409.10516 • Published • 44 -
Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse
Paper • 2409.11242 • Published • 7 -
Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models
Paper • 2409.11136 • Published • 25 -
On the Diagram of Thought
Paper • 2409.10038 • Published • 14