1 15 11

zijie tian

zijie-tian

https://zijie-tian.github.io

Zijie-Tian

AI & ML interests

Storage for AI

Recent Activity

upvoted an article 12 days ago

Unlocking Longer Generation with Key-Value Cache Quantization

liked a model about 1 month ago

Qwen/Qwen3-235B-A22B

upvoted a paper 3 months ago

Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning

View all activity

Organizations

upvoted an article 12 days ago

Article

Unlocking Longer Generation with Key-Value Cache Quantization

•

May 16, 2024

• 50

liked a model about 1 month ago

Qwen/Qwen3-235B-A22B

Text Generation • 235B • Updated Jul 26 • 134k • • 1.03k

upvoted 2 papers 3 months ago

Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning

Paper • 2505.24726 • Published May 30 • 270

Chain-of-Model Learning for Language Model

Paper • 2505.11820 • Published May 17 • 122

New activity in deepseek-ai/DeepSeek-R1-0528 3 months ago

求一个swe bench verified 跑分

➕ 1

#38 opened 3 months ago by

terryguo616

upvoted a paper 3 months ago

PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers in LLMs

Paper • 2410.05265 • Published Oct 7, 2024 • 33

upvoted 2 papers 5 months ago

LogQuant: Log-Distributed 2-Bit Quantization of KV Cache with Superior Accuracy Preservation

Paper • 2503.19950 • Published Mar 25 • 12

Task-KV: Task-aware KV Cache Optimization via Semantic Differentiation of Attention Heads

Paper • 2501.15113 • Published Jan 25 • 1

liked 2 models 6 months ago

ChenMnZ/Llama-2-7b-EfficientQAT-w2g64-GPTQ

Text Generation • 0.8B • Updated Jul 22, 2024 • 3 • 1

Qwen/QwQ-32B

Text Generation • 33B • Updated Mar 11 • 144k • • 2.83k

upvoted an article 8 months ago

Article

Fast, High-Fidelity LLM Decoding with Regex Constraints

•

Feb 23, 2024

• 9

upvoted a paper 8 months ago

TinyLlama: An Open-Source Small Language Model

Paper • 2401.02385 • Published Jan 4, 2024 • 95

liked a dataset 9 months ago

mit-han-lab/pile-val-backup

Viewer • Updated Aug 21, 2023 • 215k • 27.2k • 24

upvoted a paper 10 months ago

InstInfer: In-Storage Attention Offloading for Cost-Effective Long-Context LLM Inference

Paper • 2409.04992 • Published Sep 8, 2024 • 2

liked a Space 10 months ago

220

paper-central

⚡

Browse and chat about research papers

upvoted 2 papers 10 months ago

RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval

Paper • 2409.10516 • Published Sep 16, 2024 • 44

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 624

upvoted a paper 11 months ago

Selective Attention Improves Transformer

Paper • 2410.02703 • Published Oct 3, 2024 • 24

liked a model 11 months ago

openbmb/MiniCPM3-4B

Text Generation • Updated Feb 27 • 18.7k • 417

upvoted an article 11 months ago

Article

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

and 5 others •

Sep 18, 2024

• 265

zijie tian

AI & ML interests

Recent Activity

Organizations

zijie-tian's activity

Unlocking Longer Generation with Key-Value Cache Quantization

求一个swe bench verified 跑分

Fast, High-Fidelity LLM Decoding with Regex Constraints

paper-central

Fine-tuning LLMs to 1.58bit: extreme quantization made easy