17 13 12

kas

shing3232

AI & ML interests

None yet

Recent Activity

updated a collection 23 days ago

sakura

new activity 23 days ago

Qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4:Int4为什么比没量化的float32和float16还慢

upvoted a paper about 1 month ago

TransMLA: Multi-head Latent Attention Is All You Need

View all activity

Organizations

None yet

shing3232's activity

updated a collection 23 days ago

sakura

Collection

5 items • Updated 23 days ago

New activity in Qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4 23 days ago

Int4为什么比没量化的float32和float16还慢

#3 opened 3 months ago by

hujianmin

upvoted a paper about 1 month ago

TransMLA: Multi-head Latent Attention Is All You Need

Paper • 2502.07864 • Published Feb 11 • 54

updated a collection about 1 month ago

sakura

Collection

5 items • Updated 23 days ago

upvoted an article about 2 months ago

Article

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

and 5 others •

Sep 18, 2024

• 246

upvoted 2 papers about 2 months ago

Hogwild! Inference: Parallel LLM Generation via Concurrent Attention

Paper • 2504.06261 • Published Apr 8 • 110

VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks

Paper • 2504.05118 • Published Apr 7 • 25

liked a model 2 months ago

SakuraLLM/Sakura-GalTransl-7B-v3.5

Updated 8 days ago • 7.07k • 63

liked a model 3 months ago

webbigdata/ALMA-7B-Ja-V2

Text Generation • Updated Nov 3, 2024 • 131 • 18

New activity in agentica-org/DeepScaleR-1.5B-Preview 4 months ago

I have difficulty to trigger thinking process

#12 opened 4 months ago by

shing3232

New activity in tencent/Tencent-Hunyuan-Large 7 months ago

这个模型得什么配置能运行起来啊

#13 opened 7 months ago by

demo001s

updated a model 7 months ago

shing3232/Sakura-1.5B-Qwen2.5-v1.0-GGUF-IMX

Updated Nov 8, 2024 • 29 • 1

upvoted a collection 9 months ago

Qwen2.5-Coder

Collection

Code-specific model series based on Qwen2.5 • 40 items • Updated Apr 28 • 317

liked a model 11 months ago

UCLA-AGI/Gemma-2-9B-It-SPPO-Iter3

Text Generation • Updated Jul 1, 2024 • 7.38k • 125

updated a model about 1 year ago

shing3232/sakura-14b-qwen2beta-v0.9.2-IMX

Updated May 31, 2024 • 7 • 3

New activity in SakuraLLM/Sakura-14B-Qwen2beta-v0.9.2-GGUF about 1 year ago

CUDA运行不了BF16模型？

#1 opened about 1 year ago by

NeuronAstate

New activity in Qwen/Qwen1.5-7B-Chat-GGUF about 1 year ago

Please post f16 quantization.

🔥 1

#1 opened about 1 year ago by

ZeroWw

liked a model about 1 year ago

shing3232/sakura-14b-qwen2beta-v0.9.2-IMX

Updated May 31, 2024 • 7 • 3

upvoted a paper about 1 year ago

BASS: Batched Attention-optimized Speculative Sampling

Paper • 2404.15778 • Published Apr 24, 2024 • 11

New activity in Qwen/CodeQwen1.5-7B-Chat about 1 year ago

What are the diffences of this with Qwen/CodeQwen1.5-7B

#5 opened about 1 year ago by

Kalemnor