ct2's picture

ct2

ct-2

·

AI & ML interests

None yet

Recent Activity

new activity about 2 hours ago

moonshotai/Kimi-K2-Instruct:is kimi k2 trained with fp8?

liked a model 12 days ago

ai21labs/AI21-Jamba-Mini-1.7

upvoted a collection 12 days ago

View all activity

Organizations

None yet

upvoted a collection 12 days ago

Jamba 1.7

The AI21 Jamba family of models are hybrid SSM-Transformer foundation models, blending speed, efficient long context processing, and accuracy. • 4 items • Updated 17 days ago • 10

upvoted a collection 16 days ago

BitVLA

1-bit Vision-Language-Action Models for Robotics Manipulation • 9 items • Updated 19 days ago • 3

upvoted 2 papers 3 months ago

BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs

Paper • 2504.18415 • Published Apr 25 • 46

BitNet b1.58 2B4T Technical Report

Paper • 2504.12285 • Published Apr 16 • 74

upvoted a collection 3 months ago

BitNet

🔥BitNet family of large language models (1-bit LLMs). • 7 items • Updated May 1 • 47

upvoted a paper 3 months ago

TransMamba: Flexibly Switching between Transformer and Mamba

Paper • 2503.24067 • Published Mar 31 • 21

upvoted 2 papers 5 months ago

MutaGReP: Execution-Free Repository-Grounded Plan Search for Code-Use

Paper • 2502.15872 • Published Feb 21 • 5

Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam

Paper • 2502.17055 • Published Feb 24 • 18

upvoted a collection 5 months ago

Slam

All resources for SpeechLMs from "Slamming: Training a Speech Language Model on One GPU in a Day". We provide tokeniser, lm, and datasets • 7 items • Updated May 22 • 13

upvoted a paper 5 months ago

Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization

Paper • 2502.19261 • Published Feb 26 • 7

upvoted a collection 5 months ago

Drop-Upcycling

33 items • Updated May 30 • 2

upvoted a paper 5 months ago

Continual Quantization-Aware Pre-Training: When to transition from 16-bit to 1.58-bit pre-training for BitNet language models?

Paper • 2502.11895 • Published Feb 17 • 2

upvoted a collection 5 months ago

Hamanasu

A brand new series of Models from yours truly, Designed for Intelligence, Creativity and Roleplay - R/Locallama keeps DELETING MY GODDAMN COMMENTS • 31 items • Updated May 7 • 9

upvoted 3 papers 5 months ago

ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization

Paper • 2502.02631 • Published Feb 4 • 4

Unlocking Efficient Large Inference Models: One-Bit Unrolling Tips the Scales

Paper • 2502.01908 • Published Feb 4 • 1

QuEST: Stable Training of LLMs with 1-Bit Weights and Activations

Paper • 2502.05003 • Published Feb 7 • 44

upvoted 2 papers 9 months ago

Why Does the Effective Context Length of LLMs Fall Short?

Paper • 2410.18745 • Published Oct 24, 2024 • 18

Unbounded: A Generative Infinite Game of Character Life Simulation

Paper • 2410.18975 • Published Oct 24, 2024 • 38