Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2502.07864

about 16 hours ago

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 26
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 13
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 42
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 22

CineMaster: A 3D-Aware and Controllable Framework for Cinematic Text-to-Video Generation

Paper • 2502.08639 • Published 14 days ago • 36
TransMLA: Multi-head Latent Attention Is All You Need

Paper • 2502.07864 • Published 15 days ago • 44
Next Block Prediction: Video Generation via Semi-Autoregressive Modeling

Paper • 2502.07737 • Published 15 days ago • 9
Enhance-A-Video: Better Generated Video for Free

Paper • 2502.07508 • Published 15 days ago • 18

about 11 hours ago

InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU

Paper • 2502.08910 • Published 14 days ago • 141
TransMLA: Multi-head Latent Attention Is All You Need

Paper • 2502.07864 • Published 15 days ago • 44
LM2: Large Memory Models

Paper • 2502.06049 • Published 17 days ago • 29
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published 10 days ago • 134

TransMLA: Multi-head Latent Attention Is All You Need

Paper • 2502.07864 • Published 15 days ago • 44

TransMLA: Multi-head Latent Attention Is All You Need

Paper • 2502.07864 • Published 15 days ago • 44

Tensor Product Attention Is All You Need

Paper • 2501.06425 • Published Jan 11 • 84
TransMLA: Multi-head Latent Attention Is All You Need

Paper • 2502.07864 • Published 15 days ago • 44

MLLM-as-a-Judge for Image Safety without Human Labeling

Paper • 2501.00192 • Published Dec 31, 2024 • 25
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1 • 99
Xmodel-2 Technical Report

Paper • 2412.19638 • Published Dec 27, 2024 • 26
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

Paper • 2412.18925 • Published Dec 25, 2024 • 97

deepseek-ai/DeepSeek-V3-Base

Updated 3 days ago • 488k • 1.58k
TransMLA: Multi-head Latent Attention Is All You Need

Paper • 2502.07864 • Published 15 days ago • 44
Running on Zero

2

2

Qwen2.5 Bakeneko 32b Instruct Awq

⚡

Generate text-based responses for chat interactions
Sleeping

2

2

Deepseek R1 Distill Qwen2.5 Bakeneko 32b Awq

⚡

Generate detailed responses based on user queries

Architectural Proposals

Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published Dec 13, 2024 • 92
Causal Diffusion Transformers for Generative Modeling

Paper • 2412.12095 • Published Dec 16, 2024 • 23
Tensor Product Attention Is All You Need

Paper • 2501.06425 • Published Jan 11 • 84
TransMLA: Multi-head Latent Attention Is All You Need

Paper • 2502.07864 • Published 15 days ago • 44

Selective Attention Improves Transformer

Paper • 2410.02703 • Published Oct 3, 2024 • 24
Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 171
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention

Paper • 2410.05076 • Published Oct 7, 2024 • 8
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs

Paper • 2410.13276 • Published Oct 17, 2024 • 27

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs