-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 26 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 13 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 42 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 22
Collections
Discover the best community collections!
Collections including paper arxiv:2502.07864
-
CineMaster: A 3D-Aware and Controllable Framework for Cinematic Text-to-Video Generation
Paper • 2502.08639 • Published • 36 -
TransMLA: Multi-head Latent Attention Is All You Need
Paper • 2502.07864 • Published • 44 -
Next Block Prediction: Video Generation via Semi-Autoregressive Modeling
Paper • 2502.07737 • Published • 9 -
Enhance-A-Video: Better Generated Video for Free
Paper • 2502.07508 • Published • 18
-
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU
Paper • 2502.08910 • Published • 141 -
TransMLA: Multi-head Latent Attention Is All You Need
Paper • 2502.07864 • Published • 44 -
LM2: Large Memory Models
Paper • 2502.06049 • Published • 29 -
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
Paper • 2502.11089 • Published • 134
-
MLLM-as-a-Judge for Image Safety without Human Labeling
Paper • 2501.00192 • Published • 25 -
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 99 -
Xmodel-2 Technical Report
Paper • 2412.19638 • Published • 26 -
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 97
-
deepseek-ai/DeepSeek-V3-Base
Updated • 488k • 1.58k -
TransMLA: Multi-head Latent Attention Is All You Need
Paper • 2502.07864 • Published • 44 -
2
Qwen2.5 Bakeneko 32b Instruct Awq
⚡Generate text-based responses for chat interactions
-
2
Deepseek R1 Distill Qwen2.5 Bakeneko 32b Awq
⚡Generate detailed responses based on user queries
-
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 92 -
Causal Diffusion Transformers for Generative Modeling
Paper • 2412.12095 • Published • 23 -
Tensor Product Attention Is All You Need
Paper • 2501.06425 • Published • 84 -
TransMLA: Multi-head Latent Attention Is All You Need
Paper • 2502.07864 • Published • 44
-
Selective Attention Improves Transformer
Paper • 2410.02703 • Published • 24 -
Differential Transformer
Paper • 2410.05258 • Published • 171 -
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention
Paper • 2410.05076 • Published • 8 -
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
Paper • 2410.13276 • Published • 27