Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2404.07143

No More Adam: Learning Rate Scaling at Initialization is All You Need

Paper • 2412.11768 • Published Dec 16, 2024 • 44
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks

Paper • 2412.14161 • Published Dec 18, 2024 • 52
HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models in Resource-Constrained Environments

Paper • 2408.10945 • Published Aug 20, 2024 • 11
PDFTriage: Question Answering over Long, Structured Documents

Paper • 2309.08872 • Published Sep 16, 2023 • 54

Paused

2.09k

2.09k

Anycoder

🏢

Select and view code snippets for different providers
Running

271

271

Qwen2.5 Coder Artifacts

🐢

Generate application code with Qwen2.5-Coder-32B
Running

915

915

QwQ-32B-Preview

🔍

QwQ-32B-Preview
Running on CPU Upgrade

13k

13k

Open LLM Leaderboard

🏆

Track, rank and evaluate open LLMs and chatbots

advancing research

STaR: Bootstrapping Reasoning With Reasoning

Paper • 2203.14465 • Published Mar 28, 2022 • 8
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Paper • 2401.06066 • Published Jan 11, 2024 • 55
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Paper • 2405.04434 • Published May 7, 2024 • 21
Prompt Cache: Modular Attention Reuse for Low-Latency Inference

Paper • 2311.04934 • Published Nov 7, 2023 • 33

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 143
Elucidating the Design Space of Diffusion-Based Generative Models

Paper • 2206.00364 • Published Jun 1, 2022 • 16
GLU Variants Improve Transformer

Paper • 2002.05202 • Published Feb 12, 2020 • 3
StarCoder 2 and The Stack v2: The Next Generation

Paper • 2402.19173 • Published Feb 29, 2024 • 144

LLoCO: Learning Long Contexts Offline

Paper • 2404.07979 • Published Apr 11, 2024 • 23
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Paper • 2402.13753 • Published Feb 21, 2024 • 116
LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration

Paper • 2402.11550 • Published Feb 18, 2024 • 18
LongAlign: A Recipe for Long Context Alignment of Large Language Models

Paper • 2401.18058 • Published Jan 31, 2024 • 23

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Paper • 2404.07143 • Published Apr 10, 2024 • 110

Toward Joint Language Modeling for Speech Units and Text

Paper • 2310.08715 • Published Oct 12, 2023 • 10
facebook/wav2vec2-large-xlsr-53

Updated Mar 18, 2022 • 256k • 134
bczhou/TinyLLaVA-1.5B

Image-Text-to-Text • Updated Jun 14, 2024 • 545 • 17
Running on T4

409

409

LLaVA

🔥

Chat with an AI assistant using text and images

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Paper • 2404.08801 • Published Apr 12, 2024 • 68
TransformerFAM: Feedback attention is working memory

Paper • 2404.09173 • Published Apr 14, 2024 • 44
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Paper • 2404.07143 • Published Apr 10, 2024 • 110
Block Transformer: Global-to-Local Language Modeling for Fast Inference

Paper • 2406.02657 • Published Jun 4, 2024 • 41

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 615
BitNet: Scaling 1-bit Transformers for Large Language Models

Paper • 2310.11453 • Published Oct 17, 2023 • 101
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2, 2024 • 106
TransformerFAM: Feedback attention is working memory

Paper • 2404.09173 • Published Apr 14, 2024 • 44

Transformers_papers

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Paper • 2404.07143 • Published Apr 10, 2024 • 110
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Paper • 2404.08801 • Published Apr 12, 2024 • 68

Previous
1
2
3
4
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs