admarcosai
's Collections
Model Architectures
updated
togethercomputer/StripedHyena-Hessian-7B
Text Generation
•
Updated
•
74
•
65
Zebra: Extending Context Window with Layerwise Grouped Local-Global
Attention
Paper
•
2312.08618
•
Published
•
12
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
Paper
•
2312.07987
•
Published
•
41
LLM360: Towards Fully Transparent Open-Source LLMs
Paper
•
2312.06550
•
Published
•
58
Cached Transformers: Improving Transformers with Differentiable Memory
Cache
Paper
•
2312.12742
•
Published
•
13
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective
Depth Up-Scaling
Paper
•
2312.15166
•
Published
•
57
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence
Lengths in Large Language Models
Paper
•
2401.04658
•
Published
•
27
Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion
Tokens
Paper
•
2401.17377
•
Published
•
36
Advancing Transformer Architecture in Long-Context Large Language
Models: A Comprehensive Survey
Paper
•
2311.12351
•
Published
•
3
H2O-Danube-1.8B Technical Report
Paper
•
2401.16818
•
Published
•
18
TinyLlama: An Open-Source Small Language Model
Paper
•
2401.02385
•
Published
•
91
Learning and Leveraging World Models in Visual Representation Learning
Paper
•
2403.00504
•
Published
•
32