Runtime error 19 19 Chain Of Thought 📈 Generate detailed, step-by-step responses to queries using AI
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters Paper • 2408.04093 • Published Aug 7, 2024 • 4
DAiSEE: Towards User Engagement Recognition in the Wild Paper • 1609.01885 • Published Sep 7, 2016
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence Paper • 2404.05892 • Published Apr 8, 2024 • 34
Comparative Study of Large Language Model Architectures on Frontier Paper • 2402.00691 • Published Feb 1, 2024
Simple and Scalable Strategies to Continually Pre-train Large Language Models Paper • 2403.08763 • Published Mar 13, 2024 • 50
RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network Paper • 2206.14098 • Published Jun 28, 2022
SPDF: Sparse Pre-training and Dense Fine-tuning for Large Language Models Paper • 2303.10464 • Published Mar 18, 2023 • 1
Sparse Iso-FLOP Transformations for Maximizing Training Efficiency Paper • 2303.11525 • Published Mar 21, 2023 • 1
Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment Paper • 2405.03594 • Published May 6, 2024 • 7