uhlo
's Collections
interesting stuff
updated
Chain-of-Verification Reduces Hallucination in Large Language Models
Paper
•
2309.11495
•
Published
•
38
Adapting Large Language Models via Reading Comprehension
Paper
•
2309.09530
•
Published
•
77
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large
Language Models in 167 Languages
Paper
•
2309.09400
•
Published
•
82
Language Modeling Is Compression
Paper
•
2309.10668
•
Published
•
82
Contrastive Decoding Improves Reasoning in Large Language Models
Paper
•
2309.09117
•
Published
•
37
Exploring Large Language Models' Cognitive Moral Development through
Defining Issues Test
Paper
•
2309.13356
•
Published
•
36
Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of
Language Models
Paper
•
2309.15098
•
Published
•
7
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper
•
2309.14717
•
Published
•
44
Paper
•
2309.16609
•
Published
•
34
Effective Long-Context Scaling of Foundation Models
Paper
•
2309.16039
•
Published
•
30
Large Language Models Cannot Self-Correct Reasoning Yet
Paper
•
2310.01798
•
Published
•
33
DSPy: Compiling Declarative Language Model Calls into Self-Improving
Pipelines
Paper
•
2310.03714
•
Published
•
30
Table-GPT: Table-tuned GPT for Diverse Table Tasks
Paper
•
2310.09263
•
Published
•
39
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper
•
2310.11453
•
Published
•
96
Self-RAG: Learning to Retrieve, Generate, and Critique through
Self-Reflection
Paper
•
2310.11511
•
Published
•
74
H2O Open Ecosystem for State-of-the-art Large Language Models
Paper
•
2310.13012
•
Published
•
7
LLM-FP4: 4-Bit Floating-Point Quantized Transformers
Paper
•
2310.16836
•
Published
•
13
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
Paper
•
2310.16795
•
Published
•
26
Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time
Paper
•
2310.17157
•
Published
•
11
CodeFusion: A Pre-trained Diffusion Model for Code Generation
Paper
•
2310.17680
•
Published
•
69
Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Paper
•
2310.19102
•
Published
•
10
LoRAShear: Efficient Large Language Model Structured Pruning and
Knowledge Recovery
Paper
•
2310.18356
•
Published
•
22
Does GPT-4 Pass the Turing Test?
Paper
•
2310.20216
•
Published
•
17
CodePlan: Repository-level Coding using LLMs and Planning
Paper
•
2309.12499
•
Published
•
73
The Generative AI Paradox: "What It Can Create, It May Not Understand"
Paper
•
2311.00059
•
Published
•
18
E3 TTS: Easy End-to-End Diffusion-based Text to Speech
Paper
•
2311.00945
•
Published
•
14
Unveiling Safety Vulnerabilities of Large Language Models
Paper
•
2311.04124
•
Published
•
6
MEGAVERSE: Benchmarking Large Language Models Across Languages,
Modalities, Models and Tasks
Paper
•
2311.07463
•
Published
•
13
Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as
an Alternative to Attention Layers in Transformers
Paper
•
2311.10642
•
Published
•
23
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs
Paper
•
2311.13600
•
Published
•
42
Language Models are Super Mario: Absorbing Abilities from Homologous
Models as a Free Lunch
Paper
•
2311.03099
•
Published
•
28
Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis
Paper
•
2312.03491
•
Published
•
34
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
Paper
•
2312.04474
•
Published
•
29
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Paper
•
2312.03818
•
Published
•
32
Blending Is All You Need: Cheaper, Better Alternative to
Trillion-Parameters LLM
Paper
•
2401.02994
•
Published
•
47
Paper
•
2401.04088
•
Published
•
157
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence
Lengths in Large Language Models
Paper
•
2401.04658
•
Published
•
24
The Impact of Reasoning Step Length on Large Language Models
Paper
•
2401.04925
•
Published
•
15
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models
Paper
•
2401.06951
•
Published
•
24
Extending LLMs' Context Window with 100 Samples
Paper
•
2401.07004
•
Published
•
14
Self-Rewarding Language Models
Paper
•
2401.10020
•
Published
•
143
Rambler: Supporting Writing With Speech via LLM-Assisted Gist
Manipulation
Paper
•
2401.10838
•
Published
•
8
Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated
Text
Paper
•
2401.12070
•
Published
•
42
DeepSeek-Coder: When the Large Language Model Meets Programming -- The
Rise of Code Intelligence
Paper
•
2401.14196
•
Published
•
46
jinaai/jina-embeddings-v2-base-de
Feature Extraction
•
Updated
•
27.4k
•
69
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Paper
•
2401.15024
•
Published
•
68
Weaver: Foundation Models for Creative Writing
Paper
•
2401.17268
•
Published
•
42
TrustLLM: Trustworthiness in Large Language Models
Paper
•
2401.05561
•
Published
•
64
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper
•
2402.01739
•
Published
•
26
Shortened LLaMA: A Simple Depth Pruning for Large Language Models
Paper
•
2402.02834
•
Published
•
14
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper
•
2402.03620
•
Published
•
109
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
Paper
•
2402.04291
•
Published
•
48
Aya Dataset: An Open-Access Collection for Multilingual Instruction
Tuning
Paper
•
2402.06619
•
Published
•
54
Aya Model: An Instruction Finetuned Open-Access Multilingual Language
Model
Paper
•
2402.07827
•
Published
•
45
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement
Paper
•
2402.07456
•
Published
•
41
Scaling Laws for Fine-Grained Mixture of Experts
Paper
•
2402.07871
•
Published
•
11
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language
Models
Paper
•
2401.01335
•
Published
•
64
Computing Power and the Governance of Artificial Intelligence
Paper
•
2402.08797
•
Published
•
11
A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts
Paper
•
2402.09727
•
Published
•
35
BitDelta: Your Fine-Tune May Only Be Worth One Bit
Paper
•
2402.10193
•
Published
•
17
Chain-of-Thought Reasoning Without Prompting
Paper
•
2402.10200
•
Published
•
99
How to Train Data-Efficient LLMs
Paper
•
2402.09668
•
Published
•
38
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM
Workflows
Paper
•
2402.10379
•
Published
•
29
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
Paper
•
2402.12226
•
Published
•
40
OneBit: Towards Extremely Low-bit Large Language Models
Paper
•
2402.11295
•
Published
•
22
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue
Summarization
Paper
•
2402.13249
•
Published
•
10
Coercing LLMs to do and reveal (almost) anything
Paper
•
2402.14020
•
Published
•
12
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper
•
2402.13753
•
Published
•
111
MobileLLM: Optimizing Sub-billion Parameter Language Models for
On-Device Use Cases
Paper
•
2402.14905
•
Published
•
126
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper
•
2402.17764
•
Published
•
602
GPTVQ: The Blessing of Dimensionality for LLM Quantization
Paper
•
2402.15319
•
Published
•
19
ShortGPT: Layers in Large Language Models are More Redundant Than You
Expect
Paper
•
2403.03853
•
Published
•
62
MoAI: Mixture of All Intelligence for Large Language and Vision Models
Paper
•
2403.07508
•
Published
•
75
MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop
Queries
Paper
•
2401.15391
•
Published
•
6
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document
Understanding
Paper
•
2403.12895
•
Published
•
30
RAFT: Adapting Language Model to Domain Specific RAG
Paper
•
2403.10131
•
Published
•
67
LLM Agent Operating System
Paper
•
2403.16971
•
Published
•
65
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient
LLMs Under Compression
Paper
•
2403.15447
•
Published
•
16
SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions
Paper
•
2403.16627
•
Published
•
20
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in
Long-Horizon Generation
Paper
•
2403.05313
•
Published
•
9
The Llama 3 Herd of Models
Paper
•
2407.21783
•
Published
•
105
SAM 2: Segment Anything in Images and Videos
Paper
•
2408.00714
•
Published
•
107
Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal
Language Model
Paper
•
2408.00754
•
Published
•
21
Gemma 2: Improving Open Language Models at a Practical Size
Paper
•
2408.00118
•
Published
•
73
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs
Paper
•
2408.07055
•
Published
•
65
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper
•
2408.11796
•
Published
•
53
Automated Design of Agentic Systems
Paper
•
2408.08435
•
Published
•
38
ColPali: Efficient Document Retrieval with Vision Language Models
Paper
•
2407.01449
•
Published
•
41
Dolphin: Long Context as a New Modality for Energy-Efficient On-Device
Language Models
Paper
•
2408.15518
•
Published
•
42
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of
Encoders
Paper
•
2408.15998
•
Published
•
83
Configurable Foundation Models: Building LLMs from a Modular Perspective
Paper
•
2409.02877
•
Published
•
27
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art
Multimodal Models
Paper
•
2409.17146
•
Published
•
99
Ruler: A Model-Agnostic Method to Control Generated Length for Large
Language Models
Paper
•
2409.18943
•
Published
•
26
VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large
Language Models
Paper
•
2409.17066
•
Published
•
27
Paper
•
2410.05258
•
Published
•
165
LLMs Know More Than They Show: On the Intrinsic Representation of LLM
Hallucinations
Paper
•
2410.02707
•
Published
•
48
Paper
•
2410.07073
•
Published
•
59
Falcon Mamba: The First Competitive Attention-free 7B Language Model
Paper
•
2410.05355
•
Published
•
27
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free
Paper
•
2410.10814
•
Published
•
48
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding
and Generation
Paper
•
2410.13848
•
Published
•
27
Why Does the Effective Context Length of LLMs Fall Short?
Paper
•
2410.18745
•
Published
•
16
Continuous Speech Synthesis using per-token Latent Diffusion
Paper
•
2410.16048
•
Published
•
28
ROCKET-1: Master Open-World Interaction with Visual-Temporal Context
Prompting
Paper
•
2410.17856
•
Published
•
48
Paper
•
2410.21276
•
Published
•
76
Document Parsing Unveiled: Techniques, Challenges, and Prospects for
Structured Information Extraction
Paper
•
2410.21169
•
Published
•
29
Stealing User Prompts from Mixture of Experts
Paper
•
2410.22884
•
Published
•
13
TokenFormer: Rethinking Transformer Scaling with Tokenized Model
Parameters
Paper
•
2410.23168
•
Published
•
19