Eldar Kurtić's picture

25 5 2

Eldar Kurtić

ekurtic

·

AI & ML interests

Efficient inference

Recent Activity

new activity about 1 month ago

mistralai/Mistral-Large-3-675B-Instruct-2512:Fix broken links for eagle

updated a model about 2 months ago

daslab-testing/Llama-3.1-70B-Instruct-spinquantR1R2R4-nvfp4a16

updated a model about 2 months ago

daslab-testing/Llama-3.1-8B-Instruct-spinquantR1R2R4-nvfp4a16

View all activity

Organizations

authored 5 papers about 1 year ago

Error Feedback Can Accurately Compress Preconditioners

Paper • 2306.06098 • Published Jun 9, 2023

MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence

Paper • 2405.15593 • Published May 24, 2024 • 1

Panza: A Personalized Text Writing Assistant via Data Playback and Local Fine-Tuning

Paper • 2407.10994 • Published Jun 24, 2024 • 2

"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization

Paper • 2411.02355 • Published Nov 4, 2024 • 51

EvoPress: Towards Optimal Dynamic Model Compression via Evolutionary Search

Paper • 2410.14649 • Published Oct 18, 2024 • 8

authored 2 papers over 1 year ago

Accurate Neural Network Pruning Requires Rethinking Sparse Optimization

Paper • 2308.02060 • Published Aug 3, 2023 • 1

Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment

Paper • 2405.03594 • Published May 6, 2024 • 7

authored 7 papers about 2 years ago

The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models

Paper • 2203.07259 • Published Mar 14, 2022 • 4

ZipLM: Hardware-Aware Structured Pruning of Language Models

Paper • 2302.04089 • Published Feb 7, 2023 • 1

CrAM: A Compression-Aware Minimizer

Paper • 2207.14200 • Published Jul 28, 2022 • 1

SparseProp: Efficient Sparse Backpropagation for Faster Training of Neural Networks

Paper • 2302.04852 • Published Feb 9, 2023

GMP*: Well-Tuned Gradual Magnitude Pruning Can Outperform Most BERT-Pruning Methods

Paper • 2210.06384 • Published Oct 12, 2022 • 1

M-FAC: Efficient Matrix-Free Approximations of Second-Order Information

Paper • 2107.03356 • Published Jul 7, 2021

Sparse Finetuning for Inference Acceleration of Large Language Models

Paper • 2310.06927 • Published Oct 10, 2023 • 15