348 41 762

Sebastian Gabarain

Locutusque

SebastianG74019

AI & ML interests

Pushing performance in small language models

Recent Activity

updated a Space about 1 month ago

Locutusque/Locutusque-Models

updated a model about 1 month ago

Locutusque/Liberalis-Cogitator-Mistral-3-8B

published a model about 1 month ago

Locutusque/Liberalis-Cogitator-Mistral-3-8B

View all activity

Organizations

upvoted a changelog about 1 month ago

Changelog

Duplicate Datasets

Dec 3, 2025

• 91

upvoted a collection about 1 month ago

Ministral 3

Collection

A collection of edge models, with Base, Instruct and Reasoning variants, in 3 different sizes: 3B, 8B and 14B. All with vision capabilities. • 9 items • Updated Dec 2, 2025 • 139

upvoted a paper about 2 months ago

Higher-order Linear Attention

Paper • 2510.27258 • Published Oct 31, 2025 • 14

upvoted a paper 2 months ago

Kimi Linear: An Expressive, Efficient Attention Architecture

Paper • 2510.26692 • Published Oct 30, 2025 • 119

upvoted 4 papers 3 months ago

Sparse Query Attention (SQA): A Computationally Efficient Attention Mechanism with Query Heads Reduction

Paper • 2510.01817 • Published Oct 2, 2025 • 15

upvoted a paper 8 months ago

Achieving Tokenizer Flexibility in Language Models through Heuristic Adaptation and Supertoken Learning

Paper • 2505.09738 • Published May 14, 2025 • 10

upvoted a paper about 1 year ago

Cut Your Losses in Large-Vocabulary Language Models

Paper • 2411.09009 • Published Nov 13, 2024 • 49

upvoted 3 papers over 1 year ago

InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning

Paper • 2409.12568 • Published Sep 19, 2024 • 50

Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On

Paper • 2407.08348 • Published Jul 11, 2024 • 52

DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models

Paper • 2309.03883 • Published Sep 7, 2023 • 35

upvoted an article over 1 year ago

Article

Uncensor any LLM with abliteration

Jun 13, 2024

•

757

upvoted 3 papers over 1 year ago

MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series

Paper • 2405.19327 • Published May 29, 2024 • 48

Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16, 2024 • 132

LoRA Learns Less and Forgets Less

Paper • 2405.09673 • Published May 15, 2024 • 90

upvoted a collection over 1 year ago

Yi-1.5 (2024/05)

Collection

10 items • Updated May 20, 2024 • 92

upvoted 2 articles over 1 year ago

Article

Can we create pedagogically valuable multi-turn synthetic datasets from Cosmopedia?

May 7, 2024

•

Article

Introducing the Open Chain of Thought Leaderboard

Apr 23, 2024

•

Sebastian Gabarain

AI & ML interests

Recent Activity

Organizations

Locutusque's activity

Duplicate Datasets

Uncensor any LLM with abliteration

Can we create pedagogically valuable multi-turn synthetic datasets from Cosmopedia?

Introducing the Open Chain of Thought Leaderboard