Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
60.6
TFLOPS
64
94
207
Asankhaya Sharma
PRO
codelion
Follow
awakenbeast's profile picture
thesven's profile picture
asmodun's profile picture
98 followers
·
14 following
http://asankhaya.github.io/
asankhaya
codelion
asankhaya
AI & ML interests
Creator of OptiLLM, OpenEvolve, Adaptive Classifier, and PTS. Pioneering a new category in AI infrastructure: inference-time compute for LLMs.
Recent Activity
liked
a model
11 minutes ago
google/gemma-3-1b-it
reacted
to
their
post
with ❤️
3 days ago
New Research: Theoretical Foundations for In-Context Learning in Transformers I'm excited to share our latest theoretical work that formally proves an interesting property of large language models: base transformer models can approximate fine-tuned capabilities using only inference-time techniques like in-context learning. The core question we investigated: Can specialized behaviors typically acquired through expensive supervised fine-tuning be elicited from base models without any parameter updates? Our theoretical contribution: We provide a formal proof, grounded in the Turing completeness of transformers, showing that this is indeed possible under certain assumptions. The work establishes mathematical bounds on the minimal dataset sizes needed for approximation. Key theoretical results: - For text generation tasks: O(mV/ε²) examples suffice (where m = number of contexts, V = vocabulary size, ε = error tolerance) - For linear classification: O(d/ε) examples (where d = input dimension) - Extensions to finite context scenarios with practical bounds This work helps explain why techniques like few-shot prompting, retrieval-augmented generation, and in-context learning work so effectively in practice. It bridges formal computer science theory with empirical observations about modern language models. While the assumptions are idealized (unbounded computational resources, full dataset access), the results provide mathematical foundations for understanding inference-time adaptation strategies that are increasingly important in AI deployment. Paper: https://huggingface.co/papers/2506.08060
reacted
to
their
post
with ➕
3 days ago
New Research: Theoretical Foundations for In-Context Learning in Transformers I'm excited to share our latest theoretical work that formally proves an interesting property of large language models: base transformer models can approximate fine-tuned capabilities using only inference-time techniques like in-context learning. The core question we investigated: Can specialized behaviors typically acquired through expensive supervised fine-tuning be elicited from base models without any parameter updates? Our theoretical contribution: We provide a formal proof, grounded in the Turing completeness of transformers, showing that this is indeed possible under certain assumptions. The work establishes mathematical bounds on the minimal dataset sizes needed for approximation. Key theoretical results: - For text generation tasks: O(mV/ε²) examples suffice (where m = number of contexts, V = vocabulary size, ε = error tolerance) - For linear classification: O(d/ε) examples (where d = input dimension) - Extensions to finite context scenarios with practical bounds This work helps explain why techniques like few-shot prompting, retrieval-augmented generation, and in-context learning work so effectively in practice. It bridges formal computer science theory with empirical observations about modern language models. While the assumptions are idealized (unbounded computational resources, full dataset access), the results provide mathematical foundations for understanding inference-time adaptation strategies that are increasingly important in AI deployment. Paper: https://huggingface.co/papers/2506.08060
View all activity
Organizations
codelion
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
liked
a model
11 minutes ago
google/gemma-3-1b-it
Text Generation
•
Updated
Apr 4
•
2.13M
•
470
liked
a model
5 days ago
unsloth/Magistral-Small-2506-unsloth-bnb-4bit
Text2Text Generation
•
Updated
5 days ago
•
1.16k
•
1
liked
a model
7 days ago
mlx-community/Qwen3-0.6B-bf16
Text Generation
•
Updated
Apr 28
•
2.56k
•
5
liked
a model
11 days ago
google/gemma-3-12b-it-qat-q4_0-gguf
Image-Text-to-Text
•
Updated
Apr 11
•
96.8k
•
142
liked
a model
19 days ago
ByteDance-Seed/BAGEL-7B-MoT
Any-to-Any
•
Updated
24 days ago
•
11.6k
•
1.03k
liked
a dataset
22 days ago
MathArena/usamo_2025
Viewer
•
Updated
Apr 16
•
6
•
155
•
1
liked
a model
28 days ago
black-forest-labs/FLUX.1-Fill-dev
Updated
Nov 25, 2024
•
368k
•
769
liked
a model
about 1 month ago
codelion/DeepSeek-R1-Distill-Qwen-1.5B-PTS-DPO
Text Generation
•
Updated
May 13
•
19
•
2
liked
4 datasets
about 1 month ago
codelion/DeepSeek-R1-Distill-Qwen-1.5B-pts-dpo-pairs
Preview
•
Updated
May 13
•
28
•
1
codelion/DeepSeek-R1-Distill-Qwen-1.5B-pts
Preview
•
Updated
May 13
•
16
•
1
codelion/DeepSeek-R1-Distill-Qwen-1.5B-pts-steering-vectors
Preview
•
Updated
May 13
•
293
•
1
codelion/distilled-QwQ-32B-fineweb-edu
Preview
•
Updated
Apr 13
•
10
•
1
liked
a Space
about 1 month ago
Running
1
1
Videoanalysis
🏃
Upload and analyze MP4 video to extract key frames and summary
liked
3 models
about 1 month ago
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
Text Generation
•
Updated
Feb 24
•
1.24M
•
•
1.23k
codelion/Qwen3-0.6B-PTS-DPO
Text Generation
•
Updated
May 12
•
12
•
1
codelion/Qwen3-0.6B-PTS-DPO-LoRA
Updated
May 7
•
1
liked
3 datasets
about 1 month ago
codelion/Qwen3-0.6B-pts-dpo-pairs
Viewer
•
Updated
27 days ago
•
681
•
58
•
2
codelion/Qwen3-0.6B-pts-steering-vectors
Viewer
•
Updated
27 days ago
•
1.38k
•
199
•
4
codelion/Qwen3-0.6B-pts
Viewer
•
Updated
27 days ago
•
1.38k
•
56
•
2
liked
a model
about 1 month ago
Qwen/Qwen3-0.6B
Text Generation
•
Updated
25 days ago
•
942k
•
•
361
Load more