Asankhaya Sharma's picture

Asankhaya Sharma PRO

codelion

AI & ML interests

Creator of OptiLLM, OpenEvolve, Adaptive Classifier, and PTS. Pioneering a new category in AI infrastructure: inference-time compute for LLMs.

Recent Activity

reacted to their post with ❤️ 3 days ago
New Research: Theoretical Foundations for In-Context Learning in Transformers I'm excited to share our latest theoretical work that formally proves an interesting property of large language models: base transformer models can approximate fine-tuned capabilities using only inference-time techniques like in-context learning. The core question we investigated: Can specialized behaviors typically acquired through expensive supervised fine-tuning be elicited from base models without any parameter updates? Our theoretical contribution: We provide a formal proof, grounded in the Turing completeness of transformers, showing that this is indeed possible under certain assumptions. The work establishes mathematical bounds on the minimal dataset sizes needed for approximation. Key theoretical results: - For text generation tasks: O(mV/ε²) examples suffice (where m = number of contexts, V = vocabulary size, ε = error tolerance) - For linear classification: O(d/ε) examples (where d = input dimension) - Extensions to finite context scenarios with practical bounds This work helps explain why techniques like few-shot prompting, retrieval-augmented generation, and in-context learning work so effectively in practice. It bridges formal computer science theory with empirical observations about modern language models. While the assumptions are idealized (unbounded computational resources, full dataset access), the results provide mathematical foundations for understanding inference-time adaptation strategies that are increasingly important in AI deployment. Paper: https://huggingface.co/papers/2506.08060
reacted to their post with ➕ 3 days ago
New Research: Theoretical Foundations for In-Context Learning in Transformers I'm excited to share our latest theoretical work that formally proves an interesting property of large language models: base transformer models can approximate fine-tuned capabilities using only inference-time techniques like in-context learning. The core question we investigated: Can specialized behaviors typically acquired through expensive supervised fine-tuning be elicited from base models without any parameter updates? Our theoretical contribution: We provide a formal proof, grounded in the Turing completeness of transformers, showing that this is indeed possible under certain assumptions. The work establishes mathematical bounds on the minimal dataset sizes needed for approximation. Key theoretical results: - For text generation tasks: O(mV/ε²) examples suffice (where m = number of contexts, V = vocabulary size, ε = error tolerance) - For linear classification: O(d/ε) examples (where d = input dimension) - Extensions to finite context scenarios with practical bounds This work helps explain why techniques like few-shot prompting, retrieval-augmented generation, and in-context learning work so effectively in practice. It bridges formal computer science theory with empirical observations about modern language models. While the assumptions are idealized (unbounded computational resources, full dataset access), the results provide mathematical foundations for understanding inference-time adaptation strategies that are increasingly important in AI deployment. Paper: https://huggingface.co/papers/2506.08060
reacted to their post with 🚀 3 days ago
New Research: Theoretical Foundations for In-Context Learning in Transformers I'm excited to share our latest theoretical work that formally proves an interesting property of large language models: base transformer models can approximate fine-tuned capabilities using only inference-time techniques like in-context learning. The core question we investigated: Can specialized behaviors typically acquired through expensive supervised fine-tuning be elicited from base models without any parameter updates? Our theoretical contribution: We provide a formal proof, grounded in the Turing completeness of transformers, showing that this is indeed possible under certain assumptions. The work establishes mathematical bounds on the minimal dataset sizes needed for approximation. Key theoretical results: - For text generation tasks: O(mV/ε²) examples suffice (where m = number of contexts, V = vocabulary size, ε = error tolerance) - For linear classification: O(d/ε) examples (where d = input dimension) - Extensions to finite context scenarios with practical bounds This work helps explain why techniques like few-shot prompting, retrieval-augmented generation, and in-context learning work so effectively in practice. It bridges formal computer science theory with empirical observations about modern language models. While the assumptions are idealized (unbounded computational resources, full dataset access), the results provide mathematical foundations for understanding inference-time adaptation strategies that are increasingly important in AI deployment. Paper: https://huggingface.co/papers/2506.08060
View all activity

Organizations

meraGPT's profile picture Lambda Security's profile picture National University of Singapore's profile picture Patched's profile picture ZeroGPU Explorers's profile picture MLX Community's profile picture Social Post Explorers's profile picture Hugging Face Discord Community's profile picture Adaptive Classifier's profile picture Reasoning datasets competition 's profile picture Cerebras Hugging Face Hackathon's profile picture Agents-MCP-Hackathon's profile picture

codelion's activity

reacted to their post with ❤️🚀🔥 3 days ago
view post
Post
1812
New Research: Theoretical Foundations for In-Context Learning in Transformers

I'm excited to share our latest theoretical work that formally proves an interesting property of large language models: base transformer models can approximate fine-tuned capabilities using only inference-time techniques like in-context learning.

The core question we investigated: Can specialized behaviors typically acquired through expensive supervised fine-tuning be elicited from base models without any parameter updates?

Our theoretical contribution: We provide a formal proof, grounded in the Turing completeness of transformers, showing that this is indeed possible under certain assumptions. The work establishes mathematical bounds on the minimal dataset sizes needed for approximation.

Key theoretical results:

- For text generation tasks: O(mV/ε²) examples suffice (where m = number of contexts, V = vocabulary size, ε = error tolerance)
- For linear classification: O(d/ε) examples (where d = input dimension)
- Extensions to finite context scenarios with practical bounds

This work helps explain why techniques like few-shot prompting, retrieval-augmented generation, and in-context learning work so effectively in practice. It bridges formal computer science theory with empirical observations about modern language models.

While the assumptions are idealized (unbounded computational resources, full dataset access), the results provide mathematical foundations for understanding inference-time adaptation strategies that are increasingly important in AI deployment.

Paper: Eliciting Fine-Tuned Transformer Capabilities via Inference-Time Techniques (2506.08060)
  • 1 reply
·
posted an update 3 days ago
view post
Post
1812
New Research: Theoretical Foundations for In-Context Learning in Transformers

I'm excited to share our latest theoretical work that formally proves an interesting property of large language models: base transformer models can approximate fine-tuned capabilities using only inference-time techniques like in-context learning.

The core question we investigated: Can specialized behaviors typically acquired through expensive supervised fine-tuning be elicited from base models without any parameter updates?

Our theoretical contribution: We provide a formal proof, grounded in the Turing completeness of transformers, showing that this is indeed possible under certain assumptions. The work establishes mathematical bounds on the minimal dataset sizes needed for approximation.

Key theoretical results:

- For text generation tasks: O(mV/ε²) examples suffice (where m = number of contexts, V = vocabulary size, ε = error tolerance)
- For linear classification: O(d/ε) examples (where d = input dimension)
- Extensions to finite context scenarios with practical bounds

This work helps explain why techniques like few-shot prompting, retrieval-augmented generation, and in-context learning work so effectively in practice. It bridges formal computer science theory with empirical observations about modern language models.

While the assumptions are idealized (unbounded computational resources, full dataset access), the results provide mathematical foundations for understanding inference-time adaptation strategies that are increasingly important in AI deployment.

Paper: Eliciting Fine-Tuned Transformer Capabilities via Inference-Time Techniques (2506.08060)
  • 1 reply
·