AI & ML interests

The SmolTuners group is a community dedicated to the development of small-scale Large Language Models (LLMs) using consumer-grade GPUs.

TonicΒ 
posted an update 2 days ago
TonicΒ 
posted an update 15 days ago
view post
Post
678
πŸ‘‹ Hey there folks,

just submitted my plugin idea to the G-Assist Plugin Hackathon by @nvidia . Check it out, it's a great way to use a local SLA model on a windows machine to easily and locally get things done ! https://github.com/NVIDIA/G-Assist
TonicΒ 
posted an update 17 days ago
view post
Post
527
πŸ™‹πŸ»β€β™‚οΈ Hey there folks ,

Yesterday , Nvidia released a reasoning model that beats o3 on science, math and coding !

Today you can try it out here : Tonic/Nvidia-OpenReasoning

hope you like it !
TonicΒ 
posted an update 23 days ago
view post
Post
3268
πŸ™‹πŸ»β€β™‚οΈ Normalize adding compute & runtime traces to your model cards
  • 2 replies
Β·
TonicΒ 
posted an update 29 days ago
view post
Post
488
Who's going to Raise Summit in Paris Tomorrow ?

If you're around , I would love to meet you :-)
NymboΒ 
posted an update about 1 month ago
view post
Post
2332
Anyone know how to reset Claude web's MCP config? I connected mine when the HF MCP first released with just the default example spaces added. I added lots of other MCP spaces but Claude.ai doesn't update the available tools... "Disconnecting" the HF integration does nothing, deleting it and adding it again does nothing.

Refreshing tools works fine in VS Code because I can manually restart it in mcp.json, but claude.ai has no such option. Anyone got any ideas?
Β·
TonicΒ 
posted an update 2 months ago
view post
Post
682
πŸ™‹πŸ»β€β™‚οΈ hey there folks ,

So every bio/med/chem meeting i go to i always the same questions "why are you sharing a gdrive link with me for this?" and "Do you have any plans to publish your model weights and datasets on huggingface?" and finally i got a good answer today which explains everything :

basically there is some kind of government censorship on this (usa, but i'm sure others too) and they are told they are not allowed as it is considered a "dataleak" which is illegal !!!!

this is terrible ! but the good news is that we can do something about it !

so there is this "call for opinions and comments" here from the NIH (usa) , and here we can make our opinion on this topic known : https://osp.od.nih.gov/comment-form-responsibly-developing-and-sharing-generative-artificial-intelligence-tools-using-nih-controlled-access-data/

kindly consider dropping your opinion and thoughts about this censorship of science , and share this post , link or thoughts widely .

Together maybe we can start to share data and model weights appropriately and openly in a good way πŸ™πŸ»πŸš€

cc. @cyrilzakka

TonicΒ 
posted an update 2 months ago
view post
Post
2536
πŸ™‹πŸ»β€β™‚οΈ Hey there folks ,

Yesterday the world's first "Learn to Vibe Code" application was released .

As vibe coding is the mainstream paradigm , so now the first educational app is there to support it .

You can try it out already :

https://vibe.takara.ai

and of course it's entirely open source, so i already made my issue and feature branch :-) πŸš€
NymboΒ 
posted an update 3 months ago
view post
Post
3823
Haven't seen this posted anywhere - Llama-3.3-8B-Instruct is available on the new Llama API. Is this a new model or did someone mislabel Llama-3.1-8B?
  • 1 reply
Β·
NymboΒ 
posted an update 3 months ago
view post
Post
2752
PSA for anyone using Nymbo/Nymbo_Theme or Nymbo/Nymbo_Theme_5 in a Gradio space ~

Both of these themes have been updated to fix some of the long-standing inconsistencies ever since the transition to Gradio v5. Textboxes are no longer bright green and in-line code is readable now! Both themes are now visually identical across versions.

If your space is already using one of these themes, you just need to restart your space to get the latest version. No code changes needed.
KnutJaegersbergΒ 
posted an update 3 months ago
view post
Post
1047
Mining LLM Pretraining Data: Topics, Skills, and Cognitive Patterns

Summary
The technical blog post details an analysis of pretraining data from various Large Language Models (LLMs) like GPT-2, Falcon, and Gemma2. Using text mining techniques including embeddings, clustering, and LLM-based annotation on datasets like OpenWebText, The Pile, and C4, the study identified key patterns.

Findings show the data is dominated by topics like Technology, Politics, Health, Business, and Culture, originating from diverse sources including web scrapes, academic papers, code repositories, and news media. The data reflects the work of professionals primarily in Journalism/Media, Content Creation, Analysis/Research, Academia, and Tech/Engineering. Consequently, LLMs learn corresponding skills (e.g., Research, Critical Thinking, Communication, Domain Expertise) and task representations (e.g., Analysis, Content Creation, Compliance).

The analysis also uncovered distinct writing styles, underlying cognitive frameworks (beliefs, frames, schemas, memes), and common cognitive biases (like Confirmation Bias) embedded in the data. LLM capability progression appears linked to data scale and task frequency, following a power law. The study concludes that LLMs are powerful data-driven simulators whose capabilities and limitations are shaped by the composition and inherent biases of their pretraining corpora, highlighting the importance of data understanding and curation.



https://huggingface.co/blog/KnutJaegersberg/mining-llm-pretraining-data
KnutJaegersbergΒ 
posted an update 3 months ago
view post
Post
2746
The Intelligence Curse

The document warns of the "intelligence curse," a potential consequence of advanced AI (AGI) where powerful entities lose their incentive to invest in people as AI automates work[cite: 13, 297]. This could lead to job displacement, reduced social mobility, and a concentration of power and wealth based on AI ownership, similar to the "resource curse" in resource-rich states[cite: 17, 18, 31, 329, 353]. To counter this, the authors propose averting AI catastrophes to prevent centralization, diffusing AI widely to keep humans economically relevant, and democratizing institutions to remain anchored to human needs[cite: 22, 23, 25, 35, 36, 37, 566].


https://intelligence-curse.ai/intelligence-curse.pdf
TonicΒ 
posted an update 5 months ago
view post
Post
1611
πŸ™‹πŸ»β€β™‚οΈHey there folks,

Did you know that you can use ModernBERT to detect model hallucinations ?

Check out the Demo : Tonic/hallucination-test

See here for Medical Context Demo : MultiTransformer/tonic-discharge-guard

check out the model from KRLabs : KRLabsOrg/lettucedect-large-modernbert-en-v1

and the library they kindly open sourced for it : https://github.com/KRLabsOrg/LettuceDetect

πŸ‘†πŸ»if you like this topic please contribute code upstream πŸš€

  • 2 replies
Β·
TonicΒ 
posted an update 5 months ago
view post
Post
857
Powered by KRLabsOrg/lettucedect-large-modernbert-en-v1 from KRLabsOrg.

Detect hallucinations in answers based on context and questions using ModernBERT with 8192-token context support!

### Model Details
- **Model Name**: [lettucedect-large-modernbert-en-v1]( KRLabsOrg/lettucedect-large-modernbert-en-v1)
- **Organization**: [KRLabsOrg]( KRLabsOrg )
- **Github**: [https://github.com/KRLabsOrg/LettuceDetect](https://github.com/KRLabsOrg/LettuceDetect)
- **Architecture**: ModernBERT (Large) with extended context support up to 8192 tokens
- **Task**: Token Classification / Hallucination Detection
- **Training Dataset**: [RagTruth]( wandb/RAGTruth-processed)
- **Language**: English
- **Capabilities**: Detects hallucinated spans in answers, provides confidence scores, and calculates average confidence across detected spans.

LettuceDetect excels at processing long documents to determine if an answer aligns with the provided context, making it a powerful tool for ensuring factual accuracy.
KnutJaegersbergΒ 
posted an update 5 months ago
KnutJaegersbergΒ 
posted an update 6 months ago
view post
Post
746
Mimicking Consciousness in LLMs: Ascending the Dimensions of Thought with Recurrent Processing

This blog post explores how **recurrent processing** can transform Large Language Models (LLMs) to mimic aspects of human thought by engaging in iterative feedback loops. Inspired by string theory, the post describes how LLMs can "ascend dimensions" of cognition, progressing through foundational cognitive loopsβ€”such as basic cognition, executive functions, and meta-cognitionβ€”before advancing into **world simulation**. In this stage, LLMs explore higher dimensions, perceiving non-linear time, simulating branching possibilities, and integrating multiple realities. The interaction between the **Generator** and **Reflective Compass** allows AI systems to refine their outputs iteratively, moving toward a **point attractor** where ideas become coherent and polished. While this process doesn't bestow true consciousness, it offers a compelling imitation of reflective and adaptive thinking, leading to smarter dialogue, enhanced creativity, and more robust problem-solving.

https://huggingface.co/blog/KnutJaegersberg/oscillatory-recurrence-for-llms
KnutJaegersbergΒ 
posted an update 6 months ago
view post
Post
2749
A Brief Survey of Associations Between Meta-Learning and General AI

The paper titled "A Brief Survey of Associations Between Meta-Learning and General AI" explores how meta-learning techniques can contribute to the development of Artificial General Intelligence (AGI). Here are the key points summarized:

1. General AI (AGI) and Meta-Learning:
- AGI aims to develop algorithms that can handle a wide variety of tasks, similar to human intelligence. Current AI systems excel at specific tasks but struggle with generalization to unseen tasks.
- Meta-learning or "learning to learn" improves model adaptation and generalization, allowing AI systems to tackle new tasks efficiently using prior experiences.

2. Neural Network Design in Meta-Learning:
- Techniques like Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks enable self-improvement and adaptability for deep models, supporting generalization across tasks.
- Highway networks and ResNet-style models use shortcuts for efficient backpropagation, allowing deeper models that can be used in meta-learning frameworks.

3. Coevolution:
- Coevolution involves the mutual evolution of multiple components, such as learners or task-solvers, to improve overall performance.
- Coevolution between learners enhances collaboration and competition within AI systems, while coevolution between tasks and solvers (e.g., POWERPLAY and AI-GA frameworks) pushes solvers to adapt to increasingly complex tasks.

4. Curiosity in Meta-Learning:
- Curiosity-based exploration encourages AI systems to discover new, diverse features of the environment, avoiding local optima.
- Curiosity-based objectives can be combined with performance-based objectives to ensure efficient exploration and adaptation in complex tasks.

5. Forgetting Mechanisms:
- Forgetting is crucial to avoid memory overload in AI systems

https://arxiv.org/abs/2101.04283
KnutJaegersbergΒ 
posted an update 6 months ago
view post
Post
1678
Artificial general intelligence through recursive data compression and grounded reasoning: a position paper


This paper proposes a system to achieve AGI through general data compression and grounded reasoning.

General Data Compression involves creating a flexible algorithm that adapts to input data to simplify and compress it recursively, identifying simple, orthogonal features to avoid redundancy. The algorithm measures AGI progress by solving problems based on increasing complexity, and it expands its search space according to the data itself. Compression is applied not only to data but also to model parameters, and sequences are segmented based on compressibility.

Grounded Reasoning refers to forming representations with various granularities, crucial for commonsense reasoning and AGI. The system simulates the real world as its model, switching between representations and maximizing resourcefulness. Key ideas include the world as its own model for reasoning and actions aimed at maximizing entropy to test hypotheses.

The paper emphasizes simplicity, data-dependent bias, recursion, orthogonality, resourcefulness, and grounding in real-world contexts as fundamental principles in building an AGI system.

https://arxiv.org/abs/1506.04366
  • 1 reply
Β·
rubenroyΒ 
posted an update 6 months ago
view post
Post
2667
πŸ”₯πŸš€ Hey everyone! I'm excited to share my latest LLM release: Gilgamesh 72B, a model built on Qwen 2.5-72B Instruct. Gilgamesh was trained on a couple of my GammaCorpus datasets, specifically:

- rubenroy/GammaCorpus-CoT-Math-170k
- rubenroy/GammaCorpus-v2-5m
- rubenroy/GammaCorpus-Fact-QA-450k

I've submitted GGM 72B to the Open LLM Leaderboard for benchmarking, I'll send an update post once the results are in!

You can try it out and share your feedback, check out the model page and see what it can do:
πŸ‘‰ rubenroy/Gilgamesh-72B

Would love to hear your thoughts!
TonicΒ 
posted an update 6 months ago
view post
Post
2441
πŸ™‹πŸ»β€β™‚οΈhey there folks ,

Goedel's Theorem Prover is now being demo'ed on huggingface : Tonic/Math

give it a try !