t.d.a.g. PRO

sequelbox

AI & ML interests

open source, infinite games. (they/them)

Recent Activity

Organizations

Valiant Labs's profile picture

sequelbox's activity

posted an update 1 day ago
view post
Post
297
EARLY SNEAK PREVIEW: get a first look at the Celestia 3 science-reasoning dataset, built with DeepSeek's newest R1-0528 reasoning model! Subjects include physics, chemistry, biology, computer science, Earth science, astronomy, and information theory.

This early look contains the first 14k rows, all synthetic responses using deepseek-ai/DeepSeek-R1-0528

SEE IT HERE: sequelbox/Celestia3-DeepSeek-R1-0528-PREVIEW

Support our releases: sequelbox/SupportOpenSource

Coming up we'll have more dataset releases, including some novel reasoning and analysis methods - we think an important role for open source researchers is experimenting with new response styles on top of the increasingly excellent base models available to finetune.

more to come soon!
allegra
posted an update 6 days ago
view post
Post
282
NEW RELEASE: we've brought Esper 3 to the new deepseek-ai/DeepSeek-R1-0528-Qwen3-8B model!

- A full-stack software assistant: a reasoning finetune focused on coding, architecture, and DevOps using the Titanium and Tachibana datasets!
- Improved general and creative reasoning skills, powered by the Raiden dataset.

Get the newest Esper 3: ValiantLabs/DeepSeek-R1-0528-Qwen3-8B-Esper3
Support our releases: sequelbox/SupportOpenSource

more on the way next week!

celestially yours ;)
allegra
replied to their post 7 days ago
view reply

we'll be expanding Qwen sizes in both directions :) thanks for your review!

posted an update 8 days ago
view post
Post
321
Updates for the week:
- released some new merge models using ValiantLabs/Qwen3-14B-Esper3 and other Qwen 3 14b finetunes - these merges include math, Web3, uncensored, and general mix. depending on your use case for Esper 3 these may be helpful to you! find them at @sequelbox
- coming up we'll have more model sizes for Esper 3 and Cobalt 2, releasing soon!
- also super excited for more dataset releases with the newly released deepseek-ai/DeepSeek-R1-0528

Support the above efforts and others: sequelbox/SupportOpenSource

back to building :)
  • 2 replies
ยท
reacted to lukmanaj's post with ๐Ÿ‘ 8 days ago
view post
Post
2343
I am so happy to share to all that Iโ€™ve just completed the first unit of the new MCP course on Hugging Face and earned my certificate! The AI acceleration track is intense and fast-paced, but Iโ€™m doing my best to keep up. Excited for whatโ€™s ahead!
  • 1 reply
ยท
posted an update 17 days ago
view post
Post
1873
NEW RELEASE: Cobalt 2 for Qwen 3 14b!

- A math-reasoning finetune, focused on high-difficulty math questions with the zwhe99/DeepMath-103K dataset!
- Improved general and creative reasoning skills, powered by the Raiden dataset

GET IT NOW: ValiantLabs/Qwen3-14B-Cobalt2
HELP US RELEASE FASTER: sequelbox/SupportOpenSource

we've got more releases to come soon - excited to share with everyone!

love,
allegra
posted an update 21 days ago
view post
Post
1283
Esper 3 is now available for Qwen 3 14b!

- A full-stack software assistant: a reasoning finetune focused on coding, architecture, and DevOps using the Titanium and Tachibana datasets!
- Improved general and creative reasoning skills, powered by the Raiden dataset.

GET IT NOW: ValiantLabs/Qwen3-14B-Esper3
HELP US RELEASE FASTER: sequelbox/SupportOpenSource

more to come :)
allegra
posted an update about 1 month ago
view post
Post
2689
NEW RELEASE: Esper 3 for Qwen 3!

- A full-stack software assistant: a reasoning finetune focused on coding, architecture, and DevOps using the Titanium and Tachibana datasets!
- Improved general and creative reasoning skills, powered by the Raiden dataset.

4B model: ValiantLabs/Qwen3-4B-Esper3
8B model: ValiantLabs/Qwen3-8B-Esper3

We'll also be bringing Esper 3 to larger Qwen 3 models as soon as we can - if you want these, consider helping us out: sequelbox/SupportOpenSource

More models and datasets to come soon!

with my love and enthusiasm,
allegra
posted an update about 2 months ago
view post
Post
1787
TITANIUM 2 Deepseek-R1 dataset is here! Open-source synthetic architecture and DevOps dataset: sequelbox/Titanium2-DeepSeek-R1

Esper 3 will be coming out soon for multiple base models, trained on Titanium, Raiden, and more :)

with my love,
allegra
reacted to mkurman's post with โค๏ธ 3 months ago
view post
Post
3705
Introducing a new architecture, MedIT One โ€“ a single-token transformer with LSTM-like recurrence.

It is extremely fast in training and inference, but we lack funding for large-scale training. Enjoy ๐Ÿ“

https://github.com/MedITSolutionsKurman/medit-one

reacted to singhsidhukuldeep's post with ๐Ÿ‘ 3 months ago
view post
Post
6937
Exciting New Tool for Knowledge Graph Extraction from Plain Text!

I just came across a groundbreaking new tool called KGGen that's solving a major challenge in the AI world - the scarcity of high-quality knowledge graph data.

KGGen is an open-source Python package that leverages language models to extract knowledge graphs (KGs) from plain text. What makes it special is its innovative approach to clustering related entities, which significantly reduces sparsity in the extracted KGs.

The technical approach is fascinating:

1. KGGen uses a multi-stage process involving an LLM (GPT-4o in their implementation) to extract entities and relations from source text
2. It aggregates graphs across sources to reduce redundancy
3. Most importantly, it applies iterative LM-based clustering to refine the raw graph

The clustering stage is particularly innovative - it identifies which nodes and edges refer to the same underlying entities or concepts. This normalizes variations in tense, plurality, stemming, and capitalization (e.g., "labors" clustered with "labor").

The researchers from Stanford and University of Toronto also introduced MINE (Measure of Information in Nodes and Edges), the first benchmark for evaluating KG extractors. When tested against existing methods like OpenIE and GraphRAG, KGGen outperformed them by up to 18%.

For anyone working with knowledge graphs, RAG systems, or KG embeddings, this tool addresses the fundamental challenge of data scarcity that's been holding back progress in graph-based foundation models.

The package is available via pip install kg-gen, making it accessible to everyone. This could be a game-changer for knowledge graph applications!
posted an update 4 months ago
reacted to rubenroy's post with ๐Ÿš€ 4 months ago
view post
Post
2612
๐Ÿ”ฅ๐Ÿš€ Hey everyone! I'm excited to share my latest LLM release: Gilgamesh 72B, a model built on Qwen 2.5-72B Instruct. Gilgamesh was trained on a couple of my GammaCorpus datasets, specifically:

- rubenroy/GammaCorpus-CoT-Math-170k
- rubenroy/GammaCorpus-v2-5m
- rubenroy/GammaCorpus-Fact-QA-450k

I've submitted GGM 72B to the Open LLM Leaderboard for benchmarking, I'll send an update post once the results are in!

You can try it out and share your feedback, check out the model page and see what it can do:
๐Ÿ‘‰ rubenroy/Gilgamesh-72B

Would love to hear your thoughts!
posted an update 4 months ago
reacted to victor's post with ๐Ÿš€ 4 months ago
view post
Post
3222
Finally, an open-source AI that turns your lyrics into full songs is hereโ€”meet YuE! Unlike other tools that only create short clips, YuE can make entire songs (up to 5 minutes) with vocals, melody, and instruments all working together. Letsss go!

m-a-p/YuE-s1-7B-anneal-en-cot
posted an update 5 months ago
view post
Post
2363
A general FYI that Valiant Labs no longer has an X account. This is a business decision. Many other businesses seem to be making the same decision right now.

You can follow my account on Bluesky for updates on Shining Valiant 3, other Valiant Labs models, my open-source datasets, etc: https://bsky.app/profile/sequelbox.bsky.social

back to building :)
posted an update 5 months ago
view post
Post
1401
NEW RELEASE: the sequelbox/Tachibana-QVQ dataset is here! Code-reasoning and code-instruct data generated with Qwen/QVQ-72B-Preview

Come check out QVQ's coding skills!

for everyone to use!

more QVQ and Llama 3.1 405b datasets coming soon :)
reacted to DawnC's post with โค๏ธ 5 months ago
view post
Post
2334
๐ŸŒŸ PawMatchAI: Making Breed Selection More Intuitive! ๐Ÿ•
Excited to share the latest update to this AI-powered companion for finding your perfect furry friend! I've made significant architectural improvements to enhance breed recognition accuracy and feature detection.

โœจ What's New?
Enhanced breed recognition through advanced morphological feature analysis:
- Implemented a sophisticated feature extraction system that analyzes specific characteristics like body proportions, head features, tail structure, fur texture, and color patterns
- Added an intelligent attention mechanism that dynamically focuses on the most relevant features for each image
- Improved multi-dog detection capabilities through enhanced spatial feature analysis
- Achieved better precision in distinguishing subtle breed characteristics

๐ŸŽฏ Key Features:
Smart breed recognition powered by advanced AI architecture
Visual matching scores with intuitive color indicators
Detailed breed comparisons with interactive tooltips
Lifestyle-based recommendations tailored to your needs

๐Ÿ’ญ Project Vision
Combining my passion for AI and pets, this project represents another step toward creating meaningful AI applications. Each update aims to make the breed selection process more accessible while improving the underlying technology.

๐Ÿ‘‰ Try it now: DawnC/PawMatchAI

Your likes โค๏ธ on this space fuel this project's growth!

#AI #MachineLearning #DeepLearning #Pytorch #ComputerVision #TechForLife
  • 2 replies
ยท
posted an update 5 months ago
reacted to m-ric's post with ๐Ÿ‘€ 6 months ago
view post
Post
2613
๐‡๐ฎ๐ ๐ ๐ข๐ง๐  ๐…๐š๐œ๐ž ๐ซ๐ž๐ฅ๐ž๐š๐ฌ๐ž๐ฌ ๐๐ข๐œ๐จ๐ญ๐ซ๐จ๐ง, ๐š ๐ฆ๐ข๐œ๐ซ๐จ๐ฌ๐œ๐จ๐ฉ๐ข๐œ ๐ฅ๐ข๐› ๐ญ๐ก๐š๐ญ ๐ฌ๐จ๐ฅ๐ฏ๐ž๐ฌ ๐‹๐‹๐Œ ๐ญ๐ซ๐š๐ข๐ง๐ข๐ง๐  ๐Ÿ’๐ƒ ๐ฉ๐š๐ซ๐š๐ฅ๐ฅ๐ž๐ฅ๐ข๐ณ๐š๐ญ๐ข๐จ๐ง ๐Ÿฅณ

๐Ÿ•ฐ๏ธ Llama-3.1-405B took 39 million GPU-hours to train, i.e. about 4.5 thousand years.

๐Ÿ‘ด๐Ÿป If they had needed all this time, we would have GPU stories from the time of Pharaoh ๐“‚€: "Alas, Lord of Two Lands, the shipment of counting-stones arriving from Cathay was lost to pirates, this shall delay the building of your computing temple by many moons "

๐Ÿ› ๏ธ But instead, they just parallelized the training on 24k H100s, which made it take just a few months.
This required parallelizing across 4 dimensions: data, tensor, context, pipeline.
And it is infamously hard to do, making for bloated code repos that hold together only by magic.

๐Ÿค ๐—•๐˜‚๐˜ ๐—ป๐—ผ๐˜„ ๐˜„๐—ฒ ๐—ฑ๐—ผ๐—ป'๐˜ ๐—ป๐—ฒ๐—ฒ๐—ฑ ๐—ต๐˜‚๐—ด๐—ฒ ๐—ฟ๐—ฒ๐—ฝ๐—ผ๐˜€ ๐—ฎ๐—ป๐˜†๐—บ๐—ผ๐—ฟ๐—ฒ! Instead of building mega-training codes, Hugging Face colleagues cooked in the other direction, towards tiny 4D parallelism libs. A team has built Nanotron, already widely used in industry.
And now a team releases Picotron, a radical approach to code 4D Parallelism in just a few hundred lines of code, a real engineering prowess, making it much easier to understand what's actually happening!

โšก ๐—œ๐˜'๐˜€ ๐˜๐—ถ๐—ป๐˜†, ๐˜†๐—ฒ๐˜ ๐—ฝ๐—ผ๐˜„๐—ฒ๐—ฟ๐—ณ๐˜‚๐—น:
Counting in MFU (Model FLOPs Utilization, how much the model actually uses all the compute potential), this lib reaches ~50% on SmolLM-1.7B model with 8 H100 GPUs, which is really close to what huge libs would reach. (Caution: the team is leading further benchmarks to verify this)

Go take a look ๐Ÿ‘‰ https://github.com/huggingface/picotron/tree/main/picotron
  • 1 reply
ยท