2 2 7

Jake Mannix

jakemannix

AI & ML interests

ML Infra, language models, recommenders, search

Recent Activity

new activity 2 months ago

bluesky-community/README:Models that can be efficient enough to work with the firehose API

liked a model 3 months ago

Qwen/Qwen2.5-Coder-32B-Instruct

upvoted a paper 3 months ago

LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning

View all activity

Organizations

jakemannix's activity

New activity in bluesky-community/README 2 months ago

Models that can be efficient enough to work with the firehose API

#2 opened 2 months ago by

davanstrien

liked a model 3 months ago

Qwen/Qwen2.5-Coder-32B-Instruct

Text Generation • Updated 15 days ago • 173k • • 1.52k

upvoted a paper 3 months ago

LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning

Paper • 2410.02884 • Published Oct 3, 2024 • 54

reacted to Jaward's post with 👍 6 months ago

Post

1507

Let’s see JEPA in action🤖
Simplified image-based implementation training on a CPU with live preview support - very satisfying to watch:)

I-JEPA is the image-based version of JEPA (Joint-Embedding Predictive Architecture - an alternative to autoregressive LLM architectures ) pioneered by professor Yann Lecun.

At a higher level, I-JEPA predicts image segment representations (Target) based on representations of other segments within the same image (Context). It consists of three key components: a context encoder, target encoder and a predictor.

Code: https://github.com/Jaykef/ai-algorithms/blob/main/mnist_ijepa.ipynb

reacted to joylarkin's post with 🔥 6 months ago

Post

3022

Introducing Fineweb-Edu-Fortified: An enhanced Fineweb-Edu dataset. 📚

This dataset is tailored for NLP tasks and helps streamline model training by offering a more refined, unique dataset. Perfect for startups and researchers looking for high-quality educational content to train, evaluate, or fine-tune AI models. The dataset is based on the Fineweb-Edu subset of the large Fineweb dataset and includes:

- Exact-match deduplication across all crawls
- Embeddings for each row using the TaylorAI/bge-micro model
- Count column indicating duplication frequency
- Includes data from 95 Common Crawl crawls (2013-2024)
- Rows have been reduced from 1.279B to 0.324B after deduplication
- It is comprised of ~375B tokens (down from 1,320B in Fineweb-Edu)

Access the entire Fineweb-Edu-Fortified dataset on Hugging Face → airtrain-ai/fineweb-edu-fortified

Try a semantic search demo via this Hugging Face Space → airtrain-ai/fineweb-edu-fortified-search-demo

Many thanks to the amazing @josh-sematic for his work on this project, the Fineweb/Fineweb-Edu team at Hugging Face for producing the original datasets and for their support during our work on Fineweb-Edu-Fortified, and also thanks to @underspirit for pointing out the reduction in dataset size that could be achieved via deduplication. 🤗

liked a dataset 7 months ago

NousResearch/CharacterCodex

Viewer • Updated Jun 17, 2024 • 15.9k • 160 • 218

reacted to anakin87's post with 🔥 7 months ago

Post

1649

🌌 Creating adventures with local LLMs

What if 🤔... Homer Simpson met Spider-Man and they went on a quest for donuts? 🍩
Or if Fred Astaire and Corporal Hicks teamed up to fight xenomorphs? 👾

In the words of Karpathy, LLMs are dream machines...
they seem specially made to simulate these wild scenarios!

𝐄𝐱𝐩𝐞𝐫𝐢𝐦𝐞𝐧𝐭𝐢𝐧𝐠 𝐰𝐢𝐭𝐡 𝐭𝐡𝐢𝐬 𝐢𝐝𝐞𝐚 👇
Nous Research / @teknium recently released NousResearch/CharacterCodex:
a massive dataset with information on 16k characters, both fictional and real.
I couldn't wait to play it...

After a few attempts, I found that combining the information in this dataset with a good model (like meta-llama/Meta-Llama-3-8B-Instruct) opens the doors to a myriad of chat adventures.

🛠️ Stack:
🔹Haystack for orchestration 🏗️
🔹llamafile 🦙🗂️ to run our model locally.

📓 Check out the notebook: https://t.ly/y6jrZ
(includes a bonus 🕵️ Mystery Character Quiz)

reacted to thughost's post with 🔥 7 months ago

Post

706

We've open-sourced the code and models for Self-Play Preference Optimization (SPPO)! 🚀🚀🚀
🤗paper: Self-Play Preference Optimization for Language Model Alignment (2405.00675)
⭐ code: https://github.com/uclaml/SPPO
🤗models: UCLA-AGI/sppo-6635fdd844f2b2e4a94d0b9a

reacted to yushun0410's post with 🚀 7 months ago

Post

4625

Hi Huggingfacers!

Thrilled to introduce Adam-mini, an optimizer that achieves on-par or better performance than AdamW with 45% to 50% less memory footprint. Adam-mini can also achieve 49.5% higher throughput than AdamW on Llama2-7B pre-training.

The design of Adam-mini is inspired by certain Hessian structures we observed on Transformers.

Feel free to try it out! Try switching to Adam-mini with the same hyperparams of AdamW, it would work with only half memory. Hope Adam-mini can help save time, cost, and energy in your tasks!

Paper: "Adam-mini: Use Fewer Learning Rates To Gain More" https://arxiv.org/abs/2406.16793

Code: https://github.com/zyushun/Adam-mini