Caleb Fahlgren's picture

Caleb Fahlgren PRO

cfahlgren1

AI & ML interests

None yet

Recent Activity

updated a dataset about 3 hours ago
cfahlgren1/hub-stats
liked a model about 5 hours ago
deepseek-ai/DeepSeek-V3-Base
View all activity

Articles

Organizations

Hugging Face's profile picture Datasets Maintainers's profile picture Hugging Face OSS Metrics's profile picture Hugging Face TB Research's profile picture ChatDB's profile picture Cognitive Computations's profile picture nltpt-q's profile picture DuckDB Text-2-SQL Bench's profile picture open/ acc's profile picture Bluesky Community's profile picture

cfahlgren1's activity

reacted to lorraine2's post with πŸš€ 9 days ago
view post
Post
1966
πŸ¦™New NVIDIA paper: LLaMA-Mesh πŸ¦™

We enable large language models to generate and understand 3D meshes by representing them as text and fine-tuning. This unifies the 3D and text modalities in a single model and preserves language abilities, unlocking conversational 3D creation with mesh understanding.

πŸ”Ž Project Page: https://research.nvidia.com/labs/toronto-ai/LLaMA-Mesh/
πŸ•ΉοΈ Interactive Demo: Zhengyi/LLaMA-Mesh (courtesy of HuggingFace and Gradio)
πŸ“– Full Paper: https://arxiv.org/abs/2411.09595
πŸ‘¨β€πŸ’»Code: https://github.com/nv-tlabs/LLaMa-Mesh
πŸ’Ύ Model Checkpoint: Zhengyi/LLaMA-Mesh
🧩 Blender Addon: https://github.com/huggingface/meshgen (courtesy of Dylan Ebert)
πŸŽ₯ 5-min Overview Video: https://youtu.be/eZNazN-1lPo?si=-idQa5aaceVw0Bbj (courtesy of AI Papers Academy)
reacted to julien-c's post with πŸ‘πŸ€—β€οΈπŸ”₯ 15 days ago
view post
Post
7611
After some heated discussion πŸ”₯, we clarify our intent re. storage limits on the Hub

TL;DR:
- public storage is free, and (unless blatant abuse) unlimited. We do ask that you consider upgrading to PRO and/or Enterprise Hub if possible
- private storage is paid above a significant free tier (1TB if you have a paid account, 100GB otherwise)

docs: https://huggingface.co/docs/hub/storage-limits

We optimize our infrastructure continuously to scale our storage for the coming years of growth in Machine learning, to the benefit of the community πŸ”₯

cc: @reach-vb @pierric @victor and the HF team
Β·
posted an update 22 days ago
view post
Post
1810
You can just ask things πŸ—£οΈ

"show me messages in the coding category that are in the top 10% of reward model scores"

Download really high quality instructions from the Llama3.1 405B synthetic dataset πŸ”₯

argilla/magpie-ultra-v1.0

posted an update 24 days ago
view post
Post
2991
We just dropped an LLM inside the SQL Console 🀯

The amazing, new Qwen/Qwen2.5-Coder-32B-Instruct model can now write SQL for any Hugging Face dataset ✨

It's 2025, you shouldn't be hand writing SQL! This is a big step in making it where anyone can do in depth analysis on a dataset. Let us know what you think πŸ€—
posted an update about 1 month ago
view post
Post
910
observers πŸ”­ - automatically log all OpenAI compatible requests to a datasetπŸ’½

β€’ supports any OpenAI compatible endpoint πŸ’ͺ
β€’ supports DuckDB, Hugging Face Datasets, and Argilla as stores

> pip install observers

No complex framework. Just a few lines of code to start sending your traces somewhere. Let us know what you think! @davidberenstein1957 and I will continue iterating!

Here's an example dataset that was logged to Hugging Face from Ollama: cfahlgren1/llama-3.1-awesome-chatgpt-prompts
replied to their post about 1 month ago
posted an update about 1 month ago
view post
Post
870
You can create charts, leaderboards, and filters on top of any Hugging Face dataset in less than a minute

β€’ ASCII Bar Charts πŸ“Š
β€’ Powered by DuckDB WASM ⚑
β€’ Download results to Parquet πŸ’½
β€’ Embed and Share results with friends πŸ“¬

Do you have any interesting queries?
reacted to davanstrien's post with ❀️ about 1 month ago
replied to their post about 1 month ago
view reply

Heavy is the head that wears the crown

reacted to fracapuano's post with ❀️ about 1 month ago
view post
Post
1020
Sharing what we have built over the course of the weekend at the @llamameta hackathon, by Cerebral Valley in London πŸ‡¬πŸ‡§ πŸ‘‡

@gabrycina @calebgcc and I competed with 200+ participants and 50+ teams for a 24-hrs sprint centered around hacking for impact! We focused on applications of robotics to those in need of assisted living, moving our focus to enable greater autonomy and accessibility of robotics in everyday life.

complete list of assets πŸ‘‡
πŸ€— trained robotics policies
v1:
- fracapuano/moss-pills
- fracapuano/moss-cup
v2:
- fracapuano/meta-grasp

πŸ€— datasets
v1:
- fracapuano/pills
- fracapuano/cup
v2:
- fracapuano/cupim


You can find a live demo of our submission at: https://x.com/_fracapuano/status/1858102728691458554

If you want to know more about how we collected 100GB+ of data, trained multiple RL-policies using @lerobot and used Llama-3.2 models to handle user interactions and switch between tasks, go ahead and have a look! Also, don't be a stranger, and reach out 🦾

Our project is fully open-source, for the community (and ourselves, πŸ‘¨β€πŸ³) to build! A huge thank you to @cadene for the help (and the robot 🀭) - truly feeling these hugs-vibes πŸ€— , and to @thomwolf and @clem for sharing our work across

Little extra:
➑️ Our 🧠EEG waves🧠-based control of the 🦾robotic arm🦾
reacted to LukeNeumann's post with 🀯 about 1 month ago
view post
Post
1219
Nine years ago, I uploaded the first 8K resolution video to YouTube and I've been stockpiling 8K footage ever since: https://www.youtube.com/watch?v=sLprVF6d7Ug&t

Should @Overlaiapp release the first open-source 8K video dataset?

Could anyone even fine tune a model with this?πŸ˜…
Β·
replied to LukeNeumann's post about 1 month ago
view reply

Would be massive! Let us know if you need any help πŸ€—

reacted to dvilasuero's post with πŸš€πŸ€— about 1 month ago
posted an update about 1 month ago
reacted to nyuuzyou's post with πŸ”₯ about 1 month ago
view post
Post
949
πŸ–ΌοΈ Introducing Public Domain Pictures Dataset - nyuuzyou/publicdomainpictures

Dataset highlights:
- 644,412 public domain images with comprehensive metadata from publicdomainpictures.net
- English language metadata including titles, descriptions, and keywords
- Each entry contains rich metadata including:
- Unique image ID and full-size image URLs
- Detailed titles and descriptions
- Keyword/tag collections
- Creator attribution
- Released to the public domain under Creative Commons Zero (CC0) license
  • 2 replies
Β·
posted an update about 1 month ago
view post
Post
3091
You can clean and format datasets entirely in the browser with a few lines of SQL.

In this post, I replicate the process @mlabonne used to clean the new microsoft/orca-agentinstruct-1M-v1 dataset.

The cleaning process consists of:
- Joining the separate splits together / add split column
- Converting string messages into list of structs
- Removing empty system prompts

https://huggingface.co/blog/cfahlgren1/the-beginners-guide-to-cleaning-a-dataset

Here's his new cleaned dataset: mlabonne/orca-agentinstruct-1M-v1-cleaned
  • 1 reply
Β·