Core ML Projects

community
Activity Feed

AI & ML interests

Take the Hub to iOS and macOS

Recent Activity

coreml-projects's activity

XenovaΒ 
posted an update 7 days ago
view post
Post
1943
Introducing Moonshine Web: real-time speech recognition running 100% locally in your browser!
πŸš€ Faster and more accurate than Whisper
πŸ”’ Privacy-focused (no data leaves your device)
⚑️ WebGPU accelerated (w/ WASM fallback)
πŸ”₯ Powered by ONNX Runtime Web and Transformers.js

Demo: webml-community/moonshine-web
Source code: https://github.com/huggingface/transformers.js-examples/tree/main/moonshine-web
XenovaΒ 
posted an update 17 days ago
view post
Post
2548
Introducing TTS WebGPU: The first ever text-to-speech web app built with WebGPU acceleration! πŸ”₯ High-quality and natural speech generation that runs 100% locally in your browser, powered by OuteTTS and Transformers.js. πŸ€— Try it out yourself!

Demo: webml-community/text-to-speech-webgpu
Source code: https://github.com/huggingface/transformers.js-examples/tree/main/text-to-speech-webgpu
Model: onnx-community/OuteTTS-0.2-500M (ONNX), OuteAI/OuteTTS-0.2-500M (PyTorch)
reach-vbΒ 
posted an update 19 days ago
view post
Post
3206
VLMs are going through quite an open revolution AND on-device friendly sizes:

1. Google DeepMind w/ PaliGemma2 - 3B, 10B & 28B: google/paligemma-2-release-67500e1e1dbfdd4dee27ba48

2. OpenGVLabs w/ InternVL 2.5 - 1B, 2B, 4B, 8B, 26B, 38B & 78B: https://huggingface.co/collections/OpenGVLab/internvl-25-673e1019b66e2218f68d7c1c

3. Qwen w/ Qwen 2 VL - 2B, 7B & 72B: Qwen/qwen2-vl-66cee7455501d7126940800d

4. Microsoft w/ FlorenceVL - 3B & 8B: https://huggingface.co/jiuhai

5. Moondream2 w/ 0.5B: https://huggingface.co/vikhyatk/

What a time to be alive! πŸ”₯
pagezyhfΒ 
posted an update 21 days ago
pagezyhfΒ 
posted an update 23 days ago
view post
Post
959
It’s 2nd of December , here’s your Cyber Monday present 🎁 !

We’re cutting our price down on Hugging Face Inference Endpoints and Spaces!

Our folks at Google Cloud are treating us with a 40% price cut on GCP Nvidia A100 GPUs for the next 3️⃣ months. We have other reductions on all instances ranging from 20 to 50%.

Sounds like the time to give Inference Endpoints a try? Get started today and find in our documentation the full pricing details.
https://ui.endpoints.huggingface.co/
https://huggingface.co/pricing
victorΒ 
posted an update 27 days ago
view post
Post
1786
Qwen/QwQ-32B-Preview shows us the future (and it's going to be exciting)...

I tested it against some really challenging reasoning prompts and the results are amazing 🀯.

Check this dataset for the results: victor/qwq-misguided-attention
  • 2 replies
Β·
XenovaΒ 
posted an update 28 days ago
view post
Post
3923
We just released Transformers.js v3.1 and you're not going to believe what's now possible in the browser w/ WebGPU! 🀯 Let's take a look:
πŸ”€ Janus from Deepseek for unified multimodal understanding and generation (Text-to-Image and Image-Text-to-Text)
πŸ‘οΈ Qwen2-VL from Qwen for dynamic-resolution image understanding
πŸ”’ JinaCLIP from Jina AI for general-purpose multilingual multimodal embeddings
πŸŒ‹ LLaVA-OneVision from ByteDance for Image-Text-to-Text generation
πŸ€Έβ€β™€οΈ ViTPose for pose estimation
πŸ“„ MGP-STR for optical character recognition (OCR)
πŸ“ˆ PatchTST & PatchTSMixer for time series forecasting

That's right, everything running 100% locally in your browser (no data sent to a server)! πŸ”₯ Huge for privacy!

Check out the release notes for more information. πŸ‘‡
https://github.com/huggingface/transformers.js/releases/tag/3.1.0

Demo link (+ source code): webml-community/Janus-1.3B-WebGPU
pagezyhfΒ 
posted an update 29 days ago
view post
Post
297
Hello Hugging Face Community,

if you use Google Kubernetes Engine to host you ML workloads, I think this series of videos is a great way to kickstart your journey of deploying LLMs, in less than 10 minutes! Thank you @wietse-venema-demo !

To watch in this order:
1. Learn what are Hugging Face Deep Learning Containers
https://youtu.be/aWMp_hUUa0c?si=t-LPRkRNfD3DDNfr

2. Learn how to deploy a LLM with our Deep Learning Container using Text Generation Inference
https://youtu.be/Q3oyTOU1TMc?si=V6Dv-U1jt1SR97fj

3. Learn how to scale your inference endpoint based on traffic
https://youtu.be/QjLZ5eteDds?si=nDIAirh1r6h2dQMD

If you want more of these small tutorials and have any theme in mind, let me know!
victorΒ 
posted an update about 1 month ago
view post
Post
2258
Perfect example of why Qwen/Qwen2.5-Coder-32B-Instruct is insane?

Introducing: AI Video Composer πŸ”₯
huggingface-projects/ai-video-composer

Drag and drop your assets (images/videos/audios) to create any video you want using natural language!

It works by asking the model to output a valid FFMPEG and this can be quite complex but most of the time Qwen2.5-Coder-32B gets it right (that thing is a beast). It's an update of an old project made with GPT4 and it was almost impossible to make it work with open models back then (~1.5 years ago), but not anymore, let's go open weights πŸš€.
reach-vbΒ 
posted an update about 1 month ago
view post
Post
3164
Massive week for Open AI/ ML:

Mistral Pixtral & Instruct Large - ~123B, 128K context, multilingual, json + function calling & open weights
mistralai/Pixtral-Large-Instruct-2411
mistralai/Mistral-Large-Instruct-2411

Allen AI TΓΌlu 70B & 8B - competive with claude 3.5 haiku, beats all major open models like llama 3.1 70B, qwen 2.5 and nemotron
allenai/tulu-3-models-673b8e0dc3512e30e7dc54f5
allenai/tulu-3-datasets-673b8df14442393f7213f372

Llava o1 - vlm capable of spontaneous, systematic reasoning, similar to GPT-o1, 11B model outperforms gemini-1.5-pro, gpt-4o-mini, and llama-3.2-90B-vision
Xkev/Llama-3.2V-11B-cot

Black Forest Labs Flux.1 tools - four new state of the art model checkpoints & 2 adapters for fill, depth, canny & redux, open weights
reach-vb/black-forest-labs-flux1-6743847bde9997dd26609817

Jina AI Jina CLIP v2 - general purpose multilingual and multimodal (text & image) embedding model, 900M params, 512 x 512 resolution, matroyoshka representations (1024 to 64)
jinaai/jina-clip-v2

Apple AIM v2 & CoreML MobileCLIP - large scale vision encoders outperform CLIP and SigLIP. CoreML optimised MobileCLIP models
apple/aimv2-6720fe1558d94c7805f7688c
apple/coreml-mobileclip

A lot more got released like, OpenScholar ( OpenScholar/openscholar-v1-67376a89f6a80f448da411a6), smoltalk ( HuggingFaceTB/smoltalk), Hymba ( nvidia/hymba-673c35516c12c4b98b5e845f), Open ASR Leaderboard ( hf-audio/open_asr_leaderboard) and much more..

Can't wait for the next week! πŸ€—
victorΒ 
posted an update about 1 month ago
view post
Post
1822
Qwen2.5-72B is now the default HuggingChat model.
This model is so good that you must try it! I often get better results on rephrasing with it than Sonnet or GPT-4!!
XenovaΒ 
posted an update about 1 month ago
view post
Post
5520
Have you tried out πŸ€— Transformers.js v3? Here are the new features:
⚑ WebGPU support (up to 100x faster than WASM)
πŸ”’ New quantization formats (dtypes)
πŸ› 120 supported architectures in total
πŸ“‚ 25 new example projects and templates
πŸ€– Over 1200 pre-converted models
🌐 Node.js (ESM + CJS), Deno, and Bun compatibility
🏑 A new home on GitHub and NPM

Get started with npm i @huggingface/transformers.

Learn more in our blog post: https://huggingface.co/blog/transformersjs-v3
  • 3 replies
Β·
reach-vbΒ 
posted an update about 1 month ago
view post
Post
4329
What a brilliant week for Open Source AI!

Qwen 2.5 Coder by Alibaba - 0.5B / 1.5B / 3B / 7B / 14B/ 32B (Base + Instruct) Code generation LLMs, with 32B tackling giants like Gemnini 1.5 Pro, Claude Sonnet
Qwen/qwen25-coder-66eaa22e6f99801bf65b0c2f

LLM2CLIP from Microsoft - Leverage LLMs to train ultra-powerful CLIP models! Boosts performance over the previous SOTA by ~17%
microsoft/llm2clip-672323a266173cfa40b32d4c

Athene v2 Chat & Agent by NexusFlow - SoTA general LLM fine-tuned from Qwen 2.5 72B excels at Chat + Function Calling/ JSON/ Agents
Nexusflow/athene-v2-6735b85e505981a794fb02cc

Orca Agent Instruct by Microsoft - 1 million instruct pairs covering text editing, creative writing, coding, reading comprehension, etc - permissively licensed
microsoft/orca-agentinstruct-1M-v1

Ultravox by FixieAI - 70B/ 8B model approaching GPT4o level, pick any LLM, train an adapter with Whisper as Audio Encoder
reach-vb/ultravox-audio-language-model-release-67373b602af0a52b2a88ae71

JanusFlow 1.3 by DeepSeek - Next iteration of their Unified MultiModal LLM Janus with RectifiedFlow
deepseek-ai/JanusFlow-1.3B

Common Corpus by Pleais - 2,003,039,184,047 multilingual, commercially permissive and high quality tokens!
PleIAs/common_corpus

I'm sure I missed a lot, can't wait for the next week!

Put down in comments what I missed! πŸ€—
pagezyhfΒ 
posted an update about 1 month ago
view post
Post
1357
Hello Hugging Face Community,

I'd like to share here a bit more about our Deep Learning Containers (DLCs) we built with Google Cloud, to transform the way you build AI with open models on this platform!

With pre-configured, optimized environments for PyTorch Training (GPU) and Inference (CPU/GPU), Text Generation Inference (GPU), and Text Embeddings Inference (CPU/GPU), the Hugging Face DLCs offer:

⚑ Optimized performance on Google Cloud's infrastructure, with TGI, TEI, and PyTorch acceleration.
πŸ› οΈ Hassle-free environment setup, no more dependency issues.
πŸ”„ Seamless updates to the latest stable versions.
πŸ’Ό Streamlined workflow, reducing dev and maintenance overheads.
πŸ”’ Robust security features of Google Cloud.
☁️ Fine-tuned for optimal performance, integrated with GKE and Vertex AI.
πŸ“¦ Community examples for easy experimentation and implementation.
πŸ”œ TPU support for PyTorch Training/Inference and Text Generation Inference is coming soon!

Find the documentation at https://huggingface.co/docs/google-cloud/en/index
If you need support, open a conversation on the forum: https://discuss.huggingface.co/c/google-cloud/69
reach-vbΒ 
posted an update about 2 months ago
view post
Post
1589
Smol TTS models are here! OuteTTS-0.1-350M - Zero shot voice cloning, built on LLaMa architecture, CC-BY license! πŸ”₯

> Pure language modeling approach to TTS
> Zero-shot voice cloning
> LLaMa architecture w/ Audio tokens (WavTokenizer)
> BONUS: Works on-device w/ llama.cpp ⚑

Three-step approach to TTS:

> Audio tokenization using WavTokenizer (75 tok per second)
> CTC forced alignment for word-to-audio token mapping
> Structured prompt creation w/ transcription, duration, audio tokens

The model is extremely impressive for 350M parameters! Kudos to the
OuteAI team on such a brilliant feat - I'd love to see this be applied on larger data and smarter backbones like SmolLM πŸ€—

Check out the models here: OuteAI/outetts-6728aa71a53a076e4ba4817c
reach-vbΒ 
posted an update about 2 months ago
view post
Post
2973
Smol models ftw! AMD released AMD OLMo 1B - beats OpenELM, tiny llama on MT Bench, Alpaca Eval - Apache 2.0 licensed πŸ”₯

> Trained with 1.3 trillion (dolma 1.7) tokens on 16 nodes, each with 4 MI250 GPUs

> Three checkpoints:

- AMD OLMo 1B: Pre-trained model
- AMD OLMo 1B SFT: Supervised fine-tuned on Tulu V2, OpenHermes-2.5, WebInstructSub, and Code-Feedback datasets
- AMD OLMo 1B SFT DPO: Aligned with human preferences using Direct Preference Optimization (DPO) on UltraFeedback dataset

Key Insights:
> Pre-trained with less than half the tokens of OLMo-1B
> Post-training steps include two-phase SFT and DPO alignment
> Data for SFT:
- Phase 1: Tulu V2
- Phase 2: OpenHermes-2.5, WebInstructSub, and Code-Feedback

> Model checkpoints on the Hub & Integrated with Transformers ⚑️

Congratulations & kudos to AMD on a brilliant smol model release! πŸ€—

amd/amd-olmo-6723e7d04a49116d8ec95070
reach-vbΒ 
posted an update 2 months ago
view post
Post
2449
What a great day for Open Science! @AIatMeta released models, datasets, and code for many of its research artefacts! πŸ”₯

1. Meta Segment Anything Model 2.1: An updated checkpoint with improved results on visually similar objects, small objects and occlusion handling. A new developer suite will be added to make it easier for developers to build with SAM 2.

Model checkpoints: reach-vb/sam-21-6702d40defe7611a8bafa881

2. Layer Skip: Inference code and fine-tuned checkpoints demonstrating a new method for enhancing LLM performance.

Model checkpoints: facebook/layerskip-666b25c50c8ae90e1965727a

3. SALSA: New code enables researchers to benchmark AI-based attacks to validate security for post-quantum cryptography.

Repo: https://github.com/facebookresearch/LWE-benchmarking

4. Meta Lingua: A lightweight and self-contained codebase designed to train language models at scale.

Repo: https://github.com/facebookresearch/lingua

5. Meta Open Materials: New open source models and the largest dataset to accelerate AI-driven discovery of new inorganic materials.

Model checkpoints: fairchem/OMAT24

6. MEXMA: A new research paper and code for our novel pre-trained cross-lingual sentence encoder covering 80 languages.

Model checkpoint: facebook/MEXMA

7. Self-Taught Evaluator: a new method for generating synthetic preference data to train reward models without relying on human annotations.

Model checkpoint: facebook/Self-taught-evaluator-llama3.1-70B

8. Meta Spirit LM: An open-source language model for seamless speech and text integration.

Repo: https://github.com/facebookresearch/spiritlm
  • 3 replies
Β·
victorΒ 
posted an update 2 months ago