Moritz Laurer's picture

Moritz Laurer

MoritzLaurer

AI & ML interests

None yet

Recent Activity

Articles

Organizations

Hugging Face's profile picture Amazon SageMaker Community's profile picture  Zero Shot NLI 's profile picture Hugging Test Lab's profile picture Deutsche Gesellschaft fรผr internationale Zusammenarbeit's profile picture HuggingFaceM4's profile picture Aledade Inc's profile picture classroom-test-room's profile picture Prezi's profile picture Blog-explorers's profile picture Enterprise Explorers's profile picture ZeroGPU Explorers's profile picture Spectral's profile picture C&A's profile picture Social Post Explorers's profile picture Triple's profile picture Dev Mode Explorers's profile picture moritz-test-organization-changed-2's profile picture Hugging Face Discord Community's profile picture Moritz Test Org's profile picture

MoritzLaurer's activity

posted an update 6 days ago
view post
Post
2283
Quite excited by the ModernBERT release! 0.15/0.4B small, 2T modern pre-training data and tokenizer with code, 8k context window, great efficient model for embeddings & classification!

This will probably be the basis for many future SOTA encoders! And I can finally stop using DeBERTav3 from 2021 :D

Congrats @answerdotai , @LightOnIO and collaborators like @tomaarsen !

Paper and models here ๐Ÿ‘‡https://huggingface.co/collections/answerdotai/modernbert-67627ad707a4acbf33c41deb
replied to their post 8 days ago
view reply

Hey @borowis , I don't think there is a plan to add embedding models to the NIM API. Embedding models are quite small which makes them easier to run on accessible hardware (vs. the H100 GPUs running the large LLMs on the NIM API). I'd recommend using a cheap GPU (or even a CPU) via the HF dedicated endpoints for deploying embedding models: https://huggingface.co/inference-endpoints/dedicated And you can use the autoscaling/scale-to-zero feature to avoid unnecessary costs
(The smaller BGE models from the MTEB leaderboard are always a good place to start)

posted an update 9 days ago
posted an update 14 days ago
view post
Post
1264
I've been building a small library for working with prompt templates on the HF hub: pip install prompt-templates. Motivation:

The community currently shares prompt templates in a wide variety of formats: in datasets, in model cards, as strings in .py files, as .txt/.yaml/.json/.jinja2 files etc. This makes sharing and working with prompt templates unnecessarily complicated.

Prompt templates are currently the main hyperparameter that people tune when building complex LLM systems or agents. If we don't have a common standard for sharing them, we cannot systematically test and improve our systems. After comparing different community approaches, I think that working with modular .yaml or .json files is the best approach.

The prompt-templates library :
- proposes a standard for sharing prompts (entirely locally or on the HF hub)
- provides some utilities that are interoperable with the broader ecosystem

Try it:
# !pip install prompt-templates
from prompt_templates import PromptTemplateLoader 
prompt_template = PromptTemplateLoader.from_hub(repo_id="MoritzLaurer/closed_system_prompts", filename="claude-3-5-artifacts-leak-210624.yaml")


The library is in early stages, feedback is welcome!

More details in the docs: https://github.com/MoritzLaurer/prompt_templates/
  • 1 reply
ยท
reacted to julien-c's post with โค๏ธ 15 days ago
view post
Post
7612
After some heated discussion ๐Ÿ”ฅ, we clarify our intent re. storage limits on the Hub

TL;DR:
- public storage is free, and (unless blatant abuse) unlimited. We do ask that you consider upgrading to PRO and/or Enterprise Hub if possible
- private storage is paid above a significant free tier (1TB if you have a paid account, 100GB otherwise)

docs: https://huggingface.co/docs/hub/storage-limits

We optimize our infrastructure continuously to scale our storage for the coming years of growth in Machine learning, to the benefit of the community ๐Ÿ”ฅ

cc: @reach-vb @pierric @victor and the HF team
ยท
replied to their post 23 days ago
replied to their post 23 days ago
view reply

It works for me with this code actually. The small change I made is a minimal change in the model identifier (removing the "meta"). The issue is that Meta renamed their models on HF recently.
It works for me with this code. Can you try again with this code?

#!pip install "huggingface_hub>=0.25.0"
from huggingface_hub import InferenceClient


client = InferenceClient(
    base_url="https://huggingface.co/api/integrations/dgx/v1",
    api_key=os.getenv("HF_TOKEN_ENTERPRISE")  # see docs: https://huggingface.co/blog/inference-dgx-cloud#create-a-fine-grained-token
)

output = client.chat.completions.create(
    model="meta-llama/Llama-3.1-405B-Instruct-FP8",  #"meta-llama/Meta-Llama-3.1-405B-Instruct-FP8",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Count to 10"},
    ],
    max_tokens=1024,
)

print(output)
replied to their post 23 days ago
view reply

ok, I could reproduce the issue and I get the same error, I'm reporting it to Nvidia, thanks for reporting

replied to their post 23 days ago
replied to their post 23 days ago
reacted to clem's post with ๐Ÿ”ฅ 2 months ago
view post
Post
4436
This is no Woodstock AI but will be fun nonetheless haha. Iโ€™ll be hosting a live workshop with team members next week about the Enterprise Hugging Face hub.

1,000 spots available first-come first serve with some surprises during the stream!

You can register and add to your calendar here: https://streamyard.com/watch/JS2jHsUP3NDM
ยท
reacted to m-ric's post with ๐Ÿ”ฅ 3 months ago
view post
Post
1503
Transformers v4.45.0 released: includes a lightning-fast method to build tools! โšก๏ธ

During user research with colleagues @MoritzLaurer and @Jofthomas , we discovered that the class definition currently in used to define a Tool in
transformers.agents is a bit tedious to use, because it goes in great detail.

โžก๏ธ So Iโ€™ve made an easier way to build tools: just make a function with type hints + a docstring, and add a @tool decorator in front.

โœ…ย Voilร , youโ€™re good to go!

Read all about it in the new doc here: https://huggingface.co/docs/transformers/main/en/agents#create-a-new-tool

And donโ€™t hesitate to give feedback, Iโ€™m all ears! ๐Ÿค—
posted an update 3 months ago
view post
Post
4511
#phdone - I defended my PhD yesterday! A key lesson: it is amazing how open science and open source can empower beginners with limited resources:

I first learned about instruction-based classifiers like BERT-NLI 3-4 years ago, through the @HuggingFace ZeroShotClassificationPipeline. Digging deeper into this, it was surprisingly easy to find new datasets, newer base models, and reusable fine-tuning scripts on the HF Hub to create my own zeroshot models - although I didn't know much about fine-tuning at the time.

Thanks to the community effect of the Hub, my models were downloaded hundreds of thousands of times after a few months. Seeing my research being useful for people motivated me to improve and upload newer models. Leaving my contact details in the model cards led to academic cooperation and consulting contracts (and eventually my job at HF).

That's the power of open science & open source: learning, sharing, improving, collaborating.

I mean every word in my thesis acknowledgments (screenshot). I'm very grateful to my supervisors @vanatteveldt @CasAndreu @KasperWelbers for their guidance; to @profAndreaRenda and @CEPS_thinktank for enabling me to work part-time during the first year; to @huggingface for creating awesome tools and an awesome platform; and to many others who are not active on social media.

Links to the full thesis and the collection of my most recent models are below.

PS: If someone happens to speak Latin, let me know if my diploma contains some hidden Illuminati code or something :D
ยท
replied to their post 3 months ago
view reply
#!pip install "huggingface_hub>=0.25.0"
from huggingface_hub import InferenceClient

client = InferenceClient(
    base_url="https://huggingface.co/api/integrations/dgx/v1",
    api_key="MY_FINEGRAINED_ENTERPRISE_ORG_TOKEN"  # see docs: https://huggingface.co/blog/inference-dgx-cloud#create-a-fine-grained-token
)

output = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3.1-405B-Instruct-FP8",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Count to 10"},
    ],
    max_tokens=1024,
)

print(output)
posted an update 3 months ago
view post
Post
2303
The new NIM Serverless API by HF and Nvidia is a great option if you want a reliable API for open-weight LLMs like Llama-3.1-405B that are too expensive to run on your own hardware.

- It's pay-as-you-go, so it doesn't have rate limits like the standard HF Serverless API and you don't need to commit to hardware like for a dedicated endpoint.
- It works out-of-the box with the new v0.25 release of our huggingface_hub.InferenceClient
- It's specifically tailored to a small collection of popular open-weight models. For a broader selection of open models, we recommend using the standard HF Serverless API.
- Note that you need a token from an Enterprise Hub organization to use it.

Details in this blog post: https://huggingface.co/blog/inference-dgx-cloud
Compatible models in this HF collection: nvidia/nim-serverless-inference-api-66a3c6fcdcb5bbc6e975b508
Release notes with many more features of huggingface_hub==0.25.0: https://github.com/huggingface/huggingface_hub/releases/tag/v0.25.0

Copy-pasteable code in the first comment:
ยท
posted an update 3 months ago
view post
Post
1625
Why would you fine-tune a model if you can just prompt an LLM? The new paper "What is the Role of Small Models in the LLM Era: A Survey" provides a nice pro/con overview. My go-to approach combines both:

1. Start testing an idea by prompting an LLM/VLM behind an API. It's fast and easy and I avoid wasting time on tuning a model on a task that might not make it into production anyways.

2. The LLM/VLM then needs to be manually validated. Anyone seriously considering putting AI into production has to do at least some manual validation. Setting up a good validation pipeline with a tool like Argilla is crucial and it can be reused for any future experiments. Note: you can use LLM-as-a-judge to automate some evals, but you always also need to validate the judge!

3. Based on this validation I can then (a) either just continue using the prompted LLM if it is accurate enough and it makes sense financially given my load; or (b) if the LLM is not accurate enough or too expensive to run in the long-run, I reuse the existing validation pipeline to annotate some additional data for fine-tuning a smaller model. This can be sped up by reusing & correcting synthetic data from the LLM (or just pure distillation).

Paper: https://arxiv.org/pdf/2409.06857
Argilla docs: https://docs.argilla.io/latest/
Argilla is also very easy to deploy with Hugging Face Spaces (or locally): https://huggingface.co/new-space?template=argilla%2Fargilla-template-space
reacted to jeffboudier's post with ๐Ÿ”ฅ 3 months ago
view post
Post
4032
Pro Tip - if you're a Firefox user, you can set up Hugging Chat as integrated AI Assistant, with contextual links to summarize or simplify any text - handy!

In this short video I show how to set it up
  • 2 replies
ยท
reacted to m-ric's post with โž• 4 months ago
view post
Post
3398
๐—š๐—ผ๐—ผ๐—ด๐—น๐—ฒ ๐—ฝ๐—ฎ๐—ฝ๐—ฒ๐—ฟ : ๐˜€๐—ฐ๐—ฎ๐—น๐—ถ๐—ป๐—ด ๐˜‚๐—ฝ ๐—ถ๐—ป๐—ณ๐—ฒ๐—ฟ๐—ฒ๐—ป๐—ฐ๐—ฒ ๐—ฐ๐—ผ๐—บ๐—ฝ๐˜‚๐˜๐—ฒ ๐—ฏ๐—ฒ๐—ฎ๐˜๐˜€ ๐Ÿญ๐Ÿฐ๐˜… ๐—น๐—ฎ๐—ฟ๐—ด๐—ฒ๐—ฟ ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น๐˜€ ๐Ÿš€

Remember scaling laws? These are empirical laws that say "the bigger your model, the better it gets". More precisely, "as your compute increases exponentially, loss decreases in a linear fashion". They have wild implications, suggesting that spending 100x more training compute would make you super-LLMs. That's why companies are racing to build the biggest AI superclusters ever, and Meta bought 350k H100 GPUs, which probably cost in the order of $1B.

But think of this : we're building huge reasoning machines, but only ask them to do one pass through the model to get one token of the final answer : i.e., we expend a minimal effort on inference. That's like building a Caterpillar truck and making it run on a lawnmower's motor. ๐Ÿšš๐Ÿ›ต Couldn't we optimize this? ๐Ÿค”

๐Ÿ’ก So instead of scaling up on training by training even bigger models on many more trillions of tokens, Google researchers explored this under-explored avenue : scaling up inference compute.

They combine two methods to use more compute : either a reviser that iterated to adapt the model distribution, or generate N different completions (for instance through Beam Search) and select only the best one using an additional verifier model.

They use a Palm-2 model (released in May 23) on the MATH dataset : Palm-2 has the advantage of getting a low performance on MATH, but not zero, so that improvements will be noticeable.

And the results show that for the same fixed amount of inference compute:
๐Ÿ’ฅ a smaller model with more effort on decoding beats a x14 bigger model using naive greedy sampling.

That means that you can divide your training costs by 14 and still get the same perf for the same inference cost!

Take that, scaling laws. Mark Zuckerberg, you're welcome, hope I can get some of these H100s.

Read the paper here ๐Ÿ‘‰ Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters (2408.03314)
  • 1 reply
ยท
reacted to victor's post with โค๏ธ 4 months ago
view post
Post
5562
๐Ÿ™‹ Calling all Hugging Face users! We want to hear from YOU!

What feature or improvement would make the biggest impact on Hugging Face?

Whether it's the Hub, better documentation, new integrations, or something completely different โ€“ we're all ears!

Your feedback shapes the future of Hugging Face. Drop your ideas in the comments below! ๐Ÿ‘‡
ยท