Sylvain Filoni's picture

Sylvain Filoni

fffiloni

AI & ML interests

ML for Animation โ€ข Alumni Arts Dรฉco Paris โ€ข PSL

Recent Activity

updated a Space about 19 hours ago
fffiloni/KEEP-docker
liked a Space 2 days ago
rf-inversion/RF-inversion
updated a collection 2 days ago
Style Transfer
View all activity

Articles

Organizations

Notebooks-explorers's profile picture AI FILMS's profile picture Prodia Labs's profile picture Hugging Face Fellows's profile picture Nanomenta ML's profile picture temp-org's profile picture Blog-explorers's profile picture huggingPartyParis's profile picture The Collectionists's profile picture ZeroGPU Explorers's profile picture Workshop ENSAD DG's profile picture Gradio Templates's profile picture ENSAD DO's profile picture Social Post Explorers's profile picture Top Contributors: Space Likes's profile picture Top Contributors: Profile Followers's profile picture Telescope Optique Unterlinden's profile picture Dev Mode Explorers's profile picture Nerdy Face's profile picture

fffiloni's activity

reacted to thomwolf's post with ๐Ÿ”ฅ 16 days ago
view post
Post
4333
We are proud to announce HuggingFaceFW/fineweb-2: A sparkling update to HuggingFaceFW/fineweb with 1000s of ๐Ÿ—ฃ๏ธlanguages.

We applied the same data-driven approach that led to SOTA English performance in๐Ÿท FineWeb to thousands of languages.

๐Ÿฅ‚ FineWeb2 has 8TB of compressed text data and outperforms other multilingual datasets in our experiments.

The dataset is released under the permissive ๐Ÿ“œ ODC-By 1.0 license, and the ๐Ÿ’ป code to reproduce it and our evaluations is public.

We will very soon announce a big community project, and are working on a ๐Ÿ“ blogpost walking you through the entire dataset creation process. Stay tuned!

In the mean time come ask us question on our chat place: HuggingFaceFW/discussion

H/t @guipenedo @hynky @lvwerra as well as @vsabolcec Bettina Messmer @negar-foroutan and @mjaggi
  • 2 replies
ยท
reacted to MonsterMMORPG's post with ๐Ÿ‘€ 20 days ago
view post
Post
2648
FLUX Tools Complete Tutorial with SwarmUI (as easy as Automatic1111 or Forge) : Outpainting, Inpainting, Redux Style Transfer + Re-Imagine + Combine Multiple Images, Depth and Canny - More info at the oldest comment - No-paywall : https://youtu.be/hewDdVJEqOQ

FLUX.1 Tools by BlackForestLabs changed the #AI field forever. They became the number 1 Open Source community provider after this massive release. In this tutorial, I will show you step by step how use FLUX.1 Fill model (inpainting model) to do perfect outpainting (yes this model used for outpainting) and inpainting. Moreover, I will show all features of FLUX Redux model to do style transfer / re-imagine 1 and more than 1 images combination. Furthermore, I will show you step by step how to convert input image into Depth or Canny maps and then how to use them on #FLUX Depth and Canny models. Both LoRA and full checkpoints of FLUX Depth and Canny.

๐Ÿ”— Full Instructions, Configs, Installers, Information and Links Shared Post (the one used in the tutorial - Public - no-paywall) โคต๏ธ
โ–ถ๏ธ https://www.patreon.com/posts/tutorial-instructions-links-public-post-106135985

Preparation of this tutorial took more than 1 week and this will be the very best and easiest to follow tutorial since it is made with famous #SwarmUI. SwarmUI is as easy and as advanced as Automatic1111 SD Web UI. Biggest advantage of SwarmUI is that, it uses ComfyUI as a back-end. Therefore, It is extremely fast, VRAM optimized and supports all of the newest SOTA models as soon as they are published.

So in this tutorial I will show you how to setup SwarmUI and FLUX Dev tools on your Windows Computer, Massed Compute, RunPod and Kaggle. I will step by step explanatin and show you every tips and tricks that you need to properly do style transfer, re-imagine, inpaint, outpaint, depth and canny with FLUX.

Video Chapters first image



reacted to m-ric's post with โค๏ธ about 1 month ago
view post
Post
3780
๐—ง๐—ต๐—ฒ ๐—ป๐—ฒ๐˜…๐˜ ๐—ฏ๐—ถ๐—ด ๐˜€๐—ผ๐—ฐ๐—ถ๐—ฎ๐—น ๐—ป๐—ฒ๐˜๐˜„๐—ผ๐—ฟ๐—ธ ๐—ถ๐˜€ ๐—ป๐—ผ๐˜ ๐Ÿฆ‹, ๐—ถ๐˜'๐˜€ ๐—›๐˜‚๐—ฏ ๐—ฃ๐—ผ๐˜€๐˜๐˜€! [INSERT STONKS MEME WITH LASER EYES]

See below: I got 105k impressions since regularly posting Hub Posts, coming close to my 275k on Twitter!

โš™๏ธ Computed with the great dataset maxiw/hf-posts
โš™๏ธ Thanks to Qwen2.5-Coder-32B for showing me how to access dict attributes in a SQL request!

cc @merve who's far in front of me
ยท
posted an update about 1 month ago
reacted to abhishek's post with ๐Ÿ”ฅ about 2 months ago
view post
Post
5410
INTRODUCING Hugging Face AutoTrain Client ๐Ÿ”ฅ
Fine-tuning models got even easier!!!!
Now you can fine-tune SOTA models on all compatible dataset-model pairs on Hugging Face Hub using Python on Hugging Face Servers. Choose from a number of GPU flavors, millions of models and dataset pairs and 10+ tasks ๐Ÿค—

To try, install autotrain-advanced using pip. You can ignore dependencies and install without --no-deps and then you'd need to install some dependencies by hand.

"pip install autotrain-advanced"

Github repo: https://github.com/huggingface/autotrain-advanced
  • 6 replies
ยท
reacted to MoritzLaurer's post with ๐Ÿš€๐Ÿค— 3 months ago
view post
Post
4510
#phdone - I defended my PhD yesterday! A key lesson: it is amazing how open science and open source can empower beginners with limited resources:

I first learned about instruction-based classifiers like BERT-NLI 3-4 years ago, through the @HuggingFace ZeroShotClassificationPipeline. Digging deeper into this, it was surprisingly easy to find new datasets, newer base models, and reusable fine-tuning scripts on the HF Hub to create my own zeroshot models - although I didn't know much about fine-tuning at the time.

Thanks to the community effect of the Hub, my models were downloaded hundreds of thousands of times after a few months. Seeing my research being useful for people motivated me to improve and upload newer models. Leaving my contact details in the model cards led to academic cooperation and consulting contracts (and eventually my job at HF).

That's the power of open science & open source: learning, sharing, improving, collaborating.

I mean every word in my thesis acknowledgments (screenshot). I'm very grateful to my supervisors @vanatteveldt @CasAndreu @KasperWelbers for their guidance; to @profAndreaRenda and @CEPS_thinktank for enabling me to work part-time during the first year; to @huggingface for creating awesome tools and an awesome platform; and to many others who are not active on social media.

Links to the full thesis and the collection of my most recent models are below.

PS: If someone happens to speak Latin, let me know if my diploma contains some hidden Illuminati code or something :D
ยท
posted an update 3 months ago
view post
Post
14815
Visionary Walter Murch (editor for Francis Ford Coppola), in 1999:

โ€œ So let's suppose a technical apotheosis some time in the middle of the 21st century, when it somehow becomes possible for one person to make an entire feature film, with virtual actors. Would this be a good thing?

If the history of oil painting is any guide, the broadest answer would be yes, with the obvious caution to keep a wary eye on the destabilizing effect of following too intently a hermetically personal vision. One need only look at the unraveling of painting or classical music in the 20th century to see the risks.

Let's go even further, and force the issue to its ultimate conclusion by supposing the diabolical invention of a black box that could directly convert a single person's thoughts into a viewable cinematic reality. You would attach a series of electrodes to various points on your skull and simply think the film into existence.

And since we are time-traveling, let us present this hypothetical invention as a Faustian bargain to the future filmmakers of the 21st century. If this box were offered by some mysterious cloaked figure in exchange for your eternal soul, would you take it?

The kind of filmmakers who would accept, even leap, at the offer are driven by the desire to see their own vision on screen in as pure a form as possible. They accept present levels of collaboration as the evil necessary to achieve this vision. Alfred Hitchcock, I imagine, would be one of them, judging from his description of the creative process: "The film is already made in my head before we start shooting."โ€
โ€”
Read "A Digital Cinema of the Mind? Could Be" by Walter Murch: https://archive.nytimes.com/www.nytimes.com/library/film/050299future-film.html

  • 1 reply
ยท
reacted to singhsidhukuldeep's post with ๐Ÿ”ฅ 3 months ago
view post
Post
2550
Good folks at Meta has just unveiled Llama 3.2, pushing the boundaries of language models and computer vision.

Even more interesting is how they trained this cutting-edge model:

1๏ธโƒฃ Architecture:
Llama 3.2 uses an optimized transformer architecture with auto-regressive capabilities. The largest models (11B and 90B) now support multimodal inputs, integrating both text and images.

2๏ธโƒฃ Training Pipeline:
โ€ข Started with pretrained Llama 3.1 text models
โ€ข Added image adapters and encoders
โ€ข Pretrained on large-scale noisy (image, text) pair data
โ€ข Fine-tuned on high-quality in-domain and knowledge-enhanced (image, text) pairs

3๏ธโƒฃ Vision Integration:
โ€ข Trained adapter weights to integrate a pre-trained image encoder
โ€ข Used cross-attention layers to feed image representations into the language model
โ€ข Preserved text-only capabilities by not updating language model parameters during adapter training

4๏ธโƒฃ Post-Training Alignment:
โ€ข Multiple rounds of supervised fine-tuning (SFT)
โ€ข Rejection sampling (RS)
โ€ข Direct preference optimization (DPO)
โ€ข Synthetic data generation using Llama 3.1 for Q&A augmentation
โ€ข Reward model ranking for high-quality fine-tuning data

5๏ธโƒฃ Lightweight Models:
โ€ข Used pruning and distillation techniques for 1B and 3B models
โ€ข Structured pruning from Llama 3.1 8B model
โ€ข Knowledge distillation using Llama 3.1 8B and 70B as teachers

6๏ธโƒฃ Context Length:
All models support an impressive 128K token context length.

7๏ธโƒฃ Safety Measures:
Incorporated safety mitigation data to balance helpfulness and safety.

The result? A suite of models ranging from edge-friendly 1B parameters to powerful 90B parameter versions, capable of sophisticated reasoning across text and images. Llama 3.2 is set to revolutionize AI applications from mobile devices to enterprise-scale solutions.

What are your thoughts on these advancements? How do you see Llama 3.2 impacting your industry? Let's discuss in the comments!
reacted to jsulz's post with ๐Ÿš€ 3 months ago
view post
Post
2043
In August, the XetHub team joined Hugging Face
- https://huggingface.co/blog/xethub-joins-hf - and weโ€™ve been rolling up our sleeves to bring the best of both worlds together. We started with a deep dive into the current state of files stored with Git LFS on the Hub.

Getting this information was no small feat. We had to:
* Analyze a complete database dump of all repositories and files stored in Git LFS across Hugging Face.
* Parse through metadata on file sizes and types to accurately map the storage breakdown across Spaces, Models, and Datasets.

You can read more about the findings (with some jaw-dropping stats + charts) here https://www.linkedin.com/feed/update/urn:li:activity:7244486280351285248
reacted to asoria's post with ๐Ÿ‘ 3 months ago
view post
Post
2459
๐Ÿ“ I wrote a tutorial on how to get started with the fine-tuning process using Hugging Face tools, providing an end-to-end workflow.

The tutorial covers creating a new dataset using the new SQL Console ๐Ÿ›ข and fine-tuning a model with SFT, guided by the Notebook Creator App ๐Ÿ“™.

๐Ÿ‘‰ You can read the full article here:
https://huggingface.co/blog/asoria/easy-fine-tuning-with-hf
asoria/auto-notebook-creator
reacted to fdaudens's post with ๐Ÿ‘ 3 months ago
view post
Post
1143
๐Ÿš€ Your AI toolkit just got a major upgrade! I updated the Journalists on Hugging Face community's collection with tools for investigative work, content creation, and data analysis.

Sharing these new additions with the links in case itโ€™s helpful:
- @wendys-llc 's excellent 6-part video series on AI for investigative journalism https://www.youtube.com/playlist?list=PLewNEVDy7gq1_GPUaL0OQ31QsiHP5ncAQ
- @jeremycaplan 's curated AI Spaces on HF https://wondertools.substack.com/p/huggingface
- @Xenova 's Whisper Timestamped (with diarization!) for private, on-device transcription Xenova/whisper-speaker-diarization & Xenova/whisper-word-level-timestamps
- Flux models for image gen & LoRAs autotrain-projects/train-flux-lora-ease
- FineGrain's object cutter finegrain/finegrain-object-cutter and object eraser (this one's cool) finegrain/finegrain-object-eraser
- FineVideo: massive open-source annotated dataset + explorer HuggingFaceFV/FineVideo-Explorer
- Qwen2 chat demos, including 2.5 & multimodal versions (crushing it on handwriting recognition) Qwen/Qwen2.5 & Qwen/Qwen2-VL
- GOT-OCR integration stepfun-ai/GOT_official_online_demo
- HTML to Markdown converter maxiw/HTML-to-Markdown
- Text-to-SQL query tool by @davidberenstein1957 for HF datasets davidberenstein1957/text-to-sql-hub-datasets

There's a lot of potential here for journalism and beyond. Give these a try and let me know what you build!

You can also add your favorite ones if you're part of the community!

Check it out: https://huggingface.co/JournalistsonHF

#AIforJournalism #HuggingFace #OpenSourceAI
reacted to davanstrien's post with ๐Ÿ‘ 6 months ago
reacted to alvdansen's post with ๐Ÿ‘ 6 months ago
view post
Post
5806
New LoRA Model!

I trained this model on a new spot I'm really excited to share (soon!)

This Monday I will be posting my first beginning to end blog showing the tool I've used, dataset, captioning techniques, and parameters to finetune this LoRA.

For now, check out the model in the link below.

alvdansen/m3lt
ยท
reacted to DmitryRyumin's post with ๐Ÿ”ฅ 6 months ago
view post
Post
3642
๐Ÿš€๐ŸŽญ๐ŸŒŸ New Research Alert - Portrait4D-v2 (Avatars Collection)! ๐ŸŒŸ๐ŸŽญ๐Ÿš€
๐Ÿ“„ Title: Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer ๐Ÿ”

๐Ÿ“ Description: Portrait4D-v2 is a novel method for one-shot 4D head avatar synthesis using pseudo multi-view videos and a vision transformer backbone, achieving superior performance without relying on 3DMM reconstruction.

๐Ÿ‘ฅ Authors: Yu Deng, Duomin Wang, and Baoyuan Wang

๐Ÿ“„ Paper: Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer (2403.13570)

๐ŸŒ GitHub Page: https://yudeng.github.io/Portrait4D-v2/
๐Ÿ“ Repository: https://github.com/YuDeng/Portrait-4D

๐Ÿ“บ Video: https://www.youtube.com/watch?v=5YJY6-wcOJo

๐Ÿš€ CVPR-2023-24-Papers: https://github.com/DmitryRyumin/CVPR-2023-24-Papers

๐Ÿ“š More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

๐Ÿš€ Added to the Avatars Collection: DmitryRyumin/avatars-65df37cdf81fec13d4dbac36

๐Ÿ” Keywords: Portrait4D #4DAvatar #HeadSynthesis #3DModeling #TechInnovation #DeepLearning #ComputerGraphics #ComputerVision #Innovation
  • 1 reply
ยท
reacted to alvdansen's post with ๐Ÿš€ 6 months ago
view post
Post
2550
Per popular request, I'm working on a beginning to end LoRA training workflow blog for a style.

It will focus on dataset curation through training on a pre-determined style to give a better insight on my process.

Curious what are some questions you might have that I can try to answer in it?
reacted to louisbrulenaudet's post with ๐Ÿ‘ 6 months ago
view post
Post
3233
I am delighted to announce the publication of my LegalKit, a French labeled dataset built for legal ML training ๐Ÿค—

This dataset comprises multiple query-document pairs (+50k) curated for training sentence embedding models within the domain of French law.

The labeling process follows a systematic approach to ensure consistency and relevance:
- Initial Query Generation: Three instances of the LLaMA-3-70B model independently generate three different queries based on the same document.
- Selection of Optimal Query: A fourth instance of the LLaMA-3-70B model, using a dedicated selection prompt, evaluates the generated queries and selects the most suitable one.
- Final Label Assignment: The chosen query is used to label the document, aiming to ensure that the label accurately reflects the content and context of the original text.

Dataset: louisbrulenaudet/legalkit

Stay tuned for further updates and release information ๐Ÿ”ฅ

@clem , if we can create an "HF for Legal" organization, similar to what exists for journalists, I am available!

Note : My special thanks to @alvdansen for their illustration models โค๏ธ
  • 2 replies
ยท
reacted to fdaudens's post with ๐Ÿš€ 6 months ago
view post
Post
3414
Updated the Journalists on ๐Ÿค— community page:
- new text-to-speech tools collection JournalistsonHF/text-to-speech-6675c4dccdaa11e86928a15b
- additional leaderboards in the eval collection: TTS-AGI/TTS-Arena and dylanebert/3d-arena
- new tools in the Text-Analysis collection: gokaygokay/Florence-2, pdf2dataset/pdf2dataset, cvachet/pdf-chatbot
- Xenova/realtime-whisper-webgpu in the Transcription collection
- radames/flash-sd3-taesd3 in the Image Tools collection
- Last but not least, okaris/omni-zero in the fun collection for zero-shot stylized portrait creation

Is there any tool you would like to see added?

Find all the curated tools here: https://huggingface.co/collections/JournalistsonHF/
reacted to alvdansen's post with โค๏ธ 6 months ago
view post
Post
6848
I had a backlog of LoRA model weights for SDXL that I decided to prioritize this weekend and publish. I know many are using SD3 right now, however if you have the time to try them, I hope you enjoy them.

I intend to start writing more fully on the thought process behind my approach to curating and training style and subject finetuning, beginning this next week.

Thank you for reading this post! You can find the models on my page and I'll drop a few previews here.
ยท
reacted to harpreetsahota's post with ๐Ÿ‘ 7 months ago
view post
Post
2136
The Coachella of Computer Vision, CVPR, is right around the corner. In anticipation of the conference, I curated a dataset of the papers.

I'll have a technical blog post out tomorrow doing some analysis on the dataset, but I'm so hyped that I wanted to get it out to the community ASAP.

The dataset consists of the following fields:

- An image of the first page of the paper
- title: The title of the paper
- authors_list: The list of authors
- abstract: The abstract of the paper
- arxiv_link: Link to the paper on arXiv
- other_link: Link to the project page, if found
- category_name: The primary category this paper according to [arXiv taxonomy](https://arxiv.org/category_taxonomy)
- all_categories: All categories this paper falls into, according to arXiv taxonomy
- keywords: Extracted using GPT-4o

Here's how I created the dataset ๐Ÿ‘‡๐Ÿผ

Generic code for building this dataset can be found [here](https://github.com/harpreetsahota204/CVPR-2024-Papers).

This dataset was built using the following steps:

- Scrape the CVPR 2024 website for accepted papers
- Use DuckDuckGo to search for a link to the paper's abstract on arXiv
- Use arXiv.py (python wrapper for the arXiv API) to extract the abstract and categories, and download the pdf for each paper
- Use pdf2image to save the image of paper's first page
- Use GPT-4o to extract keywords from the abstract

Voxel51/CVPR_2024_Papers