Sylvain Filoni

fffiloni

AI & ML interests

ML for Animation • Alumni Arts Déco Paris • PSL

Articles

Organizations

fffiloni's activity

reacted to abhishek's post with 🔥 about 4 hours ago
view post
Post
2133
INTRODUCING Hugging Face AutoTrain Client 🔥
Fine-tuning models got even easier!!!!
Now you can fine-tune SOTA models on all compatible dataset-model pairs on Hugging Face Hub using Python on Hugging Face Servers. Choose from a number of GPU flavors, millions of models and dataset pairs and 10+ tasks 🤗

To try, install autotrain-advanced using pip. You can ignore dependencies and install without --no-deps and then you'd need to install some dependencies by hand.

"pip install autotrain-advanced"

Github repo: https://github.com/huggingface/autotrain-advanced
reacted to MoritzLaurer's post with 🚀🤗 about 1 month ago
view post
Post
3807
#phdone - I defended my PhD yesterday! A key lesson: it is amazing how open science and open source can empower beginners with limited resources:

I first learned about instruction-based classifiers like BERT-NLI 3-4 years ago, through the @HuggingFace ZeroShotClassificationPipeline. Digging deeper into this, it was surprisingly easy to find new datasets, newer base models, and reusable fine-tuning scripts on the HF Hub to create my own zeroshot models - although I didn't know much about fine-tuning at the time.

Thanks to the community effect of the Hub, my models were downloaded hundreds of thousands of times after a few months. Seeing my research being useful for people motivated me to improve and upload newer models. Leaving my contact details in the model cards led to academic cooperation and consulting contracts (and eventually my job at HF).

That's the power of open science & open source: learning, sharing, improving, collaborating.

I mean every word in my thesis acknowledgments (screenshot). I'm very grateful to my supervisors @vanatteveldt @CasAndreu @KasperWelbers for their guidance; to @profAndreaRenda and @CEPS_thinktank for enabling me to work part-time during the first year; to @huggingface for creating awesome tools and an awesome platform; and to many others who are not active on social media.

Links to the full thesis and the collection of my most recent models are below.

PS: If someone happens to speak Latin, let me know if my diploma contains some hidden Illuminati code or something :D
·
posted an update about 1 month ago
view post
Post
7756
Visionary Walter Murch (editor for Francis Ford Coppola), in 1999:

“ So let's suppose a technical apotheosis some time in the middle of the 21st century, when it somehow becomes possible for one person to make an entire feature film, with virtual actors. Would this be a good thing?

If the history of oil painting is any guide, the broadest answer would be yes, with the obvious caution to keep a wary eye on the destabilizing effect of following too intently a hermetically personal vision. One need only look at the unraveling of painting or classical music in the 20th century to see the risks.

Let's go even further, and force the issue to its ultimate conclusion by supposing the diabolical invention of a black box that could directly convert a single person's thoughts into a viewable cinematic reality. You would attach a series of electrodes to various points on your skull and simply think the film into existence.

And since we are time-traveling, let us present this hypothetical invention as a Faustian bargain to the future filmmakers of the 21st century. If this box were offered by some mysterious cloaked figure in exchange for your eternal soul, would you take it?

The kind of filmmakers who would accept, even leap, at the offer are driven by the desire to see their own vision on screen in as pure a form as possible. They accept present levels of collaboration as the evil necessary to achieve this vision. Alfred Hitchcock, I imagine, would be one of them, judging from his description of the creative process: "The film is already made in my head before we start shooting."”

Read "A Digital Cinema of the Mind? Could Be" by Walter Murch: https://archive.nytimes.com/www.nytimes.com/library/film/050299future-film.html

  • 1 reply
·
reacted to singhsidhukuldeep's post with 🔥 about 1 month ago
view post
Post
2453
Good folks at Meta has just unveiled Llama 3.2, pushing the boundaries of language models and computer vision.

Even more interesting is how they trained this cutting-edge model:

1️⃣ Architecture:
Llama 3.2 uses an optimized transformer architecture with auto-regressive capabilities. The largest models (11B and 90B) now support multimodal inputs, integrating both text and images.

2️⃣ Training Pipeline:
• Started with pretrained Llama 3.1 text models
• Added image adapters and encoders
• Pretrained on large-scale noisy (image, text) pair data
• Fine-tuned on high-quality in-domain and knowledge-enhanced (image, text) pairs

3️⃣ Vision Integration:
• Trained adapter weights to integrate a pre-trained image encoder
• Used cross-attention layers to feed image representations into the language model
• Preserved text-only capabilities by not updating language model parameters during adapter training

4️⃣ Post-Training Alignment:
• Multiple rounds of supervised fine-tuning (SFT)
• Rejection sampling (RS)
• Direct preference optimization (DPO)
• Synthetic data generation using Llama 3.1 for Q&A augmentation
• Reward model ranking for high-quality fine-tuning data

5️⃣ Lightweight Models:
• Used pruning and distillation techniques for 1B and 3B models
• Structured pruning from Llama 3.1 8B model
• Knowledge distillation using Llama 3.1 8B and 70B as teachers

6️⃣ Context Length:
All models support an impressive 128K token context length.

7️⃣ Safety Measures:
Incorporated safety mitigation data to balance helpfulness and safety.

The result? A suite of models ranging from edge-friendly 1B parameters to powerful 90B parameter versions, capable of sophisticated reasoning across text and images. Llama 3.2 is set to revolutionize AI applications from mobile devices to enterprise-scale solutions.

What are your thoughts on these advancements? How do you see Llama 3.2 impacting your industry? Let's discuss in the comments!
reacted to jsulz's post with 🚀 about 1 month ago
view post
Post
1796
In August, the XetHub team joined Hugging Face
- https://huggingface.co/blog/xethub-joins-hf - and we’ve been rolling up our sleeves to bring the best of both worlds together. We started with a deep dive into the current state of files stored with Git LFS on the Hub.

Getting this information was no small feat. We had to:
* Analyze a complete database dump of all repositories and files stored in Git LFS across Hugging Face.
* Parse through metadata on file sizes and types to accurately map the storage breakdown across Spaces, Models, and Datasets.

You can read more about the findings (with some jaw-dropping stats + charts) here https://www.linkedin.com/feed/update/urn:li:activity:7244486280351285248
reacted to asoria's post with 👍 about 1 month ago
reacted to fdaudens's post with 👍 about 2 months ago
view post
Post
835
🚀 Your AI toolkit just got a major upgrade! I updated the Journalists on Hugging Face community's collection with tools for investigative work, content creation, and data analysis.

Sharing these new additions with the links in case it’s helpful:
- @wendys-llc 's excellent 6-part video series on AI for investigative journalism https://www.youtube.com/playlist?list=PLewNEVDy7gq1_GPUaL0OQ31QsiHP5ncAQ
- @jeremycaplan 's curated AI Spaces on HF https://wondertools.substack.com/p/huggingface
- @Xenova 's Whisper Timestamped (with diarization!) for private, on-device transcription Xenova/whisper-speaker-diarization & Xenova/whisper-word-level-timestamps
- Flux models for image gen & LoRAs autotrain-projects/train-flux-lora-ease
- FineGrain's object cutter finegrain/finegrain-object-cutter and object eraser (this one's cool) finegrain/finegrain-object-eraser
- FineVideo: massive open-source annotated dataset + explorer HuggingFaceFV/FineVideo-Explorer
- Qwen2 chat demos, including 2.5 & multimodal versions (crushing it on handwriting recognition) Qwen/Qwen2.5 & Qwen/Qwen2-VL
- GOT-OCR integration stepfun-ai/GOT_official_online_demo
- HTML to Markdown converter maxiw/HTML-to-Markdown
- Text-to-SQL query tool by @davidberenstein1957 for HF datasets davidberenstein1957/text-to-sql-hub-datasets

There's a lot of potential here for journalism and beyond. Give these a try and let me know what you build!

You can also add your favorite ones if you're part of the community!

Check it out: https://huggingface.co/JournalistsonHF

#AIforJournalism #HuggingFace #OpenSourceAI
reacted to davanstrien's post with 👍 4 months ago
reacted to alvdansen's post with 👍 4 months ago
view post
Post
5705
New LoRA Model!

I trained this model on a new spot I'm really excited to share (soon!)

This Monday I will be posting my first beginning to end blog showing the tool I've used, dataset, captioning techniques, and parameters to finetune this LoRA.

For now, check out the model in the link below.

alvdansen/m3lt
·
reacted to DmitryRyumin's post with 🔥 4 months ago
view post
Post
3581
🚀🎭🌟 New Research Alert - Portrait4D-v2 (Avatars Collection)! 🌟🎭🚀
📄 Title: Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer 🔝

📝 Description: Portrait4D-v2 is a novel method for one-shot 4D head avatar synthesis using pseudo multi-view videos and a vision transformer backbone, achieving superior performance without relying on 3DMM reconstruction.

👥 Authors: Yu Deng, Duomin Wang, and Baoyuan Wang

📄 Paper: Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer (2403.13570)

🌐 GitHub Page: https://yudeng.github.io/Portrait4D-v2/
📁 Repository: https://github.com/YuDeng/Portrait-4D

📺 Video: https://www.youtube.com/watch?v=5YJY6-wcOJo

🚀 CVPR-2023-24-Papers: https://github.com/DmitryRyumin/CVPR-2023-24-Papers

📚 More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

🚀 Added to the Avatars Collection: DmitryRyumin/avatars-65df37cdf81fec13d4dbac36

🔍 Keywords: Portrait4D #4DAvatar #HeadSynthesis #3DModeling #TechInnovation #DeepLearning #ComputerGraphics #ComputerVision #Innovation
  • 1 reply
·
reacted to alvdansen's post with 🚀 5 months ago
view post
Post
2496
Per popular request, I'm working on a beginning to end LoRA training workflow blog for a style.

It will focus on dataset curation through training on a pre-determined style to give a better insight on my process.

Curious what are some questions you might have that I can try to answer in it?
reacted to louisbrulenaudet's post with 👍 5 months ago
view post
Post
3184
I am delighted to announce the publication of my LegalKit, a French labeled dataset built for legal ML training 🤗

This dataset comprises multiple query-document pairs (+50k) curated for training sentence embedding models within the domain of French law.

The labeling process follows a systematic approach to ensure consistency and relevance:
- Initial Query Generation: Three instances of the LLaMA-3-70B model independently generate three different queries based on the same document.
- Selection of Optimal Query: A fourth instance of the LLaMA-3-70B model, using a dedicated selection prompt, evaluates the generated queries and selects the most suitable one.
- Final Label Assignment: The chosen query is used to label the document, aiming to ensure that the label accurately reflects the content and context of the original text.

Dataset: louisbrulenaudet/legalkit

Stay tuned for further updates and release information 🔥

@clem , if we can create an "HF for Legal" organization, similar to what exists for journalists, I am available!

Note : My special thanks to @alvdansen for their illustration models ❤️
  • 2 replies
·
reacted to fdaudens's post with 🚀 5 months ago
view post
Post
3371
Updated the Journalists on 🤗 community page:
- new text-to-speech tools collection JournalistsonHF/text-to-speech-6675c4dccdaa11e86928a15b
- additional leaderboards in the eval collection: TTS-AGI/TTS-Arena and dylanebert/3d-arena
- new tools in the Text-Analysis collection: gokaygokay/Florence-2, pdf2dataset/pdf2dataset, cvachet/pdf-chatbot
- Xenova/realtime-whisper-webgpu in the Transcription collection
- radames/flash-sd3-taesd3 in the Image Tools collection
- Last but not least, okaris/omni-zero in the fun collection for zero-shot stylized portrait creation

Is there any tool you would like to see added?

Find all the curated tools here: https://huggingface.co/collections/JournalistsonHF/
reacted to alvdansen's post with ❤️ 5 months ago
view post
Post
6776
I had a backlog of LoRA model weights for SDXL that I decided to prioritize this weekend and publish. I know many are using SD3 right now, however if you have the time to try them, I hope you enjoy them.

I intend to start writing more fully on the thought process behind my approach to curating and training style and subject finetuning, beginning this next week.

Thank you for reading this post! You can find the models on my page and I'll drop a few previews here.
·
reacted to harpreetsahota's post with 👍 5 months ago
view post
Post
2067
The Coachella of Computer Vision, CVPR, is right around the corner. In anticipation of the conference, I curated a dataset of the papers.

I'll have a technical blog post out tomorrow doing some analysis on the dataset, but I'm so hyped that I wanted to get it out to the community ASAP.

The dataset consists of the following fields:

- An image of the first page of the paper
- title: The title of the paper
- authors_list: The list of authors
- abstract: The abstract of the paper
- arxiv_link: Link to the paper on arXiv
- other_link: Link to the project page, if found
- category_name: The primary category this paper according to [arXiv taxonomy](https://arxiv.org/category_taxonomy)
- all_categories: All categories this paper falls into, according to arXiv taxonomy
- keywords: Extracted using GPT-4o

Here's how I created the dataset 👇🏼

Generic code for building this dataset can be found [here](https://github.com/harpreetsahota204/CVPR-2024-Papers).

This dataset was built using the following steps:

- Scrape the CVPR 2024 website for accepted papers
- Use DuckDuckGo to search for a link to the paper's abstract on arXiv
- Use arXiv.py (python wrapper for the arXiv API) to extract the abstract and categories, and download the pdf for each paper
- Use pdf2image to save the image of paper's first page
- Use GPT-4o to extract keywords from the abstract

Voxel51/CVPR_2024_Papers
reacted to thomwolf's post with 🔥 5 months ago
view post
Post
4398
[New crazy blog post alert] We are releasing an extensive blog post on the science of creating high quality web-scale datasets, detailing all the steps and learnings that came in our recent 15 trillion tokens 🍷FineWeb release

Inspired by the distill.pub interactive graphics papers, we settled to write the most extensive, enjoyable and in-depth tech report we could draft on so prepare for a 45-mmin read with interactive graphics and all.

And it's not all, in this article we also introduce 📚FineWeb-Edu a filtered subset of Common Crawl with 1.3T tokens containing only web pages with very high educational content. Up to our knowledge, FineWeb-Edu out-performs all openly release web-scale datasets by a significant margin on knowledge- and reasoning-intensive benchmarks like MMLU, ARC, and OpenBookQA

We also make a number of surprising observations on the "quality" of the internet it-self which may challenge some of the general assumptions on web data (not saying more, I'll let you draw your conclusions ;)

HuggingFaceFW/blogpost-fineweb-v1
  • 1 reply
·
replied to their post 5 months ago
view reply

Tu veux préciser ton point de vue sur cette citation ? Les « experts » du CNC font ici un peu de prospective, sans forcément être accurate (malheureusement)

reacted to louisbrulenaudet's post with 🧠 6 months ago
view post
Post
518
Integrating the French Taxation Embedding Benchmark Task (beta) into the MTEB 🤗

I'm excited to announce an integration of the French Taxation Embedding Benchmark task into the Massive Text Embedding Benchmark (MTEB).

This addition expands the diverse set of tasks available within MTEB, enabling researchers and practitioners to develop and evaluate retrieval models focused on retrieving relevant tax articles or content based on provided queries.

Link to the 🤗 Dataset : louisbrulenaudet/tax-retrieval-benchmark

Link to the GitHub repo : https://github.com/louisbrulenaudet/tax-retrieval-benchmark

Notes:
The Massive Text Embedding Benchmark for French Taxation and the Dataset are currently in beta and may not be suitable for direct use in production. The size of the Dataset may not be sufficient to handle a wide range of queries and scenarios encountered in real-world settings.

As the Dataset grows and matures, I will provide updates and guidance on its suitability for production use cases.
posted an update 6 months ago
view post
Post
18575
🇫🇷
Quel impact de l’IA sur les filières du cinéma, de l’audiovisuel et du jeu vidéo?
Etude prospective à destination des professionnels
— CNC & BearingPoint | 09/04/2024

Si l’Intelligence Artificielle (IA) est utilisée de longue date dans les secteurs du cinéma, de l’audiovisuel et du jeu vidéo, les nouvelles applications de l’IA générative bousculent notre vision de ce dont est capable une machine et possèdent un potentiel de transformation inédit. Elles impressionnent par la qualité de leurs productions et suscitent par conséquent de nombreux débats, entre attentes et appréhensions.

Le CNC a donc décider de lancer un nouvel Observatoire de l’IA Afin de mieux comprendre les usages de l’IA et ses impacts réels sur la filière de l’image. Dans le cadre de cet Observatoire, le CNC a souhaité dresser un premier état des lieux à travers la cartographie des usages actuels ou potentiels de l’IA à chaque étape du processus de création et de diffusion d’une œuvre, en identifiant les opportunités et risques associés, notamment en termes de métiers et d’emploi. Cette étude CNC / Bearing Point en a présenté les principaux enseignements le 6 mars, lors de la journée CNC « Créer, produire, diffuser à l’heure de l’intelligence artificielle ».

Le CNC publie la version augmentée de la cartographie des usages de l’IA dans les filières du cinéma, de l’audiovisuel et du jeu vidéo.

Lien vers la cartographie complète: https://www.cnc.fr/documents/36995/2097582/Cartographie+des+usages+IA_rapport+complet.pdf/96532829-747e-b85e-c74b-af313072cab7?t=1712309387891
·