Daniel Serrano

dnlserrano

https://dnlserrano.dev

AI & ML interests

computer vision, biometrics, face, facial recognition, deepfakes, pad, mad, age, bias

Recent Activity

updated a model 3 days ago

dnlserrano/vit-base-patch16-224-in21k-finetuned-lora-MSU-MFSD

published a model 4 days ago

dnlserrano/vit-base-patch16-224-in21k-finetuned-lora-MSU-MFSD

updated a model 5 days ago

dnlserrano/vit-base-patch16-224-in21k-finetuned-lora-food101

View all activity

Organizations

None yet

dnlserrano's activity

updated a model 3 days ago

dnlserrano/vit-base-patch16-224-in21k-finetuned-lora-MSU-MFSD

Updated 3 days ago

published a model 4 days ago

dnlserrano/vit-base-patch16-224-in21k-finetuned-lora-MSU-MFSD

Updated 3 days ago

updated a model 5 days ago

dnlserrano/vit-base-patch16-224-in21k-finetuned-lora-food101

Updated 5 days ago

published a model 5 days ago

dnlserrano/vit-base-patch16-224-in21k-finetuned-lora-food101

Updated 5 days ago

liked 2 models about 1 month ago

laion/CLIP-convnext_base_w-laion2B-s13B-b82K-augreg

Zero-Shot Image Classification • Updated Apr 18, 2023 • 753k • 7

HuggingFaceTB/SmolLM2-1.7B-Instruct

Text Generation • Updated 7 days ago • 604k • • 573

liked a model about 2 months ago

bullerwins/DeepSeek-V3-GGUF

Text Generation • Updated 22 days ago • 13.9k • 101

upvoted an article about 2 months ago

Article

Introduction to ggml

Aug 13, 2024

• 165

liked 5 models about 2 months ago

liked a Space 2 months ago

312

Qwen-VL-Max

📷

Interact with images and texts using Qwen-VL-Max

upvoted a paper 3 months ago

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published Dec 13, 2024 • 140

reacted to merve's post with 🔥 4 months ago

Post

5441

Another great week in open ML!
Here's a small recap 🫰🏻

Model releases
⏯️ Video Language Models
AI at Meta released Vision-CAIR/LongVU_Qwen2_7B, a new state-of-the-art long video LM model based on DINOv2, SigLIP, Qwen2 and Llama 3.2

💬 Small language models
Hugging Face released HuggingFaceTB/SmolLM2-1.7B, a family of new smol language models with Apache 2.0 license that come in sizes 135M, 360M and 1.7B, along with datasets.
Meta released facebook/MobileLLM-1B, a new family of on-device LLMs of sizes 125M, 350M and 600M

🖼️ Image Generation
Stability AI released stabilityai/stable-diffusion-3.5-medium, a 2B model with commercially permissive license

🖼️💬Any-to-Any
gpt-omni/mini-omni2 is closest reproduction to GPT-4o, a new LLM that can take image-text-audio input and output speech is released!

Dataset releases
🖼️ Spawning/PD12M, a new captioning dataset of 12.4 million examples generated using Florence-2

reacted to merve's post with 🔥 5 months ago

Post

3798

Meta AI vision has been cooking @facebook
They shipped multiple models and demos for their papers at @ECCV 🤗

Here's a compilation of my top picks:
- Sapiens is family of foundation models for human-centric depth estimation, segmentation and more, all models have open weights and demos 👏

All models have their demos and even torchscript checkpoints!
A collection of models and demos: facebook/sapiens-66d22047daa6402d565cb2fc
- VFusion3D is state-of-the-art consistent 3D generation model from images

Model: facebook/vfusion3d
Demo: facebook/VFusion3D

- CoTracker is the state-of-the-art point (pixel) tracking model

Demo: facebook/cotracker
Model: facebook/cotracker

liked 3 models 6 months ago

guoyww/animatediff-motion-adapter-v1-5-2

Text-to-Video • Updated Nov 3, 2023 • 997 • 25

stabilityai/stable-video-diffusion-img2vid-xt

Image-to-Video • Updated Jul 10, 2024 • 662k • 2.94k

openai/whisper-large-v3

Automatic Speech Recognition • Updated Aug 12, 2024 • 4.16M • • 4.15k