alkinun's picture

alkinun

AtAndDev

AI & ML interests

LLMs, Alignment, Merging, Unsloth, DPO, SFT, ORPO, SPIN..

Recent Activity

reacted to singhsidhukuldeep's post with 👀 1 day ago
I just came across a groundbreaking paper titled "Hypencoder: Hypernetworks for Information Retrieval" by researchers from the University of Massachusetts Amherst that introduces a fundamentally new paradigm for search technology. Most current retrieval models rely on simple inner product calculations between query and document vectors, which severely limits their expressiveness. The authors prove theoretically that inner product similarity functions fundamentally constrain what types of relevance relationships can be captured. Hypencoder takes a radically different approach: instead of encoding a query as a vector, it generates a small neural network (called a "q-net") that acts as a learned relevance function. This neural network takes document representations as input and produces relevance scores. Under the hood, Hypencoder uses: - Attention-based hypernetwork layers (hyperhead layers) that transform contextualized query embeddings into weights and biases for the q-net - A document encoder that produces vector representations similar to existing models - A graph-based greedy search algorithm for efficient retrieval that can search 8.8M documents in under 60ms The results are impressive - Hypencoder significantly outperforms strong dense retrieval models on standard benchmarks like MS MARCO and TREC Deep Learning Track. The performance gap widens even further on complex retrieval tasks like tip-of-the-tongue queries and instruction-following retrieval. What makes this approach particularly powerful is that neural networks are universal approximators, allowing Hypencoder to express far more complex relevance relationships than inner product similarity functions. The framework is also flexible enough to replicate any existing neural retrieval method while adding the ability to learn query-dependent weights.
liked a Space 5 days ago
mteb/leaderboard
replied to their post 13 days ago
@nroggendorff is that you sama?
View all activity

Organizations

ESPnet's profile picture CVPR Demo Track's profile picture BigScience Biomedical Datasets's profile picture ONNXConfig for all's profile picture video-p2p-library's profile picture Gradio-Themes-Party's profile picture Gradio-Blocks-Party's profile picture scikit-learn's profile picture Open-Source AI Meetup's profile picture lora concepts library's profile picture OpenBuddy Community's profile picture ECCV 2022's profile picture Kornia AI's profile picture Tune a video concepts library's profile picture SIGGRAPH 2022's profile picture Interspeech2022's profile picture Stable Diffusion concepts library's profile picture SIGGRAPH Asia 2022 Demos's profile picture Stable Diffusion Dreambooth Concepts Library's profile picture Musika's profile picture Blog-explorers's profile picture OpenSky's profile picture ICCV2023's profile picture ICML2023's profile picture huggingPartyParis's profile picture Multi🤖Transformers's profile picture Team Tonic's profile picture That Time I got Reincarnated as a Hugging Face Organization's profile picture ZeroGPU Explorers's profile picture Pirates Party for all software open source's profile picture MLX Community's profile picture recipe research's profile picture Narra's profile picture Social Post Explorers's profile picture Cognitive Computations's profile picture M4-ai's profile picture Spinner-GPT-4's profile picture Dev Mode Explorers's profile picture Stable Diffusion Community (Unofficial, Non-profit)'s profile picture Hugging Face Discord Community's profile picture Nerdy Face's profile picture OpenEndedLM's profile picture open/ acc's profile picture Data Is Better Together Contributor's profile picture None yet's profile picture

AtAndDev's activity

reacted to singhsidhukuldeep's post with 👀 1 day ago
view post
Post
1611
I just came across a groundbreaking paper titled "Hypencoder: Hypernetworks for Information Retrieval" by researchers from the University of Massachusetts Amherst that introduces a fundamentally new paradigm for search technology.

Most current retrieval models rely on simple inner product calculations between query and document vectors, which severely limits their expressiveness. The authors prove theoretically that inner product similarity functions fundamentally constrain what types of relevance relationships can be captured.

Hypencoder takes a radically different approach: instead of encoding a query as a vector, it generates a small neural network (called a "q-net") that acts as a learned relevance function. This neural network takes document representations as input and produces relevance scores.

Under the hood, Hypencoder uses:
- Attention-based hypernetwork layers (hyperhead layers) that transform contextualized query embeddings into weights and biases for the q-net
- A document encoder that produces vector representations similar to existing models
- A graph-based greedy search algorithm for efficient retrieval that can search 8.8M documents in under 60ms

The results are impressive - Hypencoder significantly outperforms strong dense retrieval models on standard benchmarks like MS MARCO and TREC Deep Learning Track. The performance gap widens even further on complex retrieval tasks like tip-of-the-tongue queries and instruction-following retrieval.

What makes this approach particularly powerful is that neural networks are universal approximators, allowing Hypencoder to express far more complex relevance relationships than inner product similarity functions. The framework is also flexible enough to replicate any existing neural retrieval method while adding the ability to learn query-dependent weights.

replied to their post 13 days ago
posted an update 13 days ago
reacted to nroggendorff's post with 🤗 13 days ago
view post
Post
2803
hello, dev mode explorers!
  • 2 replies
·
reacted to merve's post with ❤️🔥👍 13 days ago
view post
Post
4641
Your weekly recap of open AI is here, and it's packed with models! merve/feb-14-releases-67af876b404cc27c6d837767

👀 Multimodal
> OpenGVLab released InternVideo 2.5 Chat models, new video LMs with long context
> AIDC released Ovis2 model family along with Ovis dataset, new vision LMs in different sizes (1B, 2B, 4B, 8B, 16B, 34B), with video and OCR support
> ColQwenStella-2b is a multilingual visual retrieval model that is sota in it's size
> Hoags-2B-Exp is a new multilingual vision LM with contextual reasoning, long context video understanding

💬 LLMs
A lot of math models!
> Open-R1 team released OpenR1-Math-220k large scale math reasoning dataset, along with Qwen2.5-220K-Math fine-tuned on the dataset, OpenR1-Qwen-7B
> Nomic AI released new Nomic Embed multilingual retrieval model, a MoE with 500 params with 305M active params, outperforming other models
> DeepScaleR-1.5B-Preview is a new DeepSeek-R1-Distill fine-tune using distributed RL on math
> LIMO is a new fine-tune of Qwen2.5-32B-Instruct on Math

🗣️ Audio
> Zonos-v0.1 is a new family of speech recognition models, which contains the model itself and embeddings

🖼️ Vision and Image Generation
> We have ported DepthPro of Apple to transformers for your convenience!
> illustrious-xl-v1.0 is a new illustration generation model
·
reacted to m-ric's post with 🚀 13 days ago
view post
Post
2455
For those who haven't come across it yet, here's a handy trick to discuss an entire GitHub repo with an LLM:

=> Just replace "github" with "gitingest" in the url, and you get the whole repo as a single string that you can then paste in your LLMs
reacted to merve's post with 🚀 20 days ago
reacted to ginipick's post with 🚀🔥 20 days ago
view post
Post
5218
🌟 3D Llama Studio - AI 3D Generation Platform

📝 Project Overview
3D Llama Studio is an all-in-one AI platform that generates high-quality 3D models and stylized images from text or image inputs.

✨ Key Features

Text/Image to 3D Conversion 🎯

Generate 3D models from detailed text descriptions or reference images
Intuitive user interface

Text to Styled Image Generation 🎨

Customizable image generation settings
Adjustable resolution, generation steps, and guidance scale
Supports both English and Korean prompts

🛠️ Technical Features

Gradio-based web interface
Dark theme UI/UX
Real-time image generation and 3D modeling

💫 Highlights

User-friendly interface
Real-time preview
Random seed generation
High-resolution output support (up to 2048x2048)

🎯 Applications

Product design
Game asset creation
Architectural visualization
Educational 3D content

🔗 Try It Now!
Experience 3D Llama Studio:

ginigen/3D-LLAMA

#AI #3DGeneration #MachineLearning #ComputerVision #DeepLearning
reacted to fdaudens's post with 🔥 25 days ago
view post
Post
3354
🎯 Kokoro TTS just hit v1.0! 🚀

Small but mighty: 82M parameters, runs locally, speaks multiple languages. The best part? It's Apache 2.0 licensed!
This could unlock so many possibilities ✨

Check it out: hexgrad/Kokoro-82M
  • 1 reply
·
posted an update about 1 month ago
view post
Post
1882
everywhere i go i see his face
reacted to prithivMLmods's post with 😎🔥 about 1 month ago
view post
Post
5168
Deepswipe by
.
.
.
. Deepseek🐬🗿






Everything is now in recovery. 📉📈
·
reacted to onekq's post with 👍 about 1 month ago
view post
Post
2298
So 🐋DeepSeek🐋 hits the mainstream media. But it has been a star in our little cult for at least 6 months. Its meteoric success is not overnight, but two years in the making.

To learn their history, just look at their 🤗 repo https://huggingface.co/deepseek-ai

* End of 2023, they launched the first model (pretrained by themselves) following Llama 2 architecture
* June 2024, v2 (MoE architecture) surpassed Gemini 1.5, but behind Mistral
* September, v2.5 surpassed GPT 4o mini
* December, v3 surpassed GPT 4o
* Now R1 surpassed o1

Most importantly, if you think DeepSeek success is singular and unrivaled, that's WRONG. The following models are also near or equal the o1 bar.

* Minimax-01
* Kimi k1.5
* Doubao 1.5 pro
  • 1 reply
·
replied to mitkox's post about 1 month ago
view reply

i believe sglang would be even faster but not sure if it supports non-nvidia devices