Kuldeep Singh Sidhu

singhsidhukuldeep

https://singhsidhukuldeep.github.io

AI & ML interests

😃 TOP 3 on HuggingFace for posts 🤗 Seeking contributors for a completely open-source 🚀 Data Science platform! singhsidhukuldeep.github.io

Recent Activity

posted an update about 11 hours ago

Groundbreaking Research Alert: Can Large Language Models Really Understand Personal Preferences? A fascinating new study from researchers at University of Notre Dame, Xi'an Jiaotong University, and Université de Montréal introduces PERRECBENCH - a novel benchmark for evaluating how well Large Language Models (LLMs) understand user preferences in recommendation systems. Key Technical Insights: - The benchmark eliminates user rating bias and item quality factors by using relative ratings and grouped ranking approaches - Implements three distinct ranking methods: pointwise rating prediction, pairwise comparison, and listwise ranking - Evaluates 19 state-of-the-art LLMs including Claude-3.5, GPT-4, Llama-3, Mistral, and Qwen models - Uses Kendall's tau correlation to measure ranking accuracy - Incorporates BM25 retriever with configurable history items (k=4 by default) Notable Findings: - Current LLMs struggle with true personalization, achieving only moderate correlation scores - Larger models don't always perform better - challenging conventional scaling laws - Pairwise and listwise ranking methods outperform pointwise approaches - Open-source models like Mistral-123B and Llama-3-405B compete well with proprietary models - Weight merging strategy shows promise for improving personalization capabilities The research reveals that while LLMs excel at many tasks, they still face significant challenges in understanding individual user preferences. This work opens new avenues for improving personalized recommendation systems and highlights the importance of developing better evaluation methods. A must-read for anyone interested in LLMs, recommender systems, or personalization technology. The team has made their benchmark and code publicly available for further research.

posted an update 2 days ago

While everyone is buzzing about DeepSeek AI R1's groundbreaking open-source release, ByteDance has quietly launched something remarkable - Trae, an adaptive AI IDE that's redefining the development experience and unlike competitors like Cursor, it' completely FREE! Trae is a sophisticated development environment built on Microsoft's VSCode foundation(with a nice skin on top), offering unlimited free access to both OpenAI's GPT-4o and Anthropic's Claude-3.5-Sonnet models. Technical Highlights: - Real-time AI pair programming with comprehensive codebase understanding - Natural language commands for code generation and project-level development - Intelligent task decomposition for automated planning and execution - Seamless VS Code and Cursor configuration compatibility - Multi-language support with specialized optimization for English and Chinese interfaces Currently available for macOS (Windows version in development), Trae is distributed through ByteDance's Singapore subsidiary, Spring (SG) Pte. What sets it apart is its ability to handle mixed-language workflows and enhanced localization features that address common pain points in existing IDEs. The AI assistant can generate code snippets, optimize logic, and even create entire projects from scratch through natural language prompts. It also features an innovative AI Chat system accessible via keyboard shortcuts for real-time coding assistance. For developers looking to enhance their productivity without breaking the bank, Trae offers enterprise-grade AI capabilities completely free during its initial release. This move by ByteDance signals a significant shift in the AI IDE landscape, challenging established players with a robust, accessible alternative. Try it at trae.ai

posted an update 3 days ago

Exciting Research Alert: Revolutionizing Long-Context Language Models! A groundbreaking paper from researchers at University of Edinburgh and Apple introduces ICR² (In-context Retrieval and Reasoning), addressing a critical challenge in long-context language models (LCLMs). Key Innovations: - A novel benchmark that realistically evaluates LCLMs' ability to process and reason with extended contexts - Three innovative approaches that significantly improve LCLM performance: - Retrieve-then-generate fine-tuning - Retrieval-attention probing - Joint retrieval head training The most impressive result? Their best approach, implemented on Mistral-7B with just 32K token limit, achieves performance comparable to GPT-4 while using significantly fewer parameters. Technical Deep Dive: The team's approach leverages attention head mechanisms to filter and denoise long contexts during decoding. Their retrieve-then-generate method implements a two-step process where the model first identifies relevant passages before generating responses. The architecture includes dedicated retrieval heads working alongside generation heads, enabling joint optimization during training. What sets this apart is their innovative use of the Gumbel-TopK trick for differentiable retrieval and their sophisticated attention probing mechanism that identifies and utilizes retrieval-focused attention heads. Impact: This research fundamentally changes how we approach long-context processing in LLMs, offering a more efficient alternative to traditional RAG pipelines while maintaining high performance.

View all activity

Organizations

singhsidhukuldeep's activity

posted an update about 11 hours ago

Post

615

Groundbreaking Research Alert: Can Large Language Models Really Understand Personal Preferences?

A fascinating new study from researchers at University of Notre Dame, Xi'an Jiaotong University, and Université de Montréal introduces PERRECBENCH - a novel benchmark for evaluating how well Large Language Models (LLMs) understand user preferences in recommendation systems.

Key Technical Insights:
- The benchmark eliminates user rating bias and item quality factors by using relative ratings and grouped ranking approaches
- Implements three distinct ranking methods: pointwise rating prediction, pairwise comparison, and listwise ranking
- Evaluates 19 state-of-the-art LLMs including Claude-3.5, GPT-4, Llama-3, Mistral, and Qwen models
- Uses Kendall's tau correlation to measure ranking accuracy
- Incorporates BM25 retriever with configurable history items (k=4 by default)

Notable Findings:
- Current LLMs struggle with true personalization, achieving only moderate correlation scores
- Larger models don't always perform better - challenging conventional scaling laws
- Pairwise and listwise ranking methods outperform pointwise approaches
- Open-source models like Mistral-123B and Llama-3-405B compete well with proprietary models
- Weight merging strategy shows promise for improving personalization capabilities

The research reveals that while LLMs excel at many tasks, they still face significant challenges in understanding individual user preferences. This work opens new avenues for improving personalized recommendation systems and highlights the importance of developing better evaluation methods.

A must-read for anyone interested in LLMs, recommender systems, or personalization technology. The team has made their benchmark and code publicly available for further research.

posted an update 2 days ago

Post

2243

While everyone is buzzing about DeepSeek AI R1's groundbreaking open-source release, ByteDance has quietly launched something remarkable - Trae, an adaptive AI IDE that's redefining the development experience and unlike competitors like Cursor, it' completely FREE!

Trae is a sophisticated development environment built on Microsoft's VSCode foundation(with a nice skin on top), offering unlimited free access to both OpenAI's GPT-4o and Anthropic's Claude-3.5-Sonnet models.

Technical Highlights:
- Real-time AI pair programming with comprehensive codebase understanding
- Natural language commands for code generation and project-level development
- Intelligent task decomposition for automated planning and execution
- Seamless VS Code and Cursor configuration compatibility
- Multi-language support with specialized optimization for English and Chinese interfaces

Currently available for macOS (Windows version in development), Trae is distributed through ByteDance's Singapore subsidiary, Spring (SG) Pte. What sets it apart is its ability to handle mixed-language workflows and enhanced localization features that address common pain points in existing IDEs.

The AI assistant can generate code snippets, optimize logic, and even create entire projects from scratch through natural language prompts. It also features an innovative AI Chat system accessible via keyboard shortcuts for real-time coding assistance.

For developers looking to enhance their productivity without breaking the bank, Trae offers enterprise-grade AI capabilities completely free during its initial release. This move by ByteDance signals a significant shift in the AI IDE landscape, challenging established players with a robust, accessible alternative.

Try it at trae.ai

posted an update 3 days ago

Post

2019

Exciting Research Alert: Revolutionizing Long-Context Language Models!

A groundbreaking paper from researchers at University of Edinburgh and Apple introduces ICR² (In-context Retrieval and Reasoning), addressing a critical challenge in long-context language models (LCLMs).

Key Innovations:
- A novel benchmark that realistically evaluates LCLMs' ability to process and reason with extended contexts
- Three innovative approaches that significantly improve LCLM performance:
- Retrieve-then-generate fine-tuning
- Retrieval-attention probing
- Joint retrieval head training

The most impressive result? Their best approach, implemented on Mistral-7B with just 32K token limit, achieves performance comparable to GPT-4 while using significantly fewer parameters.

Technical Deep Dive:
The team's approach leverages attention head mechanisms to filter and denoise long contexts during decoding. Their retrieve-then-generate method implements a two-step process where the model first identifies relevant passages before generating responses. The architecture includes dedicated retrieval heads working alongside generation heads, enabling joint optimization during training.

What sets this apart is their innovative use of the Gumbel-TopK trick for differentiable retrieval and their sophisticated attention probing mechanism that identifies and utilizes retrieval-focused attention heads.

Impact:
This research fundamentally changes how we approach long-context processing in LLMs, offering a more efficient alternative to traditional RAG pipelines while maintaining high performance.

posted an update 6 days ago

Post

559

Exciting breakthrough in Text Embeddings: Introducing LENS (Lexicon-based EmbeddiNgS)!

A team of researchers from University of Amsterdam, University of Technology Sydney, and Tencent have developed a groundbreaking approach that outperforms dense embeddings on the Massive Text Embedding Benchmark (MTEB).

>> Key Technical Innovations:
- LENS consolidates vocabulary space through token embedding clustering, addressing the inherent redundancy in LLM tokenizers
- Implements bidirectional attention and innovative pooling strategies to unlock the full potential of LLMs
- Each dimension corresponds to token clusters instead of individual tokens, creating more coherent and compact embeddings
- Achieves competitive performance with just 4,000-8,000 dimensional embeddings, matching the size of dense counterparts

>> Under the Hood:
The framework applies KMeans clustering to token embeddings from the language modeling head, replacing original embeddings with cluster centroids. This reduces dimensionality while preserving semantic relationships.

>> Results:
- Outperforms dense embeddings on MTEB benchmark
- Achieves state-of-the-art performance when combined with dense embeddings on BEIR retrieval tasks
- Demonstrates superior performance across clustering, classification, and retrieval tasks

This work opens new possibilities for more efficient and interpretable text embeddings. The code will be available soon.

1 reply

posted an update 7 days ago

Post

2926

Exciting breakthrough in Retrieval-Augmented Generation (RAG): Introducing MiniRAG - a revolutionary approach that makes RAG systems accessible for edge devices and resource-constrained environments.

Key innovations that set MiniRAG apart:

Semantic-aware Heterogeneous Graph Indexing
- Combines text chunks and named entities in a unified structure
- Reduces reliance on complex semantic understanding
- Creates rich semantic networks for precise information retrieval

Lightweight Topology-Enhanced Retrieval
- Leverages graph structures for efficient knowledge discovery
- Uses pattern matching and localized text processing
- Implements query-guided reasoning path discovery

Impressive Performance Metrics
- Achieves comparable results to LLM-based methods while using Small Language Models (SLMs)
- Requires only 25% of storage space compared to existing solutions
- Maintains robust performance with accuracy reduction ranging from just 0.8% to 20%

The researchers from Hong Kong University have also contributed a comprehensive benchmark dataset specifically designed for evaluating lightweight RAG systems under realistic on-device scenarios.

This breakthrough opens new possibilities for:
- Edge device AI applications
- Privacy-sensitive implementations
- Real-time processing systems
- Resource-constrained environments

The full implementation and datasets are available on GitHub: HKUDS/MiniRAG

1 reply

posted an update 9 days ago

Post

544

Exciting Research Alert: Multimodal Semantic Retrieval Revolutionizing E-commerce Product Search!

Just came across a fascinating paper from @amazon researchers that tackles a crucial challenge in e-commerce search - integrating both text and image data for better product discovery.

>> Key Innovations
The researchers developed two groundbreaking architectures:
- A 4-tower multimodal model combining BERT and CLIP for processing both text and images
- A streamlined 3-tower model that achieves comparable performance with reduced complexity

>> Technical Deep Dive
The system leverages dual-encoder architecture with some impressive components:
- Bi-encoder BERT model for processing text queries and product descriptions
- Visual transformers from CLIP for image processing
- Advanced fusion techniques including concatenation and MLP-based approaches
- Cosine similarity scoring for efficient large-scale retrieval

>> Real-world Impact
The results are remarkable:
- Up to 78.6% recall@100 for product retrieval
- Over 50% exact match precision
- Significant reduction in irrelevant results to just 11.9%

>> Industry Applications
This research has major implications for:
- E-commerce search optimization
- Visual product discovery
- Large-scale retrieval systems
- Cross-modal product recommendations

What's particularly impressive is how the system handles millions of products while maintaining computational efficiency through smart architectural choices.

This work represents a significant step forward in making online shopping more intuitive and accurate. The researchers from Amazon have demonstrated that combining visual and textual information can dramatically improve search relevance while maintaining scalability.

posted an update 10 days ago

Post

2010

Exciting breakthrough in large-scale recommendation systems! ByteDance researchers have developed a novel real-time indexing method called "Streaming Vector Quantization" (Streaming VQ) that revolutionizes how recommendations work at scale.

>> Key Innovations

Real-time Indexing: Unlike traditional methods that require periodic reconstruction of indexes, Streaming VQ attaches items to clusters in real time, enabling immediate capture of emerging trends and user interests.

Superior Balance: The system achieves remarkable index balancing through innovative techniques like merge-sort modification and popularity-aware cluster assignment, ensuring all clusters participate effectively in recommendations.

Implementation Efficiency: Built on VQ-VAE architecture, Streaming VQ features a lightweight and clear framework that makes it highly implementation-friendly for large-scale deployments.

>> Technical Deep Dive

The system operates in two key stages:
- An indexing step using a two-tower architecture for real-time item-cluster assignment
- A ranking step that employs sophisticated attention mechanisms and deep neural networks for precise recommendations.

>> Real-world Impact

Already deployed in Douyin and Douyin Lite, replacing all major retrievers and delivering significant user engagement improvements. The system handles a billion-scale corpus while maintaining exceptional performance and computational efficiency.

This represents a significant leap forward in recommendation system architecture, especially for platforms dealing with dynamic, rapidly-evolving content. The ByteDance team's work demonstrates how rethinking fundamental indexing approaches can lead to substantial real-world improvements.

posted an update 11 days ago

Post

519

Exciting breakthrough in AI recommendation systems! A team of researchers from Meta, UMN, NCSU, and UNC Chapel Hill have developed an innovative framework that significantly improves both efficiency and accuracy of LLM-based recommender systems.

The framework introduces two key innovations:

>> GCN-Retriever
Their solution uses Graph Convolutional Networks (GCNs) to efficiently identify similar users by analyzing interaction patterns in user-item graphs. This replaces traditional LLM-based retrieval methods, dramatically reducing computational overhead while maintaining recommendation quality.

>> Multi-Head Early Exit Architecture
The system implements a novel early exit strategy with multiple prediction heads at different layers. By monitoring prediction confidence in real-time, the model can terminate processing early when sufficient confidence is reached, significantly improving inference speed.

>> Performance Highlights
- Achieved 96.37 AUC on Amazon Beauty dataset
- Up to 4.96x improvement in requests per second
- Maintains or improves accuracy while reducing computation time
- Successfully handles both sparse and dense interaction data

The framework addresses two critical bottlenecks in current LLM recommender systems: retrieval delays and inference slowdown. By combining GCN-based retrieval with dynamic early exit strategies, the system delivers faster, more accurate recommendations at scale.

This work represents a significant step forward in making LLM-based recommendation systems practical for real-world commercial applications. The framework's ability to balance efficiency and accuracy while maintaining robust performance across different datasets demonstrates its potential for wide-scale adoption.

posted an update 13 days ago

Post

1111

Breaking News: LinkedIn's Content Search Engine Gets a Powerful Semantic Upgrade!

Excited to share insights about LinkedIn's innovative approach to content search, recently detailed in a groundbreaking paper by their Mountain View team. This advancement represents a significant shift from traditional keyword-based search to semantic understanding.

>> Technical Architecture

The new search engine employs a sophisticated two-layer architecture:

Retrieval Layer
- Token Based Retriever (TBR) for exact keyword matching
- Embedding Based Retriever (EBR) using a two-tower model with multilingual-e5 embeddings
- Pre-computed post embeddings stored in a dedicated embedding store for efficient retrieval

Multi-Stage Ranking
- L1 Stage: Initial filtering using a lightweight model
- L2 Stage: Advanced ranking with complex features including:
- Query-post semantic matching
- Author reputation analysis
- User engagement metrics
- Content freshness evaluation

>> Performance Improvements

The system has achieved remarkable results:
- 10%+ improvement in both on-topic rate and long-dwell metrics
- Enhanced ability to handle complex natural language queries
- Significant boost in sitewide engagement

This advancement enables LinkedIn to better serve complex queries like "how to ask for a raise?" while maintaining high performance at scale. The system intelligently balances between exact keyword matching and semantic understanding, ensuring optimal results for both navigational and conceptual searches.

What impresses me most is how the team solved the scale challenge - processing billions of posts efficiently using pre-computed embeddings and approximate nearest neighbor search. This is enterprise-scale AI at its finest.

posted an update 15 days ago

Post

532

Just read a fascinating survey paper on Query Optimization in Large Language Models by researchers at Tencent's Machine Learning Platform Department.

The paper deep dives into how we can enhance LLMs' ability to understand and answer complex queries, particularly in Retrieval-Augmented Generation (RAG) systems. Here's what caught my attention:

>> Key Technical Innovations

Core Operations:
- Query Expansion: Both internal (using LLM's knowledge) and external (web/knowledge base) expansion
- Query Disambiguation: Handling ambiguous queries through intent clarification
- Query Decomposition: Breaking complex queries into manageable sub-queries
- Query Abstraction: Stepping back to understand high-level principles

Under the Hood:
The system employs sophisticated techniques like GENREAD for contextual document generation, Query2Doc for pseudo-document creation, and FLARE's iterative anticipation mechanism for enhanced retrieval.

>> Real-World Applications

The framework addresses critical challenges in:
- Domain-specific tasks
- Knowledge-intensive operations
- Multi-hop reasoning
- Complex information retrieval

What's particularly impressive is how this approach significantly reduces hallucinations in LLMs while maintaining cost-effectiveness. The researchers have meticulously categorized query difficulties into four types, ranging from single-piece explicit evidence to multiple-piece implicit evidence requirements

posted an update 17 days ago

Post

634

Excited to share a groundbreaking development in recommendation systems - Legommenders, a comprehensive content-based recommendation library that revolutionizes how we approach personalized content delivery.

>> Key Innovations

End-to-End Training
The library enables joint training of content encoders alongside behavior and interaction modules, making it the first of its kind to offer truly integrated content understanding in recommendation pipelines.

Massive Scale
- Supports creation and analysis of over 1,000 distinct models
- Compatible with 15 diverse datasets
- Features 15 content operators, 8 behavior operators, and 9 click predictors

Advanced LLM Integration
Legommenders pioneers LLM integration in two crucial ways:
- As feature encoders for enhanced content understanding
- As data generators for high-quality training data augmentation

Superior Architecture
The system comprises four core components:
- Dataset processor for unified data handling
- Content operator for embedding generation
- Behavior operator for user sequence fusion
- Click predictor for probability calculations

Performance Optimization
The library introduces an innovative caching pipeline that achieves up to 50x speedup in evaluation compared to traditional approaches.

Developed by researchers from The Hong Kong Polytechnic University, this open-source project represents a significant leap forward in recommendation system technology.

For those interested in content-based recommendation systems, this is a must-explore tool. The library is available on GitHub for implementation and experimentation.

posted an update 19 days ago

Post

1777

Groundbreaking Survey on Large Language Models in Recommendation Systems!

Just read a comprehensive survey that maps out how LLMs are revolutionizing recommender systems. The authors have meticulously categorized existing approaches into two major paradigms:

Discriminative LLMs for Recommendation:
- Leverages BERT-like models for understanding user-item interactions
- Uses fine-tuning and prompt tuning to adapt pre-trained models
- Excels at tasks like user representation learning and ranking

Generative LLMs for Recommendation:
- Employs GPT-style models to directly generate recommendations
- Implements innovative techniques like in-context learning and zero-shot recommendation
- Supports natural language interaction and explanation generation

Key Technical Insights:
- Novel taxonomy of modeling paradigms: LLM Embeddings + RS, LLM Tokens + RS, and LLM as RS
- Integration methods spanning from simple prompting to sophisticated instruction tuning
- Hybrid approaches combining collaborative filtering with LLM capabilities
- Advanced prompt engineering techniques for controlled recommendation generation

Critical Challenges Identified:
- Position and popularity bias in LLM recommendations
- Limited context length affecting user history processing
- Need for better evaluation metrics for generative recommendations
- Controlled output generation and personalization challenges

This work opens exciting possibilities for next-gen recommendation systems while highlighting crucial areas for future research.

1 reply

posted an update 21 days ago

Post

1442

Groundbreaking Research Alert: Correctness ≠ Faithfulness in RAG Systems

Fascinating new research from L3S Research Center, University of Amsterdam, and TU Delft reveals a critical insight into Retrieval Augmented Generation (RAG) systems. The study exposes that up to 57% of citations in RAG systems could be unfaithful, despite being technically correct.

>> Key Technical Insights:

Post-rationalization Problem
The researchers discovered that RAG systems often engage in "post-rationalization" - where models first generate answers from their parametric memory and then search for supporting evidence afterward. This means that while citations may be correct, they don't reflect the actual reasoning process.

Experimental Design
The team used Command-R+ (104B parameters) with 4-bit quantization on NVIDIA A100 GPU, testing on the NaturalQuestions dataset. They employed BM25 for initial retrieval and ColBERT v2 for reranking.

Attribution Framework
The research introduces a comprehensive framework for evaluating RAG systems across multiple dimensions:
- Citation Correctness: Whether cited documents support the claims
- Citation Faithfulness: Whether citations reflect actual model reasoning
- Citation Appropriateness: Relevance and meaningfulness of citations
- Citation Comprehensiveness: Coverage of key points

Under the Hood
The system processes involve:
1. Document relevance prediction
2. Citation prediction
3. Answer generation without citations
4. Answer generation with citations

This work fundamentally challenges our understanding of RAG systems and highlights the need for more robust evaluation metrics in AI systems that claim to provide verifiable information.

2 replies

posted an update 24 days ago

Post

3411

Exciting breakthrough in e-commerce recommendation systems!
Walmart Global Tech researchers have developed a novel Triple Modality Fusion (TMF) framework that revolutionizes how we make product recommendations.

>> Key Innovation
The framework ingeniously combines three distinct data types:
- Visual data to capture product aesthetics and context
- Textual information for detailed product features
- Graph data to understand complex user-item relationships

>> Technical Architecture
The system leverages a Large Language Model (Llama2-7B) as its backbone and introduces several sophisticated components:

Modality Fusion Module
- All-Modality Self-Attention (AMSA) for unified representation
- Cross-Modality Attention (CMA) mechanism for deep feature integration
- Custom FFN adapters to align different modality embeddings

Advanced Training Strategy
- Curriculum learning approach with three complexity levels
- Parameter-Efficient Fine-Tuning using LoRA
- Special token system for behavior and item representation

>> Real-World Impact
The results are remarkable:
- 38.25% improvement in Electronics recommendations
- 43.09% boost in Sports category accuracy
- Significantly higher human evaluation scores compared to traditional methods

Currently deployed in Walmart's production environment, this research demonstrates how combining multiple data modalities with advanced LLM architectures can dramatically improve recommendation accuracy and user satisfaction.

2 replies

posted an update 25 days ago

Post

3152

Groundbreaking Research Alert: Rethinking RAG with Cache-Augmented Generation (CAG)

Researchers from National Chengchi University and Academia Sinica have introduced a paradigm-shifting approach that challenges the conventional wisdom of Retrieval-Augmented Generation (RAG).

Instead of the traditional retrieve-then-generate pipeline, their innovative Cache-Augmented Generation (CAG) framework preloads documents and precomputes key-value caches, eliminating the need for real-time retrieval during inference.

Technical Deep Dive:
- CAG preloads external knowledge and precomputes KV caches, storing them for future use
- The system processes documents only once, regardless of subsequent query volume
- During inference, it loads the precomputed cache alongside user queries, enabling rapid response generation
- The cache reset mechanism allows efficient handling of multiple inference sessions through strategic token truncation

Performance Highlights:
- Achieved superior BERTScore metrics compared to both sparse and dense retrieval RAG systems
- Demonstrated up to 40x faster generation times compared to traditional approaches
- Particularly effective with both SQuAD and HotPotQA datasets, showing robust performance across different knowledge tasks

Why This Matters:
The approach significantly reduces system complexity, eliminates retrieval latency, and mitigates common RAG pipeline errors. As LLMs continue evolving with expanded context windows, this methodology becomes increasingly relevant for knowledge-intensive applications.

posted an update 29 days ago

Post

1626

Excited to share insights from Walmart's groundbreaking semantic search system that revolutionizes e-commerce product discovery!

The team at Walmart Global Technology(the team that I am a part of 😬) has developed a hybrid retrieval system that combines traditional inverted index search with neural embedding-based search to tackle the challenging problem of tail queries in e-commerce.

Key Technical Highlights:

• The system uses a two-tower BERT architecture where one tower processes queries and another processes product information, generating dense vector representations for semantic matching.

• Product information is enriched by combining titles with key attributes like category, brand, color, and gender using special prefix tokens to help the model distinguish different attribute types.

• The neural model leverages DistilBERT with 6 layers and projects the 768-dimensional embeddings down to 256 dimensions using a linear layer, achieving optimal performance while reducing storage and computation costs.

• To improve model training, they implemented innovative negative sampling techniques combining product category matching and token overlap filtering to identify challenging negative examples.

Production Implementation Details:

• The system uses a managed ANN (Approximate Nearest Neighbor) service to enable fast retrieval, achieving 99% recall@20 with just 13ms latency.

• Query embeddings are cached with preset TTL (Time-To-Live) to reduce latency and costs in production.

• The model is exported to ONNX format and served in Java, with custom optimizations like fixed input shapes and GPU acceleration using NVIDIA T4 processors.

Results:
The system showed significant improvements in both offline metrics and live experiments, with:
- +2.84% improvement in NDCG@10 for human evaluation
- +0.54% lift in Add-to-Cart rates in live A/B testing

This is a fantastic example of how modern NLP techniques can be successfully deployed at scale to solve real-world e-

1 reply

posted an update about 1 month ago

Post

2110

Groundbreaking Research Alert: Revolutionizing Document Ranking with Long-Context LLMs

Researchers from Renmin University of China and Baidu Inc . have introduced a novel approach to document ranking that challenges conventional sliding window methods. Their work demonstrates how long-context Large Language Models can process up to 100 documents simultaneously, achieving superior performance while reducing API costs by 50%.

Key Technical Innovations:
- Full ranking strategy enables processing all passages in a single inference
- Multi-pass sliding window approach for comprehensive listwise label construction
- Importance-aware learning objective that prioritizes top-ranked passage IDs
- Support for context lengths up to 128k tokens using models like LLaMA 3.1-8B-Instruct

Performance Highlights:
- 2.2 point improvement in NDCG@10 metrics
- 29.3% reduction in latency compared to traditional methods
- Significant API cost savings through elimination of redundant passage processing

Under the hood, the system leverages advanced long-context LLMs to perform global interactions among passages, enabling more nuanced relevance assessment. The architecture incorporates a novel importance-aware loss function that assigns differential weights based on passage ranking positions.

The research team's implementation demonstrated remarkable versatility across multiple datasets, including TREC DL and BEIR benchmarks. Their fine-tuned model, RankMistral, showcases the practical viability of full ranking approaches in production environments.

This advancement marks a significant step forward in information retrieval systems, offering both improved accuracy and computational efficiency. The implications for search engines and content recommendation systems are substantial.

posted an update about 1 month ago

Post

2191

Exciting News in AI: JinaAI Releases JINA-CLIP-v2!

The team at Jina AI has just released a groundbreaking multilingual multimodal embedding model that's pushing the boundaries of text-image understanding. Here's why this is a big deal:

🚀 Technical Highlights:
- Dual encoder architecture combining a 561M parameter Jina XLM-RoBERTa text encoder and a 304M parameter EVA02-L14 vision encoder
- Supports 89 languages with 8,192 token context length
- Processes images up to 512×512 pixels with 14×14 patch size
- Implements FlashAttention2 for text and xFormers for vision processing
- Uses Matryoshka Representation Learning for efficient vector storage

⚡️ Under The Hood:
- Multi-stage training process with progressive resolution scaling (224→384→512)
- Contrastive learning using InfoNCE loss in both directions
- Trained on massive multilingual dataset including 400M English and 400M multilingual image-caption pairs
- Incorporates specialized datasets for document understanding, scientific graphs, and infographics
- Uses hard negative mining with 7 negatives per positive sample

📊 Performance:
- Outperforms previous models on visual document retrieval (52.65% nDCG@5)
- Achieves 89.73% image-to-text and 79.09% text-to-image retrieval on CLIP benchmark
- Strong multilingual performance across 30 languages
- Maintains performance even with 75% dimension reduction (256D vs 1024D)

🎯 Key Innovation:
The model solves the long-standing challenge of unifying text-only and multi-modal retrieval systems while adding robust multilingual support. Perfect for building cross-lingual visual search systems!

Kudos to the research team at Jina AI for this impressive advancement in multimodal AI!

posted an update about 1 month ago

Post

1281

Fascinating insights from @Pinterest 's latest research on improving feature interactions in recommendation systems!

Pinterest's engineering team has tackled a critical challenge in their Homefeed ranking system that serves 500M+ monthly active users. Here's what makes their approach remarkable:

>> Technical Deep Dive

Architecture Overview
• The ranking model combines dense features, sparse features, and embedding features to represent users, Pins, and context
• Sparse features are processed using learnable embeddings with size based on feature cardinality
• User sequence embeddings are generated using a transformer architecture processing past engagements

Feature Processing Pipeline
• Dense features undergo normalization for numerical stability
• Sparse and embedding features receive L2 normalization
• All features are concatenated into a single feature embedding

Key Innovations
• Implemented parallel MaskNet layers with 3 blocks
• Used projection ratio of 2.0 and output dimension of 512
• Stacked 4 DCNv2 layers on top for higher-order interactions

Performance Improvements
• Achieved +1.42% increase in Homefeed Save Volume
• Boosted Overall Time Spent by +0.39%
• Maintained memory consumption increase to just 5%

>> Industry Constraints Addressed

Memory Management
• Optimized for 60% GPU memory utilization
• Prevented OOM errors while maintaining batch size efficiency

Latency Optimization
• Removed input-output concatenation before MLP
• Reduced hidden layer sizes in MLP
• Achieved zero latency increase while improving performance

System Stability
• Ensured reproducible results across retraining
• Maintained model stability across different data distributions
• Successfully deployed in production environment

This work brilliantly demonstrates how to balance academic innovations with real-world industrial constraints. Kudos to the Pinterest team!

updated a Space about 1 month ago

Running

📉

Kuldeep Singh Sidhu

AI & ML interests

Recent Activity

Organizations

singhsidhukuldeep's activity

Posts Leaderboard