-
Item-Language Model for Conversational Recommendation
Paper • 2406.02844 • Published • 12 -
Molar: Multimodal LLMs with Collaborative Filtering Alignment for Enhanced Sequential Recommendation
Paper • 2412.18176 • Published • 16 -
hkuds/RecGPT_model
Updated • 2 -
hkuds/easyrec-roberta-base
Updated • 36 • 3
Marcus Gawronsky
marcusinthesky
AI & ML interests
Representation Learning
Recent Activity
upvoted
a
paper
about 1 month ago
LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted
Contrastive Learning
Organizations
VLM Benchmarks
-
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models
Paper • 2410.10139 • Published • 53 -
MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks
Paper • 2410.10563 • Published • 39 -
LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content
Paper • 2410.10783 • Published • 28 -
TVBench: Redesigning Video-Language Evaluation
Paper • 2410.07752 • Published • 6
Multi-modal Mamba
-
Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference
Paper • 2403.14520 • Published • 36 -
ZigMa: Zigzag Mamba Diffusion Model
Paper • 2403.13802 • Published • 18 -
SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series
Paper • 2403.15360 • Published • 13 -
MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection
Paper • 2403.19888 • Published • 12
Tiny VLM Decoder
Foundational
DS
Open-vocabulary object detection (OVD).
-
Simple Open-Vocabulary Object Detection with Vision Transformers
Paper • 2205.06230 • Published • 2 -
google/owlvit-base-patch32
Zero-Shot Object Detection • 0.2B • Updated • 147k • 134 -
Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers
Paper • 2305.07011 • Published • 5 -
Multi-Modal Classifiers for Open-Vocabulary Object Detection
Paper • 2306.05493 • Published • 6
Multimodal Embeddings
-
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions
Paper • 2403.19651 • Published • 23 -
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
Paper • 2404.04125 • Published • 30 -
Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies
Paper • 2404.08197 • Published • 30 -
Gecko: Versatile Text Embeddings Distilled from Large Language Models
Paper • 2403.20327 • Published • 49
PeFT
Decoder Upcycled to Embeddings
ZecRec
-
Item-Language Model for Conversational Recommendation
Paper • 2406.02844 • Published • 12 -
Molar: Multimodal LLMs with Collaborative Filtering Alignment for Enhanced Sequential Recommendation
Paper • 2412.18176 • Published • 16 -
hkuds/RecGPT_model
Updated • 2 -
hkuds/easyrec-roberta-base
Updated • 36 • 3
DS
VLM Benchmarks
-
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models
Paper • 2410.10139 • Published • 53 -
MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks
Paper • 2410.10563 • Published • 39 -
LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content
Paper • 2410.10783 • Published • 28 -
TVBench: Redesigning Video-Language Evaluation
Paper • 2410.07752 • Published • 6
Open-vocabulary object detection (OVD).
-
Simple Open-Vocabulary Object Detection with Vision Transformers
Paper • 2205.06230 • Published • 2 -
google/owlvit-base-patch32
Zero-Shot Object Detection • 0.2B • Updated • 147k • 134 -
Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers
Paper • 2305.07011 • Published • 5 -
Multi-Modal Classifiers for Open-Vocabulary Object Detection
Paper • 2306.05493 • Published • 6
Multi-modal Mamba
-
Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference
Paper • 2403.14520 • Published • 36 -
ZigMa: Zigzag Mamba Diffusion Model
Paper • 2403.13802 • Published • 18 -
SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series
Paper • 2403.15360 • Published • 13 -
MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection
Paper • 2403.19888 • Published • 12
Multimodal Embeddings
-
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions
Paper • 2403.19651 • Published • 23 -
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
Paper • 2404.04125 • Published • 30 -
Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies
Paper • 2404.08197 • Published • 30 -
Gecko: Versatile Text Embeddings Distilled from Large Language Models
Paper • 2403.20327 • Published • 49
Tiny VLM Decoder
PeFT
Foundational
Decoder Upcycled to Embeddings