3000

gilangf3000

AI & ML interests

None yet

Recent Activity

liked a dataset about 20 hours ago
stallone/CommitPackFT
reacted to singhsidhukuldeep's post with πŸ€— 7 days ago
Remember Gemini, GPT-4o, all being true multimodal models 🌟. Now we have a paper πŸ“„ describing an architecture that might achieve that! Uni-MoE: a native multimodal, Unified Mixture of Experts (MoE) architecture πŸ—οΈ. Uni-MoE integrates various modalities (text πŸ“, image πŸ–ΌοΈ, audio 🎡, video πŸ“Ή, speech πŸ—£οΈ) using modality-specific encoders and connectors for a cohesive multimodal understanding. Training Strategy: 1️⃣ Training cross-modality alignment with diverse connectors πŸ”„. 2️⃣ Training modality-specific experts using cross-modality instruction data πŸ“Š. 3️⃣Tuning the Uni-MoE framework with Low-Rank Adaptation (LoRA) on mixed multimodal data πŸ”§. Technical Details: Modality-Specific Encoders: CLIP for images πŸ–ΌοΈ, Whisper for speech πŸ—£οΈ, BEATs for audio 🎡. MoE-Based Blocks: Shared self-attention layers, feed-forward networks (FFN) based experts, and sparse routers for token-level expertise allocation πŸš€. Efficient Training: Utilizes LoRA for fine-tuning pre-trained experts and self-attention layers πŸ› οΈ. Uni-MoE outperforms traditional dense models on benchmarks like A-OKVQA, OK-VQA, VQAv2, MMBench, RACE-Audio, and English High School Listening Test πŸ†. The code is open-sourced as well: https://github.com/HITsz-TMG/UMOE-Scaling-Unified-Multimodal-LLMs/tree/master/Uni_MoE_v2 Paper: https://huggingface.co/papers/2405.11273
liked a model 7 days ago
fnlp/AnyGPT-base
View all activity

Organizations

Wyla Organization's profile picture