蓋瑞王
gary109
AI & ML interests
GAN,Music,LLM
Recent Activity
liked
a dataset
10 days ago
HuggingFaceFW/fineweb-2
liked
a dataset
17 days ago
MonicaHuang/ML2025_HW9
liked
a model
about 2 months ago
deepseek-ai/DeepSeek-R1-0528
Organizations
None yet
Generation 3D
-
MVDream: Multi-view Diffusion for 3D Generation
Paper • 2308.16512 • Published • 102 -
Learning Disentangled Avatars with Hybrid 3D Representations
Paper • 2309.06441 • Published • 6 -
Dynamic Mesh-Aware Radiance Fields
Paper • 2309.04581 • Published • 7 -
Towards Practical Capture of High-Fidelity Relightable Avatars
Paper • 2309.04247 • Published • 10
LLM
Multimodal LLM
-
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset
Paper • 2309.04662 • Published • 24 -
Neurons in Large Language Models: Dead, N-gram, Positional
Paper • 2309.04827 • Published • 17 -
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
Paper • 2309.05516 • Published • 10 -
DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs
Paper • 2309.03907 • Published • 12
Text-to-Image
-
PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models
Paper • 2309.05793 • Published • 50 -
InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation
Paper • 2309.06380 • Published • 32 -
ImageBind-LLM: Multi-modality Instruction Tuning
Paper • 2309.03905 • Published • 17 -
DreamStyler: Paint by Style Inversion with Text-to-Image Diffusion Models
Paper • 2309.06933 • Published • 13
Transformers
-
Uncovering mesa-optimization algorithms in Transformers
Paper • 2309.05858 • Published • 13 -
ProPainter: Improving Propagation and Transformer for Video Inpainting
Paper • 2309.03897 • Published • 27 -
Approximating Two-Layer Feedforward Networks for Efficient Transformers
Paper • 2310.10837 • Published • 11 -
CLEX: Continuous Length Extrapolation for Large Language Models
Paper • 2310.16450 • Published • 10
Vision Transformers
-
Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts
Paper • 2309.04354 • Published • 15 -
Vision Transformers Need Registers
Paper • 2309.16588 • Published • 80 -
AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models
Paper • 2309.16414 • Published • 19 -
MotionLM: Multi-Agent Motion Forecasting as Language Modeling
Paper • 2309.16534 • Published • 15
text-to-3D
-
Text2Control3D: Controllable 3D Avatar Generation in Neural Radiance Fields using Geometry-Guided Text-to-Image Diffusion Model
Paper • 2309.03550 • Published • 12 -
Text-Guided Generation and Editing of Compositional 3D Avatars
Paper • 2309.07125 • Published • 7 -
Progressive3D: Progressively Local Editing for Text-to-3D Content Creation with Complex Semantic Prompts
Paper • 2310.11784 • Published • 11 -
GaussianDreamer: Fast Generation from Text to 3D Gaussian Splatting with Point Cloud Priors
Paper • 2310.08529 • Published • 18
ML
-
Statistical Rejection Sampling Improves Preference Optimization
Paper • 2309.06657 • Published • 14 -
Efficient Monotonic Multihead Attention
Paper • 2312.04515 • Published • 8 -
Layerwise Recurrent Router for Mixture-of-Experts
Paper • 2408.06793 • Published • 33 -
Scaling Up Diffusion and Flow-based XGBoost Models
Paper • 2408.16046 • Published • 10
Video 優化
Others
-
CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion
Paper • 2310.11248 • Published • 4 -
Textbooks Are All You Need II: phi-1.5 technical report
Paper • 2309.05463 • Published • 87 -
When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale
Paper • 2309.04564 • Published • 16 -
What's In My Big Data?
Paper • 2310.20707 • Published • 11
Auto
Application
-
NExT-GPT: Any-to-Any Multimodal LLM
Paper • 2309.05519 • Published • 78 -
Large Language Model for Science: A Study on P vs. NP
Paper • 2309.05689 • Published • 21 -
AstroLLaMA: Towards Specialized Foundation Models in Astronomy
Paper • 2309.06126 • Published • 18 -
Large Language Models for Compiler Optimization
Paper • 2309.07062 • Published • 23
Cost
Video Generation
-
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation
Paper • 2310.19512 • Published • 16 -
MagicDance: Realistic Human Dance Video Generation with Motions & Facial Expressions Transfer
Paper • 2311.12052 • Published • 32 -
Fast View Synthesis of Casual Videos
Paper • 2312.02135 • Published • 11 -
DreamVideo: Composing Your Dream Videos with Customized Subject and Motion
Paper • 2312.04433 • Published • 10
ASR
-
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling
Paper • 2311.00430 • Published • 58 -
MSTRE-Net: Multistreaming Acoustic Modeling for Automatic Lyrics Transcription
Paper • 2108.02625 • Published • 1 -
FLAP: Fast Language-Audio Pre-training
Paper • 2311.01615 • Published • 18 -
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities
Paper • 2402.01831 • Published • 15
Whisper
Funny
SVC
yolo
生成式AI導論 2024
https://www.youtube.com/@HungyiLeeNTU
-
Re3: Generating Longer Stories With Recursive Reprompting and Revision
Paper • 2210.06774 • Published • 2 -
Constitutional AI: Harmlessness from AI Feedback
Paper • 2212.08073 • Published • 2 -
AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls
Paper • 2402.04253 • Published -
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate
Paper • 2305.19118 • Published
RAG
Music Captions
-
Futga: Towards Fine-grained Music Understanding through Temporally-enhanced Generative Augmentation
Paper • 2407.20445 • Published • 23 -
LP-MusicCaps: LLM-Based Pseudo Music Captioning
Paper • 2307.16372 • Published • 38 -
The Song Describer Dataset: a Corpus of Audio Captions for Music-and-Language Evaluation
Paper • 2311.10057 • Published • 1 -
MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models
Paper • 2408.01337 • Published • 12
Audio
-
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation
Paper • 2408.07547 • Published • 8 -
DeepSpeak Dataset v1.0
Paper • 2408.05366 • Published • 14 -
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Paper • 2408.15998 • Published • 88 -
Zero-shot Cross-lingual Voice Transfer for TTS
Paper • 2409.13910 • Published • 10
video segmentation
-
Tracking Anything with Decoupled Video Segmentation
Paper • 2309.03903 • Published • 28 -
ProPainter: Improving Propagation and Transformer for Video Inpainting
Paper • 2309.03897 • Published • 27 -
UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces
Paper • 2312.15715 • Published • 21 -
SAM 2: Segment Anything in Images and Videos
Paper • 2408.00714 • Published • 116
Text-to-Audio
-
Large-Scale Automatic Audiobook Creation
Paper • 2309.03926 • Published • 54 -
FoleyGen: Visually-Guided Audio Generation
Paper • 2309.10537 • Published • 9 -
MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models
Paper • 2310.11954 • Published • 25 -
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Paper • 2310.00704 • Published • 21
Prompting
-
From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting
Paper • 2309.04269 • Published • 33 -
FIAT: Fusing learning paradigms with Instruction-Accelerated Tuning
Paper • 2309.04663 • Published • 6 -
Effective Long-Context Scaling of Foundation Models
Paper • 2309.16039 • Published • 30 -
Progressive3D: Progressively Local Editing for Text-to-3D Content Creation with Complex Semantic Prompts
Paper • 2310.11784 • Published • 11
Representations
-
Natural Language Supervision for General-Purpose Audio Representations
Paper • 2309.05767 • Published • 9 -
AudioSR: Versatile Audio Super-resolution at Scale
Paper • 2309.07314 • Published • 28 -
FoleyGen: Visually-Guided Audio Generation
Paper • 2309.10537 • Published • 9 -
Toward Joint Language Modeling for Speech Units and Text
Paper • 2310.08715 • Published • 10
Robot
-
LEAP Hand: Low-Cost, Efficient, and Anthropomorphic Hand for Robot Learning
Paper • 2309.06440 • Published • 11 -
Robotic Table Tennis: A Case Study into a High Speed Learning System
Paper • 2309.03315 • Published • 7 -
Video Language Planning
Paper • 2310.10625 • Published • 11 -
RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation
Paper • 2311.01455 • Published • 30
Diffusion Model
-
InstructDiffusion: A Generalist Modeling Interface for Vision Tasks
Paper • 2309.03895 • Published • 14 -
ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning
Paper • 2309.16650 • Published • 10 -
CCEdit: Creative and Controllable Video Editing via Diffusion Models
Paper • 2309.16496 • Published • 9 -
FreeNoise: Tuning-Free Longer Video Diffusion Via Noise Rescheduling
Paper • 2310.15169 • Published • 10
Text-to-Video
-
Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation
Paper • 2309.03549 • Published • 6 -
CCEdit: Creative and Controllable Video Editing via Diffusion Models
Paper • 2309.16496 • Published • 9 -
EvalCrafter: Benchmarking and Evaluating Large Video Generation Models
Paper • 2310.11440 • Published • 17 -
LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation
Paper • 2310.10769 • Published • 9
RLHF
-
Stabilizing RLHF through Advantage Model and Selective Rehearsal
Paper • 2309.10202 • Published • 11 -
Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions
Paper • 2309.10150 • Published • 25 -
Robotic Offline RL from Internet Videos via Value-Function Pre-Training
Paper • 2309.13041 • Published • 8 -
Voyager: An Open-Ended Embodied Agent with Large Language Models
Paper • 2305.16291 • Published • 10
Image Completion
-
RealFill: Reference-Driven Generation for Authentic Image Completion
Paper • 2309.16668 • Published • 14 -
DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual Design
Paper • 2310.15144 • Published • 14 -
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Paper • 2201.12086 • Published • 3 -
TiC-CLIP: Continual Training of CLIP Models
Paper • 2310.16226 • Published • 9
multimodal
-
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Paper • 2310.09478 • Published • 21 -
Multimodal ChatGPT for Medical Applications: an Experimental Study of GPT-4V
Paper • 2310.19061 • Published • 8 -
OneLLM: One Framework to Align All Modalities with Language
Paper • 2312.03700 • Published • 24
Vision-Language
-
SILC: Improving Vision Language Pretraining with Self-Distillation
Paper • 2310.13355 • Published • 9 -
Woodpecker: Hallucination Correction for Multimodal Large Language Models
Paper • 2310.16045 • Published • 17 -
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Paper • 2201.12086 • Published • 3 -
ImageNetVC: Zero-Shot Visual Commonsense Evaluation on 1000 ImageNet Categories
Paper • 2305.15028 • Published • 1
Optimization
-
Large Language Models for Compiler Optimization
Paper • 2309.07062 • Published • 23 -
Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time
Paper • 2310.17157 • Published • 14 -
FP8-LM: Training FP8 Large Language Models
Paper • 2310.18313 • Published • 33 -
Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Paper • 2310.19102 • Published • 11
Semantic Segmentation
-
Linking Points With Labels in 3D: A Review of Point Cloud Semantic Segmentation
Paper • 1908.08854 • Published • 1 -
Segment Anything
Paper • 2304.02643 • Published • 4 -
Segment and Caption Anything
Paper • 2312.00869 • Published • 21 -
Open-Vocabulary Audio-Visual Semantic Segmentation
Paper • 2407.21721 • Published • 9
Code Generation
-
Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation
Paper • 2310.18628 • Published • 8 -
ChatCoder: Chat-based Refine Requirement Improves LLMs' Code Generation
Paper • 2311.00272 • Published • 11 -
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
Paper • 2312.04474 • Published • 33 -
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement
Paper • 2402.14658 • Published • 84
Generative
-
Idempotent Generative Network
Paper • 2311.01462 • Published • 26 -
Adaptive Shells for Efficient Neural Radiance Field Rendering
Paper • 2311.10091 • Published • 20 -
Generative Powers of Ten
Paper • 2312.02149 • Published • 8 -
DreamVideo: Composing Your Dream Videos with Customized Subject and Motion
Paper • 2312.04433 • Published • 10
AGI
music
-
A Novel 1D State Space for Efficient Music Rhythmic Analysis
Paper • 2111.00704 • Published -
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Paper • 2312.09911 • Published • 55 -
Music Style Transfer with Time-Varying Inversion of Diffusion Models
Paper • 2402.13763 • Published • 11 -
ChatMusician: Understanding and Generating Music Intrinsically with LLM
Paper • 2402.16153 • Published • 61
Datasets
-
Aria Everyday Activities Dataset
Paper • 2402.13349 • Published • 32 -
YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
Paper • 2402.13616 • Published • 48 -
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Paper • 2402.19479 • Published • 35 -
Evaluating D-MERIT of Partial-annotation on Information Retrieval
Paper • 2406.16048 • Published • 36
Watermarking
Text-to-Embedding
image-to-3D
OCR
DeepSeek
video segmentation
-
Tracking Anything with Decoupled Video Segmentation
Paper • 2309.03903 • Published • 28 -
ProPainter: Improving Propagation and Transformer for Video Inpainting
Paper • 2309.03897 • Published • 27 -
UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces
Paper • 2312.15715 • Published • 21 -
SAM 2: Segment Anything in Images and Videos
Paper • 2408.00714 • Published • 116
Generation 3D
-
MVDream: Multi-view Diffusion for 3D Generation
Paper • 2308.16512 • Published • 102 -
Learning Disentangled Avatars with Hybrid 3D Representations
Paper • 2309.06441 • Published • 6 -
Dynamic Mesh-Aware Radiance Fields
Paper • 2309.04581 • Published • 7 -
Towards Practical Capture of High-Fidelity Relightable Avatars
Paper • 2309.04247 • Published • 10
Text-to-Audio
-
Large-Scale Automatic Audiobook Creation
Paper • 2309.03926 • Published • 54 -
FoleyGen: Visually-Guided Audio Generation
Paper • 2309.10537 • Published • 9 -
MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models
Paper • 2310.11954 • Published • 25 -
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Paper • 2310.00704 • Published • 21
LLM
Multimodal LLM
-
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset
Paper • 2309.04662 • Published • 24 -
Neurons in Large Language Models: Dead, N-gram, Positional
Paper • 2309.04827 • Published • 17 -
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
Paper • 2309.05516 • Published • 10 -
DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs
Paper • 2309.03907 • Published • 12
Prompting
-
From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting
Paper • 2309.04269 • Published • 33 -
FIAT: Fusing learning paradigms with Instruction-Accelerated Tuning
Paper • 2309.04663 • Published • 6 -
Effective Long-Context Scaling of Foundation Models
Paper • 2309.16039 • Published • 30 -
Progressive3D: Progressively Local Editing for Text-to-3D Content Creation with Complex Semantic Prompts
Paper • 2310.11784 • Published • 11
Text-to-Image
-
PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models
Paper • 2309.05793 • Published • 50 -
InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation
Paper • 2309.06380 • Published • 32 -
ImageBind-LLM: Multi-modality Instruction Tuning
Paper • 2309.03905 • Published • 17 -
DreamStyler: Paint by Style Inversion with Text-to-Image Diffusion Models
Paper • 2309.06933 • Published • 13
Representations
-
Natural Language Supervision for General-Purpose Audio Representations
Paper • 2309.05767 • Published • 9 -
AudioSR: Versatile Audio Super-resolution at Scale
Paper • 2309.07314 • Published • 28 -
FoleyGen: Visually-Guided Audio Generation
Paper • 2309.10537 • Published • 9 -
Toward Joint Language Modeling for Speech Units and Text
Paper • 2310.08715 • Published • 10
Transformers
-
Uncovering mesa-optimization algorithms in Transformers
Paper • 2309.05858 • Published • 13 -
ProPainter: Improving Propagation and Transformer for Video Inpainting
Paper • 2309.03897 • Published • 27 -
Approximating Two-Layer Feedforward Networks for Efficient Transformers
Paper • 2310.10837 • Published • 11 -
CLEX: Continuous Length Extrapolation for Large Language Models
Paper • 2310.16450 • Published • 10
Robot
-
LEAP Hand: Low-Cost, Efficient, and Anthropomorphic Hand for Robot Learning
Paper • 2309.06440 • Published • 11 -
Robotic Table Tennis: A Case Study into a High Speed Learning System
Paper • 2309.03315 • Published • 7 -
Video Language Planning
Paper • 2310.10625 • Published • 11 -
RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation
Paper • 2311.01455 • Published • 30
Vision Transformers
-
Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts
Paper • 2309.04354 • Published • 15 -
Vision Transformers Need Registers
Paper • 2309.16588 • Published • 80 -
AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models
Paper • 2309.16414 • Published • 19 -
MotionLM: Multi-Agent Motion Forecasting as Language Modeling
Paper • 2309.16534 • Published • 15
Diffusion Model
-
InstructDiffusion: A Generalist Modeling Interface for Vision Tasks
Paper • 2309.03895 • Published • 14 -
ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning
Paper • 2309.16650 • Published • 10 -
CCEdit: Creative and Controllable Video Editing via Diffusion Models
Paper • 2309.16496 • Published • 9 -
FreeNoise: Tuning-Free Longer Video Diffusion Via Noise Rescheduling
Paper • 2310.15169 • Published • 10
text-to-3D
-
Text2Control3D: Controllable 3D Avatar Generation in Neural Radiance Fields using Geometry-Guided Text-to-Image Diffusion Model
Paper • 2309.03550 • Published • 12 -
Text-Guided Generation and Editing of Compositional 3D Avatars
Paper • 2309.07125 • Published • 7 -
Progressive3D: Progressively Local Editing for Text-to-3D Content Creation with Complex Semantic Prompts
Paper • 2310.11784 • Published • 11 -
GaussianDreamer: Fast Generation from Text to 3D Gaussian Splatting with Point Cloud Priors
Paper • 2310.08529 • Published • 18
Text-to-Video
-
Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation
Paper • 2309.03549 • Published • 6 -
CCEdit: Creative and Controllable Video Editing via Diffusion Models
Paper • 2309.16496 • Published • 9 -
EvalCrafter: Benchmarking and Evaluating Large Video Generation Models
Paper • 2310.11440 • Published • 17 -
LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation
Paper • 2310.10769 • Published • 9
ML
-
Statistical Rejection Sampling Improves Preference Optimization
Paper • 2309.06657 • Published • 14 -
Efficient Monotonic Multihead Attention
Paper • 2312.04515 • Published • 8 -
Layerwise Recurrent Router for Mixture-of-Experts
Paper • 2408.06793 • Published • 33 -
Scaling Up Diffusion and Flow-based XGBoost Models
Paper • 2408.16046 • Published • 10
RLHF
-
Stabilizing RLHF through Advantage Model and Selective Rehearsal
Paper • 2309.10202 • Published • 11 -
Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions
Paper • 2309.10150 • Published • 25 -
Robotic Offline RL from Internet Videos via Value-Function Pre-Training
Paper • 2309.13041 • Published • 8 -
Voyager: An Open-Ended Embodied Agent with Large Language Models
Paper • 2305.16291 • Published • 10
Video 優化
Image Completion
-
RealFill: Reference-Driven Generation for Authentic Image Completion
Paper • 2309.16668 • Published • 14 -
DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual Design
Paper • 2310.15144 • Published • 14 -
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Paper • 2201.12086 • Published • 3 -
TiC-CLIP: Continual Training of CLIP Models
Paper • 2310.16226 • Published • 9
Others
-
CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion
Paper • 2310.11248 • Published • 4 -
Textbooks Are All You Need II: phi-1.5 technical report
Paper • 2309.05463 • Published • 87 -
When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale
Paper • 2309.04564 • Published • 16 -
What's In My Big Data?
Paper • 2310.20707 • Published • 11
multimodal
-
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Paper • 2310.09478 • Published • 21 -
Multimodal ChatGPT for Medical Applications: an Experimental Study of GPT-4V
Paper • 2310.19061 • Published • 8 -
OneLLM: One Framework to Align All Modalities with Language
Paper • 2312.03700 • Published • 24
Auto
Vision-Language
-
SILC: Improving Vision Language Pretraining with Self-Distillation
Paper • 2310.13355 • Published • 9 -
Woodpecker: Hallucination Correction for Multimodal Large Language Models
Paper • 2310.16045 • Published • 17 -
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Paper • 2201.12086 • Published • 3 -
ImageNetVC: Zero-Shot Visual Commonsense Evaluation on 1000 ImageNet Categories
Paper • 2305.15028 • Published • 1
Application
-
NExT-GPT: Any-to-Any Multimodal LLM
Paper • 2309.05519 • Published • 78 -
Large Language Model for Science: A Study on P vs. NP
Paper • 2309.05689 • Published • 21 -
AstroLLaMA: Towards Specialized Foundation Models in Astronomy
Paper • 2309.06126 • Published • 18 -
Large Language Models for Compiler Optimization
Paper • 2309.07062 • Published • 23
Optimization
-
Large Language Models for Compiler Optimization
Paper • 2309.07062 • Published • 23 -
Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time
Paper • 2310.17157 • Published • 14 -
FP8-LM: Training FP8 Large Language Models
Paper • 2310.18313 • Published • 33 -
Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Paper • 2310.19102 • Published • 11
Cost
Semantic Segmentation
-
Linking Points With Labels in 3D: A Review of Point Cloud Semantic Segmentation
Paper • 1908.08854 • Published • 1 -
Segment Anything
Paper • 2304.02643 • Published • 4 -
Segment and Caption Anything
Paper • 2312.00869 • Published • 21 -
Open-Vocabulary Audio-Visual Semantic Segmentation
Paper • 2407.21721 • Published • 9
Video Generation
-
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation
Paper • 2310.19512 • Published • 16 -
MagicDance: Realistic Human Dance Video Generation with Motions & Facial Expressions Transfer
Paper • 2311.12052 • Published • 32 -
Fast View Synthesis of Casual Videos
Paper • 2312.02135 • Published • 11 -
DreamVideo: Composing Your Dream Videos with Customized Subject and Motion
Paper • 2312.04433 • Published • 10
Code Generation
-
Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation
Paper • 2310.18628 • Published • 8 -
ChatCoder: Chat-based Refine Requirement Improves LLMs' Code Generation
Paper • 2311.00272 • Published • 11 -
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
Paper • 2312.04474 • Published • 33 -
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement
Paper • 2402.14658 • Published • 84
ASR
-
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling
Paper • 2311.00430 • Published • 58 -
MSTRE-Net: Multistreaming Acoustic Modeling for Automatic Lyrics Transcription
Paper • 2108.02625 • Published • 1 -
FLAP: Fast Language-Audio Pre-training
Paper • 2311.01615 • Published • 18 -
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities
Paper • 2402.01831 • Published • 15
Generative
-
Idempotent Generative Network
Paper • 2311.01462 • Published • 26 -
Adaptive Shells for Efficient Neural Radiance Field Rendering
Paper • 2311.10091 • Published • 20 -
Generative Powers of Ten
Paper • 2312.02149 • Published • 8 -
DreamVideo: Composing Your Dream Videos with Customized Subject and Motion
Paper • 2312.04433 • Published • 10
Whisper
AGI
Funny
music
-
A Novel 1D State Space for Efficient Music Rhythmic Analysis
Paper • 2111.00704 • Published -
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Paper • 2312.09911 • Published • 55 -
Music Style Transfer with Time-Varying Inversion of Diffusion Models
Paper • 2402.13763 • Published • 11 -
ChatMusician: Understanding and Generating Music Intrinsically with LLM
Paper • 2402.16153 • Published • 61
SVC
Datasets
-
Aria Everyday Activities Dataset
Paper • 2402.13349 • Published • 32 -
YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
Paper • 2402.13616 • Published • 48 -
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Paper • 2402.19479 • Published • 35 -
Evaluating D-MERIT of Partial-annotation on Information Retrieval
Paper • 2406.16048 • Published • 36
yolo
Watermarking
生成式AI導論 2024
https://www.youtube.com/@HungyiLeeNTU
-
Re3: Generating Longer Stories With Recursive Reprompting and Revision
Paper • 2210.06774 • Published • 2 -
Constitutional AI: Harmlessness from AI Feedback
Paper • 2212.08073 • Published • 2 -
AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls
Paper • 2402.04253 • Published -
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate
Paper • 2305.19118 • Published
Text-to-Embedding
RAG
image-to-3D
Music Captions
-
Futga: Towards Fine-grained Music Understanding through Temporally-enhanced Generative Augmentation
Paper • 2407.20445 • Published • 23 -
LP-MusicCaps: LLM-Based Pseudo Music Captioning
Paper • 2307.16372 • Published • 38 -
The Song Describer Dataset: a Corpus of Audio Captions for Music-and-Language Evaluation
Paper • 2311.10057 • Published • 1 -
MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models
Paper • 2408.01337 • Published • 12
OCR
Audio
-
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation
Paper • 2408.07547 • Published • 8 -
DeepSpeak Dataset v1.0
Paper • 2408.05366 • Published • 14 -
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Paper • 2408.15998 • Published • 88 -
Zero-shot Cross-lingual Voice Transfer for TTS
Paper • 2409.13910 • Published • 10