Guanzhou Ke's picture

25 2

Guanzhou Ke

guanzhouk

·

Guanzhou-Ke

AI & ML interests

Multi-modal learning

Organizations

None yet

guanzhouk's activity

upvoted 2 papers about 3 hours ago

LLaVA-OneVision: Easy Visual Task Transfer

Paper • 2408.03326 • Published Aug 6 • 59

To Code, or Not To Code? Exploring Impact of Code in Pre-training

Paper • 2408.10914 • Published about 1 month ago • 40

upvoted a paper 20 days ago

Law of Vision Representation in MLLMs

Paper • 2408.16357 • Published 22 days ago • 92

upvoted 3 papers 22 days ago

SWE-bench-java: A GitHub Issue Resolving Benchmark for Java

Paper • 2408.14354 • Published 24 days ago • 40

The Mamba in the Llama: Distilling and Accelerating Hybrid Models

Paper • 2408.15237 • Published 23 days ago • 36

Diffusion Models Are Real-Time Game Engines

Paper • 2408.14837 • Published 24 days ago • 119

upvoted a paper 24 days ago

Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published 28 days ago • 109

upvoted 3 papers 26 days ago

SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models

Paper • 2407.15841 • Published Jul 22 • 38

Benchmarking Trustworthiness of Multimodal Large Language Models: A Comprehensive Study

Paper • 2406.07057 • Published Jun 11 • 15

MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine

Paper • 2408.02900 • Published Aug 6 • 25

upvoted a paper 28 days ago

Controllable Text Generation for Large Language Models: A Survey

Paper • 2408.12599 • Published 28 days ago • 61

upvoted 5 papers about 1 month ago

xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

Paper • 2408.08872 • Published Aug 16 • 96

OpenResearcher: Unleashing AI for Accelerated Scientific Research

Paper • 2408.06941 • Published Aug 13 • 28

Imagen 3

Paper • 2408.07009 • Published Aug 13 • 60

Transformer Explainer: Interactive Learning of Text-Generative Models

Paper • 2408.04619 • Published Aug 8 • 152

Sketch2Scene: Automatic Generation of Interactive 3D Game Scenes from User's Casual Sketches

Paper • 2408.04567 • Published Aug 8 • 23

upvoted 4 papers about 2 months ago

Medical SAM 2: Segment medical images as video via Segment Anything Model 2

Paper • 2408.00874 • Published Aug 1 • 40

The Llama 3 Herd of Models

Paper • 2407.21783 • Published Jul 31 • 102

SaulLM-54B & SaulLM-141B: Scaling Up Domain Adaptation for the Legal Domain

Paper • 2407.19584 • Published Jul 28 • 60

SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages

Paper • 2407.19672 • Published Jul 29 • 53

upvoted a paper 2 months ago

Vision language models are blind

Paper • 2407.06581 • Published Jul 9 • 80

upvoted a paper 3 months ago

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

Paper • 2406.16860 • Published Jun 24 • 55

upvoted a paper 4 months ago

What matters when building vision-language models?

Paper • 2405.02246 • Published May 3 • 98

upvoted 2 papers 7 months ago

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27 • 590

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Paper • 2402.17177 • Published Feb 27 • 88