Phan Hoang's picture

Phan Hoang

phanhoang

·

AI & ML interests

None yet

Organizations

None yet

upvoted a paper over 1 year ago

Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion

Paper • 2412.04424 • Published Dec 5, 2024 • 62

upvoted 2 articles over 1 year ago

Article

ColFlor: Towards BERT-Size Vision-Language Document Retrieval Models

ahmed-masry

•

Oct 18, 2024

• 21

Article

Visually Multilingual: Introducing mcdse-2b

marco

•

Oct 27, 2024

• 41

upvoted a collection over 1 year ago

DocLayout-YOLO

Dataset and model for DocLayout-YOLO • 10 items • Updated Jan 14, 2025 • 21

upvoted a paper over 1 year ago

Loong: Generating Minute-level Long Videos with Autoregressive Language Models

Paper • 2410.02757 • Published Oct 3, 2024 • 36

upvoted 3 collections over 1 year ago

Emu3

Emu3: Next-Token Prediction is All You Need • 7 items • Updated Feb 4 • 81

Molmo

Artifacts for open multimodal language models. • 5 items • Updated Dec 23, 2025 • 309

Qwen2.5

Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 43 items • Updated Mar 2 • 719

upvoted 2 papers over 1 year ago

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Paper • 2409.12191 • Published Sep 18, 2024 • 79

General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Paper • 2409.01704 • Published Sep 3, 2024 • 83

upvoted an article over 1 year ago

Article

Making LLMs lighter with AutoGPTQ and transformers

+4

marcsun13, fxmarty, PanEa, qwopqwop, ybelkada, TheBloke

•

Aug 23, 2023

• 64

upvoted 2 collections over 1 year ago

Awesome Document AI

A collection of open-source document AI 📄 📝 📈 • 27 items • Updated Mar 11, 2024 • 80

Qwen2-VL

Vision-language model series based on Qwen2 • 15 items • Updated Mar 2 • 231

upvoted an article over 1 year ago

Article

Fine-tune Llama 3 with ORPO

mlabonne

•

Apr 22, 2024

• 240

upvoted a paper over 1 year ago

Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models

Paper • 2408.02442 • Published Aug 5, 2024 • 21

upvoted 2 collections over 1 year ago

Function Calling Dataset

7 items • Updated Dec 5, 2023 • 10

Papers I want to read

Papers in my to-read list • 259 items • Updated Jan 10, 2025 • 32

upvoted 2 articles over 1 year ago

Article

Tool Use, Unified

Rocketknight1

•

Aug 12, 2024

• 120

Article

Introducing TextImage Augmentation for Document Images

+1

danaaubakirova, Molbap, Ternaus

•

Aug 6, 2024

• 33

upvoted an article almost 2 years ago

Article

A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes

ybelkada, timdettmers

•

Aug 17, 2022

• 131