Viktor Cerny
Nazzaroth2
AI & ML interests
Machine Translation (Focus English-Japanese)
Recent Activity
updated
a collection
7 days ago
VLM RL Reasoning
updated
a collection
13 days ago
RL_Papers in general
updated
a collection
15 days ago
imageGen
Organizations
None yet
data synthesis
OCR
-
Gemma 3 Technical Report
Paper • 2503.19786 • Published • 52 -
Kimi-VL Technical Report
Paper • 2504.07491 • Published • 131 -
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Paper • 2504.10479 • Published • 276 -
FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding
Paper • 2504.09925 • Published • 38
VLM RL Reasoning
-
OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement
Paper • 2503.17352 • Published • 23 -
When Less is Enough: Adaptive Token Reduction for Efficient Image Representation
Paper • 2503.16660 • Published • 73 -
CoMP: Continual Multimodal Pre-training for Vision Foundation Models
Paper • 2503.18931 • Published • 30 -
MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding
Paper • 2503.13964 • Published • 19
llm_compression
Loras
t2i consistency works
small_or_multimodal_llm
long_context
models to test out
RL_Papers in general
-
Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning
Paper • 2504.08672 • Published • 55 -
A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis
Paper • 2504.12322 • Published • 28 -
Learning to Reason under Off-Policy Guidance
Paper • 2504.14945 • Published • 85 -
TTRL: Test-Time Reinforcement Learning
Paper • 2504.16084 • Published • 117
imageGen
-
Latent Space Super-Resolution for Higher-Resolution Image Generation with Diffusion Models
Paper • 2503.18446 • Published • 12 -
Unconditional Priors Matter! Improving Conditional Generation of Fine-Tuned Diffusion Models
Paper • 2503.20240 • Published • 22 -
BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation
Paper • 2503.20672 • Published • 14 -
Beyond Words: Advancing Long-Text Image Generation via Multimodal Autoregressive Models
Paper • 2503.20198 • Published • 4
LLM-External_information
-
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Paper • 2310.11511 • Published • 78 -
Improving Text Embeddings with Large Language Models
Paper • 2401.00368 • Published • 81 -
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
Paper • 2404.05961 • Published • 66
LLM_Reasoning-ErrorCorrection
3D (nerfs, gaussians, generation etc.)
videogames_roleplay
manga_translation
-
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23 -
PALO: A Polyglot Large Multimodal Model for 5B People
Paper • 2402.14818 • Published • 25 -
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 128 -
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
Paper • 2404.06512 • Published • 31
model training
Reward Modeling
models to test out
data synthesis
RL_Papers in general
-
Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning
Paper • 2504.08672 • Published • 55 -
A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis
Paper • 2504.12322 • Published • 28 -
Learning to Reason under Off-Policy Guidance
Paper • 2504.14945 • Published • 85 -
TTRL: Test-Time Reinforcement Learning
Paper • 2504.16084 • Published • 117
OCR
-
Gemma 3 Technical Report
Paper • 2503.19786 • Published • 52 -
Kimi-VL Technical Report
Paper • 2504.07491 • Published • 131 -
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Paper • 2504.10479 • Published • 276 -
FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding
Paper • 2504.09925 • Published • 38
imageGen
-
Latent Space Super-Resolution for Higher-Resolution Image Generation with Diffusion Models
Paper • 2503.18446 • Published • 12 -
Unconditional Priors Matter! Improving Conditional Generation of Fine-Tuned Diffusion Models
Paper • 2503.20240 • Published • 22 -
BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation
Paper • 2503.20672 • Published • 14 -
Beyond Words: Advancing Long-Text Image Generation via Multimodal Autoregressive Models
Paper • 2503.20198 • Published • 4
VLM RL Reasoning
-
OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement
Paper • 2503.17352 • Published • 23 -
When Less is Enough: Adaptive Token Reduction for Efficient Image Representation
Paper • 2503.16660 • Published • 73 -
CoMP: Continual Multimodal Pre-training for Vision Foundation Models
Paper • 2503.18931 • Published • 30 -
MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding
Paper • 2503.13964 • Published • 19
LLM-External_information
-
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Paper • 2310.11511 • Published • 78 -
Improving Text Embeddings with Large Language Models
Paper • 2401.00368 • Published • 81 -
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
Paper • 2404.05961 • Published • 66
llm_compression
LLM_Reasoning-ErrorCorrection
Loras
3D (nerfs, gaussians, generation etc.)
t2i consistency works
videogames_roleplay
small_or_multimodal_llm
manga_translation
-
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23 -
PALO: A Polyglot Large Multimodal Model for 5B People
Paper • 2402.14818 • Published • 25 -
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 128 -
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
Paper • 2404.06512 • Published • 31
long_context
model training