Two months ago, we benchmarked @googleβs Veo2 model. It fell short, struggling with style consistency and temporal coherence, trailing behind Runway, Pika, @tencent, and even @alibaba-pai.
Thatβs changed.
We just wrapped up benchmarking Veo3, and the improvements are substantial. It outperformed every other model by a wide margin across all key metrics. Not just better, dominating across style, coherence, and prompt adherence. It's rare to see such a clear lead in todayβs hyper-competitive T2V landscape.
Dataset coming soon. Stay tuned.
5 replies
Β·
reacted to jasoncorkill's
post with π₯about 2 months ago
We just added Hidream I1 to our T2I leaderboard (https://www.rapidata.ai/leaderboard/image-models) benchmarked using 195k+ human responses from 38k+ annotators, all collected in under 24 hours.
We just published a dataset using a new (for us) preference modality: direct ranking based on aesthetic preference. We ranked a couple of thousand images from most to least preferred, all sampled from the Open Image Preferences v1 dataset by the amazing @data-is-better-together team.