This dataset was collected in roughly 4 hours using the Rapidata Python API, showcasing how quickly large-scale annotations can be performed with the right tooling!
All that at less than the cost of a single hour of a typical ML engineer in Zurich!
The new dataset of ~22,000 human annotations evaluating AI-generated videos based on different dimensions, such as Prompt-Video Alignment, Word for Word Prompt Alignment, Style, Speed of Time flow and Quality of Physics.
Runway Gen-3 Alpha: The Style and Coherence Champion
Runway's latest video generation model, Gen-3 Alpha, is something special. It ranks #3 overall on our text-to-video human preference benchmark, but in terms of style and coherence, it outperforms even OpenAI Sora.
However, it struggles with alignment, making it less predictable for controlled outputs.
We've released a new dataset with human evaluations of Runway Gen-3 Alpha: Rapidata's text-2-video human preferences dataset. If you're working on video generation and want to see how your model compares to the biggest players, we can benchmark it for you.
We benchmarked @xai-org 's Aurora model, as far as we know the first public evaluation of the model at scale.
We collected 401k human annotations in over the past ~2 days for this, we have uploaded all of the annotation data here on huggingface with a fully permissive license Rapidata/xAI_Aurora_t2i_human_preferences