Video generation models as world simulators
This report delves into a method for unifying visual data representation to facilitate the large-scale training of generative models and evaluates the capabilities and limitations of a model named Sora. Unlike previous works that often focus on specific types of visual data, Sora is a generalist model capable of generating videos and images of various durations, aspect ratios, and resolutions.