Kuaishou releases “KeLing” video model, similar to Sora’s technical route, which can generate more than 120 seconds of 1080P video
Kuaishou’s latest domestically produced video generation model, the “KeLing” model, adopts a technology route similar to Sora and combines a number of self-developed technological innovations. It can generate videos with a duration of more than 120 seconds and a resolution of up to 1080p, and can accurately model complex motion and physical properties.
Main features
1. High-quality video generation
- Duration and frame rate : KeLing supports the generation of ultra-long videos with a duration of up to 2 minutes and 30fps.
- Resolution : The resolution of the generated video is up to 1080p, with clear and delicate picture quality.
- Aspect ratio : Supports video generation with multiple aspect ratios, including vertical videos, to suit different usage scenarios and platforms.
2. Physical world simulation
- Realistic physical properties : The Klingda model can simulate the physical properties of the real world, such as gravity, light and shadow reflection, liquid flow, etc.
- Detailed depiction : The depiction of details such as object movement, surface reflection, shadow changes, etc. is very accurate, providing a realistic visual experience.
3. Complex motion characterization
- Precise Motion Modeling : Ability to accurately model complex and large-scale motion scenes, such as animals running at high speed, astronauts walking on the moon, etc.
- Continuity : The generated video images are coherent, the movements are smooth, and the subtle changes during the movement can be realistically reproduced.
4. Various control information input
- Control information input : supports users to input control information such as camera movement, frame rate, edge/key point/depth, and provides rich content control capabilities.
- Text prompt word optimization : A dedicated language model is designed to perform high-quality expansion and optimization of the prompt words entered by users, thereby improving the generation effect.
Technical realization
1. Model design
- Sora-like architecture : It adopts a Sora-like DiT structure and uses Transformer to replace the convolutional network in the traditional diffusion model to improve the generation capability and scalability.
- 3D VAE network : Self-developed 3D VAE network achieves spatiotemporal synchronous compression and improves video reconstruction quality.
- Full Attention Mechanism : A 3D Attention mechanism is designed for spatiotemporal modeling, which can accurately model complex spatiotemporal motion while taking into account computational efficiency.
2. Data protection
- Labeling system : A complete labeling system has been built to fine-tune and adjust the training data to ensure the high quality of video data.
- Video description model : A video description model was developed to generate accurate, detailed, and structured video descriptions and improve the responsiveness to text commands.
3. Computational efficiency
- Distributed training cluster : Use distributed training clusters to significantly improve hardware utilization through operator optimization, recalculation strategy optimization, and other means.
- Phased training strategy : A phased training strategy is adopted, first enhancing the model capabilities through a large amount of data in the low-resolution stage, and then improving the detail performance in the high-resolution stage.
Some examples
Large-scale reasonable exercise
Video generation up to 2 minutes long
Simulating physical world properties
- Strong concept combination ability
Movie-quality image generation
Supports free output video aspect ratio
Expression and body drive
Based on self-developed 3D face and body reconstruction technology, combined with background stability and redirection modules, the expression and body full drive technology is realized. With only a full-body photo, you can experience the vivid “singing and dancing” gameplay
- Official website: https://kling.kuaishou.com/
- More test results: https://waytoagi.feishu.cn/wiki/GevKwyEt1i4SUVk0q2JcqQFtnRd