Decode Sora – OpenAI's viral video creation AI Malu Design

Sora doesn’t create videos by stitching together multiple photos, but rather renders pixels in real time based on an understanding of physical movement.

OpenAI ‘s Sora is considered by experts to be the generative AI tool that produces the best quality footage today. “Sora marks a leap forward in the field of text-to-video conversion,” ABC News assessed.

Meanwhile, Time said that before Sora appeared, the world already had video-making AI models such as Runway and Pika. However, their weaknesses are poor video quality and short duration. Meanwhile, Sora can create 60-second videos with complex contexts but still ensure smoothness and logic, even though there are still some errors.

OpenAI’s breakthrough

OpenAI has not disclosed its model for creating videos from text to the public. In the description, the company also mentioned very little about the technology behind it and the data source used for training.

“Sora uses a diffusion model, creating video by starting with a noisy, low-resolution video then removing the noise through multiple steps until the output is satisfactory,” says the company behind ChatGPT. about how Sora works. Thanks to that, this AI is capable of creating the entire video at once instead of short segments and combining them like other tools. The algorithm allows the model to predict multiple frames at once to ensure the subject remains intact while other details are reproduced.

sora 1708358838 9912 1708360842 — Simulates how Sora creates videos by removing noise through algorithms. Source: *Medium*

According to OpenAI, Sora builds on previous research on Dall-E image generation AI and ChatGPT text generation . However, Dr. Jim Fan, senior AI researcher at Nvidia, commented: “If you still think Sora is just a creative toy like Dall-E, think again. It is a data-driven system physics model that can simulate both the real and virtual world.”

Sora is an end-to-end diffusion transformer model, he noted. Its secret lies in the ability to deeply understand text before converting it into 3D visual forms. From here, the model continues to make predictions based on physical motion rules to transform each pixel for the video as accurately as possible.

“Sora’s simulator is not only based on learned data, it can also train itself, finding the most correct results to continue composing,” Fan analyzed. What makes Sora different, he says, is that it doesn’t create video by assembling a sequence of discrete images but renders sets of pixels in real time.

Sora releases 5 videos at the same time based on the request to describe the scene with 5 perspectives. Author Bill Peebles said he did not intervene, but the AI automatically assembled the complete video.

This reminds experts of the AI model to solve Olympic math by three Vietnamese-origin doctors published in the scientific journal Nature last month. In the description of Sora’s technical operations, OpenAI also affirmed that this video creation model will serve as a foundation for AI to understand and simulate the real world.

“We believe this will be an important milestone to achieve AGI,” OpenAI stated.

Sora’s weakness

According to Medium , synthesizing text into video is a challenging task because it requires AI to understand the meaning and context of the text as well as other aspects of images, videos, and physical movements. One of the reasons OpenAI limits Sora to a small trial group is because it still has some drawbacks.

“Sora may have difficulty accurately simulating the physics of a complex scene. It may not understand cause-and-effect statements correctly,” OpenAI admitted.

For example, the company Sora can create a video of a person biting a cookie, but then the cookie is intact and has no bite marks. It can also confuse left, right, front and back details, for example the image of a man running backwards on a treadmill.

However, according to analysts, Sora’s biggest concern lies in OpenAI’s breakthrough. The videos created are so lifelike that many people fear the model can be abused to spread false content, violate privacy rights, be racist, and even influence the outcome of debates. vote. Although it prohibits the use of Sora to create bad content, the company has not yet found a way to recognize which images are created by AI and which are real to label and classify.

Fred Havemeyer, head of Macquarie’s AI research department, said that Sora’s incredible abilities will cause many concerns related to ethics and impact on society. According to him, the negative impact of AI will be the most debated topic in 2024 and Sora is the opening shot.

According to the New York Times , OpenAI still carefully hides information about where the content used to train Sora comes from and how much of it is copyrighted. “Maybe they want to keep it secret to maintain a competitive advantage, but maybe they are afraid of being sued related to copyright, similar to the trouble ChatGPT is facing,” this page wrote.

However, analysts all agree that Sora is opening a new era of AI video creation, similar to how ChatGPT appeared. When officially commercialized, it can directly impact the film, media and game design industries.

Reece Hayden, senior analyst at ABI Research, said on CBS News that in the future, AI like Sora will even change the operating model of platforms like Netflix when users can edit the ending of the story. Or create your own movie with just lines of text.

News - Case Study

Decode Sora – OpenAI’s viral video creation AI

Bài viết liên quan

Vũ Ngọc Hùng