Baidu allegedly launched a new artificial intelligence (AI) video generating technique on Wednesday. According to the study, the MuseStreamer AI model can also combine Chinese audio into created films, making it the second such model after Google's Veo 3. The tech behemoth says it is the world's first AI model that supports native Chinese audio creation. Along with the launching of the large language model (LLM), the business apparently introduced a new video content development platform called HuiXiang. Notably, MuseStreamer and HuiXiang are now unavailable outside of China.
Baidu's MuseStreamer can reportedly generate Chinese audio
The AI video creation methodology has progressed dramatically during the previous two years. We've progressed from models that struggled to create individuals with a set amount of fingers to LLMs that can now properly represent realistic physics and motion. However, most AI players have avoided videos that include native audio capabilities.
At Google I/O 2025, the tech titan became the first to provide this feature with Veo 3, which quickly became the talk of the town, leaving its main competitor, OpenAI's Sora, behind. The Mountain View-based tech firm has launched Veo 3 to all 154 countries where the Gemini app is accessible, demonstrating the company's relentless push for this technology.
However, according to a Tech in Asia story (via AI Base), Chinese tech behemoth Baidu has joined the game with its MuseStream AI model. It is supposed to be the only model that can produce videos with Chinese audio. Notably, Veo 3 only generates audio in English.
MuseStreamer is said to be capable of not only creating synchronized conversations for videos, but also of adding sound effects and ambient sounds to them. Baidu claims that the model scored 89.38% on the VBench I2V benchmark, placing it first. The tech behemoth is promoting the LLM as a consumer-friendly content creation tool.
Along with the AI model, Baidu apparently developed HuiXiang, a new video content platform. HuiXiang is believed to be the front-end for the AI model, allowing users to exchange instructions and make movies. The technology now allows 10-second video creation at 1080p quality, according to the article. In comparison, Veo 3 can only create eight-second videos. There is no clarity on the default aspect ratio of the video or whether users may make videos in alternative aspect ratios.