Researchers from the Hong Kong University of Science and Technology and Moonshot AI have developed a new AI model called AudioX.AudioX is a unified model that generates audio and music using multimodal inputs, such as text, video, image, music, and audio.The model offers various use cases, including text-to-audio, text-and-video-to-audio, and video-to-audio conversion.The researchers aim to address the scarcity of high-quality multi-modal data and improve the field of multi-modal audio generation.