Nvidia has created an AI Audio Generator that the company claims can create sounds that have not previously existed.
The company best known for their microchips claim to have create a ‘created a Swiss Army knife for sound’ in which users can create any sound they can imagine through inputting text prompts.
What is Fugatto?
Fugatto stands for Foundational Generative Audio Transformer Opus 1. It is an AI model that can generate or alter audio.
Fugatto works on any mix of music, voices or sounders described with prompts and can work from a combination of text and audio files.
Read: Google Developed an AI That Creates Music from Text – But Won’t Release it
This means a lot of things, leading headlines is the creation of new and never before heard sounds. However, it can also be used to create music based on text prompts, add or remove instruments to an existing song or even alter the accent or emotion in a voice recording.
What can Fugatto be used for?
Fugatto is not only able to generate new sounds but completely alter existing ones, from musical instruments to changing the accent of a voice.
Although this sounds simply fun to play around with, the real world application of the technology are far reaching and could be incredibly impactful to a number of industries.
In music production, prototype ideas can be quickly edited and bounced around, altering voices and instruments minutely to find new ways of creating.
Read: YouTube Trials 'Dream Track' AI Tool that Clones Singers' Voices
In advertising, Fugatto could be used to alter an existing campaign to be the most impactful across different regions. For example by altering the accents or emotions of voiceovers to make it more specific.
In gaming, developers would be able to use Fugatto to model to alter audio assets for changing action as users make choices in the game.
The model is not limited to single sounds. The temporal interpolation features mean evolving soundscapes can be created, for example, a rainstorm that moves through a set area, featuring crescendoing thunder. Users are able to control with fine grain specificity the way the soundscape evolves.
Nvidia also makes clear the possibility for ‘joyful noise’, meaning novel sounds that do not previously exist. They cite the example ‘ Fugatto can make a trumpet bark or a saxophone meow’.
How does Fugatto work?
On the technical side Fugatto is what's known as a Foundational Generative Transformer Model. The model builds on Nvidia's previous work audio vocoding, speech modeling and audio understanding.
The model in its full version uses 2.5 billion parameters to work. It was trained on a massive amount of NVIDIA DGX systems packing 32 NVIDIA H100 Tensor Core GPUs.
In training the model, the Nvidia team used a multifaceted strategy to generate data to continuously expand the range of tasks that Fugatti could perform, all whilst achieving higher accuracy without requiring additional data.
The team also scrutinised existing datasets in order to discover new relationships between datapoints.
How To Use Fugatto?
Although Nvidia have released an extensive blog post about Fugatto, how it works and the potential applications they have currently not revealed a timeline for use.
Beyond the blog post, currently, the only publicly available information on Fugatto is the research paper itself.
The model will likely be released through one of Nvidia's partners in the future.