Microsoft AI Generates Voices That Sing In Chinese and English

An AI system, called DeepSinger is claimed to have been developed by researchers at Microsoft and Zhejiang University. This technology can train on data from music websites, thereby, producing singing voices in several languages.

Reports from the preprint, Arxiv explain the new approach used – how that a specially designed component is exploited to store and capture singers’ sound quality from a disorganized and noisy singing data.

In similarity to Open AI’s Jukebox AI, DeepSinger AI has commercial consequences. Pick-up sessions for the purpose of correcting mistakes, changes, or additions after a recording are often done by music artists.

By implication, AI-assisted voice synthesis will save time and money for the singers’ employer. However, this will put the singers out of work.

An even more unpleasant side to this technology is the creation of false voices of musicians, that can make it look as though they sang lyrics they never did.

Just recently, Jay Z filed a copyright notice under his label, Roc Nation against videos that made him rap Billy Joel’s “We Didn’t Start the Fire” with the use of AI.

The researchers report that normal speaking voices are not as complicated as singing voices with respect to rhythms and patterns. So, synthesizing singing voices is really demanding because of the need to access information relating to duration and pitch control.

Additionally, manual analysis of lyrics and videos must be done for songs used in training, and there are not many singing training data sets available to the public

DeepSinger apparently finds a solution to these challenges with a pipeline that comprises of various data. The system first visits music websites for songs performed by top singers in several languages.

It then uses a music separation tool called Spleeter to extract the singing voices before separating the audio into sentences. DeepSinger further extracts the singing duration of each unit of sound differentiating one word from another in the lyrics.

When the lyrics and singing voices have been separated in accordance with a model-generated confidence score, the system makes use of these components to manage imperfect or distorted training data.

The researchers report that DeepSinger can synthesize high-quality singing voices regarding pitch accuracy and naturalness of voice, from pitch information, duration, lyrics, and reference audio.

Microsoft researchers plan to leverage more sophisticated AI-based technologies for the improvement of voice quality generated on DeepSinger.

Telegram Premium is coming later this month

Apple’s kind of a Bank now –…

Cloud Computing: Why your Health Institution needs…

DuckDuckGo is not as Private as You…

Apple receives an EU Antitrust Charge over…

Using Predictive Analytics in your Business

API Keys And How It Provides Security

The NFT Paradigm Shift toward Metaverse and…

Kids’ Apps That Collect The Most Data

EU Committee agrees on Common Charger for…

Telegram Premium is coming later this month

Apple’s kind of a Bank now – Buy…

Apple receives an EU Antitrust Charge over Mobile…

EU Committee agrees on Common Charger for Mobile…

Why did Google Drop FLoC for Topics and…

Cloud Computing: Why your Health Institution needs one!

DuckDuckGo is not as Private as You Thought

The NFT Paradigm Shift toward Metaverse and Gaming…

What happens to Mobile Apps when AI and…

How to Explain the NFT Boom

What you Need to Know About Metaverses Now

Microsoft AI Generates Voices That Sing In Chinese and English

Boluwatife Ibosiola

Related posts