Back To Top

February 13, 2024

The New MetaVoice-1B was just Released. Get Started Here.

Features, Getting Started and Gradio Playground

A new player in the text-to-speech and voice cloning arena, MetaVoice, has unveiled its innovative technology, MetaVoice 1B.

Distinguished by its open-source status under the Apache license, this technology invites widespread tinkering and enhancements.

Furthermore, MetaVoice 1B powers through with 1.2 billion parameters, refined through an extensive dataset of 100,000 hours of voice recordings.

Moreover, this technology stands out for its ability to accurately clone American and British voices from merely 30 seconds of sample audio in a zero-shot manner.

Additionally, plans are underway to expand its capabilities to include voice cloning customization for a variety of accents and languages.

Notably, engineers designed MetaVoice 1B to produce emotionally resonant speech and prevent the creation of non-existent words, a common issue in some competing models.

State-of-the-art causal and non-causal transformer models, multi-band diffusion techniques, and an advanced deep filter network combine to form the backbone of MetaVoice 1B’s design, thereby enhancing audio quality.

Key Characteristics

MetaVoice-1B is a 1.2B parameter base model for TTS (text-to-speech). It has been built with the following priorities:

  • Emotional speech rhythm and tone in English.
  • Zero-shot cloning for American & British voices, with 30s reference audio.
  • Support for voice cloning with finetuning.
    • success with as little as 1 minute training data for Indian speakers.
  • Support for long-form synthesis.

Lastly, the developers release the model under Apache 2.0. See Github for details and to contribute.

Getting Started in Python

1. Installation
				
					# # Install ffmpeg using the following commands:
wget https://johnvansickle.com/ffmpeg/builds/ffmpeg-git-amd64-static.tar.xz
wget https://johnvansickle.com/ffmpeg/builds/ffmpeg-git-amd64-static.tar.xz.md5

# Run md5sum to check the integrity of the downloaded ffmpeg package
md5sum -c ffmpeg-git-amd64-static.tar.xz.md5

# Extract the package by running tar xvf ffmpeg-git-amd64-static.tar.xz.
tar xvf ffmpeg-git-amd64-static.tar.xz

# Move the ffmpeg and ffprobe binaries to /usr/local/bin/ using sudo mv.
sudo mv ffmpeg-git-*-static/ffprobe ffmpeg-git-*-static/ffmpeg /usr/local/bin/

# Remove the installation files with rm -rf ffmpeg-git-.
rm -rf ffmpeg-git-*

pip install -r requirements.txt

# Works only on lasest NVidia GPUs. If you have a different GPU, do not install this.
pip install flash-attn

pip install -e .
				
			
2. Usage
				
					python fam/llm/serving.py --huggingface_repo_id="metavoiceio/metavoice-1B-v0.1"
				
			
3. Deploy
				
					python fam/llm/serving.py --huggingface_repo_id="metavoiceio/metavoice-1B-v0.1"
				
			
4. Use it via Hugging Space

Worth Reading:

AI Singing Voice Cloning in Python

End-to-End Python Guide for Data Processing, Training and Inference of AI Cloned voices. From Voice Data to using Pre-trained and Custom Models.
Prev Post

Smarter Stock Entries & Exits with K-Reversal Indicator in Python

Next Post

Optimize Portfolio Performance with Risk Parity Rebalancing in Python

post-bars
Mail Icon

Newsletter

Get Every Weekly Update & Insights

[mc4wp_form id=]

Leave a Comment