Скачать xVATrainer на Windows
Описание xVATrainer
xVATrainer is the companion app to xVASynth, the AI text-to-speech app using video game voices. xVATrainer is used for creating the voice models for xVASynth, and for curating and pre-processing the datasets used for training these models.
The main screen of xVATrainer contains a dataset explorer, which gives you an easy way to view, analyze, and adjust the data samples in your dataset. It further provides recording capabilities, if you need to record a dataset of your own voice, straight through the app, into the correct format.
xVATrainer contains AI model training, for the FastPitch1.1 (with modified training set-up), and HiFi-GAN models (the xVASynth "v2" models). The training follows a multi-stage approach especially optimized for maximum transfer learning (fine-tuning) quality. The generated models are exported into the correct format required by xVASynth, ready to use for generating audio with.
Batch training is also supported, allowing you to queue up any number of datasets to train, with cross-session persistence. The training panel shows a cmd-like textual log of the training progress, a tensorboard-like visual graph for the most relevant metrics, and a task manager-like set of system resources graphs.
There are several data pre-processing tools included in xVATrainer, to help you with almost any data preparation work you may need to do, to prepare your datasets for training. There is no step-by-step order that they need to be operated in, so long as your datasets end up as 22050Hz mono wav files of clean speech audio, up to about 10 seconds in length, with an associated transcript file with each audio file's transcript. Depending on what sources your data is from, you can pick which tools you need to use, to prepare your dataset to match that format. The included tools are:
PTC001, Hector Medima, CinnaMewRoll, Grant Spielbusch, Sean Lyons, Charles Hufnagel, Kirill Akimov, Mister Lyosea, Anthony Crane, Sh1tMagnet, Rachel Wiles, Pimphat, NaMu , Kelly Roth, John Detwiler, Veks , Tempuc, ratbaby, Brennen Hahn, Benoit Jauvin-Girard, stljeffbb, DirectDogman, Ulik , Stormalize , Golem, Václav Švec, Adrilz, Hammerhead96 ., Jacob Porter, Strength, Majoros Kristóf, Michael Gill, John S., Jacob Garbe, Bart Kelsey, Idiotenschnitzel, Joe Bob Slim, Mikkel Jensen, Katherine Fishwick, Youbetterwork , Jaktt1337, David Keith vun Kannon, Bob, Imogen, Yic Zeiros, Danielle, Optimist Vamscenes, David , Hawkbat , Tom Harkness, Brandon Reynolds, Alex East, Rory Beaker, ionite, Snoutpunk, Joshua Jones, PatronGuy , flyingvelociraptor, crash blue, Yualien Lunaris, Sergey Trifonov, Anshela Asre, Leif , VGC-VR , David , Caden Black, Katsuki , Calvin Farage, hairahcaz, Just Becca, Solstice_, Max Loef, CHASE MCKELVY, dollspit, Loni Dennis, SpaceD0lphin, AcademicInside, lord parker, PConD, Joseph Paul Dennison, Krazon, Tara Cooksey, Caro Tuts, Blythe, Snud Swimp, Tako-kun, Retlaw83, Yael van Dok, FinalFrog, Donald Bass, Hazel Louise Steele, J. Quint, Lulzar, Vahzah Vulom, Ryan W, Laura Almeida, Alexandra Whitton, Zelda Hadley, Cookie , Pseudo Immortal, My Best Friend Is A Squid, Agito Rivers, Thuggysmurf, radbeetle
Dataset annotation
The main screen of xVATrainer contains a dataset explorer, which gives you an easy way to view, analyze, and adjust the data samples in your dataset. It further provides recording capabilities, if you need to record a dataset of your own voice, straight through the app, into the correct format.
Trainer
xVATrainer contains AI model training, for the FastPitch1.1 (with modified training set-up), and HiFi-GAN models (the xVASynth "v2" models). The training follows a multi-stage approach especially optimized for maximum transfer learning (fine-tuning) quality. The generated models are exported into the correct format required by xVASynth, ready to use for generating audio with.
Batch training is also supported, allowing you to queue up any number of datasets to train, with cross-session persistence. The training panel shows a cmd-like textual log of the training progress, a tensorboard-like visual graph for the most relevant metrics, and a task manager-like set of system resources graphs.
Tools
There are several data pre-processing tools included in xVATrainer, to help you with almost any data preparation work you may need to do, to prepare your datasets for training. There is no step-by-step order that they need to be operated in, so long as your datasets end up as 22050Hz mono wav files of clean speech audio, up to about 10 seconds in length, with an associated transcript file with each audio file's transcript. Depending on what sources your data is from, you can pick which tools you need to use, to prepare your dataset to match that format. The included tools are:
- Audio formatting - a tool to convert from most audio formats into the required 22050Hz mono .wav format
- AI speaker diarization - an AI model that automatically extracts short slices of speech audio from otherwise longer audio samples (including feature length movie sized audio clips). The audio slices are additionally separated automatically into different individual speakers
- AI source separation - an AI model that can remove background noise, music, and echo from an audio clip of speech
- Audio Normalization - a tool which normalizes (EBU R128) audio to standard loudness
- WEM to OGG - a tool to convert from a common audio format found in game files, to a playable .ogg format. Use the "Audio formatting" tool to convert this to the required .wav format
- Cluster speakers - a tool which uses an AI model to encode audio files, and then clusters them into a known or unknown number of clusters, either separating multiple speakers, or single-speaker audio styles
- Speaker similarity search - a tool which encoders some query files, a larger corpus of audio files, and then re-orders the larger corpus according to each file's similarity to all the query files
- Speaker cluster similarity search - the same as the "Speaker similarity search" tool, but using clusters calculated via the "Cluster speakers" tool as data points in the corpus to sort
- Transcribe - an AI model which automatically generates a text transcript for audio files
- WER transcript evaluation - a tool which examines your dataset's transcript against one auto-generated via the "Transcribe" tool to check for quality. Useful when supplying your own transcript, and checking if there are any transcription errors.
- Remove background noise - a more traditional noise removal tool, which uses a clip of just noise as reference to remove from a larger corpus of audio which consistently has matching background noise
Special thanks:
PTC001, Hector Medima, CinnaMewRoll, Grant Spielbusch, Sean Lyons, Charles Hufnagel, Kirill Akimov, Mister Lyosea, Anthony Crane, Sh1tMagnet, Rachel Wiles, Pimphat, NaMu , Kelly Roth, John Detwiler, Veks , Tempuc, ratbaby, Brennen Hahn, Benoit Jauvin-Girard, stljeffbb, DirectDogman, Ulik , Stormalize , Golem, Václav Švec, Adrilz, Hammerhead96 ., Jacob Porter, Strength, Majoros Kristóf, Michael Gill, John S., Jacob Garbe, Bart Kelsey, Idiotenschnitzel, Joe Bob Slim, Mikkel Jensen, Katherine Fishwick, Youbetterwork , Jaktt1337, David Keith vun Kannon, Bob, Imogen, Yic Zeiros, Danielle, Optimist Vamscenes, David , Hawkbat , Tom Harkness, Brandon Reynolds, Alex East, Rory Beaker, ionite, Snoutpunk, Joshua Jones, PatronGuy , flyingvelociraptor, crash blue, Yualien Lunaris, Sergey Trifonov, Anshela Asre, Leif , VGC-VR , David , Caden Black, Katsuki , Calvin Farage, hairahcaz, Just Becca, Solstice_, Max Loef, CHASE MCKELVY, dollspit, Loni Dennis, SpaceD0lphin, AcademicInside, lord parker, PConD, Joseph Paul Dennison, Krazon, Tara Cooksey, Caro Tuts, Blythe, Snud Swimp, Tako-kun, Retlaw83, Yael van Dok, FinalFrog, Donald Bass, Hazel Louise Steele, J. Quint, Lulzar, Vahzah Vulom, Ryan W, Laura Almeida, Alexandra Whitton, Zelda Hadley, Cookie , Pseudo Immortal, My Best Friend Is A Squid, Agito Rivers, Thuggysmurf, radbeetle
Скриншоты xVATrainer
Скачать xVATrainer бесплатно
Скачать xVATrainer (ссылка 100% рабочая)(* для скачивания пробудьте на сайте более 30 секунд и нужно кое-что сделать, иначе ссылка не заработает. )