January 30, 2026

AI voice changer

The moment a long-held imagination became real

Share

Mail to

Eugene, UX/UI Designer

What if I could speak using a completely different voice?

Maybe the voice of a Marvel villain. Or the voice of a character from a Studio Ghibli film. Just small, playful wishes like that. And surprisingly, a way to actually make that dream come true already exists.
It involves using a voice change model released by a Japanese developer. This model transforms your voice into a completely different one in real time as you speak. It’s not a process where you record your voice first and edit it afterward—the transformation happens instantly, at the very moment you speak.

W-okada’s voice-changer

Making real-time voice conversion possible

At the center of this experiment is an open-source voice-changer project released by W-okada. This project isn’t just a simple voice filter or a pitch-shifting tool. It is designed to run machine learning–based voice conversion models in real time, which is what makes it especially compelling.
W-okada’s voice changer analyzes audio input from your microphone on the fly, applies the characteristics of a pre-trained voice model, and outputs an entirely new voice. The key point is that all of this happens in real time—the moment you speak, the voice responds as a different character. Because of this capability, the project quickly spread among VTubers, streamers, and other creators.
In this article, I’ll walk through how to install and use W-okada’s voice changer.

How to use

Below is a summary of the guidelines provided by W-okada. You can search for "W-okada voice changer" on Google, or visit the link below to find detailed setup instructions and documentation in the GitHub repository.
Click here

Usage

This is an app for performing voice changes with MMVC and so-vits-svc. It can be used in two main ways, in order of difficulty:
Using a pre-built binary
Setting up an environment with Docker or Anaconda and using it

Usage with pre-built binaries

Both Windows and Mac versions are available on hugging face.
v2 for Windows
Please download and use vcclient_win_std_xxx.zip. You can perform voice conversion using a reasonably high-performance CPU without a GPU, or by utilizing DirectML to leverage GPUs (AMD, Nvidia). v2 supports both torch and onnx.
If you have an Nvidia GPU, you can achieve faster voice conversion by using vcclient_win_cuda_xxx.zip.
v2 for Mac (Apple Silicon)
Please download and use vcclient_mac_xxx.zip.
v1
If you are using a Windows and Nvidia GPU, please download ONNX (cpu, cuda), PyTorch (cpu, cuda).
If you are using a Windows and AMD/Intel GPU, please download ONNX (cpu, DirectML) and PyTorch (cpu, cuda). AMD/Intel GPUs are only enabled for ONNX models.
In either case, for GPU support, PyTorch and Onnxruntime are only enabled if supported.
If you are not using a GPU on Windows, please download ONNX (cpu, cuda) and PyTorch (cpu, cuda).
Download from hugging face.

Steps up to startup

Start GUI

Windows version
It is launched as follows.
1. Unzip the downloaded zip file and run start_http.bat.
2. If you have the old version, be sure to unzip it into a separate folder.
Mac version
It is launched as follows.
1. Unzip the downloaded file.
2. Next, run MMVCServerSIO by hold down the control key and clicking it (or right-click to run it). If a message appears stating that the developer cannot be verified, run it again by holding down the control key and clicking it (or right-click to run it). The terminal will open and the process will finish within a few seconds.
3. Next, execute the startHTTP.command by holding down the control key and clicking on it (or you can also right-click to run it). If a message appears stating that the developer cannot be verified, repeat the process by holding down the control key and clicking on it (or perform a right-click to run it). A terminal will open, and the launch process will begin.
4. In other words, the key is to run both MMVCServerSIO and startHTTP.command. Moreover, you need to run MMVCServerSIO first.
If you have the old version, be sure to unzip it into a separate folder.
If you are connecting remotely
Please use the startHttp.command file (Mac) or start_http.bat file (Windows) with https instead of http.
Access with Browser (currently only chrome is supported), then you can see gui.
Console
When you run a .bat file (Windows) or .command file (Mac), a screen like the following will be displayed and various data will be downloaded from the Internet at the initial start-up. Depending on your environment, it may take 1-2 minutes in many cases.
GUI
Once the download of the required data is complete, a dialog like the one below will be displayed. If you wish, press the yellow icon to reward the developer with a cup of coffee. Pressing the Start button will make the dialog disappear.

GUI overview

Use this screen to operate.

Quick start

You can immediately perform voice conversion using the data downloaded at startup.

Operation

1. To get started, click on the Model Selection area to select the model you would like to use. Once the model is loaded, the images of the characters will be displayed on the screen.
Select the microphone (input) and speaker (output) you wish to use. If you are unfamiliar, we recommend selecting the client and then selecting your microphone and speaker. (We will explain the difference between server later).
When you press the start button, the audio conversion will start after a few seconds of data loading. Try saying something into the microphone. You should be able to hear the converted audio from the speaker.

FAQ on quick start

Q1: The audio is becoming choppy and stuttering.
A1: It is possible that your PC's performance is not adequate. Try increasing the CHUNK value (as shown in Figure as A, for example, 1024). Also try setting F0 Det to dio (as shown in Figure as B).
Q2: The voice is not being converted.
A2: Refer to trouble_shoot_communication.md and identify where the problem lies, and consider a solution.
Q3: The pitch is off.
A3: Although it wasn't explained in the Quick Start, if the model is pitch-changeable, you can change it with TUNE. Please refer to the more detailed explanation below.
Q4: The window doesn't show up or the window shows up but the contents are not displayed. A console error such as electron: Failed to load URL: http://localhost:18888/ with error: ERR_CONNECTION_REFUSED is displayed.
A4: There is a possibility that the virus checker is running. Please wait or designate the folder to be excluded at your own risk.
Q5: [4716:0429/213736.103:ERROR:gpu_init.cc(523)] Passthrough is not supported, GL is disabled, ANGLE is disabled is displayed.
A5: This is an error produced by the library used by this application, but it does not have any effect, so please ignore it.
Q6: My AMD GPU isn't being used.
A6: Please use the DirectML version. Additionally, AMD GPUs are only enabled for ONNX models. You can judge this by the GPU utilization rate going up in the Performance Monitor.
Q7: onxxruntime is not launching and it's producing an error.
A7: It appears that an error occurs if the folder path contains unicode. Please extract to a path that does not use unicode (just alphanumeric characters).

Model selection

Select the model you wish to use. By pressing the "edit" button, you can edit the list of models (model slots). Please refer to the model slots editing screen for more details.

Main control

A character image loaded on the left side will be displayed. The status of real-time voice changer is overlaid on the top left of the character image. You can use the buttons and sliders on the right side to control various settings.
Status of real-time voice changer
The lag time from speaking to conversion is buf + res seconds. When adjusting, please adjust the buffer time to be longer than the res time.
vol
This is the volume after voice conversion.
buf
The length of each chunk in milliseconds when capturing audio. Shortening the CHUNK will decrease this number.
res
The time it takes to convert data with CHUNK and EXTRA added is measured. Decreasing either CHUNK or EXTRA will reduce the number.
Control
Start/stop button
Press "start" to begin voice conversion and "stop" to end it.
Pass through button
When this button is pressed, the sound inputted will be outputted as is. The sound inputted will be outputted as is.. By default, a confirmation dialog will appear when it's activated, but you can skip this dialog through the Advanced Settings.
GAIN
in: Change the volume of the inputted audio for the model.
out: Change the volume of the converted audio.
TUNE
Enter a value for how much to convert the pitch of the voice. Conversion can also be done during inference. Below are some guidelines for settings.
+12 for male voice to female voice conversion
-12 for female voice to male voice conversion
INDEX (Only for RVC)
You can specify the rate of weight assigned to the features used in training. This is only valid for models which have an index file registered. 0 uses HuBERT's output as-is and 1 assigns all weights to the original features. If the index ratio is greater than 0, it may take longer to search.
Voice
Set the speaker of the audio conversion.
Save setting
Save the settings specified. When the model is recalled again, the settings will be reflected. (Excluding some parts).
export to onnx
This output will convert the PyTorch model to ONNX. It is only valid if the loaded model is a RVC PyTorch model.
Others
The item that can be configured by the AI model used will vary. Please check the features and other information on the model manufacturer's website.

Configuration

You can review the action settings and transformation processes.
Noise
You can switch the noise cancellation feature on and off, however it is only available in Client Device Mode.
Echo: Echo Cancellation Function.
Sup1, Sup2: This is a noise suppression feature
CHUNK (Input Chunk Num)
Decide how much length to cut and convert in one conversion. The higher the value, the more efficient the conversion, but the larger the buf value, the longer the maximum time before the conversion starts. The approximate time is displayed in buff:.
EXTRA (Extra Data Length)
Determines how much past audio to include in the input when converting audio. The longer the past voice is, the better the accuracy of the conversion, but the longer the res is, the longer the calculation takes. (Probably because Transformer is a bottleneck, the calculation time will increase by the square of this length)

Model Slot Edit Screen

By pressing the edit button in the Model Slot Selection Area, you can edit the model slot.

Upload button

You can upload the model. In the upload screen, you can select the voice changer type to upload. You can go back to the Model Slot Edit Screen by pressing the back button.

Sample button

You can download a sample. You can go back to the Model Slot Edit Screen by pressing the back button.

Pre-trained voice model

Of course, you can download and use pre-trained voice models. There are various Discord channels and community boards where people share fully trained RVC voice models, which you can download and use directly.
To import a downloaded model, click the Edit button, then select Upload. After setting the appropriate Voice Changer type for the model, attach both the PyTorch model file and the corresponding index file. Once uploaded, the model will be ready to use.

Responsible Use and Ethical Considerations

As real-time voice conversion models have become increasingly common, it’s worth noting that the default model used in W-okada’s RVC-based voice changer is trained on approximately 50 hours of high-quality, open-source VCTK dataset. Because this dataset is openly licensed, you can use the default model with confidence and without copyright concerns.
However, the most important point remains: you should never train or use someone else’s voice without permission for commercial purposes. Doing so can lead to serious issues related to copyright infringement and impersonation.