MiniMax Speech 02 on Novita AI: Models, Features, and Quick Start Guide

minimax speech 02 on novita ai

Novita AI provides four distinct models in the MiniMax Speech 02 series. Each model is designed to suit different scenarios, whether you need studio-quality narration or fast, interactive speech.

In the following sections, we’ll explore the differences between these models in more detail, helping you choose the best option for your specific use case.

Minimax Speech 02 Algorithm

What Does ’02’ Refer to?

TermMeaning
02Refers to the second generation of the MiniMax Speech model series.
TTSText-to-Speech: Technology that converts written text into spoken audio.
AsyncAsynchronous: The speech is generated in the background and delivered once it’s ready, useful for long texts.
HDHigh Definition/High Fidelity: Focuses on producing audio that is very realistic and high-quality.
TurboTurbo (Low Latency): Prioritizes speed and quick response, making it ideal for real-time interactions.

Minimax Speech 02 Models Comparsion

Model / API NameSuitable ScenariosAdvantagesSupported Text Length
speech‑02‑hd Text to SpeechShort text, real-time dialogueExtremely high audio quality and naturalnessUp to ~5,000 characters
speech‑02‑hd Async Long TTSAudiobooks, long-form contentSupports long texts with the same audio qualityUp to hundreds of thousands or millions of characters, processed in queue
speech‑02‑turbo Text to SpeechReal-time voice interactionFast response, low latencyUp to ~5,000 characters
speech‑02‑turbo Async Long TTSLong text in real-time interactionsBalances speed and scalabilityAlso supports long texts, with faster processing than synchronous mode

Minimax Speech 02 Customization Options

  • Extensive Voice Library:
    Access a library of over 300 authentic and natural-sounding voices, supporting true-to-life delivery in Cantonese, Mandarin Chinese, Japanese, Korean, and many other major languages.
  • Advanced Voice Controls:
    Effortlessly adjust emotion, volume, speaking rate, and output format for every voice to perfectly match your needs.
  • Innovative Voice Mixing:
    Combine multiple existing voices to create entirely new and unique vocal profiles.
  • Multiple Audio Formats:
    Output audio in a variety of formats, including FLAC, WAV, MP3, and PCM, for maximum compatibility.
  • Real-Time Streaming:
    Enjoy instant audio delivery with seamless real-time streaming, ensuring smooth integration into your applications.
  • High Concurrency Support:
    Robust infrastructure guarantees reliable performance, even under heavy workloads and high request volumes.

How does Minimax Improve Speech Synthesis?

How does Minimax Improve Speech Synthesis?

Driven by Innovations, MiniMax Ranks First

minimax speech02 ranks 1
From Artificial Analysis Arena

Minimax Speech 02 for Real-Time or Robust Speech Recognization

Scenario TypeCore ObjectiveKey Model CapabilitiesSpeech‑02 Adaptation Method
Real-time Speech SynthesisFast response and streaming playbackUltra-low latency, real-time output, natural timbre and intonation, multilingual supportSpeech‑02‑Turbo generates audio instantly, supports up to about 5,000 characters for streaming output with minimal latency, ideal for conversational applications
Robust Speech Recognition (for ASR)Synthesized speech must be clear, recognizable, and high qualityExceptional speech clarity, accurate pronunciation with low error rate, good rhythm and intonationSpeech‑02‑HD is used to generate high-fidelity speech, with low word error rate, high speaker similarity, and excellent audio quality

How to Access Minimax Speech 02?

Step 1: Log In and Access the Model Library

Log in to your account and click on the Model Library button.

Log In and Access the Model Library

Step 2: Choose Your Model

Browse through the available options and select the model that suits your needs.

Step 2: Choose Your Model

Step 3: Start Your Free Trial

Begin your free trial to explore the capabilities of the selected model.

Step 3: Start Your Free Trial

Click “Try it” to see what each field represents and to choose values to customize your API settings.

Click "Try it" to see what each field represents and to choose values to customize your API settings.

Step 4: Get Your API Key

To authenticate with the API, we will provide you with a new API key. Entering the “Settings“ page, you can copy the API key as indicated in the image.

get api key

Step 5: Install the API

After installation, import the necessary libraries into your development environment. Initialize the API with your API key to start interacting with Novita AI LLM. This is an example of using chat completions API for python users.

import requests

url = "https://api.novita.ai/v3/minimax-speech-02-hd"

payload = {
    "text": "<string>",
    "voice_setting": {
        "speed": 123,
        "vol": 123,
        "pitch": 123,
        "voice_id": "<string>",
        "emotion": "<string>",
        "english_normalization": True
    },
    "audio_setting": {
        "sample_rate": 123,
        "bitrate": 123,
        "format": "<string>",
        "channel": 123
    },
    "pronunciation_dict": { "tone": [{}] },
    "timber_weights": [
        {
            "voice_id": "<string>",
            "weight": 123
        }
    ],
    "stream": True,
    "language_boost": "<string>",
    "output_format": "<string>"
}
headers = {
    "Content-Type": "<content-type>",
    "Authorization": "<authorization>"
}

response = requests.post(url, json=payload, headers=headers)

print(response.json())

Step 6: Change to Another Model

You can click the sidebar in the upper left corner to select different audio models. Novita AI also offers voice cloning capabilities.

You can click the sidebar in the upper left corner to select different audio models. Novita AI also offers voice cloning capabilities.

MiniMax Speech 02 stands out as a top-performing text-to-speech solution, offering both high-fidelity and low-latency audio generation. With extensive voice options, advanced controls, and robust support for real-time and large-scale applications, MiniMax Speech 02 fits a wide range of speech synthesis scenarios. Its innovative features and easy customization have helped it earn first place among speech AI models.

Frequently Asked Questions

What does “02” mean in MiniMax Speech 02?

“02” refers to the second generation of the MiniMax Speech model series, representing significant improvements in quality and speed.

Can MiniMax Speech 02 handle long texts?

Yes. The Async models (HD Async and Turbo Async) are designed to process long-form content, such as audiobooks, supporting up to millions of characters.

Does it support real-time streaming?

Yes. MiniMax Speech 02’s Turbo mode offers real-time streaming with ultra-low latency, perfect for interactive or conversational apps.

Novita AI is the All-in-one cloud platform that empowers your AI ambitions. Integrated APIs, serverless, GPU Instance — the cost-effective tools you need. Eliminate infrastructure, start free, and make your AI vision a reality.


Discover more from Novita

Subscribe to get the latest posts sent to your email.

Leave a Comment

Scroll to Top

Discover more from Novita

Subscribe now to keep reading and get access to the full archive.

Continue reading