MiniMax Speech 2.8 Series on Novita AI: Expressive TTS with Emotional Tone Tags for Every Voice Application

Table Of Contents

What Is the MiniMax Speech 2.8 Series?
Key Features and What’s New
Model Variants: HD vs. Turbo, Sync vs. Async
How It Compares to Speech 2.6
Who Should Use Which Variant
How to Get Started on Novita AI
Conclusion

The MiniMax Speech 2.8 series is the latest upgrade to MiniMax’s leading text-to-speech lineup, introducing emotional tone tags — inline markers like (laughs), (sighs), and (gasps) that make AI-generated speech sound genuinely human. Available in four variants on Novita AI (HD Sync, HD Async, Turbo Sync, Turbo Async), the 2.8 series keeps the same pricing as its predecessor while adding a feature set that competitors simply don’t offer at this tier. If you’re building voice agents, audiobooks, or any audio content pipeline, this is the TTS model series to evaluate right now.

What Is the MiniMax Speech 2.8 Series?

MiniMax has consistently held a top position on the Artificial Analysis Speech Arena and the Hugging Face TTS Arena, outperforming industry heavyweights like OpenAI in blind evaluations.

The Speech 2.8 series is the latest evolution of that lineage. Built on MiniMax’s autoregressive Transformer architecture with a Flow-VAE decoder, it produces speech in a learned latent space rather than relying on traditional mel-spectrogram vocoders — the result is audio that sounds remarkably natural, with proper intonation, breathing, and emotional nuance.

The headline feature of the 2.8 series: emotional tone tags. For the first time, you can embed natural interjections directly into your text input, and the model renders them as authentic human sounds within the speech flow.

Novita AI now hosts the full Speech 2.8 series, giving developers instant API access with no cold starts.

Key Features and What’s New

Emotional Tone Tags

The standout addition. Insert parenthesized tags anywhere in your text, and the model weaves them seamlessly into the generated speech:


Tag	Effect	Example
`(laughs)`	Laughter	”That’s hilarious `(laughs)`”
`(chuckle)`	Light laugh	”Good one `(chuckle)`”
`(sighs)`	Sighing	”Oh well `(sighs)`, here we go”
`(gasps)`	Surprised gasp	”Wait `(gasps)`! Really?”
`(clears throat)`	Throat clearing	”`(clears throat)` Let’s begin”
`(coughs)`	Coughing	”Excuse me `(coughs)`”
`(sneezes)`	Sneezing	”Achoo `(sneezes)`! Sorry”

This isn’t just a novelty — it solves a real problem. Until now, making TTS output sound spontaneous required post-production editing or layering sound effects manually. With tone tags, expressiveness is baked directly into the generation pipeline.

Continuous Sound Mode

A new continuous_sound parameter smooths transitions between clauses, eliminating the subtle audio “seams” that can make synthesized speech feel stitched together. This is especially noticeable in longer passages.

Inherited from the MiniMax Speech Series

The Speech 2.8 series retains the full feature set of its predecessors:

40+ languages with language_boost for enhanced minor language/dialect recognition
9 emotion presets: happy, sad, angry, fearful, disgusted, surprised, calm, fluent, whisper
Voice cloning: use system voices, cloned voices, or text-generated voices
Voice mixing: blend up to 4 voices with weighted ratios via timber_weights
Voice modification: adjust pitch, timbre, and intensity independently (range -100 to 100)
Sound effects: spacious echo, auditorium echo, telephone distortion, robotic
Audio output formats: MP3, PCM, FLAC, WAV
Sample rates: 8,000 to 44,100 Hz
Pronunciation dictionary: custom rules for brand names, acronyms, and specialized terms
Streaming output: for real-time applications
Text limit: up to 10,000 characters per request (sync), up to 1,000,000 characters (async)

Model Variants: HD vs. Turbo, Sync vs. Async

Novita AI offers four endpoints in the Speech 2.8 series:


Variant	Endpoint	Best For
Speech 2.8 HD Sync	POST``/v3/minimax-speech-2.8-hd	Premium quality, real-time — audiobooks, professional voiceovers
Speech 2.8 HD Async	`POST /v3/async/minimax-speech-2.8-hd`	Premium quality, long-form — bulk audiobook production, batch processing
Speech 2.8 Turbo Sync	`POST /v3/minimax-speech-2.8-turbo`	Low latency, real-time — voice agents, chatbots, live customer support
Speech 2.8 Turbo Async	`POST /v3/async/minimax-speech-2.8-turbo`	Fast processing, long-form — mass content generation, large-scale dubbing

HD vs. Turbo: HD delivers studio-grade audio fidelity — richer tonal detail, more nuanced emotion rendering. Turbo optimizes for speed at slightly lower fidelity, making it ideal for real-time interactive scenarios.

Sync vs. Async: Sync returns audio in the API response (up to 10,000 characters). Async accepts up to 1,000,000 characters and returns a task_id for polling — perfect for audiobooks and batch workflows.

How It Compares to Speech 2.6


Feature	Speech 2.6	Speech 2.8
Audio Quality	Excellent	Excellent
Emotional Tone Tags	❌	✅ (laughs, sighs, gasps, etc.)
Continuous Sound Mode	❌	✅
40+ Languages	✅	✅
Voice Cloning	✅	✅
Voice Mixing (up to 4)	✅	✅
Emotion Presets (9 types)	✅	✅

The upgrade path is clear: the Speech 2.8 series gives you everything Speech 2.6 does, plus emotional tone tags and continuous sound mode, at the same price point. There’s no reason not to migrate.

Pricing on Novita AI

MiniMax Speech 2.8 series on Novita AI follows the same pricing structure as the 2.6 series:


Model	Price
Speech 2.8 Turbo (Sync & Async)	$60 / 1M characters
Speech 2.8 HD (Sync & Async)	$100 / 1M characters

For the latest pricing details, visit the Novita AI Pricing Console.

Ready to try the MiniMax Speech 2.8 series? Sign up for Novita AI and get free credits to start generating expressive, human-like speech in minutes. No infrastructure setup required.

Create your Account

Who Should Use Which Variant

Imagine you’re deciding which variant fits your project. Here’s a quick guide based on real use cases:

🎙️ “I’m building a podcast or audiobook platform”

→ Speech 2.8 HD Async

You need the highest audio fidelity, and your content is long-form. The async endpoint handles up to 1M characters per request — submit an entire chapter and retrieve the audio when it’s ready. Pair tone tags with emotion presets to bring characters to life: a narrator who (sighs) at a plot twist or (laughs) at a joke makes the listening experience dramatically more engaging.

🤖 “I’m building a real-time voice agent or chatbot”

→ Speech 2.8 Turbo Sync

Latency is everything. Turbo Sync is designed for real-time response, keeping conversations feeling natural. Add a (chuckle) when your agent cracks a joke, or a (clears throat) before delivering important information — small touches that make AI interactions feel less robotic.

🎮 “I’m adding voice to game NPCs or interactive apps”

→ Speech 2.8 HD Sync

Game characters need expressive, high-quality voices. HD Sync gives you studio-grade audio in real-time. Use voice mixing to create unique character timbres, and sprinkle in tone tags for dramatic moments — a villain who (laughs) menacingly, a companion who (gasps) at discoveries.

📹 “I’m producing video voiceovers at scale”

→ Speech 2.8 Turbo Async

You need fast batch processing without breaking the bank. Turbo Async balances speed and quality for high-volume video content — explainers, social media clips, training materials. Submit scripts in bulk and retrieve polished audio files.

How to Get Started on Novita AI

Step 1: Try It in the Playground

Before writing a single line of code, explore the MiniMax Speech 2.8 series directly in the Novita AI Playground:

Novita Playground

Step 2: Get Your API Key

Sign up for a Novita AI account (free tier available)
Navigate to the API Keys section in your dashboard
Generate a new key and save it

Step 3: Make Your First API Call

MiniMax Speech 2.8 supports two calling modes:


Mode	Best For	Response Type
Sync	Real-time dialogue, instant responses	Audio returned immediately
Async	Audiobooks, long content, batch processing	Task ID → poll for result

Option A: Sync Call (Instant Audio)

Use this for short text when you need immediate results.

cURL Example:

curl --request POST \
  --url https://api.novita.ai/v3/minimax-speech-2.8-hd \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: <content-type>' \
  --data '
{
  "text": "<string>",
  "stream": true,
  "voice_modify": {
    "pitch": 123,
    "timbre": 123,
    "intensity": 123,
    "sound_effects": "<string>"
  },
  "audio_setting": {
    "format": "<string>",
    "bitrate": 123,
    "channel": 123,
    "force_cbr": true,
    "sample_rate": 123
  },
  "output_format": "<string>",
  "voice_setting": {
    "vol": 123,
    "pitch": 123,
    "speed": 123,
    "emotion": "<string>",
    "voice_id": "<string>",
    "latex_read": true,
    "text_normalization": true
  },
  "aigc_watermark": true,
  "language_boost": "<string>",
  "stream_options": {
    "exclude_aggregated_audio": true
  },
  "timber_weights": [
    {
      "weight": 123,
      "voice_id": "<string>"
    }
  ],
  "subtitle_enable": true,
  "continuous_sound": true,
  "pronunciation_dict": {
    "tone": [
      {}
    ]
  }
}
'

Python Example:

import requests

url = "https://api.novita.ai/v3/minimax-speech-2.8-hd"

payload = {
    "text": "<string>",
    "stream": True,
    "voice_modify": {
        "pitch": 123,
        "timbre": 123,
        "intensity": 123,
        "sound_effects": "<string>"
    },
    "audio_setting": {
        "format": "<string>",
        "bitrate": 123,
        "channel": 123,
        "force_cbr": True,
        "sample_rate": 123
    },
    "output_format": "<string>",
    "voice_setting": {
        "vol": 123,
        "pitch": 123,
        "speed": 123,
        "emotion": "<string>",
        "voice_id": "<string>",
        "latex_read": True,
        "text_normalization": True
    },
    "aigc_watermark": True,
    "language_boost": "<string>",
    "stream_options": { "exclude_aggregated_audio": True },
    "timber_weights": [
        {
            "weight": 123,
            "voice_id": "<string>"
        }
    ],
    "subtitle_enable": True,
    "continuous_sound": True,
    "pronunciation_dict": { "tone": [{}] }
}
headers = {
    "Content-Type": "<content-type>",
    "Authorization": "<authorization>"
}

response = requests.post(url, json=payload, headers=headers)

print(response.text)

Option B: Async Call (For Long Text)

Use this for long text, or when you want to batch multiple requests.

1. Submit the task

cURL

curl --request POST \
  --url https://api.novita.ai/v3/async/minimax-speech-2.8-hd \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: <content-type>' \
  --data '
{
  "text": "<string>",
  "text_file_id": 123,
  "voice_modify": {
    "pitch": 123,
    "timbre": 123,
    "intensity": 123,
    "sound_effects": "<string>"
  },
  "audio_setting": {
    "format": "<string>",
    "bitrate": 123,
    "channel": 123,
    "audio_sample_rate": 123
  },
  "voice_setting": {
    "vol": 123,
    "pitch": 123,
    "speed": 123,
    "emotion": "<string>",
    "voice_id": "<string>",
    "english_normalization": true
  },
  "aigc_watermark": true,
  "language_boost": "<string>",
  "continuous_sound": true,
  "pronunciation_dict": {
    "tone": [
      {}
    ]
  }
}
'

Python

import requests

url = "https://api.novita.ai/v3/async/minimax-speech-2.8-hd"

payload = {
    "text": "<string>",
    "text_file_id": 123,
    "voice_modify": {
        "pitch": 123,
        "timbre": 123,
        "intensity": 123,
        "sound_effects": "<string>"
    },
    "audio_setting": {
        "format": "<string>",
        "bitrate": 123,
        "channel": 123,
        "audio_sample_rate": 123
    },
    "voice_setting": {
        "vol": 123,
        "pitch": 123,
        "speed": 123,
        "emotion": "<string>",
        "voice_id": "<string>",
        "english_normalization": True
    },
    "aigc_watermark": True,
    "language_boost": "<string>",
    "continuous_sound": True,
    "pronunciation_dict": { "tone": [{}] }
}
headers = {
    "Content-Type": "<content-type>",
    "Authorization": "<authorization>"
}

response = requests.post(url, json=payload, headers=headers)

print(response.text)

2. Poll for completion

cURL

 curl --request GET \
  --url https://api.novita.ai/v3/async/task-result \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: <content-type>'

Python

import requests

url = "https://api.novita.ai/v3/async/task-result"

headers = {
    "Content-Type": "<content-type>",
    "Authorization": "<authorization>"
}

response = requests.get(url, headers=headers)

print(response.text)

Step 4: Explore Advanced Features

Once you have the basics working, try these:

Voice mixing: Blend up to 4 voices for a unique timbre using timber_weights
Sound effects: Add spacious_echo or robotic filter via voice_modify.sound_effects
Pronunciation dictionary: Define custom pronunciation rules for brand names and acronyms
Streaming mode: Set "stream": true for real-time audio delivery in interactive apps
Voice modification: Fine-tune pitch, timbre, and intensity in voice_modify (range -100 to 100 each)

Conclusion

The MiniMax Speech 2.8 series brings a meaningful upgrade to an already top-tier TTS model family. The addition of emotional tone tags and continuous sound mode addresses two of the most common pain points in AI voice synthesis: making speech sound spontaneous, and eliminating unnatural transitions between clauses.

With four variants available on Novita AI — HD and Turbo, each in Sync and Async modes — the series covers every use case, from real-time voice agents to large-scale audiobook production. The pricing remains consistent with the 2.6 series, so you get strictly more capability for the same cost.

If you’re currently using Speech 2.6 or evaluating TTS options, the Speech 2.8 series is a straightforward upgrade. Try it in the Novita AI Playground or get started with the API today.

Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing affordable and reliable GPU cloud for building and scaling.

Frequently Asked Questions

Which variant should I choose: HD or Turbo?

Choose HD when audio quality is the priority — audiobooks, professional voiceovers, premium content.
Choose Turbo when latency matters — voice agents, chatbots, real-time interactive applications. Both support the full feature set including tone tags.

When should I use Sync vs. Async?

Use Sync for real-time, short-to-medium text (up to 10,000 characters).
Use Async for long-form content (up to 1,000,000 characters) or batch processing workflows.

Does Novita AI offer a free tier for testing?

Yes. Sign up for a Novita AI account to receive free credits, which you can use to test the Speech 2.8 series and other models in the Playground or via API.

MiniMax Speech 2.8 Series on Novita AI: Expressive TTS with Emotional Tone Tags for Every Voice Application

What Is the MiniMax Speech 2.8 Series?

Key Features and What’s New

Emotional Tone Tags

Continuous Sound Mode

Inherited from the MiniMax Speech Series

Model Variants: HD vs. Turbo, Sync vs. Async

How It Compares to Speech 2.6

Pricing on Novita AI

Who Should Use Which Variant

🎙️ “I’m building a podcast or audiobook platform”

🤖 “I’m building a real-time voice agent or chatbot”

🎮 “I’m adding voice to game NPCs or interactive apps”

📹 “I’m producing video voiceovers at scale”

How to Get Started on Novita AI

Step 1: Try It in the Playground

Step 2: Get Your API Key

Step 3: Make Your First API Call

Option A: Sync Call (Instant Audio)

Option B: Async Call (For Long Text)

1. Submit the task

2. Poll for completion

Step 4: Explore Advanced Features

Conclusion

Frequently Asked Questions

Product

RESOURCES

Partners

Company

What Is the MiniMax Speech 2.8 Series?

Key Features and What’s New

Emotional Tone Tags

Continuous Sound Mode

Inherited from the MiniMax Speech Series

Model Variants: HD vs. Turbo, Sync vs. Async

How It Compares to Speech 2.6

Pricing on Novita AI

Who Should Use Which Variant

🎙️ “I’m building a podcast or audiobook platform”

🤖 “I’m building a real-time voice agent or chatbot”

🎮 “I’m adding voice to game NPCs or interactive apps”

📹 “I’m producing video voiceovers at scale”

How to Get Started on Novita AI

Step 1: Try It in the Playground

Step 2: Get Your API Key

Step 3: Make Your First API Call

Option A: Sync Call (Instant Audio)

Option B: Async Call (For Long Text)

1. Submit the task

2. Poll for completion

Step 4: Explore Advanced Features

Conclusion

Frequently Asked Questions

Related Posts

Product

RESOURCES

Partners

Company