GLM TTS and ASR API Quick Start

Table Of Contents

When to Use This Quick Start
Prerequisites
GLM TTS Quick Start
GLM ASR Quick Start
GLM Voice Clone Quick Start
Pricing and Usage Notes
FAQ

This guide gets you from API key to working audio with the GLM audio APIs — GLM TTS for text-to-speech, GLM ASR for transcription, and GLM Voice Clone for custom voice synthesis. All three are synchronous REST endpoints with no polling or webhook step. If you build voice features, transcription pipelines, or Chinese-language audio applications, this is the fastest path to a working integration.

When to Use This Quick Start

Use this guide if you need to:

Convert text to speech with Chinese-optimized voices via POST /v3/glm-tts
Transcribe .wav or .mp3 audio files via POST /v3/glm-asr
Clone a voice from a short audio sample and synthesize new speech via POST /v3/glm-tts-voice-clone

All endpoints are available through the Novita AI API at https://api.novita.ai.

Prerequisites

A Novita AI account. Get your API key from the Novita AI console.
curl for the shell examples.
Python 3.8+ with requests installed for the Python examples.

Set your key as an environment variable:

export NOVITA_API_KEY="your_api_key_here"

GLM TTS Quick Start

Endpoint: POST https://api.novita.ai/v3/glm-tts

Converts text up to 1024 characters into speech. The response is binary audio — write it directly to a file.

Parameters

Parameter	Type	Default	Notes
`input`	string	—	Required. Up to 1024 characters.
`voice`	string	`tongtong`	System voice ID or cloned voice name.
`speed`	number	1.0	Range: 0.5–2.0
`volume`	number	1.0	Range: 0–10
`response_format`	string	`pcm`	`wav` or `pcm`. WAV includes a standard audio header; PCM is raw bytes at 24000 Hz.
`watermark_enabled`	boolean	true	Set `false` only if your account has watermark removal enabled.

System voices

Voice ID	Display name
`tongtong`	Tongtong (default)
`chuichui`	Chuichui
`xiaochen`	Xiaochen
`jam`	Dongdong Zoo – Jam
`kazi`	Dongdong Zoo – Kazi
`douji`	Dongdong Zoo – Douji
`luodo`	Dongdong Zoo – Luodo

curl

curl -s -X POST https://api.novita.ai/v3/glm-tts \
  -H "Authorization: Bearer $NOVITA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "你好，欢迎使用 Novita AI 语音合成接口。",
    "voice": "tongtong",
    "speed": 1.0,
    "volume": 5,
    "response_format": "wav"
  }' \
  --output output.wav

Python

import requests, os

response = requests.post(
    "https://api.novita.ai/v3/glm-tts",
    headers={
        "Authorization": f"Bearer {os.environ['NOVITA_API_KEY']}",
        "Content-Type": "application/json",
    },
    json={
        "input": "你好，欢迎使用 Novita AI 语音合成接口。",
        "voice": "tongtong",
        "speed": 1.0,
        "volume": 5,
        "response_format": "wav",
    },
)
response.raise_for_status()
with open("output.wav", "wb") as f:
    f.write(response.content)

Limits: 1024 characters per request. For longer texts, split at sentence boundaries and concatenate the audio. Recommended playback sample rate: 24000 Hz. Voice names are case-sensitive.

GLM ASR Quick Start

Endpoint: POST https://api.novita.ai/v3/glm-asr

Transcribes .wav or .mp3 audio using the GLM-ASR-2512 model. Audio can be passed as a URL or base64 string. Constraints: file ≤ 25 MB, duration ≤ 30 seconds.

Parameters

Parameter	Type	Notes
`file`	string	Required. URL or base64-encoded audio. `.wav` or `.mp3` only.
`prompt`	string	Optional. Prior transcript context, up to 8000 characters. Use for chunked transcription continuity.
`hotwords`	array	Optional. Up to 100 domain-specific terms for improved recognition accuracy.

curl (URL input)

curl -s -X POST https://api.novita.ai/v3/glm-asr \
  -H "Authorization: Bearer $NOVITA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "file": "https://example.com/sample.wav",
    "hotwords": ["Novita", "GLM"]
  }'

Python (base64 input)

import requests, base64, os

with open("sample.wav", "rb") as f:
    audio_b64 = base64.b64encode(f.read()).decode()

response = requests.post(
    "https://api.novita.ai/v3/glm-asr",
    headers={
        "Authorization": f"Bearer {os.environ['NOVITA_API_KEY']}",
        "Content-Type": "application/json",
    },
    json={"file": audio_b64, "hotwords": ["Novita", "GLM"]},
)
response.raise_for_status()
print(response.json()["text"])

Response

{ "text": "你好，欢迎使用 Novita AI 语音合成接口。" }

Handling audio longer than 30 seconds: Split into ≤30-second chunks and chain requests using the prompt field to carry transcript context between chunks:

payload = {
    "file": next_chunk_b64,
    "prompt": previous_transcript,
}

GLM Voice Clone Quick Start

Endpoint: POST https://api.novita.ai/v3/glm-tts-voice-clone

Takes a sample audio clip and synthesizes new speech in that voice. Assign a name to the cloned voice; reuse it as the voice parameter in GLM TTS without re-uploading the sample.

Parameters

Parameter	Type	Notes
`audio_url`	string	Required. URL to sample audio. ≤ 10 MB, 3–30 s recommended.
`input`	string	Required. Text to synthesize in the cloned voice.
`voice_name`	string	Required. Unique name you assign to this voice.
`text`	string	Optional. Transcript of the sample audio — improves clone quality.

curl

curl -s -X POST https://api.novita.ai/v3/glm-tts-voice-clone \
  -H "Authorization: Bearer $NOVITA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "audio_url": "https://example.com/voice-sample.wav",
    "input": "这是用克隆声音合成的语音示例。",
    "voice_name": "my-custom-voice",
    "text": "示例音频的文字内容"
  }'

Python

import requests, os

response = requests.post(
    "https://api.novita.ai/v3/glm-tts-voice-clone",
    headers={
        "Authorization": f"Bearer {os.environ['NOVITA_API_KEY']}",
        "Content-Type": "application/json",
    },
    json={
        "audio_url": "https://example.com/voice-sample.wav",
        "input": "这是用克隆声音合成的语音示例。",
        "voice_name": "my-custom-voice",
        "text": "示例音频的文字内容",
    },
)
response.raise_for_status()
data = response.json()
print(f"Voice timbre: {data['voice']}")
print(f"Audio URL: {data['audio_url']}")

Response

{
  "voice": "my-custom-voice-timbre-id",
  "audio_url": "https://..."
}

The voice value returned here can be passed directly to the GLM TTS voice parameter for future synthesis calls.

Tips: Use a clean 5–15 second sample without background noise. Provide the text transcript of the sample to improve phoneme alignment.

Pricing and Usage Notes

Pricing as of June 2026, from novita.ai/pricing:

API	Price
GLM TTS	$0.28 / 1M characters
GLM ASR	$0.021 / 1M characters
GLM Voice Clone	$0.83 / 1M characters

GLM TTS is well-suited for high-volume Chinese-language synthesis where cost matters. If you need broader multilingual TTS across 30+ languages or async processing of long-form content, MiniMax Speech is the alternative to evaluate.

FAQ

What languages does GLM TTS support? Optimized for Chinese (Mandarin). Handles mixed Chinese-English input. For broad multilingual coverage, use MiniMax Speech instead.

Can I reuse a cloned voice with GLM TTS? Yes. Pass the voice_name you assigned in the Voice Clone call as the voice parameter in GLM TTS. No need to re-upload the sample.

Why is there a 30-second limit on GLM ASR? The model processes audio synchronously. Split longer recordings at sentence boundaries and chain requests using the prompt field to carry context.

What is the difference between pcm and wav output? PCM is raw audio bytes at 24000 Hz with no header. WAV wraps the same audio in a standard container most libraries can read directly. Use WAV unless your pipeline requires raw PCM.

Does setting watermark_enabled: false always work? Only if you have completed watermark removal in your account settings. The flag is otherwise ignored.

GLM TTS and ASR API Quick Start

When to Use This Quick Start

Prerequisites

GLM TTS Quick Start

Parameters

System voices

curl

Python

GLM ASR Quick Start

Parameters

curl (URL input)

Python (base64 input)

Response

GLM Voice Clone Quick Start

Parameters

curl

Python

Response

Pricing and Usage Notes

FAQ

Recommended Articles

Product

RESOURCES

Partners

Company

When to Use This Quick Start

Prerequisites

GLM TTS Quick Start

Parameters

System voices

curl

Python

GLM ASR Quick Start

Parameters

curl (URL input)

Python (base64 input)

Response

GLM Voice Clone Quick Start

Parameters

curl

Python

Response

Pricing and Usage Notes

FAQ

Recommended Articles

Related Posts

Product

RESOURCES

Partners

Company