A voice studio for presets, cloning, and design.
Qwen3 TTS supports preset voices, multilingual generation, reference voice cloning, and prompt-based voice design.
Three ways to create speech
Multilingual control
Use auto language detection or choose a supported language explicitly.
Preset voices
Start quickly with built-in Qwen3 voices for different use cases.
Voice design
Create a voice direction from descriptive text when a preset is not enough.
Generate speech with a selected Qwen3 voice.
Pick a language and voice, then add optional style direction for tone and delivery.
Built for production voice workflows.
Use these pages as a focused audio workspace for scripts, product media, education content, and repeatable brand narration.
Multilingual content
Generate speech for videos, apps, courses, and support content across supported languages.
Preset voice production
Use built-in voices for repeatable narration when teams need a stable voice choice and quick turnaround.
Custom voice direction
Design a new voice with natural language when preset voices do not match the brand or content format.
Reference voice cloning
Upload a voice sample to create speech that follows the reference speaker for campaigns or character work.
A simple path from script to usable audio.
The generation panel handles task creation, status polling, preview, and download. The supporting content helps you prepare better inputs before submitting.
Pick a creation path
Start with a preset voice, clone from reference audio, or describe a new voice in Voice Design.
Set language behavior
Use auto detection for mixed scripts, or choose a language explicitly when pronunciation needs more control.
Tune the delivery
Add style instructions or voice descriptions to guide tone, emotion, role, and production context.
Mode comparison
| Mode | Best for | Inputs | Output |
|---|---|---|---|
| Text to Speech | Fast production with preset voices | Text, language, voice, style instruction | Preset voice speech audio |
| Voice Clone | Reference speaker consistency | Text, reference audio, optional transcript, language | Cloned speech audio |
| Voice Design | Creating a voice direction from a prompt | Text, voice description, language | Designed voice speech audio |
Input guide
Language selection
Choose auto for mixed-language scripts. Pick a specific language for predictable pronunciation.
Preset voices
Try several built-in voices before designing a custom voice; preset voices are the fastest path to stable output.
Voice design prompt
Describe speaker age, tone, accent, pacing, scene, and intended audience in one compact prompt.
Copy-ready prompts for stronger first results.
Use these examples as starting points, then adapt them to your project, audience, and delivery style.
Warm, confident, and conversational, suitable for a SaaS onboarding video.
A calm middle-aged female host with neutral accent, bright tone, and professional podcast pacing.
Keep each sentence clean and avoid unnecessary symbols when mixing English, Chinese, and product names.
Common questions before generating.
A short reference for choosing the right mode and preparing inputs before sending a task to the backend.
Which Qwen3 mode should I start with?
Start with Text to Speech if a preset voice fits. Use Voice Clone for a reference speaker, and Voice Design when you need a custom voice direction.
What should I write in a voice design prompt?
Include speaker age, tone, accent, pace, emotion, and usage context. A precise prompt usually gives more controllable results.
Can Qwen3 handle multiple languages?
Yes. The page exposes auto language detection and supported explicit language choices from the backend request model.