The Voice feature provides powerful speech synthesis and recognition capabilities powered by OpenAI. Convert text to natural-sounding speech or transcribe audio to text with high accuracy.
Overview
Voice offers two main capabilities:
- Text-to-Speech (TTS) - Convert text to audio
- Speech-to-Text (STT) - Transcribe audio to text
Text-to-Speech (TTS)
What is TTS?
Text-to-Speech converts written text into natural-sounding audio. Use it for:
- Creating voiceovers
- Generating audio content
- Accessibility applications
- Audio notifications
- Content narration
Using TTS
- Navigate to Voice in the sidebar
- Select the Text to Speech tab
- Enter your text in the input field
- Choose a voice
- Click Generate Speech
- Download or play the audio
Available Voices
| Voice | Description | Best For |
|---|---|---|
| Alloy | Neutral, balanced | General purpose |
| Echo | Warm, conversational | Podcasts, casual content |
| Fable | Narrative, storytelling | Audiobooks, stories |
| Onyx | Deep, authoritative | Professional, formal |
| Nova | Friendly, energetic | Marketing, upbeat content |
| Shimmer | Clear, expressive | Instructions, tutorials |
Voice Examples
Alloy - "A versatile voice suitable for any content type."
Echo - Best for conversational content that feels approachable.
Fable - Perfect for narrative content and storytelling.
Onyx - Ideal for professional presentations and formal content.
Nova - Great for energetic marketing and promotional material.
Shimmer - Excellent for clear instructional content.
TTS Settings
| Setting | Options | Description |
|---|---|---|
| Voice | 6 options | Choose the voice personality |
| Speed | 0.25x - 4.0x | Adjust playback speed |
| Format | MP3, WAV | Output audio format |
Character Limits
- Single request: Up to 4,096 characters
- Longer content: Split into multiple requests
Tips for Better TTS
- Use Punctuation - Commas and periods create natural pauses
- Spell Out Numbers - "Twenty-five" vs "25" for clarity
- Add Emphasis - Use ALL CAPS sparingly for emphasis
- Preview First - Generate short samples before long content
Example Text
Welcome to Girard, your all-in-one AI platform.
Today, we're excited to show you how easy it is to create
amazing content using artificial intelligence.
Let's get started!
Speech-to-Text (STT)
What is STT?
Speech-to-Text transcribes audio into written text. Use it for:
- Meeting transcription
- Voice notes
- Accessibility
- Content creation
- Audio analysis
Using STT
- Navigate to Voice in the sidebar
- Select the Speech to Text tab
- Upload an audio file or record
- Click Transcribe
- Review and copy the text
Supported Audio Formats
| Format | Extension | Max Size |
|---|---|---|
| MP3 | .mp3 | 25 MB |
| MP4 | .mp4, .m4a | 25 MB |
| WAV | .wav | 25 MB |
| WebM | .webm | 25 MB |
| MPEG | .mpeg, .mpga | 25 MB |
Recording Audio
Record directly in the browser:
- Click the Record button
- Grant microphone permission if prompted
- Speak clearly into your microphone
- Click Stop when finished
- Review the recording
- Click Transcribe
Transcription Features
- Automatic Punctuation - Adds periods, commas, etc.
- Speaker Detection - Identifies different speakers (beta)
- Timestamps - Optional timestamp markers
- Language Detection - Automatically detects language
Supported Languages
STT supports multiple languages including:
- English (US, UK, AU)
- Spanish
- French
- German
- Italian
- Portuguese
- Japanese
- Chinese (Mandarin)
- And many more...
Tips for Better Transcription
- Clear Audio - Minimize background noise
- Speak Clearly - Enunciate words properly
- Quality Mic - Use a good microphone
- Optimal Distance - Not too close or far from mic
- Steady Volume - Maintain consistent volume
Workflow Examples
Creating a Podcast Intro
- Write your intro script
- Choose an engaging voice (Nova or Echo)
- Generate the audio
- Download and add to your podcast
Welcome back to Tech Talk, the podcast where we dive deep
into the latest technology trends. I'm your host, and today
we have an exciting episode lined up for you.
Transcribing an Interview
- Upload the interview audio file
- Click Transcribe
- Review the text for accuracy
- Edit any errors
- Export for your article
Creating Audio Notifications
- Write short, clear messages
- Use appropriate voice (Shimmer for instructions)
- Generate audio clips
- Integrate into your application
Your order has been confirmed. You'll receive a shipping
notification within 24 hours.
API Usage
TTS API Example
curl -X POST https://www.girardai.com/api/voice/tts \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"text": "Hello, world!",
"voice": "alloy"
}'
STT API Example
curl -X POST https://www.girardai.com/api/voice/stt \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "audio=@recording.mp3"
Credit Usage
| Action | Credits |
|---|---|
| TTS (per 1000 characters) | 1 |
| STT (per minute of audio) | 2 |
Quality & Limitations
TTS Quality
- Natural-sounding voices
- Consistent pronunciation
- Emotional expression
- Multiple languages supported
TTS Limitations
- Max 4,096 characters per request
- No voice cloning
- Limited custom pronunciation
- English voices work best with English text
STT Accuracy
- High accuracy for clear audio
- Handles accents well
- Automatic punctuation
- Good with technical terms
STT Limitations
- Background noise affects quality
- Very fast speech may be less accurate
- Some specialized terms may be missed
- Max file size 25 MB
Best Practices
For TTS
-
Write for Speech
- Use conversational language
- Avoid complex sentences
- Read aloud before generating
-
Format Appropriately
- Break long text into paragraphs
- Use punctuation for pacing
- Consider the listener
-
Choose the Right Voice
- Match voice to content type
- Test multiple voices
- Consider your audience
For STT
-
Prepare Good Audio
- Use quality recording equipment
- Minimize background noise
- Ensure clear speech
-
Optimize Files
- Keep under 25 MB
- Use supported formats
- Trim unnecessary portions
-
Review Output
- Check for errors
- Verify technical terms
- Add formatting as needed
Troubleshooting
TTS Issues
No audio generated:
- Check character count
- Verify text input
- Try a different voice
Pronunciation issues:
- Spell out problematic words
- Use phonetic spelling
- Break up complex terms
STT Issues
Low accuracy:
- Improve audio quality
- Reduce background noise
- Speak more clearly
Upload failed:
- Check file size (< 25 MB)
- Verify file format
- Try converting to MP3
Previous: Agents | Next: MCP Servers