Search anything...
K
Back to Docs
  • Introduction
  • Quick Start
  • Account Setup
  • AI Studio
  • Chat
  • Agents
  • Voice
  • MCP Servers
  • Workflows
  • Authentication
  • Studio API
  • Chat API
  • Agents API
  • Voice API
  • Workflows API
  • Webhooks
  • Error Codes
  • Creating Custom Agents
  • MCP Integration
  • Building Workflows
  • Prompt Engineering
  • Team Management
  • Billing & Plans
  • Usage Monitoring
  • Single-Tenant Cloud
  • Private VPC Deployment
  • SSO Configuration
  • Security Policies
  • Compliance
  • Troubleshooting
  • API Versioning
DocsFeaturesVoice

Voice

Text-to-speech and speech-to-text powered by OpenAI

The Voice feature provides powerful speech synthesis and recognition capabilities powered by OpenAI. Convert text to natural-sounding speech or transcribe audio to text with high accuracy.

Overview

Voice offers two main capabilities:

  • Text-to-Speech (TTS) - Convert text to audio
  • Speech-to-Text (STT) - Transcribe audio to text

Text-to-Speech (TTS)

What is TTS?

Text-to-Speech converts written text into natural-sounding audio. Use it for:

  • Creating voiceovers
  • Generating audio content
  • Accessibility applications
  • Audio notifications
  • Content narration

Using TTS

  1. Navigate to Voice in the sidebar
  2. Select the Text to Speech tab
  3. Enter your text in the input field
  4. Choose a voice
  5. Click Generate Speech
  6. Download or play the audio

Available Voices

VoiceDescriptionBest For
AlloyNeutral, balancedGeneral purpose
EchoWarm, conversationalPodcasts, casual content
FableNarrative, storytellingAudiobooks, stories
OnyxDeep, authoritativeProfessional, formal
NovaFriendly, energeticMarketing, upbeat content
ShimmerClear, expressiveInstructions, tutorials

Voice Examples

Alloy - "A versatile voice suitable for any content type."

Echo - Best for conversational content that feels approachable.

Fable - Perfect for narrative content and storytelling.

Onyx - Ideal for professional presentations and formal content.

Nova - Great for energetic marketing and promotional material.

Shimmer - Excellent for clear instructional content.

TTS Settings

SettingOptionsDescription
Voice6 optionsChoose the voice personality
Speed0.25x - 4.0xAdjust playback speed
FormatMP3, WAVOutput audio format

Character Limits

  • Single request: Up to 4,096 characters
  • Longer content: Split into multiple requests

Tips for Better TTS

  1. Use Punctuation - Commas and periods create natural pauses
  2. Spell Out Numbers - "Twenty-five" vs "25" for clarity
  3. Add Emphasis - Use ALL CAPS sparingly for emphasis
  4. Preview First - Generate short samples before long content

Example Text

Welcome to Girard, your all-in-one AI platform.

Today, we're excited to show you how easy it is to create
amazing content using artificial intelligence.

Let's get started!

Speech-to-Text (STT)

What is STT?

Speech-to-Text transcribes audio into written text. Use it for:

  • Meeting transcription
  • Voice notes
  • Accessibility
  • Content creation
  • Audio analysis

Using STT

  1. Navigate to Voice in the sidebar
  2. Select the Speech to Text tab
  3. Upload an audio file or record
  4. Click Transcribe
  5. Review and copy the text

Supported Audio Formats

FormatExtensionMax Size
MP3.mp325 MB
MP4.mp4, .m4a25 MB
WAV.wav25 MB
WebM.webm25 MB
MPEG.mpeg, .mpga25 MB

Recording Audio

Record directly in the browser:

  1. Click the Record button
  2. Grant microphone permission if prompted
  3. Speak clearly into your microphone
  4. Click Stop when finished
  5. Review the recording
  6. Click Transcribe

Transcription Features

  • Automatic Punctuation - Adds periods, commas, etc.
  • Speaker Detection - Identifies different speakers (beta)
  • Timestamps - Optional timestamp markers
  • Language Detection - Automatically detects language

Supported Languages

STT supports multiple languages including:

  • English (US, UK, AU)
  • Spanish
  • French
  • German
  • Italian
  • Portuguese
  • Japanese
  • Chinese (Mandarin)
  • And many more...

Tips for Better Transcription

  1. Clear Audio - Minimize background noise
  2. Speak Clearly - Enunciate words properly
  3. Quality Mic - Use a good microphone
  4. Optimal Distance - Not too close or far from mic
  5. Steady Volume - Maintain consistent volume

Workflow Examples

Creating a Podcast Intro

  1. Write your intro script
  2. Choose an engaging voice (Nova or Echo)
  3. Generate the audio
  4. Download and add to your podcast
Welcome back to Tech Talk, the podcast where we dive deep
into the latest technology trends. I'm your host, and today
we have an exciting episode lined up for you.

Transcribing an Interview

  1. Upload the interview audio file
  2. Click Transcribe
  3. Review the text for accuracy
  4. Edit any errors
  5. Export for your article

Creating Audio Notifications

  1. Write short, clear messages
  2. Use appropriate voice (Shimmer for instructions)
  3. Generate audio clips
  4. Integrate into your application
Your order has been confirmed. You'll receive a shipping
notification within 24 hours.

API Usage

TTS API Example

curl -X POST https://www.girardai.com/api/voice/tts \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello, world!",
    "voice": "alloy"
  }'

STT API Example

curl -X POST https://www.girardai.com/api/voice/stt \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "audio=@recording.mp3"

Credit Usage

ActionCredits
TTS (per 1000 characters)1
STT (per minute of audio)2

Quality & Limitations

TTS Quality

  • Natural-sounding voices
  • Consistent pronunciation
  • Emotional expression
  • Multiple languages supported

TTS Limitations

  • Max 4,096 characters per request
  • No voice cloning
  • Limited custom pronunciation
  • English voices work best with English text

STT Accuracy

  • High accuracy for clear audio
  • Handles accents well
  • Automatic punctuation
  • Good with technical terms

STT Limitations

  • Background noise affects quality
  • Very fast speech may be less accurate
  • Some specialized terms may be missed
  • Max file size 25 MB

Best Practices

For TTS

  1. Write for Speech

    • Use conversational language
    • Avoid complex sentences
    • Read aloud before generating
  2. Format Appropriately

    • Break long text into paragraphs
    • Use punctuation for pacing
    • Consider the listener
  3. Choose the Right Voice

    • Match voice to content type
    • Test multiple voices
    • Consider your audience

For STT

  1. Prepare Good Audio

    • Use quality recording equipment
    • Minimize background noise
    • Ensure clear speech
  2. Optimize Files

    • Keep under 25 MB
    • Use supported formats
    • Trim unnecessary portions
  3. Review Output

    • Check for errors
    • Verify technical terms
    • Add formatting as needed

Troubleshooting

TTS Issues

No audio generated:

  • Check character count
  • Verify text input
  • Try a different voice

Pronunciation issues:

  • Spell out problematic words
  • Use phonetic spelling
  • Break up complex terms

STT Issues

Low accuracy:

  • Improve audio quality
  • Reduce background noise
  • Speak more clearly

Upload failed:

  • Check file size (< 25 MB)
  • Verify file format
  • Try converting to MP3

Previous: Agents | Next: MCP Servers

Previous
Agents
Next
MCP Servers