Search anything...
K
Back to Docs
  • Introduction
  • Quick Start
  • Account Setup
  • AI Studio
  • Chat
  • Agents
  • Voice
  • MCP Servers
  • Workflows
  • Authentication
  • Studio API
  • Chat API
  • Agents API
  • Voice API
  • Workflows API
  • Webhooks
  • Error Codes
  • Creating Custom Agents
  • MCP Integration
  • Building Workflows
  • Prompt Engineering
  • Team Management
  • Billing & Plans
  • Usage Monitoring
  • Single-Tenant Cloud
  • Private VPC Deployment
  • SSO Configuration
  • Security Policies
  • Compliance
  • Troubleshooting
  • API Versioning
DocsAPI ReferenceVoice API

Voice API

API reference for text-to-speech and speech-to-text

The Voice API provides text-to-speech (TTS) and speech-to-text (STT) capabilities powered by OpenAI. Convert text to natural speech or transcribe audio to text.

Base URL

https://www.girardai.com/api/voice

Authentication

All requests require authentication via Bearer token:

Authorization: Bearer sk_live_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Text-to-Speech (TTS)

Convert text to natural-sounding speech.

Endpoint: POST /api/voice/tts

Request Body

FieldTypeRequiredDescription
textstringYesText to convert to speech
voicestringNoVoice ID (default: "alloy")
speednumberNoSpeed multiplier (0.25-4.0, default: 1.0)

Available Voices

Voice IDDescriptionCharacter
alloyNeutral, balancedVersatile
echoWarm, conversationalFriendly
fableNarrative, storytellingDramatic
onyxDeep, authoritativeProfessional
novaFriendly, energeticUpbeat
shimmerClear, expressiveInstructional

Example Request

curl -X POST https://www.girardai.com/api/voice/tts \
  -H "Authorization: Bearer sk_live_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Welcome to Girard, your all-in-one AI platform.",
    "voice": "nova",
    "speed": 1.0
  }'

Response

Returns audio file (MP3):

Content-Type: audio/mpeg
Content-Disposition: attachment; filename="speech.mp3"

[Binary audio data]

Response Headers

HeaderDescription
Content-Typeaudio/mpeg
Content-LengthFile size in bytes
X-Credits-UsedCredits consumed

JavaScript Example

async function textToSpeech(text, voice = 'alloy') {
  const response = await fetch('https://www.girardai.com/api/voice/tts', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.GIRARDAI_API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({ text, voice }),
  });

  if (!response.ok) {
    throw new Error('TTS request failed');
  }

  // Get audio blob
  const audioBlob = await response.blob();

  // Create audio URL
  const audioUrl = URL.createObjectURL(audioBlob);

  return audioUrl;
}

// Usage
const audioUrl = await textToSpeech('Hello, world!', 'nova');
const audio = new Audio(audioUrl);
audio.play();

Python Example

import requests
import os

def text_to_speech(text, voice='alloy', speed=1.0, output_file='speech.mp3'):
    """Convert text to speech and save to file."""
    response = requests.post(
        'https://www.girardai.com/api/voice/tts',
        headers={
            'Authorization': f'Bearer {os.environ["GIRARDAI_API_KEY"]}',
            'Content-Type': 'application/json',
        },
        json={
            'text': text,
            'voice': voice,
            'speed': speed,
        }
    )

    if response.status_code == 200:
        with open(output_file, 'wb') as f:
            f.write(response.content)
        return output_file
    else:
        raise Exception(f'TTS failed: {response.text}')

# Usage
text_to_speech(
    'Welcome to our product demo.',
    voice='nova',
    output_file='welcome.mp3'
)

Speech-to-Text (STT)

Transcribe audio to text.

Endpoint: POST /api/voice/stt

Request

Send audio file as multipart form data:

FieldTypeRequiredDescription
audiofileYesAudio file to transcribe
languagestringNoLanguage code (auto-detected if not specified)

Supported Audio Formats

FormatExtensionsMax Size
MP3.mp325 MB
MP4.mp4, .m4a25 MB
WAV.wav25 MB
WebM.webm25 MB
MPEG.mpeg, .mpga25 MB

Example Request

curl -X POST https://www.girardai.com/api/voice/stt \
  -H "Authorization: Bearer sk_live_xxx" \
  -F "audio=@recording.mp3"

Response

{
  "success": true,
  "data": {
    "text": "Hello, this is a test recording of the speech to text feature.",
    "duration": 4.5,
    "language": "en"
  }
}
FieldTypeDescription
textstringTranscribed text
durationnumberAudio duration in seconds
languagestringDetected language code

JavaScript Example

async function speechToText(audioFile) {
  const formData = new FormData();
  formData.append('audio', audioFile);

  const response = await fetch('https://www.girardai.com/api/voice/stt', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.GIRARDAI_API_KEY}`,
    },
    body: formData,
  });

  const result = await response.json();

  if (result.success) {
    return result.data.text;
  }

  throw new Error(result.error);
}

// Usage with file input
const fileInput = document.getElementById('audioFile');
fileInput.addEventListener('change', async (e) => {
  const file = e.target.files[0];
  const transcript = await speechToText(file);
  console.log('Transcript:', transcript);
});

Python Example

import requests
import os

def speech_to_text(audio_file_path, language=None):
    """Transcribe audio file to text."""
    with open(audio_file_path, 'rb') as audio_file:
        files = {'audio': audio_file}
        data = {}

        if language:
            data['language'] = language

        response = requests.post(
            'https://www.girardai.com/api/voice/stt',
            headers={
                'Authorization': f'Bearer {os.environ["GIRARDAI_API_KEY"]}',
            },
            files=files,
            data=data if data else None
        )

    result = response.json()

    if result.get('success'):
        return result['data']['text']

    raise Exception(result.get('error', 'STT failed'))

# Usage
transcript = speech_to_text('meeting_recording.mp3')
print(transcript)

Combined Example

Create a voice assistant that listens and responds:

JavaScript

class VoiceAssistant {
  constructor(apiKey) {
    this.apiKey = apiKey;
    this.baseUrl = 'https://www.girardai.com/api';
  }

  async transcribe(audioBlob) {
    const formData = new FormData();
    formData.append('audio', audioBlob, 'recording.webm');

    const response = await fetch(`${this.baseUrl}/voice/stt`, {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${this.apiKey}`,
      },
      body: formData,
    });

    const result = await response.json();
    return result.data.text;
  }

  async getAIResponse(text) {
    const response = await fetch(`${this.baseUrl}/chat`, {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${this.apiKey}`,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({
        messages: [{ role: 'user', content: text }],
      }),
    });

    const result = await response.json();
    return result.data.content;
  }

  async speak(text, voice = 'nova') {
    const response = await fetch(`${this.baseUrl}/voice/tts`, {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${this.apiKey}`,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({ text, voice }),
    });

    const audioBlob = await response.blob();
    const audioUrl = URL.createObjectURL(audioBlob);
    const audio = new Audio(audioUrl);

    return new Promise((resolve) => {
      audio.onended = resolve;
      audio.play();
    });
  }

  async processVoiceInput(audioBlob) {
    // 1. Transcribe audio
    const userText = await this.transcribe(audioBlob);
    console.log('User said:', userText);

    // 2. Get AI response
    const aiResponse = await this.getAIResponse(userText);
    console.log('AI response:', aiResponse);

    // 3. Speak response
    await this.speak(aiResponse);

    return { userText, aiResponse };
  }
}

// Usage
const assistant = new VoiceAssistant(process.env.GIRARDAI_API_KEY);

// When recording stops
mediaRecorder.onstop = async () => {
  const audioBlob = new Blob(audioChunks, { type: 'audio/webm' });
  const result = await assistant.processVoiceInput(audioBlob);
};

Python

import requests
import os
from pathlib import Path

class VoiceAssistant:
    def __init__(self):
        self.api_key = os.environ.get('GIRARDAI_API_KEY')
        self.base_url = 'https://www.girardai.com/api'

    def transcribe(self, audio_path: str) -> str:
        """Transcribe audio to text."""
        with open(audio_path, 'rb') as f:
            response = requests.post(
                f'{self.base_url}/voice/stt',
                headers={'Authorization': f'Bearer {self.api_key}'},
                files={'audio': f}
            )
        return response.json()['data']['text']

    def get_ai_response(self, text: str) -> str:
        """Get AI response for text."""
        response = requests.post(
            f'{self.base_url}/chat',
            headers={
                'Authorization': f'Bearer {self.api_key}',
                'Content-Type': 'application/json'
            },
            json={'messages': [{'role': 'user', 'content': text}]}
        )
        return response.json()['data']['content']

    def speak(self, text: str, output_path: str = 'response.mp3', voice: str = 'nova') -> str:
        """Convert text to speech."""
        response = requests.post(
            f'{self.base_url}/voice/tts',
            headers={
                'Authorization': f'Bearer {self.api_key}',
                'Content-Type': 'application/json'
            },
            json={'text': text, 'voice': voice}
        )

        with open(output_path, 'wb') as f:
            f.write(response.content)

        return output_path

    def process_voice(self, audio_path: str) -> dict:
        """Process voice input end-to-end."""
        # 1. Transcribe
        user_text = self.transcribe(audio_path)
        print(f'User: {user_text}')

        # 2. Get AI response
        ai_response = self.get_ai_response(user_text)
        print(f'AI: {ai_response}')

        # 3. Generate speech
        audio_output = self.speak(ai_response)

        return {
            'user_text': user_text,
            'ai_response': ai_response,
            'audio_output': audio_output
        }


# Usage
assistant = VoiceAssistant()
result = assistant.process_voice('user_question.mp3')

Error Responses

TTS Errors

{
  "success": false,
  "error": "Text is required"
}
StatusErrorDescription
400"Text is required"Missing text field
400"Text too long"Exceeds character limit
400"Invalid voice"Unknown voice ID
401"Unauthorized"Invalid API key
503"Voice service not configured"Server issue

STT Errors

{
  "success": false,
  "error": "Audio file is required"
}
StatusErrorDescription
400"Audio file is required"Missing audio file
400"File too large"Exceeds 25 MB limit
400"Unsupported format"Invalid audio format
401"Unauthorized"Invalid API key
503"Voice service not configured"Server issue

Limits & Quotas

TTS Limits

LimitValue
Max text length4,096 characters
Max requests/minute60 (varies by plan)

STT Limits

LimitValue
Max file size25 MB
Max audio duration30 minutes
Max requests/minute30 (varies by plan)

Credit Usage

ActionCredits
TTS (per 1,000 characters)1
STT (per minute of audio)2

Best Practices

For TTS

  1. Optimize text - Use punctuation for natural pauses
  2. Choose appropriate voice - Match voice to content type
  3. Test with samples - Preview before generating long audio
  4. Handle errors - Implement retry logic

For STT

  1. Use quality audio - Clear recordings transcribe better
  2. Minimize background noise - Improves accuracy
  3. Compress large files - Stay under 25 MB limit
  4. Specify language - Improves accuracy for non-English

Previous: Agents API | Next: Workflows API

Previous
Agents API
Next
Workflows API