Girard AI - AI Automation Platform

The Voice API provides text-to-speech (TTS) and speech-to-text (STT) capabilities powered by OpenAI. Convert text to natural speech or transcribe audio to text.

Base URL

https://www.girardai.com/api/voice

Authentication

All requests require authentication via Bearer token:

Authorization: Bearer sk_live_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Text-to-Speech (TTS)

Convert text to natural-sounding speech.

Endpoint: POST /api/voice/tts

Request Body

Field	Type	Required	Description
`text`	string	Yes	Text to convert to speech
`voice`	string	No	Voice ID (default: "alloy")
`speed`	number	No	Speed multiplier (0.25-4.0, default: 1.0)

Available Voices

Voice ID	Description	Character
`alloy`	Neutral, balanced	Versatile
`echo`	Warm, conversational	Friendly
`fable`	Narrative, storytelling	Dramatic
`onyx`	Deep, authoritative	Professional
`nova`	Friendly, energetic	Upbeat
`shimmer`	Clear, expressive	Instructional

Example Request

curl -X POST https://www.girardai.com/api/voice/tts \
  -H "Authorization: Bearer sk_live_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Welcome to Girard, your all-in-one AI platform.",
    "voice": "nova",
    "speed": 1.0
  }'

Response

Returns audio file (MP3):

Content-Type: audio/mpeg
Content-Disposition: attachment; filename="speech.mp3"

[Binary audio data]

Response Headers

Header	Description
`Content-Type`	audio/mpeg
`Content-Length`	File size in bytes
`X-Credits-Used`	Credits consumed

JavaScript Example

async function textToSpeech(text, voice = 'alloy') {
  const response = await fetch('https://www.girardai.com/api/voice/tts', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.GIRARDAI_API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({ text, voice }),
  });

  if (!response.ok) {
    throw new Error('TTS request failed');
  }

  // Get audio blob
  const audioBlob = await response.blob();

  // Create audio URL
  const audioUrl = URL.createObjectURL(audioBlob);

  return audioUrl;
}

// Usage
const audioUrl = await textToSpeech('Hello, world!', 'nova');
const audio = new Audio(audioUrl);
audio.play();

Python Example

import requests
import os

def text_to_speech(text, voice='alloy', speed=1.0, output_file='speech.mp3'):
    """Convert text to speech and save to file."""
    response = requests.post(
        'https://www.girardai.com/api/voice/tts',
        headers={
            'Authorization': f'Bearer {os.environ["GIRARDAI_API_KEY"]}',
            'Content-Type': 'application/json',
        },
        json={
            'text': text,
            'voice': voice,
            'speed': speed,
        }
    )

    if response.status_code == 200:
        with open(output_file, 'wb') as f:
            f.write(response.content)
        return output_file
    else:
        raise Exception(f'TTS failed: {response.text}')

# Usage
text_to_speech(
    'Welcome to our product demo.',
    voice='nova',
    output_file='welcome.mp3'
)

Speech-to-Text (STT)

Transcribe audio to text.

Endpoint: POST /api/voice/stt

Request

Send audio file as multipart form data:

Field	Type	Required	Description
`audio`	file	Yes	Audio file to transcribe
`language`	string	No	Language code (auto-detected if not specified)

Supported Audio Formats

Format	Extensions	Max Size
MP3	.mp3	25 MB
MP4	.mp4, .m4a	25 MB
WAV	.wav	25 MB
WebM	.webm	25 MB
MPEG	.mpeg, .mpga	25 MB

Example Request

curl -X POST https://www.girardai.com/api/voice/stt \
  -H "Authorization: Bearer sk_live_xxx" \
  -F "audio=@recording.mp3"

Response

{
  "success": true,
  "data": {
    "text": "Hello, this is a test recording of the speech to text feature.",
    "duration": 4.5,
    "language": "en"
  }
}

Field	Type	Description
`text`	string	Transcribed text
`duration`	number	Audio duration in seconds
`language`	string	Detected language code

JavaScript Example

async function speechToText(audioFile) {
  const formData = new FormData();
  formData.append('audio', audioFile);

  const response = await fetch('https://www.girardai.com/api/voice/stt', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.GIRARDAI_API_KEY}`,
    },
    body: formData,
  });

  const result = await response.json();

  if (result.success) {
    return result.data.text;
  }

  throw new Error(result.error);
}

// Usage with file input
const fileInput = document.getElementById('audioFile');
fileInput.addEventListener('change', async (e) => {
  const file = e.target.files[0];
  const transcript = await speechToText(file);
  console.log('Transcript:', transcript);
});

Python Example

import requests
import os

def speech_to_text(audio_file_path, language=None):
    """Transcribe audio file to text."""
    with open(audio_file_path, 'rb') as audio_file:
        files = {'audio': audio_file}
        data = {}

        if language:
            data['language'] = language

        response = requests.post(
            'https://www.girardai.com/api/voice/stt',
            headers={
                'Authorization': f'Bearer {os.environ["GIRARDAI_API_KEY"]}',
            },
            files=files,
            data=data if data else None
        )

    result = response.json()

    if result.get('success'):
        return result['data']['text']

    raise Exception(result.get('error', 'STT failed'))

# Usage
transcript = speech_to_text('meeting_recording.mp3')
print(transcript)

Combined Example

Create a voice assistant that listens and responds:

JavaScript

class VoiceAssistant {
  constructor(apiKey) {
    this.apiKey = apiKey;
    this.baseUrl = 'https://www.girardai.com/api';
  }

  async transcribe(audioBlob) {
    const formData = new FormData();
    formData.append('audio', audioBlob, 'recording.webm');

    const response = await fetch(`${this.baseUrl}/voice/stt`, {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${this.apiKey}`,
      },
      body: formData,
    });

    const result = await response.json();
    return result.data.text;
  }

  async getAIResponse(text) {
    const response = await fetch(`${this.baseUrl}/chat`, {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${this.apiKey}`,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({
        messages: [{ role: 'user', content: text }],
      }),
    });

    const result = await response.json();
    return result.data.content;
  }

  async speak(text, voice = 'nova') {
    const response = await fetch(`${this.baseUrl}/voice/tts`, {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${this.apiKey}`,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({ text, voice }),
    });

    const audioBlob = await response.blob();
    const audioUrl = URL.createObjectURL(audioBlob);
    const audio = new Audio(audioUrl);

    return new Promise((resolve) => {
      audio.onended = resolve;
      audio.play();
    });
  }

  async processVoiceInput(audioBlob) {
    // 1. Transcribe audio
    const userText = await this.transcribe(audioBlob);
    console.log('User said:', userText);

    // 2. Get AI response
    const aiResponse = await this.getAIResponse(userText);
    console.log('AI response:', aiResponse);

    // 3. Speak response
    await this.speak(aiResponse);

    return { userText, aiResponse };
  }
}

// Usage
const assistant = new VoiceAssistant(process.env.GIRARDAI_API_KEY);

// When recording stops
mediaRecorder.onstop = async () => {
  const audioBlob = new Blob(audioChunks, { type: 'audio/webm' });
  const result = await assistant.processVoiceInput(audioBlob);
};

Python

import requests
import os
from pathlib import Path

class VoiceAssistant:
    def __init__(self):
        self.api_key = os.environ.get('GIRARDAI_API_KEY')
        self.base_url = 'https://www.girardai.com/api'

    def transcribe(self, audio_path: str) -> str:
        """Transcribe audio to text."""
        with open(audio_path, 'rb') as f:
            response = requests.post(
                f'{self.base_url}/voice/stt',
                headers={'Authorization': f'Bearer {self.api_key}'},
                files={'audio': f}
            )
        return response.json()['data']['text']

    def get_ai_response(self, text: str) -> str:
        """Get AI response for text."""
        response = requests.post(
            f'{self.base_url}/chat',
            headers={
                'Authorization': f'Bearer {self.api_key}',
                'Content-Type': 'application/json'
            },
            json={'messages': [{'role': 'user', 'content': text}]}
        )
        return response.json()['data']['content']

    def speak(self, text: str, output_path: str = 'response.mp3', voice: str = 'nova') -> str:
        """Convert text to speech."""
        response = requests.post(
            f'{self.base_url}/voice/tts',
            headers={
                'Authorization': f'Bearer {self.api_key}',
                'Content-Type': 'application/json'
            },
            json={'text': text, 'voice': voice}
        )

        with open(output_path, 'wb') as f:
            f.write(response.content)

        return output_path

    def process_voice(self, audio_path: str) -> dict:
        """Process voice input end-to-end."""
        # 1. Transcribe
        user_text = self.transcribe(audio_path)
        print(f'User: {user_text}')

        # 2. Get AI response
        ai_response = self.get_ai_response(user_text)
        print(f'AI: {ai_response}')

        # 3. Generate speech
        audio_output = self.speak(ai_response)

        return {
            'user_text': user_text,
            'ai_response': ai_response,
            'audio_output': audio_output
        }


# Usage
assistant = VoiceAssistant()
result = assistant.process_voice('user_question.mp3')

Error Responses

TTS Errors

{
  "success": false,
  "error": "Text is required"
}

Status	Error	Description
400	"Text is required"	Missing text field
400	"Text too long"	Exceeds character limit
400	"Invalid voice"	Unknown voice ID
401	"Unauthorized"	Invalid API key
503	"Voice service not configured"	Server issue

STT Errors

{
  "success": false,
  "error": "Audio file is required"
}

Status	Error	Description
400	"Audio file is required"	Missing audio file
400	"File too large"	Exceeds 25 MB limit
400	"Unsupported format"	Invalid audio format
401	"Unauthorized"	Invalid API key
503	"Voice service not configured"	Server issue

Limits & Quotas

TTS Limits

Limit	Value
Max text length	4,096 characters
Max requests/minute	60 (varies by plan)

STT Limits

Limit	Value
Max file size	25 MB
Max audio duration	30 minutes
Max requests/minute	30 (varies by plan)

Credit Usage

Action	Credits
TTS (per 1,000 characters)	1
STT (per minute of audio)	2

Best Practices

For TTS

Optimize text - Use punctuation for natural pauses
Choose appropriate voice - Match voice to content type
Test with samples - Preview before generating long audio
Handle errors - Implement retry logic

For STT

Use quality audio - Clear recordings transcribe better
Minimize background noise - Improves accuracy
Compress large files - Stay under 25 MB limit
Specify language - Improves accuracy for non-English

Previous: Agents API | Next: Workflows API

The Voice API provides text-to-speech (TTS) and speech-to-text (STT) capabilities powered by OpenAI. Convert text to natural speech or transcribe audio to text.

Base URL

https://www.girardai.com/api/voice

Authentication

All requests require authentication via Bearer token:

Authorization: Bearer sk_live_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Text-to-Speech (TTS)

Convert text to natural-sounding speech.

Endpoint: POST /api/voice/tts

Request Body

Field	Type	Required	Description
`text`	string	Yes	Text to convert to speech
`voice`	string	No	Voice ID (default: "alloy")
`speed`	number	No	Speed multiplier (0.25-4.0, default: 1.0)

Available Voices

Voice ID	Description	Character
`alloy`	Neutral, balanced	Versatile
`echo`	Warm, conversational	Friendly
`fable`	Narrative, storytelling	Dramatic
`onyx`	Deep, authoritative	Professional
`nova`	Friendly, energetic	Upbeat
`shimmer`	Clear, expressive	Instructional

Example Request

curl -X POST https://www.girardai.com/api/voice/tts \
  -H "Authorization: Bearer sk_live_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Welcome to Girard, your all-in-one AI platform.",
    "voice": "nova",
    "speed": 1.0
  }'

Response

Returns audio file (MP3):

Content-Type: audio/mpeg
Content-Disposition: attachment; filename="speech.mp3"

[Binary audio data]

Response Headers

Header	Description
`Content-Type`	audio/mpeg
`Content-Length`	File size in bytes
`X-Credits-Used`	Credits consumed

JavaScript Example

async function textToSpeech(text, voice = 'alloy') {
  const response = await fetch('https://www.girardai.com/api/voice/tts', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.GIRARDAI_API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({ text, voice }),
  });

  if (!response.ok) {
    throw new Error('TTS request failed');
  }

  // Get audio blob
  const audioBlob = await response.blob();

  // Create audio URL
  const audioUrl = URL.createObjectURL(audioBlob);

  return audioUrl;
}

// Usage
const audioUrl = await textToSpeech('Hello, world!', 'nova');
const audio = new Audio(audioUrl);
audio.play();

Python Example

import requests
import os

def text_to_speech(text, voice='alloy', speed=1.0, output_file='speech.mp3'):
    """Convert text to speech and save to file."""
    response = requests.post(
        'https://www.girardai.com/api/voice/tts',
        headers={
            'Authorization': f'Bearer {os.environ["GIRARDAI_API_KEY"]}',
            'Content-Type': 'application/json',
        },
        json={
            'text': text,
            'voice': voice,
            'speed': speed,
        }
    )

    if response.status_code == 200:
        with open(output_file, 'wb') as f:
            f.write(response.content)
        return output_file
    else:
        raise Exception(f'TTS failed: {response.text}')

# Usage
text_to_speech(
    'Welcome to our product demo.',
    voice='nova',
    output_file='welcome.mp3'
)

Speech-to-Text (STT)

Transcribe audio to text.

Endpoint: POST /api/voice/stt

Request

Send audio file as multipart form data:

Field	Type	Required	Description
`audio`	file	Yes	Audio file to transcribe
`language`	string	No	Language code (auto-detected if not specified)

Supported Audio Formats

Format	Extensions	Max Size
MP3	.mp3	25 MB
MP4	.mp4, .m4a	25 MB
WAV	.wav	25 MB
WebM	.webm	25 MB
MPEG	.mpeg, .mpga	25 MB

Example Request

curl -X POST https://www.girardai.com/api/voice/stt \
  -H "Authorization: Bearer sk_live_xxx" \
  -F "audio=@recording.mp3"

Response

{
  "success": true,
  "data": {
    "text": "Hello, this is a test recording of the speech to text feature.",
    "duration": 4.5,
    "language": "en"
  }
}

Field	Type	Description
`text`	string	Transcribed text
`duration`	number	Audio duration in seconds
`language`	string	Detected language code

JavaScript Example

async function speechToText(audioFile) {
  const formData = new FormData();
  formData.append('audio', audioFile);

  const response = await fetch('https://www.girardai.com/api/voice/stt', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.GIRARDAI_API_KEY}`,
    },
    body: formData,
  });

  const result = await response.json();

  if (result.success) {
    return result.data.text;
  }

  throw new Error(result.error);
}

// Usage with file input
const fileInput = document.getElementById('audioFile');
fileInput.addEventListener('change', async (e) => {
  const file = e.target.files[0];
  const transcript = await speechToText(file);
  console.log('Transcript:', transcript);
});

Python Example

import requests
import os

def speech_to_text(audio_file_path, language=None):
    """Transcribe audio file to text."""
    with open(audio_file_path, 'rb') as audio_file:
        files = {'audio': audio_file}
        data = {}

        if language:
            data['language'] = language

        response = requests.post(
            'https://www.girardai.com/api/voice/stt',
            headers={
                'Authorization': f'Bearer {os.environ["GIRARDAI_API_KEY"]}',
            },
            files=files,
            data=data if data else None
        )

    result = response.json()

    if result.get('success'):
        return result['data']['text']

    raise Exception(result.get('error', 'STT failed'))

# Usage
transcript = speech_to_text('meeting_recording.mp3')
print(transcript)

Combined Example

Create a voice assistant that listens and responds:

JavaScript

class VoiceAssistant {
  constructor(apiKey) {
    this.apiKey = apiKey;
    this.baseUrl = 'https://www.girardai.com/api';
  }

  async transcribe(audioBlob) {
    const formData = new FormData();
    formData.append('audio', audioBlob, 'recording.webm');

    const response = await fetch(`${this.baseUrl}/voice/stt`, {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${this.apiKey}`,
      },
      body: formData,
    });

    const result = await response.json();
    return result.data.text;
  }

  async getAIResponse(text) {
    const response = await fetch(`${this.baseUrl}/chat`, {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${this.apiKey}`,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({
        messages: [{ role: 'user', content: text }],
      }),
    });

    const result = await response.json();
    return result.data.content;
  }

  async speak(text, voice = 'nova') {
    const response = await fetch(`${this.baseUrl}/voice/tts`, {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${this.apiKey}`,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({ text, voice }),
    });

    const audioBlob = await response.blob();
    const audioUrl = URL.createObjectURL(audioBlob);
    const audio = new Audio(audioUrl);

    return new Promise((resolve) => {
      audio.onended = resolve;
      audio.play();
    });
  }

  async processVoiceInput(audioBlob) {
    // 1. Transcribe audio
    const userText = await this.transcribe(audioBlob);
    console.log('User said:', userText);

    // 2. Get AI response
    const aiResponse = await this.getAIResponse(userText);
    console.log('AI response:', aiResponse);

    // 3. Speak response
    await this.speak(aiResponse);

    return { userText, aiResponse };
  }
}

// Usage
const assistant = new VoiceAssistant(process.env.GIRARDAI_API_KEY);

// When recording stops
mediaRecorder.onstop = async () => {
  const audioBlob = new Blob(audioChunks, { type: 'audio/webm' });
  const result = await assistant.processVoiceInput(audioBlob);
};

Python

import requests
import os
from pathlib import Path

class VoiceAssistant:
    def __init__(self):
        self.api_key = os.environ.get('GIRARDAI_API_KEY')
        self.base_url = 'https://www.girardai.com/api'

    def transcribe(self, audio_path: str) -> str:
        """Transcribe audio to text."""
        with open(audio_path, 'rb') as f:
            response = requests.post(
                f'{self.base_url}/voice/stt',
                headers={'Authorization': f'Bearer {self.api_key}'},
                files={'audio': f}
            )
        return response.json()['data']['text']

    def get_ai_response(self, text: str) -> str:
        """Get AI response for text."""
        response = requests.post(
            f'{self.base_url}/chat',
            headers={
                'Authorization': f'Bearer {self.api_key}',
                'Content-Type': 'application/json'
            },
            json={'messages': [{'role': 'user', 'content': text}]}
        )
        return response.json()['data']['content']

    def speak(self, text: str, output_path: str = 'response.mp3', voice: str = 'nova') -> str:
        """Convert text to speech."""
        response = requests.post(
            f'{self.base_url}/voice/tts',
            headers={
                'Authorization': f'Bearer {self.api_key}',
                'Content-Type': 'application/json'
            },
            json={'text': text, 'voice': voice}
        )

        with open(output_path, 'wb') as f:
            f.write(response.content)

        return output_path

    def process_voice(self, audio_path: str) -> dict:
        """Process voice input end-to-end."""
        # 1. Transcribe
        user_text = self.transcribe(audio_path)
        print(f'User: {user_text}')

        # 2. Get AI response
        ai_response = self.get_ai_response(user_text)
        print(f'AI: {ai_response}')

        # 3. Generate speech
        audio_output = self.speak(ai_response)

        return {
            'user_text': user_text,
            'ai_response': ai_response,
            'audio_output': audio_output
        }


# Usage
assistant = VoiceAssistant()
result = assistant.process_voice('user_question.mp3')

Error Responses

TTS Errors

{
  "success": false,
  "error": "Text is required"
}

Status	Error	Description
400	"Text is required"	Missing text field
400	"Text too long"	Exceeds character limit
400	"Invalid voice"	Unknown voice ID
401	"Unauthorized"	Invalid API key
503	"Voice service not configured"	Server issue

STT Errors

{
  "success": false,
  "error": "Audio file is required"
}

Status	Error	Description
400	"Audio file is required"	Missing audio file
400	"File too large"	Exceeds 25 MB limit
400	"Unsupported format"	Invalid audio format
401	"Unauthorized"	Invalid API key
503	"Voice service not configured"	Server issue

Limits & Quotas

TTS Limits

Limit	Value
Max text length	4,096 characters
Max requests/minute	60 (varies by plan)

STT Limits

Limit	Value
Max file size	25 MB
Max audio duration	30 minutes
Max requests/minute	30 (varies by plan)

Credit Usage

Action	Credits
TTS (per 1,000 characters)	1
STT (per minute of audio)	2

Best Practices

For TTS

Optimize text - Use punctuation for natural pauses
Choose appropriate voice - Match voice to content type
Test with samples - Preview before generating long audio
Handle errors - Implement retry logic

For STT

Use quality audio - Clear recordings transcribe better
Minimize background noise - Improves accuracy
Compress large files - Stay under 25 MB limit
Specify language - Improves accuracy for non-English

Previous: Agents API | Next: Workflows API