The Voice API provides text-to-speech (TTS) and speech-to-text (STT) capabilities powered by OpenAI. Convert text to natural speech or transcribe audio to text.
Base URL
https://www.girardai.com/api/voice
Authentication
All requests require authentication via Bearer token:
Authorization: Bearer sk_live_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Text-to-Speech (TTS)
Convert text to natural-sounding speech.
Endpoint: POST /api/voice/tts
Request Body
| Field | Type | Required | Description |
|---|
text | string | Yes | Text to convert to speech |
voice | string | No | Voice ID (default: "alloy") |
speed | number | No | Speed multiplier (0.25-4.0, default: 1.0) |
Available Voices
| Voice ID | Description | Character |
|---|
alloy | Neutral, balanced | Versatile |
echo | Warm, conversational | Friendly |
fable | Narrative, storytelling | Dramatic |
onyx | Deep, authoritative | Professional |
nova | Friendly, energetic | Upbeat |
shimmer | Clear, expressive | Instructional |
Example Request
curl -X POST https://www.girardai.com/api/voice/tts \
-H "Authorization: Bearer sk_live_xxx" \
-H "Content-Type: application/json" \
-d '{
"text": "Welcome to Girard, your all-in-one AI platform.",
"voice": "nova",
"speed": 1.0
}'
Response
Returns audio file (MP3):
Content-Type: audio/mpeg
Content-Disposition: attachment; filename="speech.mp3"
[Binary audio data]
Response Headers
| Header | Description |
|---|
Content-Type | audio/mpeg |
Content-Length | File size in bytes |
X-Credits-Used | Credits consumed |
JavaScript Example
async function textToSpeech(text, voice = 'alloy') {
const response = await fetch('https://www.girardai.com/api/voice/tts', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.GIRARDAI_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({ text, voice }),
});
if (!response.ok) {
throw new Error('TTS request failed');
}
// Get audio blob
const audioBlob = await response.blob();
// Create audio URL
const audioUrl = URL.createObjectURL(audioBlob);
return audioUrl;
}
// Usage
const audioUrl = await textToSpeech('Hello, world!', 'nova');
const audio = new Audio(audioUrl);
audio.play();
Python Example
import requests
import os
def text_to_speech(text, voice='alloy', speed=1.0, output_file='speech.mp3'):
"""Convert text to speech and save to file."""
response = requests.post(
'https://www.girardai.com/api/voice/tts',
headers={
'Authorization': f'Bearer {os.environ["GIRARDAI_API_KEY"]}',
'Content-Type': 'application/json',
},
json={
'text': text,
'voice': voice,
'speed': speed,
}
)
if response.status_code == 200:
with open(output_file, 'wb') as f:
f.write(response.content)
return output_file
else:
raise Exception(f'TTS failed: {response.text}')
# Usage
text_to_speech(
'Welcome to our product demo.',
voice='nova',
output_file='welcome.mp3'
)
Speech-to-Text (STT)
Transcribe audio to text.
Endpoint: POST /api/voice/stt
Request
Send audio file as multipart form data:
| Field | Type | Required | Description |
|---|
audio | file | Yes | Audio file to transcribe |
language | string | No | Language code (auto-detected if not specified) |
Supported Audio Formats
| Format | Extensions | Max Size |
|---|
| MP3 | .mp3 | 25 MB |
| MP4 | .mp4, .m4a | 25 MB |
| WAV | .wav | 25 MB |
| WebM | .webm | 25 MB |
| MPEG | .mpeg, .mpga | 25 MB |
Example Request
curl -X POST https://www.girardai.com/api/voice/stt \
-H "Authorization: Bearer sk_live_xxx" \
-F "audio=@recording.mp3"
Response
{
"success": true,
"data": {
"text": "Hello, this is a test recording of the speech to text feature.",
"duration": 4.5,
"language": "en"
}
}
| Field | Type | Description |
|---|
text | string | Transcribed text |
duration | number | Audio duration in seconds |
language | string | Detected language code |
JavaScript Example
async function speechToText(audioFile) {
const formData = new FormData();
formData.append('audio', audioFile);
const response = await fetch('https://www.girardai.com/api/voice/stt', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.GIRARDAI_API_KEY}`,
},
body: formData,
});
const result = await response.json();
if (result.success) {
return result.data.text;
}
throw new Error(result.error);
}
// Usage with file input
const fileInput = document.getElementById('audioFile');
fileInput.addEventListener('change', async (e) => {
const file = e.target.files[0];
const transcript = await speechToText(file);
console.log('Transcript:', transcript);
});
Python Example
import requests
import os
def speech_to_text(audio_file_path, language=None):
"""Transcribe audio file to text."""
with open(audio_file_path, 'rb') as audio_file:
files = {'audio': audio_file}
data = {}
if language:
data['language'] = language
response = requests.post(
'https://www.girardai.com/api/voice/stt',
headers={
'Authorization': f'Bearer {os.environ["GIRARDAI_API_KEY"]}',
},
files=files,
data=data if data else None
)
result = response.json()
if result.get('success'):
return result['data']['text']
raise Exception(result.get('error', 'STT failed'))
# Usage
transcript = speech_to_text('meeting_recording.mp3')
print(transcript)
Combined Example
Create a voice assistant that listens and responds:
JavaScript
class VoiceAssistant {
constructor(apiKey) {
this.apiKey = apiKey;
this.baseUrl = 'https://www.girardai.com/api';
}
async transcribe(audioBlob) {
const formData = new FormData();
formData.append('audio', audioBlob, 'recording.webm');
const response = await fetch(`${this.baseUrl}/voice/stt`, {
method: 'POST',
headers: {
'Authorization': `Bearer ${this.apiKey}`,
},
body: formData,
});
const result = await response.json();
return result.data.text;
}
async getAIResponse(text) {
const response = await fetch(`${this.baseUrl}/chat`, {
method: 'POST',
headers: {
'Authorization': `Bearer ${this.apiKey}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
messages: [{ role: 'user', content: text }],
}),
});
const result = await response.json();
return result.data.content;
}
async speak(text, voice = 'nova') {
const response = await fetch(`${this.baseUrl}/voice/tts`, {
method: 'POST',
headers: {
'Authorization': `Bearer ${this.apiKey}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({ text, voice }),
});
const audioBlob = await response.blob();
const audioUrl = URL.createObjectURL(audioBlob);
const audio = new Audio(audioUrl);
return new Promise((resolve) => {
audio.onended = resolve;
audio.play();
});
}
async processVoiceInput(audioBlob) {
// 1. Transcribe audio
const userText = await this.transcribe(audioBlob);
console.log('User said:', userText);
// 2. Get AI response
const aiResponse = await this.getAIResponse(userText);
console.log('AI response:', aiResponse);
// 3. Speak response
await this.speak(aiResponse);
return { userText, aiResponse };
}
}
// Usage
const assistant = new VoiceAssistant(process.env.GIRARDAI_API_KEY);
// When recording stops
mediaRecorder.onstop = async () => {
const audioBlob = new Blob(audioChunks, { type: 'audio/webm' });
const result = await assistant.processVoiceInput(audioBlob);
};
Python
import requests
import os
from pathlib import Path
class VoiceAssistant:
def __init__(self):
self.api_key = os.environ.get('GIRARDAI_API_KEY')
self.base_url = 'https://www.girardai.com/api'
def transcribe(self, audio_path: str) -> str:
"""Transcribe audio to text."""
with open(audio_path, 'rb') as f:
response = requests.post(
f'{self.base_url}/voice/stt',
headers={'Authorization': f'Bearer {self.api_key}'},
files={'audio': f}
)
return response.json()['data']['text']
def get_ai_response(self, text: str) -> str:
"""Get AI response for text."""
response = requests.post(
f'{self.base_url}/chat',
headers={
'Authorization': f'Bearer {self.api_key}',
'Content-Type': 'application/json'
},
json={'messages': [{'role': 'user', 'content': text}]}
)
return response.json()['data']['content']
def speak(self, text: str, output_path: str = 'response.mp3', voice: str = 'nova') -> str:
"""Convert text to speech."""
response = requests.post(
f'{self.base_url}/voice/tts',
headers={
'Authorization': f'Bearer {self.api_key}',
'Content-Type': 'application/json'
},
json={'text': text, 'voice': voice}
)
with open(output_path, 'wb') as f:
f.write(response.content)
return output_path
def process_voice(self, audio_path: str) -> dict:
"""Process voice input end-to-end."""
# 1. Transcribe
user_text = self.transcribe(audio_path)
print(f'User: {user_text}')
# 2. Get AI response
ai_response = self.get_ai_response(user_text)
print(f'AI: {ai_response}')
# 3. Generate speech
audio_output = self.speak(ai_response)
return {
'user_text': user_text,
'ai_response': ai_response,
'audio_output': audio_output
}
# Usage
assistant = VoiceAssistant()
result = assistant.process_voice('user_question.mp3')
Error Responses
TTS Errors
{
"success": false,
"error": "Text is required"
}
| Status | Error | Description |
|---|
| 400 | "Text is required" | Missing text field |
| 400 | "Text too long" | Exceeds character limit |
| 400 | "Invalid voice" | Unknown voice ID |
| 401 | "Unauthorized" | Invalid API key |
| 503 | "Voice service not configured" | Server issue |
STT Errors
{
"success": false,
"error": "Audio file is required"
}
| Status | Error | Description |
|---|
| 400 | "Audio file is required" | Missing audio file |
| 400 | "File too large" | Exceeds 25 MB limit |
| 400 | "Unsupported format" | Invalid audio format |
| 401 | "Unauthorized" | Invalid API key |
| 503 | "Voice service not configured" | Server issue |
Limits & Quotas
TTS Limits
| Limit | Value |
|---|
| Max text length | 4,096 characters |
| Max requests/minute | 60 (varies by plan) |
STT Limits
| Limit | Value |
|---|
| Max file size | 25 MB |
| Max audio duration | 30 minutes |
| Max requests/minute | 30 (varies by plan) |
Credit Usage
| Action | Credits |
|---|
| TTS (per 1,000 characters) | 1 |
| STT (per minute of audio) | 2 |
Best Practices
For TTS
- Optimize text - Use punctuation for natural pauses
- Choose appropriate voice - Match voice to content type
- Test with samples - Preview before generating long audio
- Handle errors - Implement retry logic
For STT
- Use quality audio - Clear recordings transcribe better
- Minimize background noise - Improves accuracy
- Compress large files - Stay under 25 MB limit
- Specify language - Improves accuracy for non-English
Previous: Agents API | Next: Workflows API