Phemius API

v0.1.0

High-quality text-to-speech API with real-time WebSocket streaming, OpenAI drop-in compatibility, and voice cloning.

Base URLhttps://api.phemius.dev

Authentication

All API requests require a valid API key passed via the Authorization header. Keys are prefixed with sk_test_ (test) or sk_live_ (production). Keys are hashed server-side and cannot be recovered — store them securely after creation.

Authorization: Bearer sk_test_abc123...

Quickstart

1

Get an API key

Sign up at phemius.dev and create an API key from the dashboard.

2

Make your first request

curl -X POST https://api.phemius.dev/v1/speech \
  -H "Authorization: Bearer sk_test_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text": "Integrate natural sounding speech into your application with just a few lines of code. Low latency, high fidelity, and built for scale.", "voice": "shimmer", "model": "phemius-fast"}' \
  --output speech.pcm
3

Play the audio

Play the raw PCM file: ffplay -f s16le -ar 24000 -ch_layout mono speech.pcm

POST/v1/speech

Create Speech

Synthesize text to speech. Returns raw PCM audio (16-bit, 24kHz, mono). Characters are billed after successful synthesis.

Request Body

ParameterTypeRequired / DefaultDescription
textstringRequiredThe text to synthesize Min: 1 Max: 5,000
voicestringdefaultVoice name. Use a built-in name (alloy, echo, fable, onyx, nova, shimmer, aria) or a raw Kokoro voicepack name.
modelstring (phemius-fast)phemius-fastTTS model to use
voice_idstring|nullUUID of a custom cloned voice. Mutually exclusive with voice. When set, audio caching is disabled.
speednumber1Playback speed multiplier Range: 0.25–4

Response

200audio/pcm

Raw PCM audio bytes (16-bit signed, 24kHz, mono)

Response Headers

X-Chars-BilledNumber of characters billed for this request

Errors

StatusDescription
401Invalid or missing API key
422Validation error (text too long, invalid speed, etc.)
429Rate limit or monthly character limit exceeded
502Speech synthesis failed (upstream model error)

Examples

curl -X POST https://api.phemius.dev/v1/speech \
  -H "Authorization: Bearer sk_test_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello world", "voice": "nova"}' \
  --output speech.pcm
WebSocket/v1/speech/stream

Stream Speech (WebSocket)

Real-time streaming speech synthesis over WebSocket. Audio is delivered as binary frames as it's generated, enabling low-latency playback. Ideal for interactive applications.

Protocol Flow

Client

Send authentication

{
  "api_key": "sk_test_YOUR_KEY"
}
Client

Send synthesis request

{
  "text": "Hello world",
  "voice": "nova",
  "model": "phemius-fast",
  "speed": 1
}
Server

Receive audio chunks as binary WebSocket frames. Each chunk is raw PCM (16-bit, 24kHz, mono). Chunks arrive as they're generated for low-latency streaming.

Server

Final summary message

{
  "done": true,
  "chars_billed": 11,
  "ttfb_ms": 245,
  "cached": false
}

Request Fields

ParameterTypeRequired / DefaultDescription
textstringRequired Max: 5,000
voicestringdefault
modelstringphemius-fast
voice_idstring|null
speednumber1 Range: 0.25–4

Close Codes

CodeReason
4001Missing or invalid API key
4029Rate limit exceeded
4400Invalid request (validation error)

Error Messages

These errors are sent as JSON messages before the connection closes — they are not WebSocket close codes.

CodeDescription
5000Internal server error — sent as JSON {"error": "...", "code": 5000} before the connection closes
5002Speech synthesis failed — sent as JSON {"error": "...", "code": 5002} before the connection closes

Examples

const ws = new WebSocket('wss://api.phemius.dev/v1/speech/stream');

ws.onopen = () => {
  ws.send(JSON.stringify({ api_key: 'sk_test_YOUR_KEY' }));
  ws.send(JSON.stringify({
    text: 'Hello world',
    voice: 'nova',
    speed: 1.0,
  }));
};

const chunks = [];
ws.onmessage = (event) => {
  if (event.data instanceof Blob) {
    // Binary audio chunk — append to buffer or play immediately
    chunks.push(event.data);
  } else {
    const msg = JSON.parse(event.data);
    if (msg.done) {
      console.log(`Billed: ${msg.chars_billed} chars, TTFB: ${msg.ttfb_ms}ms`);
    } else if (msg.error) {
      console.error(`Error ${msg.code}: ${msg.error}`);
    }
  }
};
POST/v1/audio/speech

Create Speech (OpenAI Compatible)

Drop-in replacement for the OpenAI TTS API. Use your existing OpenAI SDK code — just change the base URL and API key. Always returns raw PCM audio.

Request Body

ParameterTypeRequired / DefaultDescription
modelstringtts-1Must be tts-1. Other values return 400.
inputstringRequiredThe text to synthesize Min: 1 Max: 10,000
voicestringalloyOpenAI voice name (alloy, echo, fable, onyx, nova, shimmer) or any Phemius voice name
speednumber1Playback speed multiplier Range: 0.25–4

Response

200audio/pcm

Raw PCM audio bytes (16-bit signed, 24kHz, mono)

Response Headers

X-Chars-BilledNumber of characters billed
X-Request-IdUnique request identifier

Errors

StatusDescription
400Unsupported model (use tts-1)
401Invalid or missing API key
422Validation error
429Rate limit exceeded
502Speech synthesis failed

Voice Mapping

OpenAI voice names are first mapped to Phemius voice names, which then resolve to Kokoro voicepacks. You can also pass Phemius voice names or raw voicepack names directly.

OpenAI VoicePhemius VoiceKokoro Voicepack
alloyariaaf_bella
echomarcusam_adam
fablesophiabf_emma
onyxorionbm_lewis
novanovaaf_bella
shimmerariaaf_bella

Examples

curl -X POST https://api.phemius.dev/v1/speech \
  -H "Authorization: Bearer sk_test_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text": "Integrate natural sounding speech into your application with just a few lines of code. Low latency, high fidelity, and built for scale.", "voice": "shimmer", "model": "phemius-fast"}' \
  --output speech.pcm
GET/v1/voices

List Voices

List all custom voices belonging to the authenticated user.

Response

200application/json
{
  "voices": [
    {
      "id": "uuid",
      "name": "string",
      "model": "phemius-fast",
      "created_at": "datetime"
    }
  ]
}

Errors

StatusDescription
401Invalid or missing API key

Examples

curl -H "Authorization: Bearer sk_test_YOUR_KEY" \
  https://api.phemius.dev/v1/voices
POST/v1/voices

Create Voice

Create a new custom voice. Free plan is limited to 2 voices.

Request Body

ParameterTypeRequired / DefaultDescription
namestringRequiredDisplay name for the voice Min: 1 Max: 100
modelstring (phemius-fast)phemius-fastTTS model for this voice

Response

201application/json
{
  "id": "uuid",
  "name": "string",
  "model": "string",
  "created_at": "datetime"
}

Errors

StatusDescription
401Invalid or missing API key
403Free plan limited to 2 voices
422Validation error (name too long, etc.)

Examples

curl -X POST https://api.phemius.dev/v1/voices \
  -H "Authorization: Bearer sk_test_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"name": "Alice"}'
DELETE/v1/voices/{voice_id}

Delete Voice

Delete a custom voice by ID. Only the owner can delete their voices.

Path Parameters

ParameterTypeRequired / DefaultDescription
voice_idstringThe voice ID to delete

Response

204

Voice deleted successfully (no body)

Errors

StatusDescription
401Invalid or missing API key
404Voice not found

Examples

curl -X DELETE https://api.phemius.dev/v1/voices/YOUR_VOICE_ID \
  -H "Authorization: Bearer sk_test_YOUR_KEY"
GET/v1/jobs

List Jobs

List all async jobs (voice cloning, bulk synthesis) for the authenticated user, ordered by most recent first.

Response

200application/json
{
  "jobs": [
    {
      "id": "uuid",
      "type": "string",
      "status": "queued",
      "progress": "number",
      "result": "object|null",
      "error": "string|null",
      "created_at": "datetime",
      "updated_at": "datetime"
    }
  ]
}

Errors

StatusDescription
401Invalid or missing API key

Examples

curl -H "Authorization: Bearer sk_test_YOUR_KEY" \
  https://api.phemius.dev/v1/jobs
GET/v1/jobs/{job_id}

Get Job

Get the status and details of a specific async job.

Path Parameters

ParameterTypeRequired / DefaultDescription
job_idstringThe job ID to retrieve

Response

200application/json
{
  "id": "uuid",
  "type": "string",
  "status": "queued",
  "progress": "number",
  "result": "object|null",
  "error": "string|null",
  "created_at": "datetime",
  "updated_at": "datetime"
}

Errors

StatusDescription
401Invalid or missing API key
404Job not found

Examples

curl -H "Authorization: Bearer sk_test_YOUR_KEY" \
  https://api.phemius.dev/v1/jobs/YOUR_JOB_ID

OpenAI TTS Migration

Phemius is a drop-in replacement for the OpenAI TTS API. Change two lines — base URL and API key — and your existing code works immediately.

Before (OpenAI)

from openai import OpenAI
client = OpenAI(api_key="sk-...")

response = client.audio.speech.create(
    model="tts-1",
    voice="alloy",
    input="Hello world",
)
response.stream_to_file("speech.pcm")

After (Phemius)

from openai import OpenAI
client = OpenAI(
    api_key="sk_test_YOUR_PHEMIUS_KEY",
    base_url="https://api.phemius.dev/v1",
)

response = client.audio.speech.create(
    model="tts-1",
    voice="alloy",
    input="Hello world",
)
response.stream_to_file("speech.pcm")

Key Differences

  • Only tts-1 model is supported (tts-1-hd is not available)
  • All six OpenAI voices are mapped to Kokoro equivalents
  • Both endpoints return raw PCM audio — no format conversion
  • Same speed range: 0.25x to 4.0x
  • Higher character limit per request: 10,000 chars (vs OpenAI's 4,096)

Audio Formats

Native Endpoint /v1/speech

The native /v1/speech endpoint always returns raw PCM audio (16-bit signed, 24kHz, mono). There is no response_format parameter — the output is always PCM. To play it:

ffplay -f s16le -ar 24000 -ch_layout mono speech.pcm

OpenAI-Compatible Endpoint /v1/audio/speech

The OpenAI-compatible /v1/audio/speech endpoint also always returns raw PCM audio. There is no response_format parameter.

WebSocket Streaming

The WebSocket endpoint at /v1/speech/stream enables real-time audio streaming with low time-to-first-byte. Audio chunks are sent as binary WebSocket frames as they're generated.

Audio Specification

format
Raw PCM
bit depth
16
sample rate
24000
channels
1
byte order
little-endian

Connection Flow

  1. 1Connect to wss://api.phemius.dev/v1/speech/stream
  2. 2Send JSON: {"api_key": "sk_test_..."}
  3. 3Send JSON: {"text": "...", "voice": "...", ...}
  4. 4Receive binary audio chunks
  5. 5Receive JSON: {"done": true, "chars_billed": N, "ttfb_ms": N, "cached": bool}
Responses under 60 seconds are automatically cached. Subsequent identical requests return cached audio instantly. Requests with voice_id (custom voices) bypass the cache.

Rate Limits

Rate limits are applied per API key and reset on a rolling window (requests per minute) or monthly (character limits).

PlanRequests/minMonthly CharactersNote
Free510,000Hard capped — no overage
Developer20500,000 included$8/1M chars overage
Growth1002,000,000 included$6/1M chars overage
Retry-After header indicates seconds until the limit resets. WebSocket connections return close code 4029 when rate limited

Error Handling

HTTP Errors

StatusDescription
400Bad request — invalid model or malformed request
401Unauthorized — missing, invalid, or revoked API key
403Forbidden — plan limit reached (e.g. max voices)
404Not found — resource doesn't exist or doesn't belong to you
422Validation error — request body failed schema validation
429Rate limited — too many requests or monthly character cap exceeded
502Bad gateway — upstream TTS model failed

WebSocket Close Codes

CodeDescription
4001Authentication failed
4029Rate limit exceeded
4400Invalid request body

WebSocket Error Messages

Sent as JSON before the connection closes — not WebSocket close frame codes.

CodeDescription
5000Internal server error — sent as JSON message before close
5002Speech synthesis failed — sent as JSON message before close

Built-in Voices

The following voices are available with any speech endpoint. Pass the name as the voice parameter.

NameVoicepackGenderDescription
defaultaf_heartfemaleDefault voice
alloyaf_heartfemaleWarm and balanced
echoam_adammaleClear and articulate
fablebf_emmafemaleExpressive storyteller
onyxbm_lewismaleDeep and resonant
novaaf_bellafemaleBright and energetic
shimmeraf_skyfemaleSoft and gentle
ariaaf_bellafemaleNatural and expressive
Pass any Kokoro voicepack name directly (e.g. af_heart, am_adam). If the voice name doesn't match a built-in alias, it's used as a raw voicepack name.

Plans & Limits

PlanRate LimitMonthly CharsMax VoicesBilling
Free5 RPM10,0002No billing — hard capped at 10,000 chars/month
Developer20 RPMUnlimited10$10/mo + $8/1M chars overage
Growth100 RPMUnlimitedUnlimited$25/mo + $6/1M chars overage