Phemius API

v0.1.0

High-quality text-to-speech API with real-time WebSocket streaming, OpenAI drop-in compatibility, and voice cloning.

Base URLhttps://api.phemius.dev

Authentication

All API requests require a valid API key passed via the Authorization header. Keys are prefixed with sk_test_ (test) or sk_live_ (production). Keys are hashed server-side and cannot be recovered — store them securely after creation.

Authorization: Bearer sk_test_abc123...

Quickstart

Get an API key

Make your first request

curl -X POST https://api.phemius.dev/v1/speech \
  -H "Authorization: Bearer sk_test_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text": "Integrate natural sounding speech into your application with just a few lines of code. Low latency, high fidelity, and built for scale.", "voice": "shimmer", "model": "phemius-fast"}' \
  --output speech.pcm

Play the audio

Play the raw PCM file: ffplay -f s16le -ar 24000 -ch_layout mono speech.pcm

POST/v1/speech

Create Speech

Synthesize text to speech. Returns raw PCM audio (16-bit, 24kHz, mono). Characters are billed after successful synthesis.

Request Body

Parameter	Type	Required / Default	Description
`text`	`string`	Required	The text to synthesize Min: 1 Max: 5,000
`voice`	`string`	`default`	Voice name. Use a built-in name (alloy, echo, fable, onyx, nova, shimmer, aria) or a raw Kokoro voicepack name.
`model`	`string (phemius-fast)`	`phemius-fast`	TTS model to use
`voice_id`	`string\|null`	—	UUID of a custom cloned voice. Mutually exclusive with `voice`. When set, audio caching is disabled.
`speed`	`number`	`1`	Playback speed multiplier Range: 0.25–4

Response

200audio/pcm

Raw PCM audio bytes (16-bit signed, 24kHz, mono)

Response Headers

X-Chars-BilledNumber of characters billed for this request

Errors

Status	Description
`401`	Invalid or missing API key
`422`	Validation error (text too long, invalid speed, etc.)
`429`	Rate limit or monthly character limit exceeded
`502`	Speech synthesis failed (upstream model error)

Examples

curl -X POST https://api.phemius.dev/v1/speech \
  -H "Authorization: Bearer sk_test_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello world", "voice": "nova"}' \
  --output speech.pcm

WebSocket/v1/speech/stream

Stream Speech (WebSocket)

Real-time streaming speech synthesis over WebSocket. Audio is delivered as binary frames as it's generated, enabling low-latency playback. Ideal for interactive applications.

Protocol Flow

Client

Send authentication

{
  "api_key": "sk_test_YOUR_KEY"
}

Client

Send synthesis request

{
  "text": "Hello world",
  "voice": "nova",
  "model": "phemius-fast",
  "speed": 1
}

Server

Receive audio chunks as binary WebSocket frames. Each chunk is raw PCM (16-bit, 24kHz, mono). Chunks arrive as they're generated for low-latency streaming.

Server

Final summary message

{
  "done": true,
  "chars_billed": 11,
  "ttfb_ms": 245,
  "cached": false
}

Request Fields

Parameter	Type	Required / Default	Description
`text`	`string`	Required	Max: 5,000
`voice`	`string`	`default`
`model`	`string`	`phemius-fast`
`voice_id`	`string\|null`	—
`speed`	`number`	`1`	Range: 0.25–4

Close Codes

Code	Reason
`4001`	Missing or invalid API key
`4029`	Rate limit exceeded
`4400`	Invalid request (validation error)

Error Messages

These errors are sent as JSON messages before the connection closes — they are not WebSocket close codes.

Code	Description
`5000`	Internal server error — sent as JSON `{"error": "...", "code": 5000}` before the connection closes
`5002`	Speech synthesis failed — sent as JSON `{"error": "...", "code": 5002}` before the connection closes

Examples

const ws = new WebSocket('wss://api.phemius.dev/v1/speech/stream');

ws.onopen = () => {
  ws.send(JSON.stringify({ api_key: 'sk_test_YOUR_KEY' }));
  ws.send(JSON.stringify({
    text: 'Hello world',
    voice: 'nova',
    speed: 1.0,
  }));
};

const chunks = [];
ws.onmessage = (event) => {
  if (event.data instanceof Blob) {
    // Binary audio chunk — append to buffer or play immediately
    chunks.push(event.data);
  } else {
    const msg = JSON.parse(event.data);
    if (msg.done) {
      console.log(`Billed: ${msg.chars_billed} chars, TTFB: ${msg.ttfb_ms}ms`);
    } else if (msg.error) {
      console.error(`Error ${msg.code}: ${msg.error}`);
    }
  }
};

POST/v1/audio/speech

Create Speech (OpenAI Compatible)

Drop-in replacement for the OpenAI TTS API. Use your existing OpenAI SDK code — just change the base URL and API key. Always returns raw PCM audio.

Request Body

Parameter	Type	Required / Default	Description
`model`	`string`	`tts-1`	Must be `tts-1`. Other values return 400.
`input`	`string`	Required	The text to synthesize Min: 1 Max: 10,000
`voice`	`string`	`alloy`	OpenAI voice name (alloy, echo, fable, onyx, nova, shimmer) or any Phemius voice name
`speed`	`number`	`1`	Playback speed multiplier Range: 0.25–4

Response

200audio/pcm

Raw PCM audio bytes (16-bit signed, 24kHz, mono)

Response Headers

X-Chars-BilledNumber of characters billed

X-Request-IdUnique request identifier

Errors

Status	Description
`400`	Unsupported model (use tts-1)
`401`	Invalid or missing API key
`422`	Validation error
`429`	Rate limit exceeded
`502`	Speech synthesis failed

Voice Mapping

OpenAI voice names are first mapped to Phemius voice names, which then resolve to Kokoro voicepacks. You can also pass Phemius voice names or raw voicepack names directly.

OpenAI Voice	Phemius Voice	Kokoro Voicepack
`alloy`	`aria`	`af_bella`
`echo`	`marcus`	`am_adam`
`fable`	`sophia`	`bf_emma`
`onyx`	`orion`	`bm_lewis`
`nova`	`nova`	`af_bella`
`shimmer`	`aria`	`af_bella`

Examples

curl -X POST https://api.phemius.dev/v1/speech \
  -H "Authorization: Bearer sk_test_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text": "Integrate natural sounding speech into your application with just a few lines of code. Low latency, high fidelity, and built for scale.", "voice": "shimmer", "model": "phemius-fast"}' \
  --output speech.pcm

GET/v1/voices

List Voices

List all custom voices belonging to the authenticated user.

Response

200application/json

{
  "voices": [
    {
      "id": "uuid",
      "name": "string",
      "model": "phemius-fast",
      "created_at": "datetime"
    }
  ]
}

Errors

Status	Description
`401`	Invalid or missing API key

Examples

curl -H "Authorization: Bearer sk_test_YOUR_KEY" \
  https://api.phemius.dev/v1/voices

POST/v1/voices

Create Voice

Create a new custom voice. Free plan is limited to 2 voices.

Request Body

Parameter	Type	Required / Default	Description
`name`	`string`	Required	Display name for the voice Min: 1 Max: 100
`model`	`string (phemius-fast)`	`phemius-fast`	TTS model for this voice

Response

201application/json

{
  "id": "uuid",
  "name": "string",
  "model": "string",
  "created_at": "datetime"
}

Errors

Status	Description
`401`	Invalid or missing API key
`403`	Free plan limited to 2 voices
`422`	Validation error (name too long, etc.)

Examples

curl -X POST https://api.phemius.dev/v1/voices \
  -H "Authorization: Bearer sk_test_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"name": "Alice"}'

DELETE/v1/voices/{voice_id}

Delete Voice

Delete a custom voice by ID. Only the owner can delete their voices.

Path Parameters

Parameter	Type	Required / Default	Description
`voice_id`	`string`	—	The voice ID to delete

Response

204

Voice deleted successfully (no body)

Errors

Status	Description
`401`	Invalid or missing API key
`404`	Voice not found

Examples

curl -X DELETE https://api.phemius.dev/v1/voices/YOUR_VOICE_ID \
  -H "Authorization: Bearer sk_test_YOUR_KEY"

GET/v1/jobs

List Jobs

List all async jobs (voice cloning, bulk synthesis) for the authenticated user, ordered by most recent first.

Response

200application/json

{
  "jobs": [
    {
      "id": "uuid",
      "type": "string",
      "status": "queued",
      "progress": "number",
      "result": "object|null",
      "error": "string|null",
      "created_at": "datetime",
      "updated_at": "datetime"
    }
  ]
}

Errors

Status	Description
`401`	Invalid or missing API key

Examples

curl -H "Authorization: Bearer sk_test_YOUR_KEY" \
  https://api.phemius.dev/v1/jobs

GET/v1/jobs/{job_id}

Get Job

Get the status and details of a specific async job.

Path Parameters

Parameter	Type	Required / Default	Description
`job_id`	`string`	—	The job ID to retrieve

Response

200application/json

{
  "id": "uuid",
  "type": "string",
  "status": "queued",
  "progress": "number",
  "result": "object|null",
  "error": "string|null",
  "created_at": "datetime",
  "updated_at": "datetime"
}

Errors

Status	Description
`401`	Invalid or missing API key
`404`	Job not found

Examples

curl -H "Authorization: Bearer sk_test_YOUR_KEY" \
  https://api.phemius.dev/v1/jobs/YOUR_JOB_ID

OpenAI TTS Migration

Phemius is a drop-in replacement for the OpenAI TTS API. Change two lines — base URL and API key — and your existing code works immediately.

Before (OpenAI)

from openai import OpenAI
client = OpenAI(api_key="sk-...")

response = client.audio.speech.create(
    model="tts-1",
    voice="alloy",
    input="Hello world",
)
response.stream_to_file("speech.pcm")

After (Phemius)

from openai import OpenAI
client = OpenAI(
    api_key="sk_test_YOUR_PHEMIUS_KEY",
    base_url="https://api.phemius.dev/v1",
)

response = client.audio.speech.create(
    model="tts-1",
    voice="alloy",
    input="Hello world",
)
response.stream_to_file("speech.pcm")

Key Differences

Only tts-1 model is supported (tts-1-hd is not available)
All six OpenAI voices are mapped to Kokoro equivalents
Both endpoints return raw PCM audio — no format conversion
Same speed range: 0.25x to 4.0x
Higher character limit per request: 10,000 chars (vs OpenAI's 4,096)

Audio Formats

Native Endpoint `/v1/speech`

The native /v1/speech endpoint always returns raw PCM audio (16-bit signed, 24kHz, mono). There is no response_format parameter — the output is always PCM. To play it:

ffplay -f s16le -ar 24000 -ch_layout mono speech.pcm

OpenAI-Compatible Endpoint `/v1/audio/speech`

The OpenAI-compatible /v1/audio/speech endpoint also always returns raw PCM audio. There is no response_format parameter.

WebSocket Streaming

The WebSocket endpoint at /v1/speech/stream enables real-time audio streaming with low time-to-first-byte. Audio chunks are sent as binary WebSocket frames as they're generated.

Audio Specification

format

Raw PCM

bit depth

sample rate

24000

channels

byte order

little-endian

Connection Flow

1Connect to wss://api.phemius.dev/v1/speech/stream
2Send JSON: {"api_key": "sk_test_..."}
3Send JSON: {"text": "...", "voice": "...", ...}
4Receive binary audio chunks
5Receive JSON: {"done": true, "chars_billed": N, "ttfb_ms": N, "cached": bool}

Responses under 60 seconds are automatically cached. Subsequent identical requests return cached audio instantly. Requests with voice_id (custom voices) bypass the cache.

Rate Limits

Rate limits are applied per API key and reset on a rolling window (requests per minute) or monthly (character limits).

Plan	Requests/min	Monthly Characters	Note
Free	5	10,000	Hard capped — no overage
Developer	20	500,000 included	$8/1M chars overage
Growth	100	2,000,000 included	$6/1M chars overage

Retry-After header indicates seconds until the limit resets. WebSocket connections return close code 4029 when rate limited

Error Handling

HTTP Errors

Status	Description
`400`	Bad request — invalid model or malformed request
`401`	Unauthorized — missing, invalid, or revoked API key
`403`	Forbidden — plan limit reached (e.g. max voices)
`404`	Not found — resource doesn't exist or doesn't belong to you
`422`	Validation error — request body failed schema validation
`429`	Rate limited — too many requests or monthly character cap exceeded
`502`	Bad gateway — upstream TTS model failed

WebSocket Close Codes

Code	Description
`4001`	Authentication failed
`4029`	Rate limit exceeded
`4400`	Invalid request body

WebSocket Error Messages

Sent as JSON before the connection closes — not WebSocket close frame codes.

Code	Description
`5000`	Internal server error — sent as JSON message before close
`5002`	Speech synthesis failed — sent as JSON message before close

Built-in Voices

The following voices are available with any speech endpoint. Pass the name as the voice parameter.

Name	Voicepack	Gender	Description
`default`	`af_heart`	female	Default voice
`alloy`	`af_heart`	female	Warm and balanced
`echo`	`am_adam`	male	Clear and articulate
`fable`	`bf_emma`	female	Expressive storyteller
`onyx`	`bm_lewis`	male	Deep and resonant
`nova`	`af_bella`	female	Bright and energetic
`shimmer`	`af_sky`	female	Soft and gentle
`aria`	`af_bella`	female	Natural and expressive

Pass any Kokoro voicepack name directly (e.g. af_heart, am_adam). If the voice name doesn't match a built-in alias, it's used as a raw voicepack name.

Plans & Limits

Plan	Rate Limit	Monthly Chars	Max Voices	Billing
Free	5 RPM	10,000	2	No billing — hard capped at 10,000 chars/month
Developer	20 RPM	Unlimited	10	$10/mo + $8/1M chars overage
Growth	100 RPM	Unlimited	Unlimited	$25/mo + $6/1M chars overage

Phemius API

Authentication

Quickstart

Get an API key

Make your first request

Play the audio

Create Speech

Request Body

Response

Response Headers

Errors

Examples

Stream Speech (WebSocket)

Protocol Flow

Request Fields

Close Codes

Error Messages

Examples

Create Speech (OpenAI Compatible)

Request Body

Response

Response Headers

Errors

Voice Mapping

Examples

List Voices

Response

Errors

Examples

Create Voice

Request Body

Response

Errors

Examples

Delete Voice

Path Parameters

Response

Errors

Examples

List Jobs

Response

Errors

Examples

Get Job

Path Parameters

Response

Errors

Examples

OpenAI TTS Migration

Key Differences

Audio Formats

Native Endpoint /v1/speech

OpenAI-Compatible Endpoint /v1/audio/speech

WebSocket Streaming

Audio Specification

Connection Flow

Rate Limits

Error Handling

HTTP Errors

WebSocket Close Codes

WebSocket Error Messages

Built-in Voices

Plans & Limits

Native Endpoint `/v1/speech`

OpenAI-Compatible Endpoint `/v1/audio/speech`