MCP TTS VOICEVOX - Secure MCP Server by ALMC Security 2025

MCP TTS VOICEVOX

View on GitHub

MCP TTS VOICEVOX

English | 日本語

A text-to-speech MCP server using VOICEVOX

Features

  • Advanced playback control - Flexible audio processing with queue management, immediate playback, and synchronous/asynchronous control
  • Prefetching - Pre-generates next audio for smooth playback
  • Cross-platform support - Works on Windows, macOS, and Linux (including WSL environment audio playback)
  • Stdio/HTTP support - Supports Stdio, SSE, and StreamableHttp
  • Multiple speaker support - Individual speaker specification per segment
  • Automatic text segmentation - Stable audio synthesis through automatic long text segmentation
  • Independent client library - Provided as a separate package @kajidog/voicevox-client

Requirements

Installation

npm install -g @kajidog/mcp-tts-voicevox

Usage

As MCP Server

1. Start VOICEVOX Engine

Start the VOICEVOX Engine and have it wait on the default port (http://localhost:50021).

2. Start MCP Server

Standard I/O mode (recommended):

npx @kajidog/mcp-tts-voicevox

HTTP server mode:

# Linux/macOS
MCP_HTTP_MODE=true npx @kajidog/mcp-tts-voicevox

# Windows PowerShell
$env:MCP_HTTP_MODE='true'; npx @kajidog/mcp-tts-voicevox

MCP Tools

speak - Text-to-speech

Converts text to speech and plays it.

Parameters:

  • text: String (multiple texts separated by newlines, speaker specification in "1:text" format)
  • speaker (optional): Speaker ID
  • speedScale (optional): Playback speed
  • immediate (optional): Whether to start playback immediately (default: true)
  • waitForStart (optional): Whether to wait for playback to start (default: false)
  • waitForEnd (optional): Whether to wait for playback to end (default: false)

Examples:

// Simple text
{ "text": "Hello\nIt's a nice day today" }

// Speaker specification
{ "text": "Hello", "speaker": 3 }

// Per-segment speaker specification
{ "text": "1:Hello\n3:It's a nice day today" }

// Immediate playback (bypass queue)
{
  "text": "Emergency message",
  "immediate": true,
  "waitForEnd": true
}

// Wait for playback to complete (synchronous processing)
{
  "text": "Wait for this audio playback to complete before next processing",
  "waitForEnd": true
}

// Add to queue but don't auto-play
{
  "text": "Wait for manual playback start",
  "immediate": false
}

Advanced Playback Control Features

Immediate Playback (immediate: true)

Play audio immediately by bypassing the queue:

  • Parallel operation with regular queue: Does not interfere with existing queue playback
  • Multiple simultaneous playback: Multiple immediate playbacks can run simultaneously
  • Ideal for urgent notifications: Prioritizes important messages

Synchronous Playback Control (waitForEnd: true)

Wait for playback completion to synchronize processing:

  • Sequential processing: Execute next processing after audio playback
  • Timing control: Enables coordination between audio and other processing
  • UI synchronization: Align screen display with audio timing
// Example 1: Play urgent message immediately and wait for completion
{
  "text": "Emergency! Please check immediately",
  "immediate": true,
  "waitForEnd": true
}

// Example 2: Step-by-step audio guide
{
  "text": "Step 1: Please open the file",
  "waitForEnd": true
}
// Next processing executes after the above audio completes

Other Tools

  • generate_query - Generate query for speech synthesis
  • synthesize_file - Generate audio file
  • stop_speaker - Stop playback and clear queue
  • get_speakers - Get speaker list
  • get_speaker_detail - Get speaker details

Package Structure

@kajidog/mcp-tts-voicevox (this package)

  • MCP Server - Communicates with MCP clients like Claude Desktop
  • HTTP Server - Remote MCP communication via SSE/StreamableHTTP

@kajidog/voicevox-client (independent package)

  • General-purpose library - Communication functionality with VOICEVOX Engine
  • Cross-platform - Node.js and browser environment support
  • Advanced playback control - Immediate playback, synchronous playback, and queue management features

MCP Configuration Examples

Claude Desktop Configuration

Add the following configuration to your claude_desktop_config.json file:

{
  "mcpServers": {
    "tts-mcp": {
      "command": "npx",
      "args": ["-y", "@kajidog/mcp-tts-voicevox"]
    }
  }
}

When SSE Mode is Required

If you need speech synthesis in SSE mode, you can use mcp-remote for SSE↔Stdio conversion:

  1. Claude Desktop Configuration

    {
      "mcpServers": {
        "tts-mcp-proxy": {
          "command": "npx",
          "args": ["-y", "mcp-remote", "http://localhost:3000/sse"]
        }
      }
    }
    
  2. Starting SSE Server

    Mac/Linux:

    MCP_HTTP_MODE=true MCP_HTTP_PORT=3000 npx @kajidog/mcp-tts-voicevox
    

    Windows:

    $env:MCP_HTTP_MODE='true'; $env:MCP_HTTP_PORT='3000'; npx @kajidog/mcp-tts-voicevox
    

### AivisSpeech Configuration Example

```json
{
  "mcpServers": {
    "tts-mcp": {
      "command": "npx",
      "args": ["-y", "@kajidog/mcp-tts-voicevox"],
      "env": {
        "VOICEVOX_URL": "http://127.0.0.1:10101",
        "VOICEVOX_DEFAULT_SPEAKER": "888753764"
      }
    }
  }
}

Environment Variables

VOICEVOX Configuration

  • VOICEVOX_URL: VOICEVOX Engine URL (default: http://localhost:50021)
  • VOICEVOX_DEFAULT_SPEAKER: Default speaker ID (default: 1)
  • VOICEVOX_DEFAULT_SPEED_SCALE: Default playback speed (default: 1.0)

Playback Options Configuration

  • VOICEVOX_DEFAULT_IMMEDIATE: Whether to start playback immediately when added to queue (default: true)
  • VOICEVOX_DEFAULT_WAIT_FOR_START: Whether to wait for playback to start (default: false)
  • VOICEVOX_DEFAULT_WAIT_FOR_END: Whether to wait for playback to end (default: false)

Usage Examples:

# Example 1: Wait for completion for all audio playback (synchronous processing)
export VOICEVOX_DEFAULT_WAIT_FOR_END=true
npx @kajidog/mcp-tts-voicevox

# Example 2: Wait for both playback start and end
export VOICEVOX_DEFAULT_WAIT_FOR_START=true
export VOICEVOX_DEFAULT_WAIT_FOR_END=true
npx @kajidog/mcp-tts-voicevox

# Example 3: Manual control (disable auto-play)
export VOICEVOX_DEFAULT_IMMEDIATE=false
npx @kajidog/mcp-tts-voicevox

These options allow fine-grained control of audio playback behavior according to application requirements.

Server Configuration

  • MCP_HTTP_MODE: Enable HTTP server mode (set to true to enable)
  • MCP_HTTP_PORT: HTTP server port number (default: 3000)
  • MCP_HTTP_HOST: HTTP server host (default: 0.0.0.0)

Usage with WSL (Windows Subsystem for Linux)

Configuration method for connecting from WSL environment to Windows host MCP server.

1. Windows Host Configuration

Starting MCP server with AivisSpeech and PowerShell:

$env:MCP_HTTP_MODE='true'; $env:MCP_HTTP_PORT='3000'; $env:VOICEVOX_URL='http://127.0.0.1:10101'; $env:VOICEVOX_DEFAULT_SPEAKER='888753764'; npx @kajidog/mcp-tts-voicevox

2. WSL Environment Configuration

Check Windows host IP address:

# Get Windows host IP address from WSL
ip route show | grep default | awk '{print $3}'

Usually in the format 172.x.x.1.

Claude Code .mcp.json configuration example:

{
  "mcpServers": {
    "tts": {
      "type": "sse",
      "url": "http://172.29.176.1:3000/sse"
    }
  }
}

Important Points:

  • Within WSL, localhost or 127.0.0.1 refers to WSL internal, so cannot access Windows host services
  • Use WSL gateway IP (usually 172.x.x.1) to access Windows host
  • Ensure the port is not blocked by Windows firewall

Connection Test:

# Check connection to Windows host MCP server from WSL
curl http://172.29.176.1:3000

If normal, 404 Not Found will be returned (because root path doesn't exist).

Troubleshooting

Common Issues

  1. VOICEVOX Engine is not running

    curl http://localhost:50021/speakers
    
  2. Audio is not playing

    • Check system audio output device
    • Check platform-specific audio playback tools:
      • Linux: Requires one of aplay, paplay, play, ffplay
      • macOS: afplay (pre-installed)
      • Windows: PowerShell (pre-installed)
  3. Not recognized by MCP client

    • Check package installation: npm list -g @kajidog/mcp-tts-voicevox
    • Check JSON syntax in configuration file

License

ISC

MseeP.ai Security Assessment Badge

Developer Information

Instructions for developing this repository locally.

Setup

  1. Clone the repository:
    git clone https://github.com/kajidog/mcp-tts-voicevox.git
    cd mcp-tts-voicevox
    
  2. Install pnpm (if not already installed).
  3. Install dependencies:
    pnpm install
    

Main Development Commands

You can run the following commands in the project root.

  • Build all packages:
    pnpm build
    
  • Run all tests:
    pnpm test
    
  • Run all linters:
    pnpm lint
    
  • Start root server in development mode:
    pnpm dev
    
  • Start stdio interface in development mode:
    pnpm dev:stdio
    

These commands will also properly handle processing for related packages within the workspace.

Related in Communication - Secure MCP Servers

ServerSummaryActions
TelegramThe server is a bridge between the Telegram API and the AI assistants and is based on the Model Cont...View
Twilio Manager MCPA Model Context Protocol (MCP) implementation for managing Twilio resources. This package provides t...View
MCP Email ServerGmail と IMAP をサポートするメール管理用MCPサーバーView
AgentRPCAgentRPC is a universal RPC layer that allows AI agents to call functions across network boundaries...View
Wassenger🚀 Supercharge your WhatsApp automation driven by AI! Send messages, summarize conversations, and ma...View
EmailA Model Context Protocol server that provides email functionality. This server enables LLMs to compo...View