rospeex is a cloud-based speech communication toolkit for ROS (Robot Operating System). Rospeex supports speech recognition/synthesis in English, Chinese, Japanese, and Korean languages.

You can write a simple dialogue function with only 10 lines of codes in Python/C++.

Installation information is available at the official page.


Speech recognition without rospeex

Our cloud-based speech recognition service is also available for non-ROS users.

  • Academic use only. If you'd like to use this service for commercial purpose, please contact me for licensing information.
  • Absolutely no warranty.

Sample code in C++, Sample code in Python

# -*- coding: utf-8 -*-
Usage: python sample.py input.wav
import sys
import base64
import json
import urllib2

# Cloud-based speech recognition URL
URL ='http://rospeex.nict.go.jp/nauth_json/jsServices/VoiceTraSR'

def read_wavfile(filename):
    with open(filename,'rb') as rf:
        wav = rf.read()
    return wav

def post_to_recognizer(wav):
    buf = base64.b64encode(wav)
    json_data = { "method":"recognize",
                  "params":( "ja",
                             {"audio":buf, "audioType":"audio/x-wav", "voiceType":"*" } ) }
    json_obj = json.dumps(json_data)
    req = urllib2.Request(URL, json_obj)
    cont = urllib2.urlopen(req).read()
    return cont

def print_text(json_str):
    json_obj = json.loads(json_str)
    print json_obj['result'].encode('utf-8')

if __name__=='__main__':
    argv = sys.argv
    wav = read_wavfile(argv[1])
    recognition_result = post_to_recognizer(wav)



Non-monologue speech synthesis for service robots

You can try our cloud-based speech synthesis system here.

  • INCOMPATIBLE with IE/Safari. Compatible with Firefox and Google Chrome.
  • Non-commercial use only.
  • Absolutely no warranty.

Sample code in C++ and Python:

#!/usr/bin/env python2
# coding: utf-8
Python2.7 sample code for rospeex TTS
import base64
import urllib2
import json

URL = 'http://rospeex.nict.go.jp/nauth_json/jsServices/VoiceTraSS'

def main():
    databody = {"method": "speak",
                "params": ["1.1",
                          {"language": "ja", "text": "こんにちは", "voiceType": "*", "audioType": "audio/x-wav"}]}
    request = urllib2.Request(URL, json.dumps(databody))
    response = urllib2.urlopen(request).read()
    tmp = json.loads(response)['result']['audio']
    wav = base64.decodestring(tmp.encode('utf-8'))

    with open("out.wav", "wb") as f:

if __name__ == "__main__":

iPhone App "Kyo no Osusume"

Discover your own Kyoto in a unique way with Kyo-no-Osusume! Just let it know what you feel like and/or what about Kyoto you want to experience. It picks recommended destinations for you based on a questionnaire database from 4000 people.



RoboCup 2011 Istanbul Noise Database

These databases can be used for testing your robot's speech recognition system. Play these files at 75dB (very noisy) and you can simulate noise conditions at typical RoboCup@Home environments.

Who Is Who (1h42m, 188MB)
Enhanced Who Is Who (1h44m, 192MB)
Shopping Mall (0h28m, 52MB)

Sentence Generator 2010 for the General Purpose Service Robots Test

In the GPSR test, the order of the task is not predefined. The task is randomly given on site as a speech command, which is a complex sentence. The sentence generator 2010 generates random commands according to defined grammar.


  • Go to the back door, grasp the chips, and bring it to the armchair.
  • Go to the dining table, introduce yourself, and leave the apartment.
  • Find a person, bring the yoghurt from the closet, and leave the apartment.