Speech-to-Text with IBM Watson and PHP

I needed a speech to text engine for a project that I’m working on at Nexmo, and we settled on the IBM Watson speech-to-text engine.

Unfortunately, IBM don’t have an official PHP client library and the ones that I could find on Github didn’t support speech-to-text. Here’s how I took an mp3 and retrieved the text within it

First, sign up for IBM Bluemix. It may have already created you a speech-to-text project. If so, click on show at next to credentials at the bottom and make a note of the username and password – we’ll need them later.

If it didn’t, you’ll need to do the following

  • Click on Create project in the top right
  • Click Get Watson Services
  • Select Speech to Text
  • Click Add Services on the right
  • Give your project a name
  • Click Create a project

Once you have a your credentials, it’s time to make the request. We’re going to use Guzzle, so let’s install it first:

composer require guzzlehttp/guzzle

Once it’s installed, we can send the file off to IBM by creating a client and making a POST request to https://stream.watsonplatform.net/spech-to-text/api/v1/recognize using the credentials that we make a note of earlier.

<?php

require_once 'vendor/autoload.php';

$client = new GuzzleHttp\Client([
    'base_uri' => 'https://stream.watsonplatform.net/'
]);

$audio = fopen('/path/to/audio.mp3', 'r');
$resp = $client->request('POST', 'speech-to-text/api/v1/recognize', [
    'auth' => ['username', 'password'],
    'headers' => [
        'Content-Type' => 'audio/mpeg',
    ],
    'body' => $audio
]);

echo $resp->getBody();

The IBM API will respond with a JSON document containing the transcribed text. Here’s the response from my test call where I said “Hello Watson, this is a test, will you transcribe this correctly for me please

 {
   "results": [
      {
         "alternatives": [
            {
               "confidence": 0.55,
               "transcript": "hello Watson "
            }
         ],
         "final": true
      },
      {
         "alternatives": [
            {
               "confidence": 0.998,
               "transcript": "this is a test "
            }
         ],
         "final": true
      },
      {
         "alternatives": [
            {
               "confidence": 0.735,
               "transcript": "were you transcribe this correctly for me please "
            }
         ],
         "final": true
      }
   ],
   "result_index": 0
}

Overall I think it did pretty well, considering that I was speaking fairly quickly in to the default laptop microphone.

Michael is a polyglot software engineer, committed to reducing complexity in systems and making them more predictable. Working with a variety of languages and tools, he shares his technical expertise to audiences all around the world at user groups and conferences. You can follow @mheap on Twitter

Thoughts on this post

Leave a comment?

Leave a Reply