Speech-to-Text with IBM Watson and PHP

05 Feb 2018 in Tech

I needed a speech to text engine for a project that I'm working on at Nexmo, and we settled on the IBM Watson speech-to-text engine.

Unfortunately, IBM don't have an official PHP client library and the ones that I could find on GitHub didn't support speech-to-text. Here's how I took an mp3 and retrieved the text within it

First, sign up for IBM Bluemix. It may have already created you a speech-to-text project. If so, click on show at next to credentials at the bottom and make a note of the username and password - we'll need them later.

If it didn't, you'll need to do the following

Click on Create project in the top right
Click Get Watson Services
Select Speech to Text
Click Add Services on the right
Give your project a name
Click Create a project

Once you have a your credentials, it's time to make the request. We're going to use Guzzle, so let's install it first:

bash
composer require guzzlehttp/guzzle

Once it's installed, we can send the file off to IBM by creating a client and making a POST request to https://stream.watsonplatform.net/spech-to-text/api/v1/recognize using the credentials that we make a note of earlier.

php
<?php
require_once 'vendor/autoload.php';
$client = new GuzzleHttp\Client([
   'base_uri' => 'https://stream.watsonplatform.net/'
]);
$audio = fopen('/path/to/audio.mp3', 'r');
$resp = $client->request('POST', 'speech-to-text/api/v1/recognize', [
   'auth' => ['username', 'password'],
   'headers' => [
      'Content-Type' => 'audio/mpeg',
   ],
   'body' => $audio
]);
echo $resp->getBody();

The IBM API will respond with a JSON document containing the transcribed text. Here's the response from my test call where I said "Hello Watson, this is a test, will you transcribe this correctly for me please"

json
{
  "results": [
    {
      "alternatives": [
        {
          "confidence": 0.55,
          "transcript": "hello Watson "
        }
      ],
      "final": true
    },
    {
      "alternatives": [
        {
          "confidence": 0.998,
          "transcript": "this is a test "
        }
      ],
      "final": true
    },
    {
      "alternatives": [
        {
          "confidence": 0.735,
          "transcript": "were you transcribe this correctly for me please "
        }
      ],
      "final": true
    }
  ],
  "result_index": 0
}

Overall I think it did pretty well, considering that I was speaking fairly quickly in to the default laptop microphone.