Speech-to-Text with IBM Watson and PHP

05 Feb 2018 in Development

I needed a speech to text engine for a project that I'm working on at Nexmo, and we settled on the IBM Watson speech-to-text engine.

Unfortunately, IBM don't have an official PHP client library and the ones that I could find on GitHub didn't support speech-to-text. Here's how I took an mp3 and retrieved the text within it

First, sign up for IBM Bluemix. It may have already created you a speech-to-text project. If so, click on show at next to credentials at the bottom and make a note of the username and password - we'll need them later.

If it didn't, you'll need to do the following

  • Click on Create project in the top right

  • Click Get Watson Services

  • Select Speech to Text

  • Click Add Services on the right

  • Give your project a name

  • Click Create a project

Once you have a your credentials, it's time to make the request. We're going to use Guzzle, so let's install it first:

composer require guzzlehttp/guzzle

Once it's installed, we can send the file off to IBM by creating a client and making a POST request to https://stream.watsonplatform.net/spech-to-text/api/v1/recognize using the credentials that we make a note of earlier.

<?php
require_once 'vendor/autoload.php';

$client = new GuzzleHttp\Client([
'base_uri' => 'https://stream.watsonplatform.net/'
]);

$audio = fopen('/path/to/audio.mp3', 'r');
$resp = $client->request('POST', 'speech-to-text/api/v1/recognize', [
'auth' => ['username', 'password'],
'headers' => [
'Content-Type' => 'audio/mpeg',
],
'body' => $audio
]);

echo $resp->getBody();

The IBM API will respond with a JSON document containing the transcribed text. Here's the response from my test call where I said "Hello Watson, this is a test, will you transcribe this correctly for me please"

{
"results": [
{
"alternatives": [
{
"confidence": 0.55,
"transcript": "hello Watson "
}
],
"final": true
},
{
"alternatives": [
{
"confidence": 0.998,
"transcript": "this is a test "
}
],
"final": true
},
{
"alternatives": [
{
"confidence": 0.735,
"transcript": "were you transcribe this correctly for me please "
}
],
"final": true
}
],
"result_index": 0
}

Overall I think it did pretty well, considering that I was speaking fairly quickly in to the default laptop microphone.