Speech-to-Text with IBM Watson and PHP
I needed a speech to text engine for a project that I'm working on at Nexmo, and we settled on the IBM Watson speech-to-text engine.
Unfortunately, IBM don't have an official PHP client library and the ones that I could find on GitHub didn't support speech-to-text. Here's how I took an mp3
and retrieved the text within it
First, sign up for IBM Bluemix. It may have already created you a speech-to-text project. If so, click on show at next to credentials at the bottom and make a note of the username and password - we'll need them later.
If it didn't, you'll need to do the following
Click on Create project in the top right
Click Get Watson Services
Select Speech to Text
Click Add Services on the right
Give your project a name
Click Create a project
Once you have a your credentials, it's time to make the request. We're going to use Guzzle, so let's install it first:
composer require guzzlehttp/guzzle
Once it's installed, we can send the file off to IBM by creating a client and making a POST request to https://stream.watsonplatform.net/spech-to-text/api/v1/recognize
using the credentials that we make a note of earlier.
<?phprequire_once 'vendor/autoload.php';$client = new GuzzleHttp\Client(['base_uri' => 'https://stream.watsonplatform.net/']);$audio = fopen('/path/to/audio.mp3', 'r');$resp = $client->request('POST', 'speech-to-text/api/v1/recognize', ['auth' => ['username', 'password'],'headers' => ['Content-Type' => 'audio/mpeg',],'body' => $audio]);echo $resp->getBody();
The IBM API will respond with a JSON document containing the transcribed text. Here's the response from my test call where I said "Hello Watson, this is a test, will you transcribe this correctly for me please"
{"results": [{"alternatives": [{"confidence": 0.55,"transcript": "hello Watson "}],"final": true},{"alternatives": [{"confidence": 0.998,"transcript": "this is a test "}],"final": true},{"alternatives": [{"confidence": 0.735,"transcript": "were you transcribe this correctly for me please "}],"final": true}],"result_index": 0}
Overall I think it did pretty well, considering that I was speaking fairly quickly in to the default laptop microphone.