Using the PHP SDK with Google BigQuery

19 Nov 2012 in TIL

TL;DR: Google's SDK sucks. Here's how to insert things into BigQuery via Google Cloud Storage. If you want to POST directly into BigQuery, you can't do it using the SDK. More info on that coming shortly...

Last week, I was (un?)fortunate enough to get chance to play with Google BigQuery. I wanted to interact with it programatically, so I went in search of the documentation. After a bit of reading, I worked out I needed to use a service account, and downloaded the p12 key file that I needed.

I quickly realised that the easiest way to get up and running would be to use the PHP Google SDK. (This wasn't the case). The SDK is incomplete, the PHP one is undocumented and the examples provided don't include a BigQuery example. I ended up finding some Java documentation and thankfully, they ported the libraries almost verbatim so I could use it as a framework to put together an example.

So, before I ramble on any more, some code!

Creating a client

The first thing we need to do is create a client and authenticate.

php
<?php
// Require our SDK
require_once 'google-api-php-client/src/Google_Client.php';
require_once 'google-api-php-client/src/contrib/Google_BigqueryService.php';
// Define some stuff we're gonna use later (change for your details)
define("CLIENT_ID", '1234567890.apps.googleusercontent.com');
define("SERVICE_ACCOUNT_NAME", '[email protected]');
define("KEY_FILE", './key.p12');
define("PROJECT_ID", 1234567890);
define("DATASET_ID, "ImportTest");
define("TABLE_ID, "TableOne");
// Create a client for making the requests
$client = new Google_Client();
$client->setApplicationName("Google BigQuery Test");
// Load the key in PKCS 12 format (you need to download this from the
// Google API Console when the service account was created.)
$key = file_get_contents(KEY_FILE);
$client->setAssertionCredentials(new Google_AssertionCredentials(
SERVICE_ACCOUNT_NAME,
array('https://www.googleapis.com/auth/bigquery', 'https://www.googleapis.com/auth/devstorage.read_write'),
$key)
);
$client->setClientId(CLIENT_ID);

GET

Once you have a client, it's quite easy to get various different things, here's a few examples:

php
$service = new Google_BigqueryService($client);
// List the tables in "ImportTest"
$service->tables->listTables(PROJECT_ID, "ImportTest");
// Get the table "TableOne" in "ImportTest"'s details
$service->tables->get(PROJECT_ID, "ImportTest", "TableOne");
// Get the data from "TableOne"
$service->tabledata->listTabledata(PROJECT_ID, "ImportTest", "TableOne");

POST

Inserting data's a bit harder. They have a ridiculously convoluted way of setting it up, but here's an example that works:

php
// Create our job
$job = new Google_Job;
// Create a job reference and assign it
$jobRef = new Google_JobReference;
$jobRef->setProjectId(PROJECT_ID);
$job->setJobReference($jobRef);
// What's our destination table?
$dest = new Google_TableReference;
$dest->setProjectId(PROJECT_ID);
$dest->setDatasetId(DATASET_ID);
$dest->setTableId(TABLE_ID);
// What's our schema?
$f1 = new Google_TableFieldSchema;
$f1->setType("string");
$f1->setName("kind");
$f2 = new Google_TableFieldSchema;
$f2->setType("string");
$f2->setName("fullName");
// Create a tableschema
$schema = new Google_TableSchema;
$schema->setFields(array($f1, $f2));
// We're performing the 'load' method with JSON data, with some sensible options (like append)
$confLoad = new Google_JobConfigurationLoad;
$confLoad->setSchema($schema);
$confLoad->setCreateDisposition("CREATE_IF_NEEDED");
$confLoad->setWriteDisposition("WRITE_APPEND");
$confLoad->setDestinationTable($dest);
$confLoad->sourceFormat = 'NEWLINE_DELIMITED_JSON';
// Our data sources, these come from Google Cloud Storage
$confLoad->setSourceUris(array(
// File paths here
"gs://json_file_bucket/some_name.json"
));
// Create a job configuration
$conf = new Google_JobConfiguration;
$conf->setLoad($confLoad);
// Pass our config into the job
$job->setConfiguration($conf);
// Run the insert (this is asynchronous)
$service = new Google_BigQueryService($client);
$running = $service->jobs->insert(PROJECT_ID, $job);
// Show some details
echo "Inserting".PHP_EOL;
echo "Job ID:". $running->jobReference['jobId'].PHP_EOL;

As the job is run asynchronously, we need to query the API to see if the job completed. I tend to use the developer console to view the job's progress, along with any errors that may arise. If you get errors about things being malformed, you can try inserting via the console to see if it's your data or your code.

Useful links:

BigQuery Links: