TL;DR: Google’s SDK sucks. Here’s how to insert things into BigQuery via Google Cloud Storage. If you want to POST directly into BigQuery, you can’t do it using the SDK. More info on that coming shortly…
Last week, I was (un?)fortunate enough to get chance to play with Google BigQuery. I wanted to interact with it programatically, so I went in search of the documentation. After a bit of reading, I worked out I needed to use a service account, and downloaded the p12 key file that I needed.
I quickly realised that the easiest way to get up and running would be to use the PHP Google SDK. (This wasn’t the case). The SDK is incomplete, the PHP one is undocumented and the examples provided don’t include a BigQuery example. I ended up finding some Java documentation and thankfully, they ported the libraries almost verbatim so I could use it as a framework to put together an example.
So, before I ramble on any more, some code!
Creating a client
The first thing we need to do is create a client and authenticate.
<?php // Require our SDK require_once 'google-api-php-client/src/Google_Client.php'; require_once 'google-api-php-client/src/contrib/Google_BigqueryService.php'; // Define some stuff we're gonna use later (change for your details) define("CLIENT_ID", '1234567890.apps.googleusercontent.com'); define("SERVICE_ACCOUNT_NAME", '[email protected]'); define("KEY_FILE", './key.p12'); define("PROJECT_ID", 1234567890); define("DATASET_ID, "ImportTest"); define("TABLE_ID, "TableOne"); // Create a client for making the requests $client = new Google_Client(); $client->setApplicationName("Google BigQuery Test"); // Load the key in PKCS 12 format (you need to download this from the // Google API Console when the service account was created.) $key = file_get_contents(KEY_FILE); $client->setAssertionCredentials(new Google_AssertionCredentials( SERVICE_ACCOUNT_NAME, array('https://www.googleapis.com/auth/bigquery', 'https://www.googleapis.com/auth/devstorage.read_write'), $key) ); $client->setClientId(CLIENT_ID);
Once you have a client, it’s quite easy to get various different things, here’s a few examples:
$service = new Google_BigqueryService($client); // List the tables in "ImportTest" $service->tables->listTables(PROJECT_ID, "ImportTest"); // Get the table "TableOne" in "ImportTest"'s details $service->tables->get(PROJECT_ID, "ImportTest", "TableOne"); // Get the data from "TableOne" $service->tabledata->listTabledata(PROJECT_ID, "ImportTest", "TableOne");
Inserting data’s a bit harder. They have a ridiculously convoluted way of setting it up, but here’s an example that works:
// Create our job $job = new Google_Job; // Create a job reference and assign it $jobRef = new Google_JobReference; $jobRef->setProjectId(PROJECT_ID); $job->setJobReference($jobRef); // What's our destination table? $dest = new Google_TableReference; $dest->setProjectId(PROJECT_ID); $dest->setDatasetId(DATASET_ID); $dest->setTableId(TABLE_ID); // What's our schema? $f1 = new Google_TableFieldSchema; $f1->setType("string"); $f1->setName("kind"); $f2 = new Google_TableFieldSchema; $f2->setType("string"); $f2->setName("fullName"); // Create a tableschema $schema = new Google_TableSchema; $schema->setFields(array($f1, $f2)); // We're performing the 'load' method with JSON data, with some sensible options (like append) $confLoad = new Google_JobConfigurationLoad; $confLoad->setSchema($schema); $confLoad->setCreateDisposition("CREATE_IF_NEEDED"); $confLoad->setWriteDisposition("WRITE_APPEND"); $confLoad->setDestinationTable($dest); $confLoad->sourceFormat = 'NEWLINE_DELIMITED_JSON'; // Our data sources, these come from Google Cloud Storage $confLoad->setSourceUris(array( // File paths here "gs://json_file_bucket/some_name.json" )); // Create a job configuration $conf = new Google_JobConfiguration; $conf->setLoad($confLoad); // Pass our config into the job $job->setConfiguration($conf); // Run the insert (this is asynchronous) $service = new Google_BigQueryService($client); $running = $service->jobs->insert(PROJECT_ID, $job); // Show some details echo "Inserting".PHP_EOL; echo "Job ID:". $running->jobReference['jobId'].PHP_EOL;
As the job is run asynchronously, we need to query the API to see if the job completed. I tend to use the developer console to view the job’s progress, along with any errors that may arise. If you get errors about things being malformed, you can try inserting via the console to see if it’s your data or your code.
- https://code.google.com/apis/console/b/0/ – Google API console
- https://storage.cloud.google.com/ – Google Cloud Storage manager
- https://developers.google.com/bigquery/docs/reference/v2/jobs#resource – An explanation of the request to send
- https://developers.google.com/bigquery/docs/reference/v2/jobs/list#try-it – View job history, filter by state (done, running, pending)
- https://developers.google.com/apis-explorer/#s/bigquery/v2/bigquery.jobs.insert – Insert test data via the API
- https://developers.google.com/apis-explorer/#s/bigquery/v2/bigquery.jobs.get – Get a specific job to inspect it’s details
Michael is a polyglot software engineer, committed to reducing complexity in systems and making them more predictable. Working with a variety of languages and tools, he shares his technical expertise to audiences all around the world at user groups and conferences. You can follow @mheap on Twitter