A few weeks back I was fortunate enough to have a few pints with Nick Payne, the developer behind faavorite. We chatted about a few things, but it always ended up coming back to either tech stacks, or Twitter's API. Nick mentioned that he had been meaning to write up the faavorite tech stack and I thought that I should do the same for TweetDig. Now that Nick's published his post on faavorite, it's about time that I wrote one for us.
Just in case you don't want to read the entire thing, here's the executive summary: Linux, nginx, PHP-FPM, NodeJS, Redis, mySQL, BackboneJS, Handlebars.js
I thought this would be an easy one. Both Steve and I knew PHP so that was a shoe-in, and what better accompaniment for PHP than a nice mySQL database? We quickly realised that whilst PHP was great for pulling data out of the database and getting it exposed, it just wasn't going to cut it when working with a streaming connection.
I'd been meaning to play with NodeJS for a while and it's event driven model mapped pretty much perfectly onto the streaming API so I decided to give it a go. Thanks to awesome libraries like ntwitter and node-mysql I was connected and saving data almost instantly. However, just receiving the raw data from Twitter isn't the nicest way to work, so I spent a few hours developing TDTwitterStream. It sits on top of ntwitter and emits events depending on what data it receives, meaning that your code now only has to listen for events rather than inspect the data that Twitter sends through the stream.
So, we had PHP, mySQL and Node all playing together nicely. It was working, but it didn't really handle failure very well, so we added a layer in between the stream and our database that consists of a Redis server at the minute. This gives us a nice little buffer in case anything goes wrong, and leads nicely into the next section…
When designing the architecture, I had one thing in mind. I wanted to be able to scale any component of the system individually. Having a few decoupled systems and making stateless requests makes this very easy to do. In my mind, there are 5 main components to TweetDig:
Get the tweets from Twitter
Store the tweet + tweeter in the database
Run the tweet through the filters created by everyone that saw that tweet
Expose the data via an API
Show the data on screen for a user to use
Or as a flow chain (sorry, no fancy diagrams from me): Twitter->Node->Redis->Node->mySQL->PHP->Interface.
Separating the system into distinct subsystems has various advantages. The most important one to me is that they are all interchangeable. At the moment the API is written in PHP. Should PHP start struggling to serve all of the incoming requests one day, we could quite happily rewrite the API in another language and nothing else in the system would be affected. If NodeJS suddenly becomes a terrible platform, we could swap it out for Python using something like Twisted.
There's an awesome side effect to this architecture that's actually rather useful. We can disable any section of it and none of the others would be affected. A few weeks ago we had a database issue (we ran out of HDD space) and instead of losing all of the data like we normally would have, the process that filters tweets simply stopped trying to connect to the DB and the incoming tweets were buffered into Redis but not taken out again. 10 minutes later (once I'd taken a snapshot of the HDD and put it onto a 250gb volume) I restarted the filtering system and the count in Redis started to drop. One potentially disasterous mistake, and no lost data at all. All our users saw was a maintenance message for 10 minutes and slightly delayed tweets for 10 minutes or so after the site became available again.
Keeping it all alive
The downside to having all of these distributed systems is making sure that they're all running as they should be. The database issue occurred because we didn't have sufficient server monitoring in place. I'll be one of the first to admit that I'm not a DevOps guy. I can install and configure a server with nginx/PHP-FPM/NodeJS/Redis/mySQL and do some basic tuning (slow query log and the like), but I'm far from comfortable doing it all. After doing some reading around, it looked like nagios and statsd along with graphite was the way to go. It also looked very complicated.
Then, almost as though the world knew what I was working on, someone tweeted a link to Amon.cx just as I was starting to give up for the day. Amon is awesome. A one line install script, and automated agents for any server that you want to manage. Just install, give them a server identifier code and it just works. It monitors CPU, LoadAvg, RAM, HDD and Network by default, along with any named processes that you tell it to watch.
Once the servers were covered, there was only one more thing to take care of. There's a few conditions under which the node processes would shut down, the most obvious of which is if it crashes. To make sure that the system continues as normal after a crash, we run all our node scripts under Forever. Should a crash occur, a stack trace is sent over to Amon (it even does event logging!) and the process is restarted and picks up where it left off from.
Finally, the stuff that people actually get to see! If we've made it this far, we're successfully collecting and exposing data to be using in some kind of interface. Again, the technology choice for the frontend was half valid reasons, half an excuse to play with some new technology.
The frontend site is powered by BackboneJS and uses Handlebars.js for templating. Backbone is pretty awesome, but there's quite a lot of things that I've done that aren't as "correct" as they could be. On the flip side, there's a few things that I think Backbone gets wrong, but I can just extend the basic Backbone.Controller and modify it as I see fit. I could go on forever about Backbone, but I think I'll save that for another post.
We send an authentication token with every request to the API to identify the user as it's a stateless API. To do this, we use jQuery's $.ajaxSetup to set some default headers, including the auth token and default content type. API responses are namespaced into summary, data and error. Backbone handles the parsing of this response, the showing of the error message if it's present and parses the data attribute into whatever collection or model the request was made from.
Once the data is in a model, we use Handlebars to render the data in a predefined template. Using logicless templates has been an awesome learning experience. Keeping all of the data processing in the model means that no matter what context I use the model in, all of the additional metadata that's been generated on the client side is available.
There's still a lot of work to be done on the frontend (and the backend) in regards to caching, deferred loading of data and keeping the data in the interface consistent. For now though, it works well enough! The site is online, filtering approximately a million tweets a day without too much trouble. I'm using it as my primary twitter client (in conjunction to a prototype mobile app). Now we just need to work on making it more responsive and easier to use for everyone that didn't play a part in developing it :)
One last thing
The thing I'm most impressed about with faavorite is that it's all running on a single box. We're currently running our own mini EC2 cloud, consisting of one instance for Node processes (Small-Extra CPU), one for the frontend (Backbone) + backend (PHP) (Micro), one for Redis (Small) and one for the database (Large). That adds up to just over 11.5gb RAM and 5 and a half cores. The large DB instance is a forward-planning thing. We don't need all 8gb of RAM yet, but I'd rather have it and not need it, than need it and not have it. To think that faavorite is running on a single quad core 2GB VM is just insane.
I learned quite a lot about NodeJS, Backbone and Amazon's PaaS offering whilst building Tweetdig, and I think there's quite a lot in there that might come in useful to other people, including one rather nasty Heisenbug that I encountered with Node and EC2. That'll have to wait until another time though :)