I needed a very simple Twitter cache for a project I’m working on. And I was very happy to trade off some realtime accuracy for reliability. In addition to caching the tweets, I also needed to pre-process them into css-able html with clickable links, usernames, and hashtags. The web had a few nice examples of how to use regular expressions to parse the raw tweet text, but I decided to take what I liked and do the rest myself.
Links
Here’s the PHP code for parsing links out of the raw tweet text:
$text = preg_replace( '@(https?://([-\w\.]+)+(/([\w/_\.]*(\?\S+)?(#\S+)?)?)?)@', '<a href="$1">$1</a>', $text);
I only wanted http and https links, with an optional query part (\?\S+)? and an option anchor part (#\S+)?. The conversion of a text link into an html link is done using back references, which in PHP is $1, $2, etc. In the expression above, I use $1 twice to put the matched link into both the href attribute and the link text.
Users
Here’s the PHP code for parsing Twitter usernames:
$text = preg_replace( '/@(\w+)/', '<a href="http://twitter.com/$1">@$1</a>', $text);
Nothing special, just take the @ and all following word characters (letters, digits, and underscores), and turn it into a user link.
Hashtags
Here’s the PHP code for parsing Twitter hashtags:
$text = preg_replace( '/\s+#(\w+)/', ' <a href="http://search.twitter.com/search?q=%23$1">#$1</a>', $text);
Getting the hashtags right was the most tricky of the three. I decided to only grab hashtags that were proceeded by one or more spaces. The real magic is the %23 in the query string, which forces a search on the complete hashtag, including the # part. For example, compare a search for #flex to a search for flex.
The Cache
The cache is just a simple cron job that periodically queries Twitter and retrieves the latest tweets. Most importantly, the cache fails gracefully if Twitter is inaccessible, which it does by doing exactly nothing if Twitter is down. This guarantees that my app always has valid data (when my server is up, the cache is up too), but with the possibility that the data is a little old.
Here’s the notable function in the cache:
function getTweets($user, $num = 3) { //first, get the user's timeline $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, "http://twitter.com/statuses/user_timeline/$user.json?count=$num"); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); $json = curl_exec($ch); curl_close($ch); if ($json === false) { return false; } //abort on error //second, convert the resulting json into PHP $result = json_decode($json); //third, build up the html output $s = ''; foreach ($result as $item) { //handle any special characters $text = htmlentities($item->text, ENT_QUOTES, 'utf-8'); //build the metadata part $meta = date('g:ia M jS', strtotime($item->created_at)) . ' from ' . $item->source; //parse the tweet text into html $text = preg_replace('@(https?://([-\w\.]+)+(/([\w/_\.]*(\?\S+)?(#\S+)?)?)?)@', '<a href="$1">$1</a>', $text); $text = preg_replace('/@(\w+)/', '<a href="http://twitter.com/$1">@$1</a>', $text); $text = preg_replace('/\s#(\w+)/', ' <a href="http://search.twitter.com/search?q=%23$1">#$1</a>', $text); //assemble everything $s .= '<p class="tweet">' . $text . "<br />\n" . '<span class="tweet-meta">' . $meta . "</span></p>\n"; } return $s; }
First, we query the user’s JSON timeline using cURL. Second, we use PHP’s awesome json_decode function to convert the JSON into objects. And lastly, we iterate over the tweets and parse everything into our desired HTML output.
Here some sample output from my twitter feed:
<p class="tweet">Been reading Programming Goggle App Engine. Actually feeling dumber now than before I started. Too much to learn.<br /> <span class="tweet-meta">2:58pm Feb 14th from <a href="http://www.tweetdeck.com/" rel="nofollow">TweetDeck</a></span></p> <p class="tweet">Blog Post :: Async Testing with FlexUnit 4 :: <a href="http://bit.ly/cGLnaI">http://bit.ly/cGLnaI</a><br /> <span class="tweet-meta">3:33pm Feb 11th from <a href="http://www.tweetdeck.com/" rel="nofollow">TweetDeck</a></span></p> <p class="tweet">Blog Post :: A Better HTML Template for Flex 4 :: <a href="http://bit.ly/70DLsj">http://bit.ly/70DLsj</a><br /> <span class="tweet-meta">12:55pm Jan 25th from <a href="http://www.tweetdeck.com/" rel="nofollow">TweetDeck</a></span></p>
Once I have the output, I can do whatever I want with it: save to disk, stick it in the database, keep it in memory, cache it in memcache, etc. In my case, I wanted the simplest possible option, so I chose to write it out as a static html file.
The end. The rest of the app’s not ready yet…
