Saturnboy
 2.23

I needed a very simple Twitter cache for a project I’m working on. And I was very happy to trade off some realtime accuracy for reliability. In addition to caching the tweets, I also needed to pre-process them into css-able html with clickable links, usernames, and hashtags. The web had a few nice examples of how to use regular expressions to parse the raw tweet text, but I decided to take what I liked and do the rest myself.

Links

Here’s the PHP code for parsing links out of the raw tweet text:

$text = preg_replace(
    '@(https?://([-\w\.]+)+(/([\w/_\.]*(\?\S+)?(#\S+)?)?)?)@',
     '<a href="$1">$1</a>',
    $text);

I only wanted http and https links, with an optional query part (\?\S+)? and an option anchor part (#\S+)?. The conversion of a text link into an html link is done using back references, which in PHP is $1, $2, etc. In the expression above, I use $1 twice to put the matched link into both the href attribute and the link text.

Users

Here’s the PHP code for parsing Twitter usernames:

$text = preg_replace(
    '/@(\w+)/',
    '<a href="http://twitter.com/$1">@$1</a>',
    $text);

Nothing special, just take the @ and all following word characters (letters, digits, and underscores), and turn it into a user link.

Hashtags

Here’s the PHP code for parsing Twitter hashtags:

$text = preg_replace(
    '/\s+#(\w+)/',
    ' <a href="http://search.twitter.com/search?q=%23$1">#$1</a>',
    $text);

Getting the hashtags right was the most tricky of the three. I decided to only grab hashtags that were proceeded by one or more spaces. The real magic is the %23 in the query string, which forces a search on the complete hashtag, including the # part. For example, compare a search for #flex to a search for flex.

The Cache

The cache is just a simple cron job that periodically queries Twitter and retrieves the latest tweets. Most importantly, the cache fails gracefully if Twitter is inaccessible, which it does by doing exactly nothing if Twitter is down. This guarantees that my app always has valid data (when my server is up, the cache is up too), but with the possibility that the data is a little old.

Here’s the notable function in the cache:

function getTweets($user, $num = 3) {
    //first, get the user's timeline
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, "http://twitter.com/statuses/user_timeline/$user.json?count=$num");
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $json = curl_exec($ch);
    curl_close($ch);
 
    if ($json === false) { return false; } //abort on error
 
    //second, convert the resulting json into PHP
    $result = json_decode($json);
 
    //third, build up the html output
    $s = '';
    foreach ($result as $item) {
        //handle any special characters
        $text = htmlentities($item->text, ENT_QUOTES, 'utf-8');
 
        //build the metadata part
        $meta = date('g:ia M jS', strtotime($item->created_at)) . ' from ' . $item->source;
 
        //parse the tweet text into html
        $text = preg_replace('@(https?://([-\w\.]+)+(/([\w/_\.]*(\?\S+)?(#\S+)?)?)?)@', '<a href="$1">$1</a>', $text);
        $text = preg_replace('/@(\w+)/', '<a href="http://twitter.com/$1">@$1</a>', $text);
        $text = preg_replace('/\s#(\w+)/', ' <a href="http://search.twitter.com/search?q=%23$1">#$1</a>', $text);
 
        //assemble everything
        $s .= '<p class="tweet">' . $text . "<br />\n" . '<span class="tweet-meta">' . $meta . "</span></p>\n";
    }
 
    return $s;
}

First, we query the user’s JSON timeline using cURL. Second, we use PHP’s awesome json_decode function to convert the JSON into objects. And lastly, we iterate over the tweets and parse everything into our desired HTML output.

Here some sample output from my twitter feed:

<p class="tweet">Been reading Programming Goggle App Engine. Actually feeling dumber now than before I started. Too much to learn.<br /> 
<span class="tweet-meta">2:58pm Feb 14th from <a href="http://www.tweetdeck.com/" rel="nofollow">TweetDeck</a></span></p>
 
<p class="tweet">Blog Post :: Async Testing with FlexUnit 4 :: <a href="http://bit.ly/cGLnaI">http://bit.ly/cGLnaI</a><br /> 
<span class="tweet-meta">3:33pm Feb 11th from <a href="http://www.tweetdeck.com/" rel="nofollow">TweetDeck</a></span></p>
 
<p class="tweet">Blog Post :: A Better HTML Template for Flex 4 :: <a href="http://bit.ly/70DLsj">http://bit.ly/70DLsj</a><br /> 
<span class="tweet-meta">12:55pm Jan 25th from <a href="http://www.tweetdeck.com/" rel="nofollow">TweetDeck</a></span></p>

Once I have the output, I can do whatever I want with it: save to disk, stick it in the database, keep it in memory, cache it in memcache, etc. In my case, I wanted the simplest possible option, so I chose to write it out as a static html file.

The end. The rest of the app’s not ready yet…


 1.12
Code
off

Recently, I was caught by a special characters vs. html entities issue in Flex 4. For reference, you can read more about special characters in the text property of a text component in the official docs. And also here on Flex Examples. Unfortunately, neither of these was exactly what I was looking for.

The Problem

I had an array of names in ActionScript that potentially contained special characters, and I wanted to output them in a spark List component. Nothing magical required, just get them on the screen.

Solution #1

One option, which I’ve NEVER seen in anyone’s code ever, is to move the definition of the array into its own Script tag without the CDATA block. For example:

<fx:Script>
    [Bindable] private var nuggets:Array = [
        'Carmelo Anthony',
        'Chauncey Billups',
        'Nen&#234;',
        'Kenyon Martin'];
</fx:Script>

Now, it doesn’t matter if we use single quotes (') or double quotes ("), because outside of the CDATA block all numeric html entities are processed. The fact that Flash Builder 4 automatically inserts the CDATA block when you open a Script tag probably means that almost no one has ever even heard of this possible solution. It’s so weird, I can’t recommend this solution.

Digging Deeper: The official docs will tell you that only a few named html entities work (&lt;, &gt;, &amp;, &quot;, &apos;), and after that you must use numeric html entities (&#NNN;).

Solution #2

Another option, that I actually thought of first, is to process the numeric html entities via a regular expression to output the correct special character. For example:

private function makeSpecialChars(item:Object):String {
    return item.toString().replace(/&#\d+;/g, replaceFunc);
}
 
private function replaceFunc():String {
    var s:String = arguments[0];
    s = s.substring(2, s.length - 1);
    s = String.fromCharCode(parseInt(s));
    return s;
}

We use a simple regular expression to match any numeric html entity, and then call a replacement function to do the work of converting the entity into a special character. The static method String.fromCharCode does the actual conversion.

Putting the converter code together with a List‘s labelFunction property and we get this:

<fx:Script>
<![CDATA[
    [Bindable] private var nuggets2:Array = [
        'Carmelo Anthony',
        'Chauncey Billups',
        'Nen&#234;',
        'Kenyon Martin'];
 
    private function makeSpecialChars(item:Object):String {
        ...same as above...
    }
]]>
</fx:Script>
 
<s:List dataProvider="{new ArrayList(nuggets2)}"
    labelFunction="makeSpecialChars" />

For each element in the array, the labelFunction gets called and any numeric html entity is converted into the corresponding special character. No magic.

The Result

Here is the final result with both Solution #1 and Solution #2 together (view source enabled):

Flash is required. Get it here!

If you view the source, you can see that I’m using two separate Script tags, one with a CDATA block and one without. Who does that?

Files

© 2017 saturnboy.com