Zorba 2.2 significantly improves the processing of XML, JSON, and textual streams. In this post, we would like to showcase how XQuery can be leveraged to build applications on top of the Twitter Streaming API.
The Application
The application we would like to build is fairly trivial. First, we fetch a stream of public statuses from Twitter (aka tweets). Each tweet is parsed as JSON. If the tweet is geo-tagged, we transform it into a KML placemark. Finally, we store the tweets into a KML document.
To keep the KML document fresh with the latest tweets, we will overwrite it every time an arbitrary number of tweets have been processed. To visualize the Twitter updates with Google earth, we will create a KML Network Link that points to the KML document updated by Zorba.
Let’s dive in.
The Tweets
First things first: we fetch a stream of tweets from Twitter.
We will use the statuses/sample method from the Twitter Streaming API. This method returns a random sample of tweets. Since this method is returning a stream, the connection between the client and the server will virtually never end.
The good news is that EXPath HTTP Client implementation in Zorba streams HTTP responses. This is particularly well suited to process large amount of data efficiently that are returned from an HTTP server. This feature is also a savior in this particular application scenario since the HTTP response we would like to process is never-ending.
Let’s go ahead and write a small query that fetches the stream of tweets. We ask the HTTP client to serialize the HTTP response as text by using the attribute override-media-type=”text/plain”.
import module namespace http = "http://expath.org/ns/http-client";
http:send-request(
< ="https://stream.twitter.com/1/statuses/sample.json"
="GET"
="xqueryblog"
="1qaz2w"
="text/plain"
/>)
If you execute this query, you will notice that the HTTP response from the send-request() function is already being serialized even thought the HTTP connection is not closed yet. Pretty cool, no?
At this point, we want to transform the stream into a sequence of tweets. By default, the Twitter API delimits each JSON objects with a carriage return character ( ). Zorba provides a split() function to return a sequence of strings from an input. This function is similar to fn:tokenize() but doesn’t use a regular expression as a separator. This restriction makes it easier to split large/infinite strings in a streaming manner.
In the code snippet below, we split the response for each tweet and parse its JSON representation using the JSON module.
import module namespace http = "http://expath.org/ns/http-client";
import module namespace string = "http://www.zorba-xquery.com/modules/string";
import module namespace json = "http://www.zorba-xquery.com/modules/converters/json";
declare namespace j = "http://john.snelson.org.uk/parsing-json-into-xquery";
declare function local:parse-tweets($stream as xs:string) as element(j:json)*
{
let $raw-tweets as xs:string* := string:split($stream, " ")
for $raw-tweet in $raw-tweets
return json:parse($raw-tweet)
};
let $stream :=
http:send-request(
< ="https://stream.twitter.com/1/statuses/sample.json"
="GET"
="xqueryblog"
="1qaz2w"
="text/plain"
/>)[2]
return
local:parse-tweets($stream)
In the query above, the predicate [2] means that we are only interested by the HTTP response body. The HTTP response status code and headers are available in the first item of the function result. Examples on how to use the EXPath HTTP client are available in theZorba live demo interface.
Some tweets contain unicode characters that are not supported in XML 1.0. Therefore, you should run the query above by setting the version parameter from the XML serialization to 1.1:
$zorba -q streaming-example.xq -f -i -z version="1.1"
The Placemarks
The next step is to transform each tweet into a KML placemark. As you can see in the code snippet below, this is a pretty straightforward operation.
declare function local:placemark($json as element(j:json))
{
let $coordinates := $json/j:pair[@name = "coordinates"]/j:pair[@name = "coordinates"]/j:item
let $lat := $coordinates[1]/text()
let $long := $coordinates[2]/text()
let $coordinates := $lat || ", " || $long || ", 0"
return
< ="http://www.opengis.net/kml/2.2">
<>{$json//j:pair[@name = "name"]/text()}</>
<>1</>
<>{$json/j:pair[@name = "text"]/text()}</>
<>
<>
<>2</>
<>
<>{$json//j:pair[@name = "profile_image_url"]/text()}</>
</>
</>
<><>1</></>
</>
<>
<>0</>
<>1</>
<>{$coordinates}</>
</>
</>
};
See example3.xq to use this function directly with the Twitter API.
Google Earth Integration
We’re almost there. To integrate with Google Earth, we need to add some scripting to our query. First, we define a function namedfetch-placemarks($limit). This function will fetch a certain number of tweets from Twitter and transform them into KML placemarks. The number of tweets to fetch is set by the $limit parameter. See the function definition below.
declare %ann:sequential function local:fetch-placemarks($limit as xs:integer)
as element(Placemark)*
{
let $stream :=
http:send-request(
< ="https://stream.twitter.com/1/statuses/sample.json"
="GET"
="xqueryblog"
="1qaz2w"
="text/plain"
/>)[2]
let $tweets := local:parse-tweets($stream)
let $tweets := subsequence($tweets, 1, $limit)
for $tweet in $tweets
where $tweet/j:pair[@name = "coordinates"]/@type != "null"
return local:placemark($tweet)
};
This function invokes parse-tweets() which is the exact same function that we declared previously. The main query expression an infinite loop to fetch the KML placemarks from a Twitter stream and store it into a KML document named tweets.xml. For each whileiteration, the tweets.xml document will be refreshed with new content. This is done in the main query below.
(: XQuery 4ever :)
while(true()) {
trace("...", "refresh KML document");
let $placemarks := local:fetch-placemarks(1000)
let $kml :=
< ="http://www.opengis.net/kml/2.2"
="http://www.w3.org/2005/Atom"
="http://www.google.com/kml/ext/2.2">
<>
<>Tweets</>
<>1</>
< ="tweets">
{$placemarks}
</>
</>
</>
return file:write("tweets.kml", $kml, ());
}
Notice that the main query invokes fetch-placemarks() which is the exact same function that we declared previously.
Finally, we create network link that points to tweets.xml:
<?xml version="1.0" encoding="UTF-8"?>
xmlns="http://www.opengis.net/kml/2.2"
Realtime Tweets
file:///{PATH}/tweets.kml
onInterval
5
Add the network link to Google Earth and that’s it — we’re done.
Take-Away
The source code of this application is available on Launchpad. We hope that you will put Zorba streaming capabilities to the test.
If you are looking for more complex JSON processing with XQuery, you should checkout the Zorba JSONiq development branch.