{"API Voice"}

The Twitter Firehose


“Firehose” is the name given to the massive, real-time stream of Tweets that flow from Twitter each day. Twitter provides access to this “firehose”, using a streaming technology called XMPP, something that was originally developed by John Kalucki for Technorati in 2007, but after receiving requests from other companies, Twitter began making more arrangements for firehose access.

“Experimentation with XMPP is informing the future direction of our developer platform. Ultimately, our API will be updated and extended to support a wider variety of interesting projects by a great many talented developers.”

With API rate limits imposed on the REST API, drinking from the firehose became the dream of every developer looking to get access the ever increasing flow Tweets flowing from Twitter every day, eventually giving firehose access a sort of mythical status.

While Twitter historically had several loose partnerships for firehose access with companies like Summize as early as 2007, you don’t start seeing official fire partnership announcements until end of 2009, with two major search engine partnerships:

Bing - Search Engine
Google - Search Engine

Then shortly afterwards in beginning of 2010 they added another major search partnership:

Yahoo - Search Engine

Then in January of 2010, out of the firehose access, the Twitter Streaming API was born, except it wasn’t quite firehose level, more of a garden hose that is limited to 1% of the full firehose stream--higher levels of access were still only available to the handful of selected partners.

Shortly afterwards in March of 2010 it seemed like even small companies would get access to the Twitter firehose, with a batch announcement of seven smaller, Twitter firehose partnerships:

Collecta - Real-Time Search
CrowdEye - Location-based Search

Chainn Search - Social Search
Ellerdale - Real-Time Web, Semantic Analysis
Kosmix - Categorization Engine
Scoopler - Real-time Search Engine
Twazzup - Real-Time News

It appeared Twitter would deliver on the access it promised, around the firehose.

“Full investment in this ecosystem of innovation, means all our partners should have access to the same volume of data, regardless of company size. More than fifty thousand interesting applications are currently using our freely available, rate-limited platform offerings. With access to the full Firehose of data, it is possible to move far beyond the Twitter experiences we know today. In fact, we’re pretty sure that some amazing innovation is possible.”

However during the rest of 2010, Twitter only formalized what looks like three other firehose partnerships:

Jive - Social Business Software
PeopleBrowsr - Social Analytics
Gnip - Real-Time Social Media Data

By 2011 it was clear that full Twitter firehose access was a private club, with Twitter only admitting 5 new firehose partners:

MediaSift - Real-Time Social Media Data
SocialFlow - Social Media Optimization
NTT DOCOMO - Japanese Mobile Provider
Crimson Hexagon - Social Media Monitoring, Analysis and Analytics
Mass Relevance - Social Media Integration Platform

At the end of 2011, there were 230 million tweets a day flowing through the firehose, and Twitter begins routing requests for firehose access to a single partner, Gnip, and in April 2012 this became the official response--resulting in online 3 new firehose partners in 2012:

Sysomos - Social Media Monitoring and Analytics
Yandex - Search Engine
DataMinr - Social Analytics

In 2012, full Twitter firehose access is officially handled through Twitter’s reseller partners, Gnip and Datasift. Other Twitter firehose partners still have their access, while others have had their access cut-off or not renewed--most notably Google, when as Google’s deal with Twitter expired at the end of 2011, the two failed to establish a new partnership, keeping Tweets out of Google’s search index.

From what I can tell, 20+ companies have been given full Twitter firehose access. I’m sure there are more, but without public announcements, they aren’t obvious, but this count reflects reports I’ve found from Twitter. Several companies like Chainn Search went out of business, while others like Ellerdale and Kozmix were acquired by other companies like Flipboard and Walmart. And in an interesting twist, Scoopler was shut down and the team went to work on Google+.

I think it’s safe to say that we won’t see much more talk about full Twitter firehose access arrangements, direct access will remain something a privileged few get, with all other requests for access routed to Gnip and Datasift.  

With Twitter continuing to restrict access to Tweets via the REST and Streaming APIs, and access being routed to the two approved resellers, I don’t think the Twitter firehose is about access and distribution anymore, its become about monetization and control.