“Cloudera’s software is publicly accessible in the extreme. The 4-year-old startup deals in open source software — software that’s freely available to everyone. It specializes in an open source platform called Hadoop, which seeks to mimic the way Google applies tens of thousands of servers to a common task. In many ways, Hadoop is still years behind the software Google now uses in its worldwide network of data centers. But Marcel Kornacker is closing the gap.
After moving to Cloudera, he built Impala, a tool based on the engine Google uses to ask questions of its worldwide database. Plugging into Hadoop, Impala lets you query a collection of digital data in much the way we have always queried databases. If you’re dealing with a database of, say, online travel reservations, you can instantly ask for a list of all reservations over a certain price or all those that used a certain digital coupon. The difference is that Impala, like the Google query engine it’s based on, works with petabytes of data — aka millions of gigabytes. And unlike Google’s engine, Impala is open source.”