It’s an irony of the second Age of Reason that the abundance of data—the effervescence of sources and ease of delivery—makes so many more questions answerable while at the same time making it very easy to get lost. We’ve dedicated an issue to exploration, to a broad, cross-platform look at the fruits of Big Data.
Employers say the ideal candidate must have more than traditional market-research skills: the ability to find patterns in millions of pieces of data streaming in from different sources, to infer from those patterns how customers behave and to write statistical models that pinpoint behavioral triggers.
At e-commerce site operator Etsy Inc., for instance, a biostatistics Ph.D. who spent years mining medical records for early signs of breast cancer now writes statistical models to figure out the terms people use when they search Etsy for a new fashion they saw on the street.
To get help, employers are increasingly looking to an elite program called Insight Data Science Fellows Program, which helps funnel doctoral candidates from fields like astrophysics, neuroscience and math into the profession. The program, based near Stanford University and funded by tech companies, has a 100% placement rate.
The Chicago mapping and traffic information enterprise started life as NAVTEQ, acquired by Nokia in 2007. The map databases curated here go into four out of five cars with in-dash navigation, and the company makes 2.7 million map database revisions every day. Here's global traffic-monitoring effort supports nav/traffic routing in 41 countries and processes more than 1 billion data points per day coming in primarily from anonymized cellphone GPS signals and roadside traffic monitors. Twenty-five traffic "editors" cover North America and Australia from Chicago, monitoring police scanners, government Twitter feeds, and 12,000 traffic cameras to provide real-time traffic route guidance.
The traffic team is done with this probe data minutes after it arrives, but it doesn't get discarded. The Big Data analysis team at Here teases other useful info out of it, such as mapping drive-through restaurant lanes and confirming POI viability. (No cellphones have stopped at that supposed fuel station in a month; should we strike it from the database?) But the most interesting knowledge they're teasing out of this pile of ones and zeroes is behavioral. By studying the speed traces of millions of vehicles on freeway ramps, dead-man's-curves, and blind-uncontrolled intersections, they can begin to model how real humans behave in these situations and teach this to the would-be autopilots.
Companies ranging from established giants such as IBM, SAS, and Microsoft to startups such as Tranzlogic and Kaggle offer affordable, cloud-based data-crunching services-which can help you get nondigitized data into data-crunchable form-and today virtually anyone can get his or her hands dirty in the great Big Data mud pile.
Businesses successfully mining Big Data are cross-referencing their internal information-pricing histories, customer traffic patterns-with multiple outside sources to increase revenue by understanding customers' behavior better, reducing costs by eliminating inefficiencies and human bias, strengthening client bonds by anticipating clients' needs, enriching service offerings with new knowledge, and giving employees new tools to perform their jobs better.
Oracle invited me to moderate a panel in their DaaS launch today. Watch the whole hour for some great perspectives on this exciting new market category, and our panel starting around 26.00.
Talking to Steve Miranda and Omar Tawakol before and after I got even more excited about the concept
a) There is a new generation of data asset entrepreneurs like Omar (whose company, BlueKai Oracle acquired) who are about separating signal from noise– under NDA I have been briefed by other entrepreneurs focused on supply chain, HR and other DaaS. Scary smart folks.
b) Watching Omar talk about all the social and other marketing data feeds, Steve talking about feeds from machines and wearables, you realize we are not anymore in the Kansas of internal, structured data so many companies are still heavily invested in
c) How rapidly business and deployment models are changing. While the old model of “buy hardware, install software, load data, clean for quality – pay for all that” is not going away the speed of getting started with and the economics of DaaS are going to be compelling
d) How effortlessly Oracle with its decades of data management experience and horizontal and vertical application knowledge could play in a wide range of DaaS categories.
Each month more than 11 million people–mostly 35- to 65-year-old women–visit Wayfair.com to browse its massive housewares catalog, an online directory hundreds of times larger than any Sears, Roebuck ever produced. Shipping is free for orders over $49; assembly is usually up to you. Wayfair doesn’t make anything. Many of its goods are produced by mom-and-pop operations, and the site will carry a product even if it sells it only once.
The key to this enterprise is a series of algorithms that fulfills orders–with a 98% success rate that’s improving all the time. Deployed to manage 7,000 vendors and a head-spinningly convoluted supply chain, that secret sauce makes shopping a virtually frictionless experience. Wayfair is as much a data miner as it is a retailer.
As many as 1 in 10 patients respond well in clinical trials of experimental medicines that U.S. regulators end up rejecting, according to the National Cancer Institute (NCI). To understand why these patients had such a response, researchers are beginning to use DNA sequencing technology to determine if the patients they call “exceptional responders” carry gene variations that can lead to better targeted therapies, including new treatments and the reconsideration of others.
Traditional treatments such as chemotherapy kill healthy cells along with malignant ones, but targeted therapies are designed to leave healthy cells unscathed and home in on cancer cells that make tumors grow and spread. The catch is that they don’t work for everyone, and even patients who find them helpful tend to develop resistance over time. The NCI and academic medical centers including Memorial Sloan Kettering Cancer Center in New York, the Dana-Farber Cancer Institute and Massachusetts General Hospital in Boston, and the Broad Institute in Cambridge, Mass., are creating a national database of exceptional responders to aid research. “What was yesterday’s miracle event is today becoming a subject of scientific inquiry,” says Leonard Lichtenfeld, an oncologist and the deputy chief medical officer of the American Cancer Society.
Take well-known U.S. universities such as Carnegie Mellon and Purdue. In each case, LinkedIn has data on the career paths of more than 60,000 graduates. That’s a data set big enough to allow for some fascinating fine-grained distinctions. Type in MIT, and you quickly learn that graduates are unusually likely to land jobs at Google, IBM, and Oracle. Plug in Purdue, and employers such as Lilly, Cummins, and Boeing predominate.
Such information is a gold mine for high-school juniors and seniors, says Purvi Modi, a college advisor in Cupertino, California, since most high-school students have only a hazy idea of what careers are out there. By using LinkedIn’s tool, students interested in specialties such as solar energy, screenwriting, or making medical devices can pinpoint schools with the best track records of sending graduates into those fields. Modi, who advises about 300 students a year, says about 40 percent of them now cruise through this part of LinkedIn’s database, known as University Pages, to get insights. That’s impressive, given that the data-combing service has been fully available only since August 2013.
“It will no longer be just a research tool; reading all of your DNA (rather than looking at just certain genes) will soon be cheap enough to be used regularly for pinpointing medical problems and identifying treatments. This will be an enormous business, and one company dominates it: Illumina. The San Diego–based company sells everything from sequencing machines that identify each nucleotide in DNA to software and services that analyze the data. In the coming age of genomic medicine, Illumina is poised to be what Intel was to the PC era—the dominant supplier of the fundamental technology.
Illumina already held 70 percent of the market for genome-sequencing machines when it made a landmark announcement in January: using 10 of its latest machines in parallel makes it feasible to read a person’s genome for $1,000, long considered a crucial threshold for moving sequencing into clinical applications. Medical research stands to benefit as well. More researchers will have the ability to do large-scale studies that could lead to more precise understanding of diseases and help usher in truly personalized medicine.