Companies ranging from established giants such as IBM, SAS, and Microsoft to startups such as Tranzlogic and Kaggle offer affordable, cloud-based data-crunching services-which can help you get nondigitized data into data-crunchable form-and today virtually anyone can get his or her hands dirty in the great Big Data mud pile.
Businesses successfully mining Big Data are cross-referencing their internal information-pricing histories, customer traffic patterns-with multiple outside sources to increase revenue by understanding customers' behavior better, reducing costs by eliminating inefficiencies and human bias, strengthening client bonds by anticipating clients' needs, enriching service offerings with new knowledge, and giving employees new tools to perform their jobs better.
Oracle invited me to moderate a panel in their DaaS launch today. Watch the whole hour for some great perspectives on this exciting new market category, and our panel starting around 26.00.
Talking to Steve Miranda and Omar Tawakol before and after I got even more excited about the concept
a) There is a new generation of data asset entrepreneurs like Omar (whose company, BlueKai Oracle acquired) who are about separating signal from noise– under NDA I have been briefed by other entrepreneurs focused on supply chain, HR and other DaaS. Scary smart folks.
b) Watching Omar talk about all the social and other marketing data feeds, Steve talking about feeds from machines and wearables, you realize we are not anymore in the Kansas of internal, structured data so many companies are still heavily invested in
c) How rapidly business and deployment models are changing. While the old model of “buy hardware, install software, load data, clean for quality – pay for all that” is not going away the speed of getting started with and the economics of DaaS are going to be compelling
d) How effortlessly Oracle with its decades of data management experience and horizontal and vertical application knowledge could play in a wide range of DaaS categories.
Each month more than 11 million people–mostly 35- to 65-year-old women–visit Wayfair.com to browse its massive housewares catalog, an online directory hundreds of times larger than any Sears, Roebuck ever produced. Shipping is free for orders over $49; assembly is usually up to you. Wayfair doesn’t make anything. Many of its goods are produced by mom-and-pop operations, and the site will carry a product even if it sells it only once.
The key to this enterprise is a series of algorithms that fulfills orders–with a 98% success rate that’s improving all the time. Deployed to manage 7,000 vendors and a head-spinningly convoluted supply chain, that secret sauce makes shopping a virtually frictionless experience. Wayfair is as much a data miner as it is a retailer.
As many as 1 in 10 patients respond well in clinical trials of experimental medicines that U.S. regulators end up rejecting, according to the National Cancer Institute (NCI). To understand why these patients had such a response, researchers are beginning to use DNA sequencing technology to determine if the patients they call “exceptional responders” carry gene variations that can lead to better targeted therapies, including new treatments and the reconsideration of others.
Traditional treatments such as chemotherapy kill healthy cells along with malignant ones, but targeted therapies are designed to leave healthy cells unscathed and home in on cancer cells that make tumors grow and spread. The catch is that they don’t work for everyone, and even patients who find them helpful tend to develop resistance over time. The NCI and academic medical centers including Memorial Sloan Kettering Cancer Center in New York, the Dana-Farber Cancer Institute and Massachusetts General Hospital in Boston, and the Broad Institute in Cambridge, Mass., are creating a national database of exceptional responders to aid research. “What was yesterday’s miracle event is today becoming a subject of scientific inquiry,” says Leonard Lichtenfeld, an oncologist and the deputy chief medical officer of the American Cancer Society.
Take well-known U.S. universities such as Carnegie Mellon and Purdue. In each case, LinkedIn has data on the career paths of more than 60,000 graduates. That’s a data set big enough to allow for some fascinating fine-grained distinctions. Type in MIT, and you quickly learn that graduates are unusually likely to land jobs at Google, IBM, and Oracle. Plug in Purdue, and employers such as Lilly, Cummins, and Boeing predominate.
Such information is a gold mine for high-school juniors and seniors, says Purvi Modi, a college advisor in Cupertino, California, since most high-school students have only a hazy idea of what careers are out there. By using LinkedIn’s tool, students interested in specialties such as solar energy, screenwriting, or making medical devices can pinpoint schools with the best track records of sending graduates into those fields. Modi, who advises about 300 students a year, says about 40 percent of them now cruise through this part of LinkedIn’s database, known as University Pages, to get insights. That’s impressive, given that the data-combing service has been fully available only since August 2013.
“It will no longer be just a research tool; reading all of your DNA (rather than looking at just certain genes) will soon be cheap enough to be used regularly for pinpointing medical problems and identifying treatments. This will be an enormous business, and one company dominates it: Illumina. The San Diego–based company sells everything from sequencing machines that identify each nucleotide in DNA to software and services that analyze the data. In the coming age of genomic medicine, Illumina is poised to be what Intel was to the PC era—the dominant supplier of the fundamental technology.
Illumina already held 70 percent of the market for genome-sequencing machines when it made a landmark announcement in January: using 10 of its latest machines in parallel makes it feasible to read a person’s genome for $1,000, long considered a crucial threshold for moving sequencing into clinical applications. Medical research stands to benefit as well. More researchers will have the ability to do large-scale studies that could lead to more precise understanding of diseases and help usher in truly personalized medicine.
Happy Cinqo de Mayo! As attention turns to jallapeno and other peppers today, the reality is one person’s five alarm chili can actually be blander than someone else’s Thai Hot curry.
The world needs a spice scale as we quantify everything in our lives. Sure there is the Scoville scale, but go check how few even the hot sauces on the “Wall of Fire” at the cajun restaurant Heaven on Seven in Chicago show the scale.
Expect more brands like Kraft to provide more quantification as in the packaging below.
“Kraft redesigned the packaging of its line of spicy cheese to include a "heat scale." Five chili peppers show the level of spice from mild (Smoky Chipotle) to extra hot (Hot Habañero). The company is considering adding ghost pepper, known as one of the world's spiciest, to fill in the fifth pepper on the scale” says the WSJ.
Nice interactive data visualization from Fathom Information Design of how different countries are aging – if it had countries like Mexico and Pakistan it would also show how the world is getting younger.
BusinessWeek on Dallas Museum of Art trading free entry for patron data
“The program has so far delivered about 2 million records that show how visitors use the museum. “In the past all we’ve ever known is that some number of anonymous people have entered a space,” says Robert Stein, the museum’s deputy director and a software developer who previously worked with Anderson at the Indianapolis Museum of Art. The DMA is using the data to learn which galleries are the most popular, which events attract visitors from city neighborhoods where museum membership is thin, and the rate of repeat visits. It’s similar to the metrics relied on by online retailers, Stein says, but “instead of clickstreams, you’re looking at streams of activity in a physical space.” Mindful of privacy concerns, the museum tracks activity only when members decide to check in to the museum or scan a card in a gallery.”
“This assigns each shot a probability related to its position, and thus determines how well a goalscorer is performing. The statistic filters out the quality of the opposition and the quality of the player's team. Last year, for instance, Tottenham's Gareth Bale had 161 shots and 21 goals, when, according to the goal-expectation model, he was due to score only 11. "Bale would regularly shoot from situations with a low probability of success, such as from a distance of 30 yards, and score," says Paul Boanas, Prozone's senior account manager and a former performance analyst. "This type of contextual information helps to explain why he's worth so much."
Some of the most important elements of football remain very hard to quantify and it's difficult to understand what we can't measure. Consider defence. Using data from the last ten seasons of the Premier League, Anderson and Sally compared the value of a goal scored and the value of a goal conceded. They found that scoring a goal, on average, is worth slightly more than one point, whereas not conceding produces, on average, 2.5 points per match. "Goals that don't happen are more valuable than goals that do happen," Anderson says. "It's counterintuitive. The question is: how do we measure something that doesn't happen? The challenge is to see the unseen."