Why Detailed Data Is As Important As Big Data
The increasing ability for companies to get transaction-level, detail-level data — clickstream data versus summary data — presents huge opportunity, says Boston College’s Sam Ransbotham.
Topics
Competing With Data & Analytics
Big data gets all the press these days, but as important — and perhaps even more important — is detailed data.
That’s according to Sam Ransbotham, an assistant professor at Boston College in the Information Systems department. He’s been at BC for four years, and before that he was at the Georgia Institute of Technology, where he got his PhD in IT management and his BA in chemical engineering.
Detailed data gives companies “the opportunity to try to figure out the ways that items, such as customers, differ,” he says. And that’s not just demographically, but in their behavior. “By observing detailed, transactional data, we can actually find much more interesting things than we can by lumping them into demographic groups.”
Ransbotham’s initial research interests were in security and risk, but that led him to analytics. “What really sparked my interest is how do you make sense of that much data, since detection system logs have data in huge numbers, the billions of records magnitude. How do you spot trends and how do you figure out what’s going on in those trends?” His research also now encompasses what he calls “more positive areas” of customer service and customer reviews and how those things are being used in marketing.
In a conversation with David Kiron, executive editor of Innovation Hubs at MIT Sloan Management Review, Ransbotham explains why detailed data can tell companies not just why someone did something but why they didn’t do something else, how hospitals don’t seem to face heightened malpractice risks when they install electronic medical record systems and what companies should and should not be worried about when their customers fire off real-time comments on Twitter.
You see all the talk about what big data is doing to the competitive landscape. What do you see as “big data”? What does that mean to you?
It’s easy to talk about the number of records, just the total volume, and there’s no question that that’s increasing and is huge. But more than just the size, is also the types of data. It’s transaction-level, detail-level data — such as clickstream data versus summary data. And that’s really the more interesting of the trends. The details are what give companies the opportunity to do so much more.
What kinds of “more”? How is big data different from regular kinds of analytics?
There’s the opportunity to try to figure out the ways that items, say customers, differ. And not just how they demographically differ, which is how people have been reporting and thinking about things for forever, but how does their behavior differ. By observing detailed transactional level data, we can actually find much more interesting things than we can by lumping them into demographic groups.
Can you give us an example? What kinds of behavioral patterns can you discern with new techniques that you couldn’t before, that could help managers manage better?
One of the things that we’ve become very good at is capturing data about what people are doing. If you think about classic point-of-sale systems, they capture details of the transactions. What people buy, what time of day, whether the product was on sale.
But those systems don’t tell you what the customer didn’t do. What did he look at and not buy? What did he not look at? How did he walk around the store?
I know some researchers are monitoring where shopping carts are in the store and where people are looking when they’re in grocery stores and those sorts of things. But when we talk about the Web, we’re getting new kinds of data that do show us the kinds of things that people looked at but didn’t buy. That’s opening up a new opportunity to understand how people are going about the process. Web data, click stream data, was one of the first chances we got to look into that.
We’re still limited there, because companies tend to get just what people did on their website, and not what people did categorically. You also can’t tell if people have shopped in stores and then are buying online. But we’re gradually getting more and more data about what people are doing in the process of shopping and of buying.
Can you draw that out a little bit — what can you learn from what customers didn’t do?
Think about the example of online grocery stores: You can tell what people looked at but didn’t purchase. And you’d want to know, was it because of price? Was there some attribute of it? Was it the photo, or the text?
Companies that are savvy can start to manipulate those variables. So, they’ll pick half their customers and send them down one path and half down another path. One path might have more information, or less information, or better pictures, or more detailed pictures, or less detailed pictures. Companies can really start to understand what types of information and presentations make a difference to consumers.
Experimentation is cheap now, isn’t it?
It’s cheap and it’s fast. If you have a random way of showing people different things on your website, then you can pretty quickly, with a very small number of observations, start to figure out what’s working and what isn’t. In real time, you can begin to refine your presentation — and I’m using Web commerce as an example, just because that’s an easy way of running experiments, but experimentation can go well beyond that context.
It goes back to some things we were just talking about in terms of the difference between big data and detailed data — you really don’t have to have that much data for an experiment like that. It’s not like you need to run it for six months. These are answers you can find out with not the huge volume, not the billions of records, but with the detailed level of the records. Randomness combined with a clear decision point (such as a purchase) is powerful.
And the thing is this: if your company is not doing this, somebody else is, and they’re doing it quickly, too.
Yeah, talk about that: What are the competitive pressures to build a capability around being able to deal with a variety of analytics?
Well, I’m a technical person in general, so I don’t want to minimize the importance of having good technical people on your team, and certainly building models, understanding them and running them is important. But I think that’s really the secondary skill here, and not the one that’s most in demand.
The competitive challenge is that while it’s hard to find people that do the technical things, it’s even harder to find people who can interpret them, who can use creativity to ask provocative questions, who can think about experiments to run that would be interesting. It’s hard to have a corporate culture that encourages that sort of manipulation, experimentation and data-based decision making.
Again, there’s certainly a shortage of people who have the technical skills, but I think we’ll see, much like we have in the rest of IT, that those things move more quickly towards commodities than these managerial skills do.
How do companies deal with this expertise shortage, finding people who can blend analytic skills with business expertise?
We’re certainly seeing that employers want people with lots of technical skill coming straight out of school. That’s where they’re pulling some resources from. But for managerial skills, companies are sending people out to explore what other people are doing and trying to stimulate some thinking that way.
Let’s switch gears and talk about some of your research on security and risk with IT. Tell us about what you found looking at healthcare and malpractice lawsuits.
Sure. One of the things I’ve looked at is medical malpractice lawsuits and whether there is increased risk to hospitals that install computer systems that, as a byproduct, log what happens during patient care. Does this affect their medical malpractice, the lawsuits? Does it change anything?
This is a study I did with Eric Overby at Georgia Tech, and we looked at data in the state of Florida because it has some laws about reporting that make it public information. We also have information about who’s installing computer systems, particularly things like electronic medical records.
So on the one hand, there’s a lot of evidence out there that says that patient health care improves with the installation of these systems. They can help doctors prevent drug interactions, they can improve accuracy. Lots of positives out there. But at the same time, there’s a fear that all this detailed information can be used against hospitals in the context of medical malpractice. That if something goes wrong, then people go through an electronic discovery process and try to dig through these detailed logs and find out something that’s happened. You’re talking about lots of different people involved with a patient and there can be lots of opportunities for something to not be absolutely perfect.
It’s an inherently empirical question here of which of these tensions is stronger. Which way do things work out? To me, it’s that process that’s the most interesting, trying to figure out how the data we’re collecting can be used to answer that question.
So what did you find? Did the data increase the risk?
The net result is that we don’t see any adverse effect of installing those systems. If anything, there’s improvement. But they’re certainly not worse. So you get all the patient health care benefits, and it doesn’t seem to be hurting from a medical malpractice perspective.
That’s a fine result.
Yes. And to tie that back into what we were talking about earlier, the presence of all this data is new. It is unusual for managers who are used to making decisions the way they’ve made them all along, who are used to relying on their experience, whether or not it’s right or wrong, good or bad. The idea that you would actually look to data to answer these things is a big shift. It’s a big shift for physicians, it’s a big shift for a manager.
Do you have a perspective on what kind of help leaders need to accept that their experience may not be all they thought it’s worth, that data can supplant it — how can they make that shift in attitude?
Well, I don’t think that data completely supplants experience. Maybe to make that point stronger, I’ll say that you get billions of records out there to analyze, and we need to shift people who have that experience, who have relied on that, into guiding those questions. We’re not John Henry, the steel-driving man, fighting the machine. These are tools. We still need people to help understand what kind of experiments to run, and to understand how to shape those tools.
On the other hand, we certainly don’t need people bookkeeping by hand. That’s not a good use of people. So it’s trying to apply people where they’re most useful.
Now, your question was more about some of the cultural shifts and people skills to make those transitions, and that’s something I don’t know. I’m not sure that I’m qualified to answer that.
Ok. Let’s switch gears one more time. Your research also looks at the volume of data in social media and mobile devices, and what all that data and speed at which it’s generated means for companies. Can you talk a little about that?
So, we have all these social media tools out there, and if you think about it, what have those things done, really? At the core, they reduce transaction cost and coordination cost. They’ve made it really easy for us to share stuff.
By sharing stuff, I mean that people are creating data and providing feedback, and they’re doing it right at the time of the good or bad experience. The idea that these things are firing in real time and that they’re visible to everybody is, I think, a brave new world.
So we as customers are walking around with mobile devices that make it so easy to take a picture, to post something, to act immediately. Maybe it’s just an overall trend in society to react to things so quickly, maybe too quickly, but in either case, the devices and the infrastructure have certainly enabled that.
Here’s the question for companies: what are the risks of this? Before cell phones, if you went to a restaurant, maybe you had to wait in line for longer than you wanted to, but the food was great and it was a nice night. By the time you got home, you said, “Ah, that was a nice evening.” Whereas today, the worry is that in our modern world, you’ve already fired off Facebook updates and Twitter updates while you’re waiting in line, complaining about the restaurant. You don’t just turn to the person in line next to you and make some comment about how things are terrible; you get to broadcast that everywhere quickly.
So I did a study looking at restaurant reviews (with Nick Lurie, funded by the Wharton Customer Analytics Initiative), where we had some coming from mobile users and some coming from desktop users. It’s not clear what should happen. On the one hand, mobile people might react like I said, react instantaneously and not really kind of get the holistic experience in their head. On the other hand, they don’t have problems with recall bias, and they’re more likely to be accurate the closer they are to the actual experience.
The study looked at how those reviews are different. We did a lot of text analysis and said, okay, the things that people were actually typing, how did they differ? Are there more emotional words? Are there more positive words or negative words, or are there more words that indicate future thinking or past thinking? We looked those variables and tried to explain the difference in influence of reviews written on mobile and desktops.
Have you reached a point in your analysis where you can recommend to managers how concerned to be about these immediate reviews that come from customers who post to Facebook or who tweet as the experience is happening?
I’d say some of the fears — like the fears about health care electronic medical records and litigation — are unfounded. What we saw when we were looking at the influence of reviews is that mobile reviews are less influential. Perhaps people recognize that other people are hotheads, or that the mobile experience might be jaded, and they’ll discount that. Which I think is really interesting. I think that companies don’t need to panic about this as much as they thought. Yes, get a few stories out there like the guy who made the YouTube video “United breaks guitars” and those sorts of quintessential social media explosions, but for the most part, people are discounting those things. At least in our restaurant context they seem to be.
You can go back to where does competitive advantage come from and how can you sustain it. Some of the things that we’re talking about, the data about your customers and how they behave, can really become a source of advantage for companies. Now again, the challenge is that other people are trying to do this as well at the same time, and so it may just be the kind of thing where you need to run just a little bit faster than everybody else. Companies need to figure out how to turn that into some sort of competitive advantage, a sustainable or non-ephemeral competitive advantage. Those are the things that we’re still working on.
Comments (3)
Blues You Can Use – Lesson #1 | theglobalroundhouse
hpark
kpk2005