I figured I would write a short post from the trenches. I wanted to pose a question to the masses...
How do you know that your "counts" of sound bites are even accurate?
I have been spending an enormous amount of time working with people across the social media wild west and one of the things I cannot get my head around is the battle of numbers in social media data. Culturally, we are shifting the very foundation of what data means from a place of strict quantification to a place of the more data the better with blind faith.
As a scientist who is trained in variable management, I was always taught that there is no bad data only data. And this tenet is surely true as the field of social media measurement continues to mature slowly. That being said, I am surprised at how much push people are asking in the area of quantificaiton, but rarely question how the data was counted.
Yes, it is imperative that when someone tells you a sound bite is positive that it truly is, but for any technology company, trying to live up to the perfect coding standard is impossible. Then I will see someone question how many sound bites they have on their brand.
I am shocked that people are so quick to declare victory of their facebook fan page likes, but are so loathe to stop and think when you tell them that their multimillion campaign on generated 7500 soundbites naturally. Why is this the case?
Yes, social media data is extremely powerful. The data is vast and the potential for real time understanding translating to business results is real and will become tangible as this year wears on, but to simply say more data is better without asking how that number was generated is folly.
I figured I would post some thoughts on how to think about social media analytics to "clear the air". We all want our tools to look pretty and do wild things. But what else is there to consider? I would argue there is more than one thing to think about when getting a great tool.
In fact, there are three critical components to think about
Features - This is the obvious part. This is about bells and whistles, slicing and dicing the data so you can mine it and use it. Most peopl solely focus on the ease of use that a tool has. And by the way I couldn't agree more with that thinking. Many tools are very powerful but very difficult to use. Having something that is flexible, powerful and nimble is critical. This is becoming table steaks for anyone interested in building something good. But frankly if you have a great tool there are two other things to think about...
Content- This is the point I am trying to make here. Did you get ALL the content? Or better yet, did you get ALL the CLEAN content? Here is the distinction...if you get 100,000 sound bites and 30,000 are duplicate posts it doesn't matter how great your features are, you now have crappy content and you are comparing apples to oranges in doing your work. Many social media junkies will focus solely on the numbers and not on whether they are accurate. Are you asking that feature provider to open their kimono so they can tell you how they get there content? Do you know how they verify their streams? I think people need to spend a whole bunch more time on this issue before worrying about how pretty the data is sliced and diced. Why? Becuase if you are so worried about the ROI of social media data then your content and thus you counts better be good or you data is bad and your reco is flawed.
Accuracy - Again if you are trying to understand what social media is telling you (not listen) then you need to be concerend with this small thing. No one is going to get you all the way to 100% accuracy. At least, not until skynet is built and we all go down in a ball of MATRIX flames. But for now, after you get your counts right, you need to make sure that sentiment is accurate. There are a lot of claims out there as to how to best calculate it. While I work for an NLP company, I won't argue that we are the end all be all, but we do go for accuracy of the smaller sample set as one of our targets to help you know WHY people feel a certain way. Accuracy is the key to good sentiment analysis and critical if you want to get away from coding a thousand soundbites by hand...ugh.
Why am I writing this? Because I see a lot of resistance from people I work with when trying to get comfortable with social media data. It starts with not asking question about what appears to be bigger and better. It is about making sure that what is under the hood is in working order or you just bought a car that has a beautiful body, paint job, awesome seats, great stereo and an even cooler steering wheel, but is powered by a YUGO engine under the hood. That car is great to stand around an party and look good but could get you to the liquor store to buy a case of beer...