This Webecology research report has been making the rounds on Twitter. I haven’t had time to read it until now, here are my reading notes:
The Webecology team uses large scale data mining to identify patterns indicative of online culture and community. Wish I’d do this, too – and will, as soon as I find a research partner to help with the data mining part.
For this project, the authors set out to create a more accurate measure of influence on Twitter that goes beyond either:
- number of followers; or
- followers/friends ratio
The authors defined influence on Twitter as:
influence on Twitter = the potential of an action of a user to initiate a further action by another user
Specifically, influence means the potential of a tweet to generate replies, mentions (conversational behaviors), RTs, and attributions (content-pushing behaviors).
This is an atheoretical, operational definition of influence (the study’s Achille’s heel).
As far as I understand, all 4 actions were weighed equally. So, a RT factors the same as an @reply in determining influence.
They selected 12 Twitter accounts to study. The selection was based on this criterion: the 12 accounts were “widely perceived to be among the more influential users on Twitter.” It is not clear who did the perceiving, and what definition or measure of influence they used in the process of perception. IMO, the arbitrary selection of the sample is another major weakness – but in this case, I can live with it, because the purpose is not to derive conclusions about Twitter culture as much as it is to demonstrate how the methodology can be used.
Then, the 12 users were grouped into 3 categories. Here is a table with the accounts they analyzed, and their number of tweets over 10 days, as well as the number of followers and friends at the end of the 10 days:
|Stanley Kirk Burrell
|CNN Breaking News
|Social Media Analysts
The data that they mined was as collected over 10 days, in August 2009. The data included:
- The 2143 tweets generated by the 12 users
- The 90,130 actions (responses, RTs) triggered by the original 2143 tweets
- All the tweets generated in connection with the 12 users (by their followers and friends;a total of 134, 654 tweets, 15,866,629 followers, and 899,773 friends/followees)
The authors produced 2 types of influence reports, based on the type of action that was triggered:
- conversational action (people replied, or mentioned the user – e.g. “meeting @stockington for catnip”)
- content-pushing action (people retweeted, or gave attribution – e.g. “via@username”)
Please note that a mention may or may not be a response to a tweet. If they were not responses to a tweet, they fall outside the authors’ definition of Twitter influence, and they should have been excluded from the analysis.
Here we go, on to the findings:
This graph shows you the amount of conversational activity (@replies and mentions) each user got in response to one (average) tweet.
This graph shows you how much content action (retweets and attributions) each user got for each (average) tweet:
So here we see that, per tweet, @sockington did get more retweets than @chrisbrogan.
The authors claim that these graphs of influence/tweet are the most accurate measure of Twitter influence so far. Therefore:
@sockington IS more influential on Twitter than @chrisbrogan,
because the fake cat gets more retweets. (sorry, @sockington, I do love you!!!)
I know exactly what you’re thinking, it starts with B and ends with T.
That’s because here we have a problem of construct validity. The measures do not actually measure influence. I wish the authors had read some research in communication & persuasion about the concept of influence, then worked their way from a conceptual to an operational definition.
Obviously, @sockington gets more retweets because he’s cuter & funnier than @chrisbrogan (sorry, Chris!). We don’t know why people reply or retweet. This study ignores a very important aspect of human relations: meaning. There is meaning in tweets, and meaning in why people retweet. But that is not captured in this study.
That being said, the report shows what can be done with data mining – it’s awesome! With a bit of help from people who know how to study meaning (hint, hint!), this type of research will be extremely valuable.
If anything, let this be an argument for computers & communication people working together, across disciplines.
In a future post, I will review conceptual and operational definitions of influence.