There is an incredible amount of potential stored within social networks and the Internet of Things.
On projects at Odopod, we've scored site contributors based on their social activities. We've provided tools for our clients to hold conversations in Twitter and bring those conversations into their sites. We've generated countless shares and likes. And we've only begun to scratch the surface.
With the continued growth of data available to us via APIs and increasingly sophisticated open source tools, we're looking forward to more and more opportunities to skim a little data and shape it into something both fun and useful.
I recently presented some related research during a brown bag lunch discussion. Here are some highlights.
Many of the examples I use are based on concepts presented in Mining the Social Web by Matthew A. Russell. The book is a great overview of contemporary open source tools, APIs and data-mining practices. The examples and techniques discussed make a broad range of data mining tasks approachable without a PHD.
Working with relationships
Network graphs are a very useful tool for making sense of relationships among a group things. These graphs provide a model for storing nodes and connections between them. For example, a group of Facebook users could represent the nodes in a graph and their friendships would be used to form connections (aka edges) between the nodes.
Once a network graph is set up, several functions are available for analyzing the data. You can determine the density of a graph, which could tell you if your followers on Twitter form a tight-knit community or if they are strangers to one another. You can also quickly find the nodes that are highly connected to others in the community. It is possible locate sub networks (called cliques) within the main network. And, of course, you can play your own game of "six degrees of separation" by getting the shortest path between any two nodes in the graph.
While it is common to make network graphs based on friendships or other relationships in a social network, there are countless other ways to make use of these graphs. For example, you can learn about how your messages spread through Twitter by making nodes for retweeted messages and connecting each one to the tweet it was retweeted from.
Graphing Odopod's Twitter network
As I was looking at a network graph of some Odopod employees and their followers, I realized that the shared relationships within the graph could be used to find Twitter accounts for others who work here.
I originally built a graph using 20 fellow employees that I was already following on Twitter. By looking at the largest cliques (sub networks), the people appearing most often in those cliques and the most connected nodes in general, I found additional accounts for people who work at Odopod. By adding those and rebuilding the graph, I found even more. In the end, I extended the group of 20 accounts to 30.
Given a partial list of any community, this technique could be used to find additional members.
A force directed plot of a network graph that includes Twitter accounts of Odopod employees and their followers. The large mass in the center are all @Odopod followers collected around the @Odopod node.
Other visualizations of Network Graphs
There are several ways to visualize network graphs. One alternative to the force directed render above is an RGraph (radial graph). RGraphs show the degrees of separation between nodes by mapping them to concentric circles.
This interactive RGraph consists of my friends on Facebook. Two states of the graph are shown in the image below. On the left I am at the center of the graph and everyone in the network is one degree away from me. On the right one of my friends is centered and everyone else is either one degree (connected directly) or two degrees away (connected via someone else).
Two states of an RGraph mapping my Facebook friends.
Given that this data consists only of my direct friends, no one in the graph is further than two degrees removed from anyone else. Other data sets could have many more degrees of separation and the RGraph would include additional concentric circles to represent those distances.
Working with content
As interesting as relationships within network graphs are, the content contained within profiles and documents (messages) are often a richer source of data.
Some interesting possibilities include: finding the most vocal people in a community; finding common interests, professions and home towns among a set of profiles; charting the most commonly referenced topics in a set of documents; and reconstructing conversations between one or more people.
Analyzing the contents of @Odopod's Twitter feed
One way to quickly summarize a collection of Twitter messages is to sum the frequency of hash tags, account mentions and links. The interactive arc diagram below shows this type of summary made for Odopod's twitter feed.
Arc diagram of entities mentioned more than once by @Odopod
The list is sorted by frequency of use and the number in parentheses to the right of the entity indicates the frequency. For brevity, the list is limited to items mentioned more than once.
Arcs in the diagram connect items appearing in the same tweet and the width of the arc reflects the number of times the entities were mentioned together. Finally, the size of each node reflects the total number of times the entity was mentioned in a tweet with another item in the list.
This technique can be used on messages from a group of accounts, all tweets referencing links from your site, or any other set of tweets that can be collected.
Additionally, it can be extended beyond the scope of Twitter's hash tags, mentions and links. Using open source text mining libraries, relevant terms can be extracted and allow for similar frequency analysis to be made on any set of text documents.
Capturing #CreativeJS and #creativecoding discussion
It's becoming more common for conversations to be held on Twitter. Following these conversations and returning to them after the fact can be a difficult challenge. Twitter.com does provide a way to expand a given message and see it in context of a conversation, but this is quite limited when there are many people talking in the thread or if there are multiple threads happening simultaneously.
I recently built a proof-of-concept to follow and analyze conversations around specific topics and links. The goal was to determine who was talking and what they were saying.
For testing purposes, I chose #CreativeJS and #creativecoding as topics of interest. More important than my personal interest in creative programming is the fact that @seb_ly, who tweets on these topics, often engages in extended conversations with others.
Once I collected the relevant tweets, I summed the frequency of message authors as well as the other accounts, tags and links mentioned in their tweets. I looked at results for all the tweets as well as two subsets, reweets and conversations (based on replies).
The results of the conversation analysis show the summary information as well as an archive of the conversations occurring during the time that this data sample was collected.
Archive of one conversation mentioning the #CreativeJS tag
In the future we will build on this proof-of-concept and look at things like how conversations unfold over time and if people involved send out additional Tweets to their networks after the conversations take place.
It's a multidisciplinary problem
All the examples shown above are strictly technical exercises to explore the potential of an expanded set of techniques. After spending just a little time working with these tools and APIs, it is clear to me that there is a great opportunity to pull in data and extract compelling and useful content.
These projects will continue to draw on all of Odopod's strengths. When it comes to engaging our audiences and communicating clearly, sophisticated concepts and visualizations are just as critical as our technical ability to wrangle the bits. I look forward to seeing how these ideas evolve and finding ways to include them into more of our work.
What role do you see data mining playing in the future of your work? Let us know in the comments below.
The slides from my original presentation are online and include a few examples not mentioned above.
Just stumbled on this (a little late). This is amazing stuff. The cool part for me is just how sexy those diagrams look.