Trending On NYT vs. Twitter
Posted: October 9th, 2009 | Author: Kunal | Filed under: Assignments | Comments OffMidterm Project Proposal
Stephen Varga & Kunal D Patel
1. Overview
A short summary of what your project will be. Give me your best elevator pitch here.
Our project will be a real-time, split-screen image comparison of trending topic popularity on The New York Times and Twitter. Through analysis of comment activity and article keywords, we hope to generate a list of ‘trending topics’ to compare to Twitter’s well-established system. Both sets of keywords will be mined for relevant images using Bing’s recently announced API, with the images displayed in a side-by-side feed to serve as an immediate visual comparison of active topics amongst both environments. This visualization will compare and contrast popular subjects within a controlled environment in which users are provided content to discuss (NYT articles) versus an uncontrolled, user-generated environment (Twitter).
2. Data
The data sources you will be using
- The Community API – to determine articles with highest current comment rates
- The Article Search API – once ‘trending’ articles have been determined, mine them for keywords
- Twitter API – mine trending topics
- Bing API – search NYT keywords and Twitter trending topics for relevant imagery
3. Design Questions
A set of questions that you intend to answer or explore. At least one question should be about the data itself (i.e., what is the story you’re hoping to tell?), but these questions may also address design methods or technical approaches.
- Are there discernible differences in content between what active participants in formal and informal news networks are interested in?
- Can an analysis of trending topics generated by each user group offer insights into the demographics of the groups?
- How do we ensure that our trending topics are viable metrics for judging user groups?
- How can we regulate image searches for these topics to return relevant images?
4. Prior Art / Precedents
Discuss at least two existing works that are similar in some respect to your proposed project. How do you see your project in relation to this ecosystem of other works? Will it contribute something unique? Will it address problems that you see in other works?
10×10 – Jonathan Harris’ hourly scraping of several international news feeds, visualized as a sorted list of the 100 most “important” words in the news connected to a 10×10 grid of corresponding images.
Pingwire – Allan Grinshtein’s real-time visualization of images from 3 popular Twitter image-hosting services (twitpic, yfrog, and twitgoo).
In both of these works, the role of the user for each network is not readily addressed, which is where our focus lies. Harris uses his own algorithms for determining the 100 most important words in the news, and Pingwire is simply a real-time aggregated feed of hosted images. By approaching data mining and collection from the perspective of the users, we believe we are contributing a unique comparison of the cultures of The Times and Twitter.
5. Collaboration
A brief explanation of how you plan to collaborate with your partner.
For the purposes of data mining, the most effective solution will likely be to split the work between data sources. We both have experience with PHP and mySQL for mining and storing data, and the Bing API offers protocols we are both familiar with (JSON, XML). While we have yet to finalize a programming language for visualization, we will likely use openFrameworks for it’s speed and our familiarity with it. We will plan the visualization, data mining, context and interaction design processes together and split tasks as needed.