Millions of news, blog articles, and status updates are published every day. You like to know what is going on in the news (let’s just call it news, even if it’s actually a blog post). So what are the options you have in the Internet?
You rely on your favourite manually curated website that selects the news for you. You may read through a few of such websites. This gives you access to content with predictably good quality. But at the same time, you relinquish your control of what information (and what opinions) get a chance to appear before your eyes. You are under a total control of your selected news source(s).
You visit your favourite social voting site (Reddit, Hacker News, Digg). You look through top 20 links. You read the discussions. This gives you something to feed your brain with. You expect the community to produce the top links that would be generally relevant to your area of interest. This is great, as it gives you a fine entry point to the world of information. But this is also very limited. It’s like using MTV top 20 to get to know what’s going on in the world of music.
The news are grouped by how similar they are to each other, and then packed into a predefined set of standard sections. Think Google News, Yahoo News, News360. You can browse overall top news. You can also choose a section and browse top news within that section. Within each group of news, you can explore other articles – which provides you with an additional dose of objectivity that is lacking within the curated media and social voting sites. Already better. But is this all that can be achieved? No.
Don’t people all have different interests? Let’s give them a chance to tell us what their interests are, and we can learn from them, and show them better quality news tailored for each one of them! Of course, we cannot ask people to actually type in their key words, people are so lazy these days. But surely people can click like or dislike button on a specific news article. We can then use sophisticated algorithms to infer the interests of a specific person, and then overlay this on top of our aggregated news feed, and make everyone happy by showing them the news they would actually like. Think Zite, and the above aggregators.
Isn’t it great? No. Here are the reasons why (and what can be done about it).
Here are preliminary results for sentiment analysis on Intel (INTC) in 2011. Each dot represents a news article. The horisontal axis shows time. The vertical axis shows news article sentiment. If you open the large picture (click on it), you will see there is some correlation between positive / negative sentiment and subsequent price changes.
I am beginning a new foray deeper into computational linguistics in the area of sentiment analysis. In this post I want to outline the main goals and the reasons behind them. Later I am planning to post updates on the progress.
Exploratory Search. We have been working for some time on exploratory search at Readrz. The goal was to allow people to explore a collection of texts. Overall, this is a very big topic. The texts can be explored along semantical dimensions, and along the time dimension if texts are timestamped. Semantical dimensions can be predefined categories, tag clouds, search queries, or more vaguely defined topics (more on this in next paragraph). When semantical dimensions are specified manually as categories/tags and then associated with documents by people working with them, exploratory search becomes a relatively easy task. We can implement filtering by the defined categories and by time. We can let people see related categories by inferring them from associations of documents with multiple categories. We can also easily compute the “activity” in different categories over time and present graphs to the users, and let them navigate through time. This have been implemented in many online systems that allow tagging documents, photos, etc.
Several lessons learned from implementing exploratory search at Readrz (so far):
Today we have released a first version of search! At the moment, you can only search within the currently selected section on the left. We are working on cross-sectional global search, and it will be released in the near future. Below are example of some searches…
Search for “football” within the Celebrities section brings up Daniel Redcliffe’s opinions about footballers plus another story about the romance of a famous footballer: