Date of Degree
PhD (Doctor of Philosophy)
Social media offers a powerful outlet for people's thoughts and feelings -- it is an enormous ever-growing source of texts ranging from everyday observations to involved discussions. This thesis contributes to the field of sentiment analysis, which aims to extract emotions and opinions from text. A basic goal is to classify text as expressing either positive or negative emotion. Sentiment classifiers have been built for social media text such as product reviews, blog posts, and even Twitter messages. With increasing complexity of text sources and topics, it is time to re-examine the standard sentiment extraction approaches, and possibly to re-define and enrich sentiment definition. Thus, this thesis begins by introducing a rich multi-dimensional model based on Affect Control Theory and showing its usefulness in sentiment classification. Next, unlike sentiment analysis research to date, we examine sentiment expression and polarity classification within and across various social media streams by building topical datasets. When comparing Twitter, reviews, and blogs on consumer product topics, we show that it is possible, and sometimes even beneficial, to train sentiment classifiers on text sources which are different from the target text. This is not the case, however, when we compare political discussion in YouTube comments to Twitter posts, demonstrating the difficulty of political sentiment classification. We further show that neither discussion volume or sentiment expressed in these streams correspond well to national polls, putting in question recent research linking the two. The complexity of political discussion also calls for a more specific re-definition of "sentiment" as agreement with the author's political stance. We conclude that sentiment must be defined, and tools for its analysis designed, within a larger framework of human interaction.
Affect Control Theory, Data Mining, Sentiment Analysis, Social Media, Text Retrieval
xi, 174 pages
Includes bibliographical references (pages 167-174).
Copyright 2012 Yelena Mejova