Document Type

Dissertation

Date of Degree

Spring 2012

Degree Name

PhD (Doctor of Philosophy)

Degree In

Computer Science

First Advisor

Padmini Srinivasan

Abstract

Social media offers a powerful outlet for people's thoughts and feelings -- it is an enormous ever-growing source of texts ranging from everyday observations to involved discussions. This thesis contributes to the field of sentiment analysis, which aims to extract emotions and opinions from text. A basic goal is to classify text as expressing either positive or negative emotion. Sentiment classifiers have been built for social media text such as product reviews, blog posts, and even Twitter messages. With increasing complexity of text sources and topics, it is time to re-examine the standard sentiment extraction approaches, and possibly to re-define and enrich sentiment definition. Thus, this thesis begins by introducing a rich multi-dimensional model based on Affect Control Theory and showing its usefulness in sentiment classification. Next, unlike sentiment analysis research to date, we examine sentiment expression and polarity classification within and across various social media streams by building topical datasets. When comparing Twitter, reviews, and blogs on consumer product topics, we show that it is possible, and sometimes even beneficial, to train sentiment classifiers on text sources which are different from the target text. This is not the case, however, when we compare political discussion in YouTube comments to Twitter posts, demonstrating the difficulty of political sentiment classification. We further show that neither discussion volume or sentiment expressed in these streams correspond well to national polls, putting in question recent research linking the two. The complexity of political discussion also calls for a more specific re-definition of "sentiment" as agreement with the author's political stance. We conclude that sentiment must be defined, and tools for its analysis designed, within a larger framework of human interaction.

Keywords

Affect Control Theory, Data Mining, Sentiment Analysis, Social Media, Text Retrieval

Pages

xi, 174 pages

Bibliography

Includes bibliographical references (pages 167-174).

Copyright

Copyright 2012 Yelena Mejova

Share

COinS