Document Type

Dissertation

Date of Degree

Spring 2015

Degree Name

PhD (Doctor of Philosophy)

Degree In

Computer Science

First Advisor

Padmini Srinivasan

Abstract

Social media surveillance is becoming more and more popular. However, current surveillance methods do not utilize well-respected surveys, which were established over many decades in domains outside of computer science. Also the evaluation of the previous social media surveillance is not sufficient, especially for surveillance of happiness on social media. These motivated us to develop a general computational methodology for translating a well-known survey into a social media surveillance strategy. Therefore, traditional surveys could be utilized to broaden social media surveillance. The methodology could bridge domains like psychology and social science with computer science. We use life satisfaction on social media as a case study to illustrate our survey-to-surveillance methodology. We start with a famous life satisfaction survey, expand the survey statements to generate templates. Then we use the templates to build queries in our information retrieval system to retrieve the social media posts which could be considered as valid responses to the original survey. Filters were utilized to boost the performance of the retrieval system of our surveillance method.

To evaluate our surveillance method, we developed a novel method to build the gold standard dataset. Instead of evaluating all the data instances like the traditional way, we ask human workers to "find'' as many of the positives as possible in the dataset, the rest are assumed to be negatives. We used the method to build the gold standard dataset for the life satisfaction case study. We also build three more gold standard datasets to further demonstrate the value of our method. Using the life satisfaction gold standard dataset, we show that performance of our surveillance method of life satisfaction outperforms other popular methods (lexicon and machine learning based methods) used by previous researchers.

Using our surveillance method of life satisfaction on social media, we did a comprehensive analysis of life satisfaction expressions on Twitter. We not only show the time series, daily and weekly cycle of life satisfaction on social media, but also found the differences in characteristics for users with different life satisfaction expressions. These include psychosocial features such as anxiety, anger and depression. In addition, we present the geographic distribution of life satisfaction, including the life satisfaction across the U.S. and places around the world. This thesis is the first to systematically explore life satisfaction expressions over Twitter. This is done using computational methods that derive from an established survey on life satisfaction.

Public Abstract

Social media surveillance is becoming more and more popular. However, current surveillance methods do not utilize well-respected surveys, which were established over many decades in domains outside of computer science. Also the current social media surveillance methods are not accurate enough. These motivated us to develop a general computational methodology for translating a well-known survey into a social media surveillance strategy. Therefore, traditional surveys could be utilized to broaden social media surveillance. The methodology could bridge domains like psychology and social science with computer science.

We use life satisfaction on social media as a case study to illustrate our survey-to-surveillance methodology. In addition, we developed a novel method to build the dataset to evaluate our surveillance method. We show the method of building the dataset is solid, and the performance of our surveillance method of life satisfaction outperforms other popular methods used by previous researchers.

Using our surveillance method of life satisfaction on social media, we did a comprehensive analysis of life satisfaction expressions on Twitter. We not only show the time series, daily and weekly cycle of life satisfaction on social media, but also found the differences in characteristics for social media users with different life satisfaction expressions. These include psychosocial features such as anxiety, anger and depression. In addition, we present the geographic distribution of life satisfaction. This thesis is the first to systematically explore life satisfaction expressions over Twitter. This is done using computational methods that derive from an established survey on life satisfaction.

Keywords

publicabstract, Gold Standard Dataset, Life Satisfaction, Social Media Surveillance

Pages

xv, 155 pages

Bibliography

Includes bibliographical references (pages 149-155).

Copyright

Copyright 2015 Chao Yang

Share

COinS