Document Type


Date of Degree


Degree Name

PhD (Doctor of Philosophy)

Degree In

Business Administration

First Advisor

Padmini Srinivasan

First Committee Member

Padmini Srinivasan

Second Committee Member

Ramji Balakrishnan

Third Committee Member

Warren Boe

Fourth Committee Member

Mort Pincus

Fifth Committee Member

Nick Street


Text mining and machine learning methodologies have been applied to biomedicine and business domains for new relationship and knowledge discovery. Company annual reports (or 10K filings), as one of the most important mandatory information disclosures, have remained untapped by the text mining and machine learning community. Previous research indicates that the narrative disclosures in company annual reports can be used to assess the company's short-term financial prospects. In this study, we apply text classification methods to 10K filings to systematically assess the predictive potential of company annual reports. We specify our research problem along five dimensions: financial performance indicators, choice of predictions, evaluation criteria, document representation, and experiment design. Different combinations of the choices we made along the five dimensions provide us with different perspectives and insights into the feasibility of using annual reports to predict company future performance. Our results confirm that predictive models can be successfully built using the textual content of annual reports. Mock portfolios constructed with firms predicted by the text-based model are shown to produce positive average stock return. Sub-sample experiments and post-hoc analysis further confirm that the text-based model is able to catch the textual differences among firms with different financial characteristics. We see a rich set of research questions with the promise of further insight in this research area.


xi, 100 pages


Includes bibliographical references (pages 93-100).


Copyright 2007 Xin Ying Qiu