Document Type


Date of Degree


Degree Name

PhD (Doctor of Philosophy)

Degree In

Business Administration

First Advisor

Srinivasan, Padmini

First Committee Member

Srinivasan, Padmini

Second Committee Member

Balakrishnan, Ramji

Third Committee Member

Boe, Warren

Fourth Committee Member

Pincus, Mort

Fifth Committee Member

Street, Nick


Text mining and machine learning methodologies have been applied to biomedicine and business domains for new relationship and knowledge discovery. Company annual reports (or 10K filings), as one of the most important mandatory information disclosures, have remained untapped by the text mining and machine learning community. Previous research indicates that the narrative disclosures in company annual reports can be used to assess the company's short-term financial prospects. In this study, we apply text classification methods to 10K filings to systematically assess the predictive potential of company annual reports. We specify our research problem along five dimensions: financial performance indicators, choice of predictions, evaluation criteria, document representation, and experiment design. Different combinations of the choices we made along the five dimensions provide us with different perspectives and insights into the feasibility of using annual reports to predict company future performance. Our results confirm that predictive models can be successfully built using the textual content of annual reports. Mock portfolios constructed with firms predicted by the text-based model are shown to produce positive average stock return. Sub-sample experiments and post-hoc analysis further confirm that the text-based model is able to catch the textual differences among firms with different financial characteristics. We see a rich set of research questions with the promise of further insight in this research area.


xi, 100 pages


Includes bibliographical references (pages 93-100).


Copyright 2007 Xin Ying Qiu