Document Type

Dissertation

Date of Degree

Fall 2014

Degree Name

PhD (Doctor of Philosophy)

Degree In

Informatics

First Advisor

David Eichmann

Abstract

Emerging topic detection algorithms have the potential to assist researchers in maintaining awareness of current trends in biomedical fields--a feat not easily achieved with existing methods. Though topic detection algorithms for news-cycles exist, several aspects of this particular area make applying them directly to scientific literature problematic.

This dissertation offers a framework for emerging topic detection in biomedicine. The framework includes a novel set of weightings based on the historical importance of each topic identified. Features such as journal impact factor and funding data are used to develop a fitness score to identify which topics are likely to burst in the future. Characterization of bursts over an extended planning horizon by discipline was performed to understand what a typical burst trend looks like in this space to better understand how to identify important or emerging trends. Cluster analysis was used to create an overlapping hierarchical structure of scientific literature at the discipline level. This allows for granularity adjustment (e.g. discipline level or research area level) in emerging topic detection for different users. Using cluster analysis allows for the identification of terms that may not be included in annotated taxonomies, as they are new or not considered as relevant at the time the taxonomy was last updated. Weighting topics by historical frequency allows for better identification of bursts that are associated with less well-known areas, and therefore more surprising. The fitness score allows for the early identification of bursty terms. This framework will benefit policy makers, clinicians and researchers.

Public Abstract

Emerging topic detection algorithms have the potential to assist researchers in maintaining awareness of current trends in biomedical fields—a feat not easily achieved with existing methods. Though topic detection algorithms for news-cycles exist, several aspects of this particular area make applying them directly to scientific literature problematic.

This dissertation offers a framework for emerging topic detection in biomedicine. The framework includes a novel set of weightings based on the historical importance of each topic identified. Features such as journal impact factor and funding data are used to develop a fitness score to identify which topics are likely to burst in the future. Characterization of bursts over an extended planning horizon by discipline was performed to understand what a typical burst trend looks like in this space to better understand how to identify important or emerging trends. Cluster analysis was used to create an overlapping hierarchical structure of scientific literature at the discipline level. This allows for granularity adjustment (e.g. discipline level or research area level) in emerging topic detection for different users. Using cluster analysis allows for the identification of terms that may not be included in annotated taxonomies, as they are new or not considered as relevant at the time the taxonomy was last updated. Weighting topics by historical frequency allows for better identification of bursts that are associated with less well-known areas, and therefore more surprising. The fitness score allows for the early identification of bursty terms. This framework will benefit policy makers, clinicians and researchers.

Keywords

publicabstract, Burst Detection, Clustering Analysis

Pages

xiii, 110 pages

Bibliography

Includes bibliographical references (pages 99-107).

Copyright

Copyright 2014 Charisse Rene Madlock-Brown

Share

COinS