Date of Degree
PhD (Doctor of Philosophy)
Over the past thirty years, clinical research has benefited substantially from the adoption of electronic medical record systems. As deployment has increased, so too has the number of researchers seeking to improve the overall analytical environment by way of tools and models. Although much work has been done, there are still many uninvestigated areas; two of which are explored in this dissertation.
The first pertains to the physical storage of the data itself. There are two generally accepted storage models: relational and entity-attribute-value (EAV). For clinical data, EAV systems are preferred due to their natural way of managing many-to-many relationships, sparse attributes, and dynamic processes along with minimal conversion effort and reduction in federation complexities. However, the relational database management systems on which they are implemented, are not intended to organize and retrieve data in this format; eroding their performance gains. To combat this effect, we present the foundation for an EAV Database Management System (EDBMS). We discuss data conversion methodologies, formulate the requisite metadata and partitioned type-sensing index structures, and provide detailed runtime and experimental analysis with five extant methods. Our results show that the prototype, EAVDB, reduces space and conversion requirements while enhancing overall query performance.
The second topic concerns query performance in a federated environment. One method used to decrease query execution time, is to pre-compute and store "beneficial" queries (views). The View Selection Problem (VSP) identifies these views subject to resource constraints. A federated model, however, has yet to be developed. In this dissertation, we submit three advances in view materialization. First, a more robust optimization function, the Minimum-Maintenance View Selection Problem (MMVSP), is derived by combining existing approaches. Second, the Federated View Selection Problem (FVSP), built upon the MMVSP, and federated data cube lattice are formalized. The FVSP allows for multiple querying nodes, partial and full materialization, and data propagation constriction. The latter two are shown to greatly reduce the overall number of valid solutions within the solution space and thus a novel, multi-tiered approach is given. Lastly, EAV materialization, which is introduced in this dissertation, is incorporated into an expanded, multi-modal variant of the FVSP. As models and heuristics for both the federated and EAV VSP, to the best of our knowledge, do not exist, this research defines two new branches of data warehouse optimization. Coupled with our EDBMS design, this dissertation confronts two main challenges associated with clinical data warehousing and federation.
Clinical data warehousing, EAV database management system, EAV view selection problem, Entity-attribute-value system, Federated view selection problem
xix, 284 pages
Includes bibliographical references (pages 259-284).
Copyright 2013 Ray Hylock