DOI of Published Version
The rapid pace of bioscience research makes it very challenging to track relevant articles in one’s area of interest. MEDLINE, a primary source for biomedical literature, offers access to more than 20 million citations with three-quarters of a million new ones added each year. Thus it is not surprising to see active research in building new document retrieval and sentence retrieval systems. We present Ferret, a prototype retrieval system, designed to retrieve and rank sentences (and their documents) conveying gene-centric relationships of interest to a scientist. The prototype has several features. For example, it is designed to handle gene name ambiguity and perform query expansion. Inputs can be a list of genes with an optional list of keywords. Sentences are retrieved across species but the species discussed in the records are identified. Results are presented in the form of a heat map and sentences corresponding to specific cells of the heat map may be selected for display. Ferret is designed to assist bio scientists at different stages of research from early idea exploration to advanced analysis of results from bench experiments.
Three live case studies in the field of plant biology are presented related to Arabidopsis thaliana. The first is to discover genes that may relate to the phenotype of open immature flower in Arabidopsis. The second case is about finding associations reported between ethylene signaling and a set of 300+ Arabidopsis genes. The third case is on searching for potential gene targets of an Arabidopsis transcription factor hypothesized to be involved in plant stress responses. Ferret was successful in finding valuable information in all three cases. In the first case the bZIP family of genes was identified. In the second case sentences indicating relevant associations were found in other species such as potato and jasmine. In the third sentences led to new research questions about the plant hormone salicylic acid.
Ferret successfully retrieved relevant gene-centric sentences from PubMed records. The three case studies demonstrate end user satisfaction with the system.
OAfund, Gene-centric relationships, Sentence retrieval, Text retrieval, Sentence ranking, Scientific workflow
Granting or Sponsoring Agency
NSF DBI & MCB
1146256 (PS), 1146300 & 0950158 (XZ), 1147144 & 0923796 (CC)
Journal Article Version
Version of Record
Published Article/Book Citation
BMC Bioinformatics 2015, 16:198. doi:10.1186/s12859-015-0630-0
© 2015 Srinivasan et al.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.