Document Type

Article

Peer Reviewed

1

Publication Date

9-15-2016

NLM Title Abbreviation

Cancer Inform

Journal/Book/Conference Title

Cancer Informatics

PubMed ID

27679461

DOI of Published Version

10.4137/CIN.S40043

Abstract

Discovering important genes that account for the phenotype of interest has long been a challenge in genome-wide expression analysis. Analyses such as gene set enrichment analysis (GSEA) that incorporate pathway information have become widespread in hypothesis testing, but pathway-based approaches have been largely absent from regression methods due to the challenges of dealing with overlapping pathways and the resulting lack of available software. The R package grpreg is widely used to fit group lasso and other group-penalized regression models; in this study, we develop an extension, grpregOverlap, to allow for overlapping group structure using a latent variable approach. We compare this approach to the ordinary lasso and to GSEA using both simulated and real data. We find that incorporation of prior pathway information can substantially improve the accuracy of gene expression classifiers, and we shed light on several ways in which hypothesis-testing approaches such as GSEA differ from regression approaches with respect to the analysis of pathway data.

Keywords

OAfund, overlapping group lasso, penalized logistic regression, gene set enrichment analysis, pathway selection

Granting or Sponsoring Agency

NIH

Grant Number

5-P30-CA086862

Journal Article Version

Version of Record

Published Article/Book Citation

Cancer Informatics 2016:15 179–187 doi: 10.4137/CIN.S40043

Rights

Copyright © 2016 the authors

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial 3.0 License

Included in

Biostatistics Commons

Share

COinS
 

URL

https://ir.uiowa.edu/biostat_pubs/3