Self updating map
This report details the creation and study of a road map of the TCGA’s HTTP repository, created to enable the use of this unprecedented biomolecular data resource in the creation of Web 3.0 applications, and enhance the reproducibility of biomolecular research delivered as elements of a computational ecosystem.In a previous report (Deus et al., 2010), the authors have identified a Resource Description Framework (RDF) data model describing the contents of TCGA file repository.The development of the TCGA Roadmap engine was found to provide specific clues about how biomedical big data initiatives should be exposed as public resources for exploratory analysis, data mining and reproducible research.These specific design elements align with the concept of knowledge reengineering and represent a sharp departure from top-down approaches in grid initiatives such as Ca BIG. Almeida 1 2 Associate Editor: Janet Kelso 0 Digital Enterprise Research Institute (DERI), National University of Ireland , Galway , Ireland 1 Department of Electrical and Computer Engineering, University of Alabama at Birmingham , Birmingham, AL 35249 , USA 2 Division of Informatics, Department of Pathology, University of Alabama at Birmingham , Birmingham, AL 35233-7331 , USA Motivation: Since 2011, The Cancer Genome Atlas' (TCGA) files have been accessible through HTTP from a public site, creating entirely new possibilities for cancer informatics by enhancing data discovery and retrieval.They also present a much more interoperable and reproducible alternative to the still pervasive use of data portals. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser. Significantly, these enhancements enable the reporting of analysis results that can be fully traced to and reproduced using their source data.Availability: A prepared dashboard, including links to source code and a SPARQL endpoint, is available at Alternatively, you can download the file locally and open with any standalone PDF reader: https://bioinformatics.oxfordjournals.org/content/29/10/1333pdf Advance Access publication April A self-updating road map of The Cancer Genome Atlas David E. However, to realize this possibility, a continually updated road map of files in the TCGA is required.In the past 2 years, the third generation of web technologies (Hendler, 2009) matured to the extent that they now provide the foundation for multiple big data resources, such as those integrated by (Hendler et al., 2012).
Specifically, this engine uses Java Script in conjunction with the World Wide Web Consortium’s (W3C) Resource Description Framework (RDF), and SPARQL, the query language for RDF, to capture metadata of files in the TCGA open-access HTTP directory.Simultaneous to the expansion of the TCGA, the tooling required for enabling computational ecosystems for data-driven medical genomics (Almeida, 2010) is maturing rapidly, to the point that tools operating within and providing such ecosystems are beginning to appear (Almeida et al., 2012b).The concern that the web browser is computationally inefficient for advanced numerical procedures has also been amply overcome, as we found when identifying sequence analysis procedures making use of the Map Reduce (Dean and Ghemawat, 2008) distributed computing template (Almeida et al., 2012a; Vinga et al., 2012).The resulting index may be queried using SPARQL, and enables file-level provenance annotations as well as discovery of arbitrary subsets of files, based on their metadata, using web standard languages.In turn, these abilities enhance the reproducibility and distribution of novel results delivered as elements of a web-based computational ecosystem.