-
Notifications
You must be signed in to change notification settings - Fork 19
Convert ExtendingGenomicRanges.Rnw to .Rmd #69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
hsadia538
wants to merge
5
commits into
Bioconductor:devel
Choose a base branch
from
hsadia538:extGenomicRanges-rmd
base: devel
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
7a4c48c
Remove ExtendingGenomicRanges.Rnw
hsadia538 e1cb01f
Convert ExtendingGenomicRanges.Rnw to ExtendingGenomicRanges.Rmd
hsadia538 c894b54
Update ExtendingGenomicRanges.Rmd
hsadia538 24992b2
updated links, added missing words, wrap lines
sonali8434 1ecc930
Add backtick, set message to false
jwokaty File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,98 @@ | ||
| --- | ||
| title: "Extending *GenomicRanges*" | ||
| author: | ||
| - name: "Michael Lawrence" | ||
| - name: "Bioconductor Team" | ||
| date: "Edited: Oct 2014; Compiled: `r format(Sys.time(), '%d %B, %Y')`" | ||
| package: GenomicRanges | ||
| vignette: > | ||
| %\VignetteIndexEntry{Extending Genomic Ranges} | ||
| %\VignetteEncoding{UTF-8} | ||
| %\VignetteEngine{knitr::rmarkdown} | ||
| output: | ||
| BiocStyle::html_document: | ||
| number_sections: yes | ||
| toc: yes | ||
| toc_depth: 4 | ||
| --- | ||
|
|
||
| # Introduction | ||
|
|
||
| The goal of `r Biocpkg("GenomicRanges")` is to provide general containers for | ||
| genomic data. The central class, at least from the user perspective, is | ||
| *GRanges*, which formalizes the notion of ranges, while allowing for arbitrary | ||
| "metadata columns" to be attached to it. These columns offer the same | ||
| flexibility as the venerable *data.frame* and permit users to adapt *GRanges* to | ||
| a wide variety of *adhoc* use-cases. | ||
|
|
||
| The more we encounter a particular problem, the better we understand it. We | ||
| eventually develop a systematic approach for solving the most frequently | ||
| encountered problems, and every systematic approach deserves a systematic | ||
| implementation. For example, we might want to formally store genetic variants, | ||
| with information on alleles and read depths. The metadata columns, which were so | ||
| useful during prototyping, are inappropriate for extending the formal semantics | ||
| of our data structure: for the sake of data integrity, we need to ensure that | ||
| the columns are always present and that they meet certain constraints. | ||
|
|
||
| We might also find that our prototype does not scale well to the increased data | ||
| volume that often occurs when we advance past the prototype stage. *GRanges* is | ||
| meant mostly for prototyping and stores its data in memory as simple R data | ||
| structures. We may require something more specialized when the data are large; | ||
| for example, we might store the data as a Tabix-indexed file, or in a database. | ||
|
|
||
| The `r Biocpkg("GenomicRanges")` package does not directly solve either of these | ||
| problems, because there are no general solutions. However, it is adaptable to | ||
| specialized use cases. | ||
|
|
||
| # The *GenomicRanges* abstraction | ||
|
|
||
| Unbeknownst to many, most of the *GRanges* implementation is provided by methods | ||
| on the *GenomicRanges* class, the virtual parent class of *GRanges*. | ||
| *GenomicRanges* methods provide everything except for the actual data storage | ||
| and retrieval, which *GRanges* implements directly using slots. For example, the | ||
| ranges are retrieved like this: | ||
|
|
||
| ```{r granges-ranges, message=FALSE} | ||
| library(GenomicRanges) | ||
| selectMethod(ranges, "GRanges") | ||
| ``` | ||
|
|
||
| An alternative implementation is *DelegatingGenomicRanges*, which stores all of its data in a delegate *GenomicRanges* object: | ||
|
|
||
| ```{r delegating-granges-ranges} | ||
| selectMethod(ranges, "DelegatingGenomicRanges") | ||
| ``` | ||
|
|
||
| This abstraction enables us to pursue more efficient implementations for | ||
| particular tasks. One example is *GNCList*, which is indexed for fast range | ||
| queries, we expose here: | ||
|
|
||
| ```{r gnclist-granges} | ||
| getSlots("GNCList")["granges"] | ||
| ``` | ||
|
|
||
| The `r Biocpkg("MutableRanges")` package in svn provides other, untested | ||
| examples. | ||
|
|
||
| # Formalizing `mcols`: Extra column slots | ||
|
|
||
| An orthogonal problem to data storage is adding semantics by the formalization | ||
| of metadata columns, and we solve it using the "extra column slot" mechanism. | ||
| Whenever *GenomicRanges* needs to operate on its metadata columns, it also | ||
| delegates to the internal `extraColumnSlotNames` generic, methods of which | ||
| should return a character vector, naming the slots in the *GenomicRanges* | ||
| subclass that correspond to columns (i.e., they have one value per range). It | ||
| extracts the slot values and manipulates them as it would a metadata column -- | ||
| except they are now formal slots, with formal types. | ||
|
|
||
| An example is the *VRanges* class in `r Biocpkg("VariantAnnotation")`. It stores | ||
| information on the variants by adding these column slots: | ||
|
|
||
| ```{r vranges, message=FALSE, warning=FALSE} | ||
| GenomicRanges:::extraColumnSlotNames(VariantAnnotation:::VRanges()) | ||
| ``` | ||
|
|
||
| Mostly for historical reasons, *VRanges* extends *GRanges*. However, since the | ||
| data storage mechanism and the set of extra column slots are orthogonal, it is | ||
| probably best practice to take a composition approach by extending | ||
| *DelegatingGenomicRanges*. | ||
This file was deleted.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.