Extends SingleCellExperiment, with additional validity checks on the metadata() slot.

Chromium(dir, format = c("mtx", "hdf5"), filtered = TRUE,
  sampleMetadataFile = NULL, organism = NULL, ensemblRelease = NULL,
  genomeBuild = NULL, refdataDir = NULL, gffFile = NULL,
  transgeneNames = NULL, spikeNames = NULL,
  interestingGroups = "sampleName")

Arguments

dir

character(1). Path to Cell Ranger output directory (final upload). This directory path must contain filtered_gene_bc_matrices* as a child directory.

format

character(1). Output format, either MatrixMarket ("mtx") or HDF5 ("hdf5").

filtered

logical(1). Use filtered (recommended) or raw counts. Note that raw counts still contain only whitelisted cellular barcodes.

sampleMetadataFile

character(1). Sample metadata file path. CSV or TSV is preferred, but Excel worksheets are also supported. Check the documentation for conventions and required columns.

organism

character(1). Full Latin organism name (e.g. "Homo sapiens").

ensemblRelease

integer(1). Ensembl release version (e.g. 90). We recommend setting this value if possible, for improved reproducibility. When left unset, the latest release available via AnnotationHub/ensembldb is used. Note that the latest version available can vary, depending on the versions of AnnotationHub and ensembldb in use.

genomeBuild

character(1). Ensembl genome build assembly name (e.g. "GRCh38"). If set NULL, defaults to the most recent build available. Note: don't pass in UCSC build IDs (e.g. "hg38").

refdataDir

character(1) or NULL. Directory path to Cell Ranger reference annotation data.

gffFile

character(1). GFF/GTF (General Feature Format) file. Generally, we recommend using a GTF (GFFv2) instead of a GFFv3 file if possible.

transgeneNames

character. Vector indicating which assay rows denote transgenes (e.g. EGFP, TDTOMATO).

spikeNames

character. Vector indicating which assay rows denote spike-in sequences (e.g. ERCCs).

interestingGroups

character. Groups of interest that define the samples. If left unset, defaults to sampleName.

Value

Chromium.

Details

Read 10x Genomics Cell Ranger output for a Chromium data set into a SingleCellExperiment object.

Directory structure

Cell Ranger can vary in its output directory structure, but we're requiring a single, consistent directory structure for all datasets, even those that only contain a single sample:

file.path(
    "<dir>",
    "<sampleName>",
    "outs",
    "filtered_gene_bc_matrices*",
    "<genomeBuild>",
    "matrix.mtx"
)

Sample metadata

A user-supplied sample metadata file defined by sampleMetadataFile is required for multiplexed datasets. Otherwise this can be left NULL, and minimal sample data will be used, based on the directory names.

Reference data

We strongly recommend supplying the corresponding reference data required for Cell Ranger with the refdataDir argument. It will convert the gene annotations defined in the GTF file into a GRanges object, which get slotted in rowRanges. Otherwise, the function will attempt to use the most current annotations available from Ensembl, and some gene IDs may not match, due to deprecation in the current Ensembl release.

See also

Chromium.

Examples

dir <- system.file("extdata/cellranger", package = "Chromium") x <- Chromium(dir)
#> Failed to detect sequencing lanes.
#> Importing counts.
#> Unknown organism. Skipping annotations.
#> Calculating cellular barcode metrics.
#> 100 cells detected.
#> Calculating metrics without biotype information. #> `rowRanges` is required to calculate: nCoding, nMito, mitoRatio
#> 81 / 100 cellular barcodes passed pre-filtering (81.0%)
#> class: Chromium #> dim: 100 81 #> metadata(15): version pipeline ... wd sessionInfo #> assays(1): counts #> rownames(100): ENSG00000008128 ENSG00000008130 ... ENSG00000279457 #> ENSG00000279928 #> rowData names(0): #> colnames(81): AAACCTGAGACAGACC AAACCTGAGCGCCTCA ... AACTCAGTCCAACCAA #> AACTCCCAGAAACCTA #> colData names(7): nUMI nGene ... mitoRatio sampleID #> reducedDimNames(0): #> spikeNames(0):