Usage#

General command line usage#

Flag

Type

Description

Default

-d, –data

Path

Path to the h5ad file.

data/adata_raw.h5ad

-o, –output

Path

Path to the output folder for the processed data.

results

-p, –parameters

Path

Path to the config file.

config.gin

-v, –verbose

Flag

Generate verbose logs.

By default, moresca expects the data in H5AD format to be in data. The output directory as well as figure folders specified in the config are generated on the fly if they don’t exist yet.

Currently, the script will perform the most common operations from doublet removal to DEG analysis of found clusters. If you want to apply ambient RNA correction beforehand, you need to run this separately.

The following example executes the pipeline with the h5ad file example_data.h5ad and the parameter file config.gin, saving the output in the folder results.

moresca -d example_data.h5ad -o results -p config.gin

Using the config.gin#

By default, the used parameter file looks like this:

# config.gin
quality_control:
    apply = True
    doublet_removal = True
    outlier_removal = True
    min_genes = 200
    min_counts = None
    max_counts = None
    min_cells = 10
    max_genes = None
    mt_threshold = 15
    rb_threshold = 10
    hb_threshold = 1
    figures = "figures/"
    pre_qc_plots = True
    post_qc_plots = True

normalization:
    apply = True
    method = "log1pPF"
    remove_mt = False
    remove_rb = False
    remove_hb = False

feature_selection:
    apply = True
    species = "hsapiens"
    method = "seurat_v3"
    number_features = 2000

scaling:
    apply = True
    max_value = None

pca:
    apply = True
    n_comps = 100
    use_highly_variable = True

batch_effect_correction:
    apply = False
    method = "harmony"
    batch_key = None

neighborhood_graph:
    apply = True
    n_neighbors = 30
    n_pcs = None
    metric = "cosine"

clustering:
    apply = True
    method = "leiden"
    resolution = 1.0

diff_gene_exp:
    apply = True
    method = "wilcoxon"
    groupby = "leiden_r1.0"
    use_raw = False
    layer = "unscaled"
    tables = None

umap:
    apply = True

plotting:
    apply = True
    umap = True
    path = "figures/"

The following values of the parameters are currently possible

Note

The parameter species is only used for the feature selection based on anti-correlation. For available values check here. It defaults to hsapiens.

Parameter

Values

Description

quality_control

apply

bool

Whether to apply the quality control steps or not.

doublet_removal

bool

Whether to perform doublet removal or not.

outlier_removal

bool

Whether to remove outliers or not.

min_genes

int, float, bool, None

The minimum number of genes required for a cell to pass quality control.

min_counts

int, float, bool, None

The minimum total counts required for a cell to pass quality control.

max_counts

int, float, bool, None

The maximum total counts allowed for a cell to pass quality control.

min_cells

int, float, bool, None

The minimum number of cells required for a gene to pass quality control.

max_genes

int, float, str, bool, None

The maximum number of genes allowed for a cell to pass quality control.

mt_threshold

int, float, str, bool, None

The threshold for the percentage of counts in mitochondrial genes (maximum).

rb_threshold

int, float, str, bool, None

The threshold for the percentage of counts in ribosomal genes (minimum).

hb_threshold

int, float, str, bool, None

The threshold for the percentage of counts in hemoglobin genes (maximum).

figures

str, Path, None

The path to the output directory for the quality control plots.

pre_qc_plots

bool, None

Whether to generate plots of QC covariates before quality control or not.

post_qc_plots

bool, None

Whether to generate plots of QC covariates after quality control or not.

normalization

apply

bool

Whether to apply the normalization steps or not.

method

log1pCP10k, log1pPF, PFlog1pPF, analytical_pearson, None, False

The normalization method to use.

remove_mt

bool, None

Whether to remove mitochondrial genes or not.

remove_rb

bool, None

Whether to remove ribosomal genes or not.

remove_hb

bool, None

Whether to remove hemoglobin genes or not.

feature_selection

apply

bool

Whether to apply the feature selection steps or not.

method

seurat, seurat_v3, analytical_pearson, anti_correlation, triku, hotspot, None, False

The feature selection method to use.

species

str

Species of the data. Only used if feature_selection=anti_correlation

number_features

int, None

The number of top features to select (only applicable for certain methods).

scaling

apply

bool

Whether to apply the scaling step or not.

max_value

int, float, None

The maximum value to which the data will be scaled. If None, the data will be scaled to unit variance.

pca

apply

bool

Whether to apply the PCA or not.

n_comps

int, float

The number of principal components to compute. A float is interpreted as the proportion of the total variance to retain.

use_highly_variable

bool

Whether to use highly variable genes for PCA computation.

batch_effect_correction

apply

bool

Whether to apply the batch effect correction or not.

method

harmony, None, False

The batch effect correction method to use.

batch_key

str

The key in adata.obs that identifies the batches.

neighborhood_graph

apply

bool

Whether to compute the neighborhood graph or not.

n_neighbors

int

The number of neighbors to consider for each cell.

n_pcs

int, None

The number of principal components to use for the computation.

metric

str

The distance metric to use for computing the neighborhood graph.

clustering

apply

bool

Whether to perform clustering or not.

method

leiden, phenograph, None, False

The clustering method to use.

resolution

float, int, list, tuple, auto

The resolution parameter for the clustering method. Can be a single value or a list of values.

diff_gene_exp

apply

bool

Whether to perform differential gene expression analysis or not.

method

wilcoxon, t-test, logreg, t-test_overestim_var

The differential gene expression analysis method to use.

groupby

str

The key in adata.obs that identifies the groups for comparison.

use_raw

bool, None

Whether to use the raw gene expression data or not.

layer

str, None

The layer in adata.layers to use for the differential gene expression analysis.

corr_method

benjamini-hochberg, bonferroni

The method to use for multiple testing correction.

tables

str, Path, None

The path to the output directory for the differential expression tables.

umap

apply

bool

Whether to run UMAP or not.

plotting

apply

bool

Whether to create plots or not.

umap

bool

Whether to plot the UMAP or not.

path

str, Path

The path to the output directory for the plots.