Title: | A Revival of the ClaNC Algorithm |
---|---|
Description: | Classification of microarrays to nearest centroids (ClaNC) <doi:10.1093/bioinformatics/bti756> selects optimal genes for centroids, similar to Prediction Analysis for Microarrays (PAM) but using fewer corrective factors, resulting in greater sensitivity and accuracy. Unfortunately, the original source of ClaNC can no longer be found. 'reclanc' reimplements this algorithm, with the the additional benefit of increased interoperability with standard data structures and modeling ecosystems. |
Authors: | Kai Aragaki [aut, cre] |
Maintainer: | Kai Aragaki <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.0.0.9000 |
Built: | 2025-02-01 05:34:43 UTC |
Source: | https://github.com/KaiAragaki/reclanc |
Calculate centroids from expression data with ClaNC
clanc(x, ...) ## Default S3 method: clanc(x, ...) ## S3 method for class 'data.frame' clanc(x, classes, active, priors = "equal", ...) ## S3 method for class 'matrix' clanc(x, classes, active, priors = "equal", ...) ## S3 method for class 'SummarizedExperiment' clanc(x, classes, active, priors = "equal", assay = 1, ...) ## S3 method for class 'ExpressionSet' clanc(x, classes, active, priors = "equal", ...) ## S3 method for class 'formula' clanc(formula, data, active, priors = "equal", ...) ## S3 method for class 'recipe' clanc(x, data, active, priors = "equal", ...)
clanc(x, ...) ## Default S3 method: clanc(x, ...) ## S3 method for class 'data.frame' clanc(x, classes, active, priors = "equal", ...) ## S3 method for class 'matrix' clanc(x, classes, active, priors = "equal", ...) ## S3 method for class 'SummarizedExperiment' clanc(x, classes, active, priors = "equal", assay = 1, ...) ## S3 method for class 'ExpressionSet' clanc(x, classes, active, priors = "equal", ...) ## S3 method for class 'formula' clanc(formula, data, active, priors = "equal", ...) ## S3 method for class 'recipe' clanc(x, data, active, priors = "equal", ...)
x |
Depending on the context:
Expression should be library-size corrected, but not scaled. If supplying a data frame, matrix, ExpressionSet, SummarizedExperiment, the rows should represent genes, and the columns should represent samples (as is standard for expression data). The column names should be sample IDs, while the row names should be gene IDs. If a recipe is provided, the data should have genes as columns (to match the formula provided to the recipe.) |
... |
Not currently used, but required for extensibility. |
classes |
When
When |
active |
Either a single number or a numeric vector equal to the length of the number of unique class labels. Represents the number class-specific genes that should be selected for a centroid. Note that different numbers of genes can be selected for each class. See details. When |
priors |
Can take a variety of values:
When |
assay |
When a SummarizedExperiment is used, the index or name of the assay |
formula |
A formula specifying the classes on the left-hand side, and the predictor terms on the right-hand side. |
data |
When a recipe or formula is used,
|
The original description of ClaNC can be found here
While active
sets the number of class-specific genes, each centroid will
have more than that number of genes. To explain by way of example, if active = 5
and there are 3 classes, each centroid will have 15 genes, with 5 of
those genes being particular to a given class. If these genes are 'active' in
that class, their values will be the mean of the class. If the genes are not
active in that given class, their values will be the overall expression of
the given gene across all classes.
A clanc
object.
expression_matrix <- synthetic_expression$expression head(expression_matrix) classes <- synthetic_expression$classes classes # data.frame/tibble/matrix interface: clanc(expression_matrix, classes = classes, active = 5, priors = "equal") # Formula interface: # Data must have class included as a column # Genes must be *columns* and samples must be *rows* # Hence the data transposition. for_formula <- data.frame(class = classes, t(expression_matrix)) clanc(class ~ ., for_formula, active = 5, priors = "equal") # Recipes interface: rec <- recipes::recipe(class ~ ., data = for_formula) clanc(rec, for_formula, active = 5, priors = "equal") # SummarizedExperiment interface: se <- SummarizedExperiment::SummarizedExperiment( expression_matrix, colData = data.frame( class = classes, active = 5, prior = c(0.5, 0.5) ) ) clanc(se, classes = "class", active = "active", priors = "equal") # ExpressionSet interface: adf <- data.frame( row.names = colnames(expression_matrix), class = classes ) |> Biobase::AnnotatedDataFrame() es <- Biobase::ExpressionSet(expression_matrix, adf) clanc(es, classes = "class", active = 5, priors = 0.5)
expression_matrix <- synthetic_expression$expression head(expression_matrix) classes <- synthetic_expression$classes classes # data.frame/tibble/matrix interface: clanc(expression_matrix, classes = classes, active = 5, priors = "equal") # Formula interface: # Data must have class included as a column # Genes must be *columns* and samples must be *rows* # Hence the data transposition. for_formula <- data.frame(class = classes, t(expression_matrix)) clanc(class ~ ., for_formula, active = 5, priors = "equal") # Recipes interface: rec <- recipes::recipe(class ~ ., data = for_formula) clanc(rec, for_formula, active = 5, priors = "equal") # SummarizedExperiment interface: se <- SummarizedExperiment::SummarizedExperiment( expression_matrix, colData = data.frame( class = classes, active = 5, prior = c(0.5, 0.5) ) ) clanc(se, classes = "class", active = "active", priors = "equal") # ExpressionSet interface: adf <- data.frame( row.names = colnames(expression_matrix), class = classes ) |> Biobase::AnnotatedDataFrame() es <- Biobase::ExpressionSet(expression_matrix, adf) clanc(es, classes = "class", active = 5, priors = 0.5)
clanc
Predict from a clanc
## S3 method for class 'clanc' predict(object, new_data, type, assay = NULL, format = c("wide", "tall"), ...)
## S3 method for class 'clanc' predict(object, new_data, type, assay = NULL, format = c("wide", "tall"), ...)
object |
A |
new_data |
A data frame or matrix of new predictors. |
type |
A single character. The type of predictions to generate. Valid options are:
|
assay |
If |
format |
Character. Are the data "wide" (default), with genes as columns, or "tall", with genes as rows? |
... |
Not used, but required for extensibility. |
method |
If |
A tibble of predictions. The number of rows in the tibble is guaranteed
to be the same as the number of rows in new_data
.
Synthetic Expression of Two Distinct Classes
synthetic_expression
synthetic_expression
synthetic_expression
A list containing two items:
Normalized log expression of 12 samples across 100 genes
A factor vector of classes of each of the 12 samples