Package 'reclanc' reference manual

Title:	A Revival of the ClaNC Algorithm
Description:	Classification of microarrays to nearest centroids (ClaNC) <doi:10.1093/bioinformatics/bti756> selects optimal genes for centroids, similar to Prediction Analysis for Microarrays (PAM) but using fewer corrective factors, resulting in greater sensitivity and accuracy. Unfortunately, the original source of ClaNC can no longer be found. 'reclanc' reimplements this algorithm, with the the additional benefit of increased interoperability with standard data structures and modeling ecosystems.
Authors:	Kai Aragaki [aut, cre] , Alan Dabney [aut, cph] (Original creator of ClaNC)
Maintainer:	Kai Aragaki <[email protected]>
License:	MIT + file LICENSE
Version:	0.0.0.9000
Built:	2025-04-02 05:43:43 UTC
Source:	https://github.com/KaiAragaki/reclanc

Calculate centroids from expression data with ClaNC

Description

Calculate centroids from expression data with ClaNC

Usage

clanc(x, ...)

## Default S3 method:
clanc(x, ...)

## S3 method for class 'data.frame'
clanc(x, classes, active, priors = "equal", ...)

## S3 method for class 'matrix'
clanc(x, classes, active, priors = "equal", ...)

## S3 method for class 'SummarizedExperiment'
clanc(x, classes, active, priors = "equal", assay = 1, ...)

## S3 method for class 'ExpressionSet'
clanc(x, classes, active, priors = "equal", ...)

## S3 method for class 'formula'
clanc(formula, data, active, priors = "equal", ...)

## S3 method for class 'recipe'
clanc(x, data, active, priors = "equal", ...)
clanc(x, ...)

## Default S3 method:
clanc(x, ...)

## S3 method for class 'data.frame'
clanc(x, classes, active, priors = "equal", ...)

## S3 method for class 'matrix'
clanc(x, classes, active, priors = "equal", ...)

## S3 method for class 'SummarizedExperiment'
clanc(x, classes, active, priors = "equal", assay = 1, ...)

## S3 method for class 'ExpressionSet'
clanc(x, classes, active, priors = "equal", ...)

## S3 method for class 'formula'
clanc(formula, data, active, priors = "equal", ...)

## S3 method for class 'recipe'
clanc(x, data, active, priors = "equal", ...)

Arguments

`x`	Depending on the context: A data frame of expression. A matrix of expression. A recipe specifying a set of preprocessing steps created from `recipes::recipe()`. An ExpressionSet. A SummarizedExperiment with `assay` containing expression. Expression should be library-size corrected, but not scaled. If supplying a data frame, matrix, ExpressionSet, SummarizedExperiment, the rows should represent genes, and the columns should represent samples (as is standard for expression data). The column names should be sample IDs, while the row names should be gene IDs. If a recipe is provided, the data should have genes as columns (to match the formula provided to the recipe.)
`...`	Not currently used, but required for extensibility.
`classes`	When `x` is a data frame or matrix, `class` contains class labels with the form of either: A data frame with 1 factor column A factor vector. When `x` is an ExpressionSet or SummarizedExperiment, `class` is the name of the column in `pData(x)` or `colData(x)` that contains classes as a factor.
`active`	Either a single number or a numeric vector equal to the length of the number of unique class labels. Represents the number class-specific genes that should be selected for a centroid. Note that different numbers of genes can be selected for each class. See details. When `x` is an ExpressionSet or SummarizedExperiment, `active` can additionally by the name of the column in `pData(x)` or `colData(x)` that contains the numeric vector
`priors`	Can take a variety of values: "equal" - each class has an equal prior "class" - each class has a prior equal to its frequency in the training set A numeric vector with length equal to number of classes When `x` is an ExpressionSet or SummarizedExperiment, `active` can additionally by the name of the column in `pData(x)` or `colData(x)` that contains the numeric vector
`assay`	When a SummarizedExperiment is used, the index or name of the assay
`formula`	A formula specifying the classes on the left-hand side, and the predictor terms on the right-hand side.
`data`	When a recipe or formula is used, `data` is specified as: A data frame containing both expression and classes, where columns are the genes or class, and rows are the samples.

Details

The original description of ClaNC can be found here

While active sets the number of class-specific genes, each centroid will have more than that number of genes. To explain by way of example, if active = 5 and there are 3 classes, each centroid will have 15 genes, with 5 of those genes being particular to a given class. If these genes are 'active' in that class, their values will be the mean of the class. If the genes are not active in that given class, their values will be the overall expression of the given gene across all classes.

Value

A clanc object.

Examples


expression_matrix <- synthetic_expression$expression
head(expression_matrix)
classes <- synthetic_expression$classes
classes

# data.frame/tibble/matrix interface:

clanc(expression_matrix, classes = classes, active = 5, priors = "equal")

# Formula interface:

# Data must have class included as a column
# Genes must be *columns* and samples must be *rows*
# Hence the data transposition.
for_formula <- data.frame(class = classes, t(expression_matrix))

clanc(class ~ ., for_formula, active = 5, priors = "equal")


# Recipes interface:

rec <- recipes::recipe(class ~ ., data = for_formula)

clanc(rec, for_formula, active = 5, priors = "equal")

# SummarizedExperiment interface:
se <- SummarizedExperiment::SummarizedExperiment(
  expression_matrix,
  colData = data.frame(
    class = classes,
    active = 5,
    prior = c(0.5, 0.5)
  )
)

clanc(se, classes = "class", active = "active", priors = "equal")

# ExpressionSet interface:
adf <- data.frame(
  row.names = colnames(expression_matrix),
  class = classes
) |>
  Biobase::AnnotatedDataFrame()

es <- Biobase::ExpressionSet(expression_matrix, adf)
clanc(es, classes = "class", active = 5, priors = 0.5)

expression_matrix <- synthetic_expression$expression
head(expression_matrix)
classes <- synthetic_expression$classes
classes

# data.frame/tibble/matrix interface:

clanc(expression_matrix, classes = classes, active = 5, priors = "equal")

# Formula interface:

# Data must have class included as a column
# Genes must be *columns* and samples must be *rows*
# Hence the data transposition.
for_formula <- data.frame(class = classes, t(expression_matrix))

clanc(class ~ ., for_formula, active = 5, priors = "equal")


# Recipes interface:

rec <- recipes::recipe(class ~ ., data = for_formula)

clanc(rec, for_formula, active = 5, priors = "equal")

# SummarizedExperiment interface:
se <- SummarizedExperiment::SummarizedExperiment(
  expression_matrix,
  colData = data.frame(
    class = classes,
    active = 5,
    prior = c(0.5, 0.5)
  )
)

clanc(se, classes = "class", active = "active", priors = "equal")

# ExpressionSet interface:
adf <- data.frame(
  row.names = colnames(expression_matrix),
  class = classes
) |>
  Biobase::AnnotatedDataFrame()

es <- Biobase::ExpressionSet(expression_matrix, adf)
clanc(es, classes = "class", active = 5, priors = 0.5)

Predict from a `clanc`

Description

Predict from a clanc

Usage

## S3 method for class 'clanc'
predict(object, new_data, type, assay = NULL, format = c("wide", "tall"), ...)
## S3 method for class 'clanc'
predict(object, new_data, type, assay = NULL, format = c("wide", "tall"), ...)

Arguments

`object`	A `clanc` object.
`new_data`	A data frame or matrix of new predictors.
`type`	A single character. The type of predictions to generate. Valid options are: `"numeric"` for numeric predictions.
`assay`	If `object` inherits `SummarizedExperiment`, the index of the assay.
`format`	Character. Are the data "wide" (default), with genes as columns, or "tall", with genes as rows?
`...`	Not used, but required for extensibility.
`method`	If `type` is `numeric`, the method of correlation

Value

A tibble of predictions. The number of rows in the tibble is guaranteed to be the same as the number of rows in new_data.

Synthetic Expression of Two Distinct Classes

Description

Synthetic Expression of Two Distinct Classes

Usage

synthetic_expression
synthetic_expression

Format

`synthetic_expression`

A list containing two items:

expression: Normalized log expression of 12 samples across 100 genes
classes: A factor vector of classes of each of the 12 samples

Package 'reclanc'

Help Index

Calculate centroids from expression data with ClaNC

Description

Usage

Arguments

Details

Value

Examples

Predict from a clanc

Description

Usage

Arguments

Value

Synthetic Expression of Two Distinct Classes

Description

Usage

Format

synthetic_expression

Predict from a `clanc`

`synthetic_expression`