Model structure

class celltypist.models.Model(clf, scaler, description)[source]

Bases: object

Class that wraps the logistic Classifier and the StandardScaler.

Parameters:

clf – A logistic Classifier incorporated in the loaded model.
scaler – A StandardScaler incorporated in the loaded model.
description – Description of the model as a dictionary.

classifier: The logistic Classifier incorporated in the loaded model.

scaler: The StandardScaler incorporated in the loaded model.

description: Description of the loaded model.

property cell_types: ndarray: Get cell types included in the model.

convert(map_file: str | None = None, sep: str = ',', convert_from: int | None = None, convert_to: int | None = None, unique_only: bool = True, collapse: str = 'average', random_state: int = 0) → None[source]

Convert the model of one species to another species by mapping orthologous genes. Note that when provided with a custom map file, this method can be used to convert genes in the model to other formats (orthologous genes, Ensembl IDs, HGNC IDs, etc.).

Parameters:

map_file – A two-column gene mapping file between two species. Default to a human-mouse (mouse-human) conversion using the built-in mapping file provided by CellTypist.
sep – Delimiter of the mapping file. Default to comma (i.e., a csv file is by default expected from the user if provided).
convert_from – Column index (0 or 1) of the mapping file corresponding to the species converted from. Default to an automatic detection.
convert_to – Column index (0 or 1) of the mapping file corresponding to the species converted to. Default to an automatic detection.
unique_only – Whether to leverage only 1:1 orthologs between the two species. (Default: True)
collapse – The way 1:N orthologs are handled. Possible values are ‘average’ which averages the classifier weights and ‘random’ which randomly chooses one gene’s weights from all its orthologs. This argument is ignored if unique_only = True. (Default: ‘average’)
random_state – Random seed for reproducibility. This argument is only relevant if unique_only = False and collapse = ‘random’.

Returns:

The original model is modified by converting to the other species.

Return type:

None

extract_top_markers(cell_type: str, top_n: int = 10, only_positive: bool = True) → ndarray[source]

Extract the top driving genes for a given cell type.

Parameters:

cell_type – The cell type to extract markers for.
top_n – Number of markers to extract for a given cell type. (Default: 10)
only_positive – Whether to extract positive markers only. Set to False to include negative markers as well. (Default: True)

Returns:

A list of marker genes for the query cell type.

Return type:

ndarray

property features: ndarray: Get genes included in the model.

static load(model: str | None = None)[source]

Load the desired model.

Parameters:: model – Model name specifying the model you want to load. Default to ‘Immune_All_Low.pkl’ if not provided. To see all available models and their descriptions, use models_description().
Returns:: A Model object.
Return type:: Model

predict_labels_and_prob(indata, mode: str = 'best match', p_thres: float = 0.5) → tuple[source]

Get the decision matrix, probability matrix, and predicted cell types for the input data.

Parameters:

indata – The input array-like object used as a query.
mode – The way cell prediction is performed. For each query cell, the default (‘best match’) is to choose the cell type with the largest score/probability as the final prediction. Setting to ‘prob match’ will enable a multi-label classification, which assigns 0 (i.e., unassigned), 1, or >=2 cell type labels to each query cell. (Default: ‘best match’)
p_thres – Probability threshold for the multi-label classification. Ignored if mode is ‘best match’. (Default: 0.5)

Returns:

A tuple of decision score matrix, raw probability matrix, and predicted cell type labels.

Return type:

tuple

write(file: str) → None[source]: Write out the model.