Model structure

class celltypist.models.Model(clf, scaler, description)[source]

Bases: object

Class that wraps the logistic Classifier and the StandardScaler.

Parameters:
  • clf – A logistic Classifier incorporated in the loaded model.

  • scaler – A StandardScaler incorporated in the loaded model.

  • description – Description of the model as a dictionary.

classifier

The logistic Classifier incorporated in the loaded model.

scaler

The StandardScaler incorporated in the loaded model.

description

Description of the loaded model.

property cell_types: ndarray

Get cell types included in the model.

convert(map_file: str | None = None, sep: str = ',', convert_from: int | None = None, convert_to: int | None = None, unique_only: bool = True, collapse: str = 'average', random_state: int = 0) None[source]

Convert the model of one species to another species by mapping orthologous genes. Note that when provided with a custom map file, this method can be used to convert genes in the model to other formats (orthologous genes, Ensembl IDs, HGNC IDs, etc.).

Parameters:
  • map_file – A two-column gene mapping file between two species. Default to a human-mouse (mouse-human) conversion using the built-in mapping file provided by CellTypist.

  • sep – Delimiter of the mapping file. Default to comma (i.e., a csv file is by default expected from the user if provided).

  • convert_from – Column index (0 or 1) of the mapping file corresponding to the species converted from. Default to an automatic detection.

  • convert_to – Column index (0 or 1) of the mapping file corresponding to the species converted to. Default to an automatic detection.

  • unique_only – Whether to leverage only 1:1 orthologs between the two species. (Default: True)

  • collapse – The way 1:N orthologs are handled. Possible values are ‘average’ which averages the classifier weights and ‘random’ which randomly chooses one gene’s weights from all its orthologs. This argument is ignored if unique_only = True. (Default: ‘average’)

  • random_state – Random seed for reproducibility. This argument is only relevant if unique_only = False and collapse = ‘random’.

Returns:

The original model is modified by converting to the other species.

Return type:

None

extract_top_markers(cell_type: str, top_n: int = 10, only_positive: bool = True) ndarray[source]

Extract the top driving genes for a given cell type.

Parameters:
  • cell_type – The cell type to extract markers for.

  • top_n – Number of markers to extract for a given cell type. (Default: 10)

  • only_positive – Whether to extract positive markers only. Set to False to include negative markers as well. (Default: True)

Returns:

A list of marker genes for the query cell type.

Return type:

ndarray

property features: ndarray

Get genes included in the model.

static load(model: str | None = None)[source]

Load the desired model.

Parameters:

model – Model name specifying the model you want to load. Default to ‘Immune_All_Low.pkl’ if not provided. To see all available models and their descriptions, use models_description().

Returns:

A Model object.

Return type:

Model

predict_labels_and_prob(indata, mode: str = 'best match', p_thres: float = 0.5) tuple[source]

Get the decision matrix, probability matrix, and predicted cell types for the input data.

Parameters:
  • indata – The input array-like object used as a query.

  • mode – The way cell prediction is performed. For each query cell, the default (‘best match’) is to choose the cell type with the largest score/probability as the final prediction. Setting to ‘prob match’ will enable a multi-label classification, which assigns 0 (i.e., unassigned), 1, or >=2 cell type labels to each query cell. (Default: ‘best match’)

  • p_thres – Probability threshold for the multi-label classification. Ignored if mode is ‘best match’. (Default: 0.5)

Returns:

A tuple of decision score matrix, raw probability matrix, and predicted cell type labels.

Return type:

tuple

write(file: str) None[source]

Write out the model.