Model structure
- class celltypist.models.Model(clf, scaler, description)[source]
Bases:
object
Class that wraps the logistic Classifier and the StandardScaler.
- Parameters:
clf – A logistic Classifier incorporated in the loaded model.
scaler – A StandardScaler incorporated in the loaded model.
description – Description of the model as a dictionary.
- classifier
The logistic Classifier incorporated in the loaded model.
- scaler
The StandardScaler incorporated in the loaded model.
- description
Description of the loaded model.
- convert(map_file: str | None = None, sep: str = ',', convert_from: int | None = None, convert_to: int | None = None, unique_only: bool = True, collapse: str = 'average', random_state: int = 0) None [source]
Convert the model of one species to another species by mapping orthologous genes. Note that when provided with a custom map file, this method can be used to convert genes in the model to other formats (orthologous genes, Ensembl IDs, HGNC IDs, etc.).
- Parameters:
map_file – A two-column gene mapping file between two species. Default to a human-mouse (mouse-human) conversion using the built-in mapping file provided by CellTypist.
sep – Delimiter of the mapping file. Default to comma (i.e., a csv file is by default expected from the user if provided).
convert_from – Column index (0 or 1) of the mapping file corresponding to the species converted from. Default to an automatic detection.
convert_to – Column index (0 or 1) of the mapping file corresponding to the species converted to. Default to an automatic detection.
unique_only – Whether to leverage only 1:1 orthologs between the two species. (Default: True)
collapse – The way 1:N orthologs are handled. Possible values are ‘average’ which averages the classifier weights and ‘random’ which randomly chooses one gene’s weights from all its orthologs. This argument is ignored if unique_only = True. (Default: ‘average’)
random_state – Random seed for reproducibility. This argument is only relevant if unique_only = False and collapse = ‘random’.
- Returns:
The original model is modified by converting to the other species.
- Return type:
None
- extract_top_markers(cell_type: str, top_n: int = 10, only_positive: bool = True) ndarray [source]
Extract the top driving genes for a given cell type.
- Parameters:
cell_type – The cell type to extract markers for.
top_n – Number of markers to extract for a given cell type. (Default: 10)
only_positive – Whether to extract positive markers only. Set to False to include negative markers as well. (Default: True)
- Returns:
A list of marker genes for the query cell type.
- Return type:
- predict_labels_and_prob(indata, mode: str = 'best match', p_thres: float = 0.5) tuple [source]
Get the decision matrix, probability matrix, and predicted cell types for the input data.
- Parameters:
indata – The input array-like object used as a query.
mode – The way cell prediction is performed. For each query cell, the default (‘best match’) is to choose the cell type with the largest score/probability as the final prediction. Setting to ‘prob match’ will enable a multi-label classification, which assigns 0 (i.e., unassigned), 1, or >=2 cell type labels to each query cell. (Default: ‘best match’)
p_thres – Probability threshold for the multi-label classification. Ignored if mode is ‘best match’. (Default: 0.5)
- Returns:
A tuple of decision score matrix, raw probability matrix, and predicted cell type labels.
- Return type: