Downsampling function

celltypist.samples.downsample_adata(adata: AnnData, mode: str = 'total', n_cells: int | None = None, by: str | None = None, balance_cell_type: bool = False, random_state: int = 0, return_index: bool = True) AnnData | ndarray[source]

Downsample cells to a given number (either in total or per cell type).

Parameters:
  • adata – An AnnData object representing the input data.

  • mode – The way downsampling is performed. Default to downsampling the input cells to a total of n_cells. Set to ‘each’ if you want to downsample cells within each cell type to n_cells. (Default: ‘total’)

  • n_cells – The total number of cells (mode = ‘total’) or the number of cells from each cell type (mode = ‘each’) to sample. For the latter, all cells from a given cell type will be selected if its cell number is fewer than n_cells.

  • by – Key (column name) of the input AnnData representing the cell types.

  • balance_cell_type – Whether to balance the cell type frequencies when mode = ‘total’. Setting to True will sample rare cell types with a higher probability, ensuring close-to-even cell type compositions. This argument is ignored if mode = ‘each’. (Default: False)

  • random_state – Random seed for reproducibility.

  • return_index – Only return the downsampled cell indices. Setting to False if you want to get a downsampled version of the input AnnData. (Default: True)

Return type:

Depending on return_index, returns the downsampled cell indices or a subset of the input AnnData.