Functions

pyCellPhenoX.marker_discovery module

pyCellPhenoX.marker_discovery.marker_discovery(shap_df, expression_mat)

_summary_

Parameters:
  • shap_df (dataframe) – cells by (various columns: meta data, shap values for each latent dimension, interpretable score)

  • expression_mat (dataframe) – cells by genes/proteins/etc.

pyCellPhenoX.neighborhoodAbundanceMatrix module

pyCellPhenoX.nonnegativeMatrixFactorization module

pyCellPhenoX.nonnegativeMatrixFactorization.nonnegativeMatrixFactorization(X, numberOfComponents=-1, min_k=2, max_k=12)

Perform NMF

Parameters:
  • X (dataframe) – the marker by cell matrix to be decomposed

  • numberOfComponents (int) – number of components or ranks to learn (if -1, then we will select k)

  • min_k (int) – alternatively, provide the minimum number of ranks to test

  • max_k (int) – and the maximum number of ranks to test

Returns:

W and H matrices

Return type:

tuple

pyCellPhenoX.plot_interpretablescore_boxplot module

pyCellPhenoX.plot_interpretablescore_umap module

pyCellPhenoX.preprocessing module

pyCellPhenoX.preprocessing.preprocessing(latent_features, meta, sub_samp=False, subset_percentage=0.99, bal_col=['subject_id', 'cell_type', 'disease'], target='disease', covariates=[], interaction_covs=[])

Prepare the data to be in the correct format for CellPhenoX

Parameters:
  • latent_features (pd.DataFrame) – Latent embeddings (e.g., NMF ranks, or principal components) of the NAM

  • meta (dataframe) – Dataframe containing meta data (e.g., covariates, target/outcome variable for classification model)

  • sub_samp (bool, optional) – Optionally, subsample the data. Defaults to False.

  • subset_percentage (float, optional) – If sub_samp = True, specify the desired proportion of rows. Defaults to 0.99.

  • bal_col (list, optional) – List of column names in meta to balance the subsampling by. Defaults to [“subject_id”, “cell_type”, “disease”].

  • target (str) – Name of the outcome column in meta. Defaults to “disease”.

  • covariates (list, optional) – List of column names in meta that are to be included as features/predictors in the classsification model. Defaults to [].

  • interaction_covs (list, optional) – Optionally, pass a list of the colum

Returns:

X, latent embeddings and covariates (your predictors); y, model outcome (your target variable)

Return type:

tuple (dataframe, series)

pyCellPhenoX.principalComponentAnalysis module

pyCellPhenoX.principalComponentAnalysis.principalComponentAnalysis(X, var)

Perform PCA

Parameters:
  • X (dataframe) – the marker by cell matrix to be decomposed

  • var (float) – desired proportion of variance explained

Returns:

principal components

Return type:

dataframe