Functions¶

pyCellPhenoX.marker_discovery module¶

pyCellPhenoX.marker_discovery.marker_discovery(shap_df, expression_mat)¶

_summary_

Parameters:

shap_df (dataframe) – cells by (various columns: meta data, shap values for each latent dimension, interpretable score)
expression_mat (dataframe) – cells by genes/proteins/etc.

pyCellPhenoX.neighborhoodAbundanceMatrix module¶

pyCellPhenoX.nonnegativeMatrixFactorization module¶

pyCellPhenoX.nonnegativeMatrixFactorization.nonnegativeMatrixFactorization(X, numberOfComponents=-1, min_k=2, max_k=12)¶

Perform NMF

Parameters:

X (dataframe) – the marker by cell matrix to be decomposed
numberOfComponents (int) – number of components or ranks to learn (if -1, then we will select k)
min_k (int) – alternatively, provide the minimum number of ranks to test
max_k (int) – and the maximum number of ranks to test

Returns:

W and H matrices

Return type:

tuple

pyCellPhenoX.plot_interpretablescore_boxplot module¶

pyCellPhenoX.plot_interpretablescore_umap module¶

pyCellPhenoX.preprocessing module¶

pyCellPhenoX.preprocessing.preprocessing(latent_features, meta, sub_samp=False, subset_percentage=0.99, bal_col=['subject_id', 'cell_type', 'disease'], target='disease', covariates=[], interaction_covs=[])¶

Prepare the data to be in the correct format for CellPhenoX

Parameters:

latent_features (pd.DataFrame) – Latent embeddings (e.g., NMF ranks, or principal components) of the NAM
meta (dataframe) – Dataframe containing meta data (e.g., covariates, target/outcome variable for classification model)
sub_samp (bool, optional) – Optionally, subsample the data. Defaults to False.
subset_percentage (float, optional) – If sub_samp = True, specify the desired proportion of rows. Defaults to 0.99.
bal_col (list, optional) – List of column names in meta to balance the subsampling by. Defaults to [“subject_id”, “cell_type”, “disease”].
target (str) – Name of the outcome column in meta. Defaults to “disease”.
covariates (list, optional) – List of column names in meta that are to be included as features/predictors in the classsification model. Defaults to [].
interaction_covs (list, optional) – Optionally, pass a list of the colum

Returns:

X, latent embeddings and covariates (your predictors); y, model outcome (your target variable)

Return type:

tuple (dataframe, series)

pyCellPhenoX.principalComponentAnalysis module¶

pyCellPhenoX.principalComponentAnalysis.principalComponentAnalysis(X, var)¶

Perform PCA

Parameters:

X (dataframe) – the marker by cell matrix to be decomposed
var (float) – desired proportion of variance explained

Returns:

principal components

Return type:

dataframe