Functions¶
pyCellPhenoX.marker_discovery module¶
- pyCellPhenoX.marker_discovery.marker_discovery(shap_df, expression_mat)¶
_summary_
- Parameters:
shap_df (dataframe) – cells by (various columns: meta data, shap values for each latent dimension, interpretable score)
expression_mat (dataframe) – cells by genes/proteins/etc.
pyCellPhenoX.neighborhoodAbundanceMatrix module¶
pyCellPhenoX.nonnegativeMatrixFactorization module¶
- pyCellPhenoX.nonnegativeMatrixFactorization.nonnegativeMatrixFactorization(X, numberOfComponents=-1, min_k=2, max_k=12)¶
Perform NMF
- Parameters:
X (dataframe) – the marker by cell matrix to be decomposed
numberOfComponents (int) – number of components or ranks to learn (if -1, then we will select k)
min_k (int) – alternatively, provide the minimum number of ranks to test
max_k (int) – and the maximum number of ranks to test
- Returns:
W and H matrices
- Return type:
tuple
pyCellPhenoX.plot_interpretablescore_boxplot module¶
pyCellPhenoX.plot_interpretablescore_umap module¶
pyCellPhenoX.preprocessing module¶
- pyCellPhenoX.preprocessing.preprocessing(latent_features, meta, sub_samp=False, subset_percentage=0.99, bal_col=['subject_id', 'cell_type', 'disease'], target='disease', covariates=[], interaction_covs=[])¶
Prepare the data to be in the correct format for CellPhenoX
- Parameters:
latent_features (pd.DataFrame) – Latent embeddings (e.g., NMF ranks, or principal components) of the NAM
meta (dataframe) – Dataframe containing meta data (e.g., covariates, target/outcome variable for classification model)
sub_samp (bool, optional) – Optionally, subsample the data. Defaults to False.
subset_percentage (float, optional) – If sub_samp = True, specify the desired proportion of rows. Defaults to 0.99.
bal_col (list, optional) – List of column names in meta to balance the subsampling by. Defaults to [“subject_id”, “cell_type”, “disease”].
target (str) – Name of the outcome column in meta. Defaults to “disease”.
covariates (list, optional) – List of column names in meta that are to be included as features/predictors in the classsification model. Defaults to [].
interaction_covs (list, optional) – Optionally, pass a list of the colum
- Returns:
X, latent embeddings and covariates (your predictors); y, model outcome (your target variable)
- Return type:
tuple (dataframe, series)
pyCellPhenoX.principalComponentAnalysis module¶
- pyCellPhenoX.principalComponentAnalysis.principalComponentAnalysis(X, var)¶
Perform PCA
- Parameters:
X (dataframe) – the marker by cell matrix to be decomposed
var (float) – desired proportion of variance explained
- Returns:
principal components
- Return type:
dataframe