pyCellPhenoX utilities¶
pyCellPhenoX.utils.balanced_sample module¶
- pyCellPhenoX.utils.balanced_sample.balanced_sample(group, subset_percentage)¶
Perform balanced sampling on a DataFrame group.
- Parameters:
group (DataFrame) – The DataFrame or group to sample from.
subset_percentage (float) – The fraction of the group to sample (between 0.0 and 1.0).
- Returns:
A randomly sampled fraction of the group, based on the given percentage.
- Return type:
DataFrame
pyCellPhenoX.utils.check_indices module¶
- pyCellPhenoX.utils.check_indices.check_indices(a, b)¶
Check that the indices are matching for the dataframes and assign indices if they aren’t.
- Parameters:
a (pd.DataFrame) – DataFrame 1
b (pd.DataFrame) – DataFrame 2
- Return type:
a, b (pd.DataFrame, pd.DataFrame)
pyCellPhenoX.utils.reducedim module¶
- pyCellPhenoX.utils.reducedim.reduceDim(reducMethod, reducMethodParams, expression_mat)¶
Call the reduction method specified by user
- Parameters:
reducMethod (str) – the name of the method to be used (“nmf” or “pca”)
reducMethodParams (dict) – parameters for the method selected
- Returns:
one matrix if PCA selected, tuple of matrices if NMF selected
- Return type:
matrix/matrices
pyCellPhenoX.utils.select_num_components module¶
- pyCellPhenoX.utils.select_num_components.select_number_of_components(eigenvalues, var)¶
Find the number of the components based on the percentage of accumulated variance
- Parameters:
eigenvalues (array) – array of eigenvalues (explained variances) for the components
var (float) – desired proportion of variance explained
- Returns:
number of components
- Return type:
int
pyCellPhenoX.utils.select_optimal_k module¶
- pyCellPhenoX.utils.select_optimal_k.select_optimal_k(X, min_k, max_k)¶
Select optimal k (number of components) and generate elbow plot for silhouette score
- Parameters:
X (dataframe) – the marker by cell matrix to be decomposed
numberOfComponents (int) – number of components or ranks to learn (if -1, then we will select k)
min_k (int) – alternatively, provide the minimum number of ranks to test
max_k (int) – and the maximum number of ranks to test
- Returns:
optimal k for decomposition
- Return type:
int