API¶

Clustering¶

constclust.cluster(adata, n_neighbors, resolutions, random_state, n_procs=1, neighbor_kwargs={}, leiden_kwargs={}, progress_bar=True)[source]¶

Generate clusterings for each combination of n_neighbors, resolutions, and random_state.

Parameters

adata : AnnDataAnnData: Object to be clustered.
n_neighbors : Collection[int]Collection[int]: Values for numbers of neighbors.
resolutions : Collection[float]Collection[float]: Values for resolution parameter for modularity optimization.
random_state : Collection[int]Collection[int]: Random seeds to start with.
n_procs : intint (default: 1): Number of processes to use.
neighbor_kwargs : dictdict (default: {}): Key word arguments to pass to all calls to scanpy.pp.neighbors(). For example: {“use_rep”: “X”}.
leiden_kwargs : dictdict (default: {}): Key word argument to pass to all calls to leidenalg.find_partition(). For example, {"partition_type": leidenalg.CPMVertexPartition}.
progress_bar : boolbool (default: True): Whether to diplay a progress bar for the clustering process.

Return type

Tuple[DataFrame, DataFrame]Tuple[DataFrame, DataFrame]

Returns

Pair of dataframes, where the first contains the settings for each partitioning,
and the second contains the partitionings.

Example

>>> params, clusterings = cluster(
        adata,
        n_neighbors=np.linspace(15, 90, 4, dtype=int),
        resolutions=np.geomspace(0.05, 20, 50),
        random_state=[0,1,2,3],
        n_procs=4
    )

Reconciling¶

constclust.reconcile(settings, clusterings, paramtypes='oou', nprocs=1)[source]¶

Constructor for reconciler object.

Parameters

settings : DataFrameDataFrame: Parameterizations of each clustering.
clusterings : DataFrameDataFrame: Assignments from each clustering.
nprocs : intint (default: 1): Number of processes to use

Example

>>> params, clusterings = cluster(adata, ... )
>>> reconciler = reconcile(params, clusterings)

Return type: ReconcilerReconciler

class constclust.aggregate.Component(reconciler, cluster_ids)[source]¶

A connected component from a Reconciler

_parent¶

The Reconciler which generated this component.

Type: Reconciler

settings¶

Subset of parents settings. Contains only settings for clustering which appear in this component.

Type: pandas.DataFrame

cluster_ids¶

Which clusters are in this component.

Type: numpy.ndarray

intersect¶

Intersection of samples in this component.

Type: numpy.ndarray[int]

intersect_names¶

Names of samples in the intersection of this component.

Type: numpy.ndarray[str]

union¶

Union of samples in this component.

Type: numpy.ndarray[int]

union_names¶

Names of samples in the union of this component.

Type: numpy.ndarray[str]

class constclust.aggregate.ComponentList(components)[source]¶

A set of consistent components identified from many clustering solutions.

This is considered to be an immutable list, so operations values will be cached.

property obs_names¶

The set of observations these components were found on.

Return type: IndexIndex

to_graph(overlap='intersect')[source]¶

Builds a hierarchichal graph of the components

Return type: DiGraphDiGraph

describe()[source]¶

Calculates summary statistics for components.

Example

>>> stats = comp_list.describe()

Return type: DataFrameDataFrame

filter(func=None, *, min_intersect=None, max_intersect=None, min_union=None, max_union=None, min_solutions=None, max_solutions=None)[source]¶

Filter components from this collection, returns a copy.

Example

>>> to_examine = comp_list.filter(min_intersect=20, min_solutions=100)

Return type: ComponentListComponentList

plot_components(adata, *, x_param='n_neighbors', y_param='resolution', embedding_basis='X_umap', embedding_kwargs=mappingproxy({}))[source]¶

Plot parameter space and scatter plot for each component.

The parameter space is a heatmap, showing the range of parameters each component was found in. The scatter plot shows which observations were included in the component in a 2d embedding of the dataset.

Parameters

x_param : strstr (default: 'n_neighbors'): Which key from the parameters will be along the y-axis of the heatmaps.
y_param : strstr (default: 'resolution'): Which key from the parameters will be along the y-axis of the heatmaps.
embedding_basis : strstr (default: 'X_umap'): Basis from adata to use for embedding plot.
embedding_kwargs : MappingMapping (default: mappingproxy({})): Keyword arguments to pass to sc.pl.embedding.

Example

>>> comps.plot_components(coords=adata.obsm["X_umap"])

plot_hierarchies(coords, *, overlap='intersect', scatter_kwargs=mappingproxy({}))[source]¶

Find and plot interactive hierarchies of components.

Parameters

coords : ndarray, DataFrameUnion[ndarray, DataFrame]: Coordinates to use in scatter plots. Should have shape (n_obs, 2). If it’s a dataframe, it’s index should contain the same elements as self.obs_names.
scatter_kwargs : MappingMapping (default: mappingproxy({})): Key word arguments passed to ds_umap

Example

>>> from bokeh.io import show
>>> comps = reconciler.get_components(0.9, min_cells=5)
>>> show(
        comps
        .filter(min_solutions=100)
        .plot_hierarchies(coords=adata.obsm["X_umap"])
    )

class constclust.aggregate.ReconcilerBase[source]¶

Base type for reconciler.

Has methods for subsetting implemented, providing data is up to subclass.

property obs_names¶

The set of observations clusters were found on.

Return type: IndexIndex

get_param_range(clusters)[source]¶

Given a set of clusters, returns the range of parameters for which they were calculated.

Parameters

clusters : Collection[Int]: If its a collection of ints, I’ll say that was a range of parameter ids.

subset_clusterings(clusterings_to_keep)[source]¶

Take subset of Reconciler, where only clusterings_to_keep are present.

Reduces size of both .settings and .clusterings.

Parameters

clusterings_to_keep: Indexer into Reconciler.settings. Anything that should give the correct result for reconciler.settings.loc[clusterings_to_keep].

Returns

Return type

ReconcilerSubset

subset_cells(cells_to_keep)[source]¶

Take subset of Reconciler, where only cells_to_keep are present.

Parameters

cells_to_keep: Indexer into Reconciler.clusterings. Anything that should give the correct result for reconciler.clusterings.loc[cells_to_keep].

Returns

Return type

ReconcilerSubset

describe_clusters(log1p=False)[source]¶

Describe the clusters in this Reconciler.

Parameters

log1p : boolbool (default: False): Whether to also return log transformed values for numeric cols.

Return type

DataFrameDataFrame

Returns

DataFrame containing summary statistics on the clusters in this reconciler. Good
for plotting.

Example

>>> import hvplot.pandas
>>> clusters = reconciler.describe_clusters(log1p=True)
>>> clusters.hvplot.scatter(
    "log1p_resolution",
    "log1p_n_obs",
    datashade=True,
    dynspread=True
)

describe_clusterings()[source]¶

Convenience function to generate summary statistics for clusterings in a reconciler.

Example

>>> import seaborn as sns
>>> clusterings = reconciler.describe_clusterings()
>>> sns.jointplot(data=clusterings, x="resolution", y="max_n_obs")

Return type: DataFrameDataFrame

class constclust.aggregate.ReconcilerSubset(parent, settings, clusterings, mapping, graph)[source]¶

Subset of a Reconciler

_parent¶

Reconciler this subset was derived from.

Type: Reconciler

settings¶

Settings for clusterings in this subset.

Type: pandas.DataFrame

clusterings¶

Clusterings contained in this subset.

Type: pandas.DataFrame

graph¶

Reference to graph from parent.

Type: igraph.Graph

cluster_ids¶

Integer ids of all clusters in this subset.

Type: np.ndarray[int]

_mapping¶

pd.Series with a MultiIndex. Unlike the _mapping from Reconciler, this does not necessarily have all clusters, so ranges of clusters cannot be assumed to be contiguous. Additionally, you can’t just index into this with cluster_ids as positions.

Type: pandas.Series

_obs_names¶

Maps from integer position to input cell name.

Type: pd.Series

find_contained_components(min_presence, min_weight=0.9, min_cells=2)[source]¶: Find components contained in a subset.

get_components(min_weight, min_cells=2)[source]¶

Return connected components of graph, with edges filtered by min_weight.

Parameters

min_weight : floatfloat: Minimum edge weight for inclusion of a clustering.
min_cells : intint (default: 2): Minimum cells a component should have.

Return type

List[Component]List[Component]

class constclust.aggregate.Reconciler(settings, clusterings, mapping, graph)[source]¶

Collects and reconciles many clusterings by local (in parameter space) stability.

settings¶

Contains settings for all clusterings. Index corresponds to .clusterings columns, while columns should correspond to the parameters which were varied.

Type: pandas.DataFrame

clusterings¶

Contains cluster assignments for each cell, for each clustering. Columns correspond to .settings index, while the index correspond to the cells. Each cluster is encoded with a unique cluster id.

Type: pandas.DataFrame

graph¶

Weighted graph. Nodes are clusters (identified by unique cluster id integer, same as in .clusterings). Edges connect clusters with shared contents. Weight is the Jaccard similarity between the contents of the clusters.

Type: igraph.Graph

cluster_ids¶

Integer ids of all clusters in this Reconciler.

Type: numpy.ndarray[int]

_obs_names¶

Ordered set for names of the cells. Internally they are refered to by integer positions.

Type: pandas.Series

_mapping¶

pd.Series with a MultiIndex. Index has levels clustering and cluster. Each position in index should have a unique value at level “cluster”, which corresponds to a cluster in the clustering dataframe. Values are np.arrays with indices of cells in relevant cluster. This should be considered immutable, though this is not the case for ``ReconcilerSubset``s.

Type: pandas.Series

find_components(min_weight, clusters, min_cells=2)[source]¶

Return components from filtered graph which contain specified clusters.

Parameters

min_weight : float: Minimum weight for edges to be kept in graph. Should be over 0.5.
clusters : np.array[int]: Clusters which you’d like to search from.

get_components(min_weight, min_cells=2)[source]¶

Return connected components of graph, with edges filtered by min_weight.

Parameters

min_weight : floatfloat: Minimum weight for edges to be kept in graph. Should be in range [0.5, 1].
min_cells : intint (default: 2): Minimum number of cells in a component.

Returns

All components from Reconciler, sorted by number of clusterings.

Return type

List[Component]

Plotting¶

constclust.plotting.component_param_range(component, x='n_neighbors', y='resolution', ax=None)[source]¶

Given a component, show which parameters it’s found at as a heatmap.

Parameters

component : ForwardRefForwardRef: The component to plot.
x : strstr (default: 'n_neighbors'): The parameter for the x axis.
y : strstr (default: 'resolution'): The parameter to place on the y axis.
ax : Axis, NoneOptional[Axis] (default: None): Optional axis to plot on.

Example

>>> comps = reconciler.get_comps(0.9)
>>> plotting.component_param_range(comps[0])

Return type: AxisAxis

constclust.plotting.component(component, adata, x='n_neighbors', y='resolution', embedding_basis='X_umap', plot_global=False, aspect=None, embedding_kwargs={})[source]¶

Plot stability and embedding for component.

Parameters

component : ForwardRefForwardRef: Component object to plot.
adata : AnnDataAnnData: AnnData to use for plotting UMAP. Should have same cell names as Component`s parent `Reconciler.
x : strstr (default: 'n_neighbors'): Parameter to plot on the X-axis of the heatmap.
y : strstr (default: 'resolution'): Parameter to plot on the Y-axis of the heatmap.
embedding_basis : strstr (default: 'X_umap'): Which basis from the AnnData object to use for embedding.
aspect : float, NoneOptional[float] (default: None): Aspect ratio of entire plot. Defaults to 1/2.
embedding_kwargs : dictdict (default: {}): Arguments passed to sc.pl.embedding.