API

Clustering

constclust.cluster(adata, n_neighbors, resolutions, random_state, n_procs=1, neighbor_kwargs={}, leiden_kwargs={}, progress_bar=True)[source]

Generate clusterings for each combination of n_neighbors, resolutions, and random_state.

Parameters
adata : AnnDataAnnData

Object to be clustered.

n_neighbors : Collection[int]Collection[int]

Values for numbers of neighbors.

resolutions : Collection[float]Collection[float]

Values for resolution parameter for modularity optimization.

random_state : Collection[int]Collection[int]

Random seeds to start with.

n_procs : intint (default: 1)

Number of processes to use.

neighbor_kwargs : dictdict (default: {})

Key word arguments to pass to all calls to scanpy.pp.neighbors(). For example: {“use_rep”: “X”}.

leiden_kwargs : dictdict (default: {})

Key word argument to pass to all calls to leidenalg.find_partition(). For example, {"partition_type": leidenalg.CPMVertexPartition}.

progress_bar : boolbool (default: True)

Whether to diplay a progress bar for the clustering process.

Return type

Tuple[DataFrame, DataFrame]Tuple[DataFrame, DataFrame]

Returns

  • Pair of dataframes, where the first contains the settings for each partitioning,

  • and the second contains the partitionings.

Example

>>> params, clusterings = cluster(
        adata,
        n_neighbors=np.linspace(15, 90, 4, dtype=int),
        resolutions=np.geomspace(0.05, 20, 50),
        random_state=[0,1,2,3],
        n_procs=4
    )

Reconciling

constclust.reconcile(settings, clusterings, paramtypes='oou', nprocs=1)[source]

Constructor for reconciler object.

Parameters
settings : DataFrameDataFrame

Parameterizations of each clustering.

clusterings : DataFrameDataFrame

Assignments from each clustering.

nprocs : intint (default: 1)

Number of processes to use

Example

>>> params, clusterings = cluster(adata, ... )
>>> reconciler = reconcile(params, clusterings)
Return type

ReconcilerReconciler

class constclust.aggregate.Component(reconciler, cluster_ids)[source]

A connected component from a Reconciler

_parent

The Reconciler which generated this component.

Type

Reconciler

settings

Subset of parents settings. Contains only settings for clustering which appear in this component.

Type

pandas.DataFrame

cluster_ids

Which clusters are in this component.

Type

numpy.ndarray

intersect

Intersection of samples in this component.

Type

numpy.ndarray[int]

intersect_names

Names of samples in the intersection of this component.

Type

numpy.ndarray[str]

union

Union of samples in this component.

Type

numpy.ndarray[int]

union_names

Names of samples in the union of this component.

Type

numpy.ndarray[str]

class constclust.aggregate.ComponentList(components)[source]

A set of consistent components identified from many clustering solutions.

This is considered to be an immutable list, so operations values will be cached.

property obs_names

The set of observations these components were found on.

Return type

IndexIndex

to_graph(overlap='intersect')[source]

Builds a hierarchichal graph of the components

Return type

DiGraphDiGraph

describe()[source]

Calculates summary statistics for components.

Example

>>> stats = comp_list.describe()
Return type

DataFrameDataFrame

filter(func=None, *, min_intersect=None, max_intersect=None, min_union=None, max_union=None, min_solutions=None, max_solutions=None)[source]

Filter components from this collection, returns a copy.

Example

>>> to_examine = comp_list.filter(min_intersect=20, min_solutions=100)
Return type

ComponentListComponentList

plot_components(adata, *, x_param='n_neighbors', y_param='resolution', embedding_basis='X_umap', embedding_kwargs=mappingproxy({}))[source]

Plot parameter space and scatter plot for each component.

The parameter space is a heatmap, showing the range of parameters each component was found in. The scatter plot shows which observations were included in the component in a 2d embedding of the dataset.

Parameters
x_param : strstr (default: 'n_neighbors')

Which key from the parameters will be along the y-axis of the heatmaps.

y_param : strstr (default: 'resolution')

Which key from the parameters will be along the y-axis of the heatmaps.

embedding_basis : strstr (default: 'X_umap')

Basis from adata to use for embedding plot.

embedding_kwargs : MappingMapping (default: mappingproxy({}))

Keyword arguments to pass to sc.pl.embedding.

Example

>>> comps.plot_components(coords=adata.obsm["X_umap"])
plot_hierarchies(coords, *, overlap='intersect', scatter_kwargs=mappingproxy({}))[source]

Find and plot interactive hierarchies of components.

Parameters
coords : ndarray, DataFrameUnion[ndarray, DataFrame]

Coordinates to use in scatter plots. Should have shape (n_obs, 2). If it’s a dataframe, it’s index should contain the same elements as self.obs_names.

scatter_kwargs : MappingMapping (default: mappingproxy({}))

Key word arguments passed to ds_umap

Example

>>> from bokeh.io import show
>>> comps = reconciler.get_components(0.9, min_cells=5)
>>> show(
        comps
        .filter(min_solutions=100)
        .plot_hierarchies(coords=adata.obsm["X_umap"])
    )
class constclust.aggregate.ReconcilerBase[source]

Base type for reconciler.

Has methods for subsetting implemented, providing data is up to subclass.

property obs_names

The set of observations clusters were found on.

Return type

IndexIndex

get_param_range(clusters)[source]

Given a set of clusters, returns the range of parameters for which they were calculated.

Parameters
clusters : Collection[Int]

If its a collection of ints, I’ll say that was a range of parameter ids.

subset_clusterings(clusterings_to_keep)[source]

Take subset of Reconciler, where only clusterings_to_keep are present.

Reduces size of both .settings and .clusterings.

Parameters
clusterings_to_keep

Indexer into Reconciler.settings. Anything that should give the correct result for reconciler.settings.loc[clusterings_to_keep].

Returns

Return type

ReconcilerSubset

subset_cells(cells_to_keep)[source]

Take subset of Reconciler, where only cells_to_keep are present.

Parameters
cells_to_keep

Indexer into Reconciler.clusterings. Anything that should give the correct result for reconciler.clusterings.loc[cells_to_keep].

Returns

Return type

ReconcilerSubset

describe_clusters(log1p=False)[source]

Describe the clusters in this Reconciler.

Parameters
log1p : boolbool (default: False)

Whether to also return log transformed values for numeric cols.

Return type

DataFrameDataFrame

Returns

  • DataFrame containing summary statistics on the clusters in this reconciler. Good

  • for plotting.

Example

>>> import hvplot.pandas
>>> clusters = reconciler.describe_clusters(log1p=True)
>>> clusters.hvplot.scatter(
    "log1p_resolution",
    "log1p_n_obs",
    datashade=True,
    dynspread=True
)
describe_clusterings()[source]

Convenience function to generate summary statistics for clusterings in a reconciler.

Example

>>> import seaborn as sns
>>> clusterings = reconciler.describe_clusterings()
>>> sns.jointplot(data=clusterings, x="resolution", y="max_n_obs")
Return type

DataFrameDataFrame

class constclust.aggregate.ReconcilerSubset(parent, settings, clusterings, mapping, graph)[source]

Subset of a Reconciler

_parent

Reconciler this subset was derived from.

Type

Reconciler

settings

Settings for clusterings in this subset.

Type

pandas.DataFrame

clusterings

Clusterings contained in this subset.

Type

pandas.DataFrame

graph

Reference to graph from parent.

Type

igraph.Graph

cluster_ids

Integer ids of all clusters in this subset.

Type

np.ndarray[int]

_mapping

pd.Series with a MultiIndex. Unlike the _mapping from Reconciler, this does not necessarily have all clusters, so ranges of clusters cannot be assumed to be contiguous. Additionally, you can’t just index into this with cluster_ids as positions.

Type

pandas.Series

_obs_names

Maps from integer position to input cell name.

Type

pd.Series

find_contained_components(min_presence, min_weight=0.9, min_cells=2)[source]

Find components contained in a subset.

get_components(min_weight, min_cells=2)[source]

Return connected components of graph, with edges filtered by min_weight.

Parameters
min_weight : floatfloat

Minimum edge weight for inclusion of a clustering.

min_cells : intint (default: 2)

Minimum cells a component should have.

Return type

List[Component]List[Component]

class constclust.aggregate.Reconciler(settings, clusterings, mapping, graph)[source]

Collects and reconciles many clusterings by local (in parameter space) stability.

settings

Contains settings for all clusterings. Index corresponds to .clusterings columns, while columns should correspond to the parameters which were varied.

Type

pandas.DataFrame

clusterings

Contains cluster assignments for each cell, for each clustering. Columns correspond to .settings index, while the index correspond to the cells. Each cluster is encoded with a unique cluster id.

Type

pandas.DataFrame

graph

Weighted graph. Nodes are clusters (identified by unique cluster id integer, same as in .clusterings). Edges connect clusters with shared contents. Weight is the Jaccard similarity between the contents of the clusters.

Type

igraph.Graph

cluster_ids

Integer ids of all clusters in this Reconciler.

Type

numpy.ndarray[int]

_obs_names

Ordered set for names of the cells. Internally they are refered to by integer positions.

Type

pandas.Series

_mapping

pd.Series with a MultiIndex. Index has levels clustering and cluster. Each position in index should have a unique value at level “cluster”, which corresponds to a cluster in the clustering dataframe. Values are np.arrays with indices of cells in relevant cluster. This should be considered immutable, though this is not the case for ``ReconcilerSubset``s.

Type

pandas.Series

find_components(min_weight, clusters, min_cells=2)[source]

Return components from filtered graph which contain specified clusters.

Parameters
min_weight : float

Minimum weight for edges to be kept in graph. Should be over 0.5.

clusters : np.array[int]

Clusters which you’d like to search from.

get_components(min_weight, min_cells=2)[source]

Return connected components of graph, with edges filtered by min_weight.

Parameters
min_weight : floatfloat

Minimum weight for edges to be kept in graph. Should be in range [0.5, 1].

min_cells : intint (default: 2)

Minimum number of cells in a component.

Returns

All components from Reconciler, sorted by number of clusterings.

Return type

List[Component]

Plotting

constclust.plotting.component_param_range(component, x='n_neighbors', y='resolution', ax=None)[source]

Given a component, show which parameters it’s found at as a heatmap.

Parameters
component : ForwardRefForwardRef

The component to plot.

x : strstr (default: 'n_neighbors')

The parameter for the x axis.

y : strstr (default: 'resolution')

The parameter to place on the y axis.

ax : Axis, NoneOptional[Axis] (default: None)

Optional axis to plot on.

Example

>>> comps = reconciler.get_comps(0.9)
>>> plotting.component_param_range(comps[0])
Return type

AxisAxis

constclust.plotting.component(component, adata, x='n_neighbors', y='resolution', embedding_basis='X_umap', plot_global=False, aspect=None, embedding_kwargs={})[source]

Plot stability and embedding for component.

Parameters
component : ForwardRefForwardRef

Component object to plot.

adata : AnnDataAnnData

AnnData to use for plotting UMAP. Should have same cell names as Component`s parent `Reconciler.

x : strstr (default: 'n_neighbors')

Parameter to plot on the X-axis of the heatmap.

y : strstr (default: 'resolution')

Parameter to plot on the Y-axis of the heatmap.

embedding_basis : strstr (default: 'X_umap')

Which basis from the AnnData object to use for embedding.

aspect : float, NoneOptional[float] (default: None)

Aspect ratio of entire plot. Defaults to 1/2.

embedding_kwargs : dictdict (default: {})

Arguments passed to sc.pl.embedding.