API Reference

class kmapper.KeplerMapper(verbose=0)[source]

With this class you can build topological networks from (high-dimensional) data.

  1. Fit a projection/lens/function to a dataset and transform it. For instance “mean_of_row(x) for x in X”
  2. Map this projection with overlapping intervals/hypercubes. Cluster the points inside the interval (Note: we cluster on the inverse image/original data to lessen projection loss). If two clusters/nodes have the same members (due to the overlap), then: connect these with an edge.
  3. Visualize the network using HTML and D3.js.
KM has a number of nice features, some which get forgotten.
  • project : Some projections it makes sense to use a distance matrix, such as knn_distance_#. Using distance_matrix = <metric> for a custom metric.
  • fit_transform : Applies a sequence of projections. Currently, this API is a little confusing and will be changed in the future.
Parameters:
verbose: int, default is 0

Logging level. Currently 3 levels (0,1,2) are supported. For no logging, set verbose=0. For some logging, set verbose=1. For complete logging, set verbose=2.

project(X, projection='sum', scaler=MinMaxScaler(copy=True, feature_range=(0, 1)), distance_matrix=None)[source]

Creates the projection/lens from a dataset. Input the data set. Specify a projection/lens type. Output the projected data/lens.

Parameters:
X : Numpy Array

The data to fit a projection/lens to.

projection :

Projection parameter is either a string, a Scikit-learn class with fit_transform, like manifold.TSNE(), or a list of dimension indices. A string from [“sum”, “mean”, “median”, “max”, “min”, “std”, “dist_mean”, “l2norm”, “knn_distance_n”]. If using knn_distance_n write the number of desired neighbors in place of n: knn_distance_5 for summed distances to 5 nearest neighbors. Default = “sum”.

scaler : Scikit-Learn API compatible scaler.

Scaler of the data applied before mapping. Use None for no scaling. Default = preprocessing.MinMaxScaler() if None, do no scaling, else apply scaling to the projection. Default: Min-Max scaling

distance_matrix : Either str or None

If not None, then any of [“braycurtis”, “canberra”, “chebyshev”, “cityblock”, “correlation”, “cosine”, “dice”, “euclidean”, “hamming”, “jaccard”, “kulsinski”, “mahalanobis”, “matching”, “minkowski”, “rogerstanimoto”, “russellrao”, “seuclidean”, “sokalmichener”, “sokalsneath”, “sqeuclidean”, “yule”]. If False do nothing, else create a squared distance matrix with the chosen metric, before applying the projection.

Returns:
lens : Numpy Array

projected data.

Examples

>>> projected_data = mapper.project(data, projection="sum", scaler=km.preprocessing.MinMaxScaler() )
fit_transform(X, projection='sum', scaler=MinMaxScaler(copy=True, feature_range=(0, 1)), distance_matrix=False)[source]

Same as .project() but accepts lists for arguments so you can chain.

map(lens, X=None, clusterer=DBSCAN(algorithm='auto', eps=0.5, leaf_size=30, metric='euclidean', metric_params=None, min_samples=3, n_jobs=None, p=None), cover=<kmapper.cover.Cover object>, nerve=<kmapper.nerve.GraphNerve object>, precomputed=False, overlap_perc=None, nr_cubes=None, coverer=None)[source]

Apply Mapper algorithm on this projection and build a simplicial complex. Returns a dictionary with nodes and links.

Parameters:
lens: Numpy Array

Lower dimensional representation of data. In general will be output of fit_transform.

X: Numpy Array

Original data or data to run clustering on. If None, then use lens as default.

clusterer: Default: DBSCAN

Scikit-learn API compatible clustering algorithm. Must provide fit and predict.

cover: type kmapper.Cover

Cover scheme for lens. Instance of kmapper.cover providing methods define_bins and find_entries.

nerve: kmapper.Nerve

Nerve builder implementing __call__(nodes) API

precomputed : Boolean

Tell Mapper whether the data that you are clustering on is a precomputed distance matrix. If set to True, the assumption is that you are also telling your clusterer that metric=’precomputed’ (which is an argument for DBSCAN among others), which will then cause the clusterer to expect a square distance matrix for each hypercube. precomputed=True will give a square matrix to the clusterer to fit on for each hypercube.

nr_cubes: Int (Deprecated)

The number of intervals/hypercubes to create. Default = 10. (DeprecationWarning: define Cover explicitly in future versions)

overlap_perc: Float (Deprecated)

The percentage of overlap “between” the intervals/hypercubes. Default = 0.1. (DeprecationWarning: define Cover explicitly in future versions)

Returns:
simplicial_complex : dict

A dictionary with “nodes”, “links” and “meta” information.

Examples

>>> simplicial_complex = mapper.map(lens, X=None, clusterer=cluster.DBSCAN(eps=0.5,min_samples=3), cover=km.Cover(n_cubes=[10,20], perc_overlap=0.4))

>>>print(simplicial_complex[“nodes”]) >>>print(simplicial_complex[“links”]) >>>print(simplicial_complex[“meta”])

visualize(graph, color_function=None, custom_tooltips=None, custom_meta=None, path_html='mapper_visualization_output.html', title='Kepler Mapper', save_file=True, X=None, X_names=[], lens=None, lens_names=[], show_tooltips=True, nbins=10)[source]

Generate a visualization of the simplicial complex mapper output. Turns the complex dictionary into a HTML/D3.js visualization

Parameters:
graph : dict

Simplicial complex output from the map method.

path_html : String

file name for outputing the resulting html.

custom_meta: dict

Render (key, value) in the Mapper Summary pane.

custom_tooltip: list or array like

Value to display for each entry in the node. The cluster data pane will display entry for all values in the node. Default is index of data.

save_file: bool, default is True

Save file to path_html.

X: numpy arraylike

If supplied, compute statistics information about the original data source with respect to each node.

X_names: list of strings

Names of each variable in X to be displayed. If None, then display names by index.

lens: numpy arraylike

If supplied, compute statistics of each node based on the projection/lens

lens_name: list of strings

Names of each variable in lens to be displayed. In None, then display names by index.

show_tooltips: bool, default is True.

If false, completely disable tooltips. This is useful when using output in space-tight pages or will display node data in custom ways.

nbins: int, default is 10

Number of bins shown in histogram of tooltip color distributions.

Returns:
html: string

Returns the same html that is normally output to path_html. Complete graph and data ready for viewing.

Examples

>>> mapper.visualize(simplicial_complex, path_html="mapper_visualization_output.html",
                    custom_meta={'Data': 'MNIST handwritten digits', 
                                 'Created by': 'Franklin Roosevelt'
                    }, )
data_from_cluster_id(cluster_id, graph, data)[source]

Returns the original data of each cluster member for a given cluster ID

Parameters:
cluster_id : String

ID of the cluster.

graph : dict

The resulting dictionary after applying map()

data : Numpy Array

Original dataset. Accepts both 1-D and 2-D array.

Returns:
entries:

rows of cluster member data as Numpy array.

Cover Schemes

Cover schemes provide a customizable way of defining a cover for your lens.

class kmapper.cover.Cover(n_cubes=10, perc_overlap=0.2, nr_cubes=None, overlap_perc=None, limits=None)[source]

Helper class that defines the default covering scheme

Parameters:
limits: Numpy Array (n_dim,2)

(lower bound, upper bound) for every dimension If a value is set to np.float(‘inf’), the bound will be assumed to be the min/max value of the dimension Also, if limits == None, the limits are defined by the maximum and minimum value of the lens for all dimensions. i.e. [[min_1, max_1], [min_2, max_2], [min_3, max_3]]

define_bins(data)[source]

Returns an iterable of all bins in the cover.

Warning: This function must assume that the first column of data are indices.

Examples

If there are 4 cubes per dimension and 3 dimensions return the bottom left (origin) coordinates of 64 hypercubes, as a sorted list of Numpy arrays

find_entries(data, cube, verbose=0)[source]

Find all entries in data that are in the given cube.

Parameters:
data: Numpy array

Either projected data or original data.

cube:

an item from the list of cubes provided by cover.define_bins iterable.

Returns:
hypercube: Numpy Array

All entries in data that are in cube.

class kmapper.cover.CubicalCover(n_cubes=10, perc_overlap=0.2, nr_cubes=None, overlap_perc=None, limits=None)[source]

Explicit definition of a cubical cover as the default behavior of the cover class. This is currently identical to the default cover class.

Nerves

class kmapper.nerve.Nerve[source]

Base class for implementations of a nerve finder to build a Mapper complex.

class kmapper.nerve.GraphNerve(min_intersection=1)[source]

Creates the 1-skeleton of the Mapper complex.

Parameters:
min_intersection: int, default is 1

Minimum intersection considered when computing the nerve. An edge will be created only when the intersection between two nodes is greater than or equal to min_intersection

compute(nodes)[source]

Helper function to find edges of the overlapping clusters.

Parameters:
nodes:

A dictionary with entires {node id}:{list of ids in node}

Returns:
edges:

A 1-skeleton of the nerve (intersecting nodes)

simplicies:

Complete list of simplices

class kmapper.nerve.SimplicialNerve[source]

Creates the entire Cech complex of the covering defined by the nodes.

Warning: Not implemented yet.

Adapters

Adapt Mapper format into other common formats.

  • networkx
kmapper.adapter.to_networkx(graph)[source]

Convert a Mapper 1-complex to a networkx graph.

Parameters:
graph: dictionary, graph object returned from `kmapper.map`
Returns:
g: graph as networkx.Graph() object
kmapper.adapter.to_nx(graph)

Convert a Mapper 1-complex to a networkx graph.

Parameters:
graph: dictionary, graph object returned from `kmapper.map`
Returns:
g: graph as networkx.Graph() object

Visuals

kmapper.jupyter.display(path_html='mapper_visualization_output.html')[source]

Displays a html file inside a Jupyter Notebook output cell.

Parameters:
path_html : str

Path to html. Use file name for file inside current working directory. Use file:// browser url-format for path to local file. Use https:// urls for externally hosted resources.

Notes

Thanks to https://github.com/smartinsightsfromdata for the issue: https://github.com/MLWave/kepler-mapper/issues/10

kmapper.plotlyviz.plotlyviz(scomplex, colorscale=[[0.0, 'rgb(68, 1, 84)'], [0.1, 'rgb(72, 35, 116)'], [0.2, 'rgb(64, 67, 135)'], [0.3, 'rgb(52, 94, 141)'], [0.4, 'rgb(41, 120, 142)'], [0.5, 'rgb(32, 144, 140)'], [0.6, 'rgb(34, 167, 132)'], [0.7, 'rgb(68, 190, 112)'], [0.8, 'rgb(121, 209, 81)'], [0.9, 'rgb(189, 222, 38)'], [1.0, 'rgb(253, 231, 36)']], title='Kepler Mapper', graph_layout='kk', color_function=None, color_function_name=None, dashboard=False, graph_data=False, factor_size=3, edge_linewidth=1.5, node_linecolor='rgb(200,200,200)', width=600, height=500, bgcolor='rgba(240, 240, 240, 0.95)', left=10, bottom=35, summary_height=300, summary_width=600, summary_left=20, summary_right=20, hist_left=25, hist_right=25, member_textbox_width=800, filename=None)[source]

Visualizations and dashboards for kmapper graphs using Plotly. This method is suitable for use in Jupyter notebooks.

Parameters:
scomplex: dict

Simplicial complex is the output from the KeplerMapper map method.

title: str

Title of output graphic

graph_layout: igraph layout;

recommended ‘kk’ (kamada-kawai) or ‘fr’ (fruchterman-reingold)

colorscale:

Plotly colorscale(colormap) to color graph nodes

dashboard: bool, default is False

If true, display complete dashboard of node information

graph_data: bool, default is False

If true, display graph metadata

factor_size: double, default is 3

a factor for the node size

edge_linewidth : double, default is 1.5
node_linecolor: color str, default is “rgb(200,200,200)”
width: int, default is 600,
height: int, default is 500,
bgcolor: color str, default is “rgba(240, 240, 240, 0.95)”,
left: int, default is 10,
bottom: int, default is 35,
summary_height: int, default is 300,
summary_width: int, default is 600,
summary_left: int, default is 20,
summary_right: int, default is 20,
hist_left: int, default is 25,
hist_right: int, default is 25,
member_textbox_width: int, default is 800,
filename: str, default is None

if filename is given, the graphic will be saved to that file.