API Reference¶

class
kmapper.
KeplerMapper
(verbose=0)[source]¶ With this class you can build topological networks from (highdimensional) data.
 Fit a projection/lens/function to a dataset and transform it. For instance “mean_of_row(x) for x in X”
 Map this projection with overlapping intervals/hypercubes. Cluster the points inside the interval (Note: we cluster on the inverse image/original data to lessen projection loss). If two clusters/nodes have the same members (due to the overlap), then: connect these with an edge.
 Visualize the network using HTML and D3.js.
 KM has a number of nice features, some which get forgotten.
 project : Some projections it makes sense to use a distance matrix, such as knn_distance_#. Using distance_matrix = <metric> for a custom metric.
 fit_transform : Applies a sequence of projections. Currently, this API is a little confusing and will be changed in the future.
Parameters:  verbose: int, default is 0
Logging level. Currently 3 levels (0,1,2) are supported. For no logging, set verbose=0. For some logging, set verbose=1. For complete logging, set verbose=2.

project
(X, projection='sum', scaler=MinMaxScaler(copy=True, feature_range=(0, 1)), distance_matrix=None)[source]¶ Creates the projection/lens from a dataset. Input the data set. Specify a projection/lens type. Output the projected data/lens.
Parameters:  X : Numpy Array
The data to fit a projection/lens to.
 projection :
Projection parameter is either a string, a Scikitlearn class with fit_transform, like manifold.TSNE(), or a list of dimension indices. A string from [“sum”, “mean”, “median”, “max”, “min”, “std”, “dist_mean”, “l2norm”, “knn_distance_n”]. If using knn_distance_n write the number of desired neighbors in place of n: knn_distance_5 for summed distances to 5 nearest neighbors. Default = “sum”.
 scaler : ScikitLearn API compatible scaler.
Scaler of the data applied before mapping. Use None for no scaling. Default = preprocessing.MinMaxScaler() if None, do no scaling, else apply scaling to the projection. Default: MinMax scaling
 distance_matrix : Either str or None
If not None, then any of [“braycurtis”, “canberra”, “chebyshev”, “cityblock”, “correlation”, “cosine”, “dice”, “euclidean”, “hamming”, “jaccard”, “kulsinski”, “mahalanobis”, “matching”, “minkowski”, “rogerstanimoto”, “russellrao”, “seuclidean”, “sokalmichener”, “sokalsneath”, “sqeuclidean”, “yule”]. If False do nothing, else create a squared distance matrix with the chosen metric, before applying the projection.
Returns:  lens : Numpy Array
projected data.
Examples
>>> projected_data = mapper.project(data, projection="sum", scaler=km.preprocessing.MinMaxScaler() )

fit_transform
(X, projection='sum', scaler=MinMaxScaler(copy=True, feature_range=(0, 1)), distance_matrix=False)[source]¶ Same as .project() but accepts lists for arguments so you can chain.

map
(lens, X=None, clusterer=DBSCAN(algorithm='auto', eps=0.5, leaf_size=30, metric='euclidean', metric_params=None, min_samples=3, n_jobs=None, p=None), cover=<kmapper.cover.Cover object>, nerve=<kmapper.nerve.GraphNerve object>, precomputed=False, overlap_perc=None, nr_cubes=None, coverer=None)[source]¶ Apply Mapper algorithm on this projection and build a simplicial complex. Returns a dictionary with nodes and links.
Parameters:  lens: Numpy Array
Lower dimensional representation of data. In general will be output of fit_transform.
 X: Numpy Array
Original data or data to run clustering on. If None, then use lens as default.
 clusterer: Default: DBSCAN
Scikitlearn API compatible clustering algorithm. Must provide fit and predict.
 cover: type kmapper.Cover
Cover scheme for lens. Instance of kmapper.cover providing methods define_bins and find_entries.
 nerve: kmapper.Nerve
Nerve builder implementing __call__(nodes) API
 precomputed : Boolean
Tell Mapper whether the data that you are clustering on is a precomputed distance matrix. If set to True, the assumption is that you are also telling your clusterer that metric=’precomputed’ (which is an argument for DBSCAN among others), which will then cause the clusterer to expect a square distance matrix for each hypercube. precomputed=True will give a square matrix to the clusterer to fit on for each hypercube.
 nr_cubes: Int (Deprecated)
The number of intervals/hypercubes to create. Default = 10. (DeprecationWarning: define Cover explicitly in future versions)
 overlap_perc: Float (Deprecated)
The percentage of overlap “between” the intervals/hypercubes. Default = 0.1. (DeprecationWarning: define Cover explicitly in future versions)
Returns:  simplicial_complex : dict
A dictionary with “nodes”, “links” and “meta” information.
Examples
>>> simplicial_complex = mapper.map(lens, X=None, clusterer=cluster.DBSCAN(eps=0.5,min_samples=3), cover=km.Cover(n_cubes=[10,20], perc_overlap=0.4))
>>>print(simplicial_complex[“nodes”]) >>>print(simplicial_complex[“links”]) >>>print(simplicial_complex[“meta”])

visualize
(graph, color_function=None, custom_tooltips=None, custom_meta=None, path_html='mapper_visualization_output.html', title='Kepler Mapper', save_file=True, X=None, X_names=[], lens=None, lens_names=[], show_tooltips=True, nbins=10)[source]¶ Generate a visualization of the simplicial complex mapper output. Turns the complex dictionary into a HTML/D3.js visualization
Parameters:  graph : dict
Simplicial complex output from the map method.
 path_html : String
file name for outputing the resulting html.
 custom_meta: dict
Render (key, value) in the Mapper Summary pane.
 custom_tooltip: list or array like
Value to display for each entry in the node. The cluster data pane will display entry for all values in the node. Default is index of data.
 save_file: bool, default is True
Save file to path_html.
 X: numpy arraylike
If supplied, compute statistics information about the original data source with respect to each node.
 X_names: list of strings
Names of each variable in X to be displayed. If None, then display names by index.
 lens: numpy arraylike
If supplied, compute statistics of each node based on the projection/lens
 lens_name: list of strings
Names of each variable in lens to be displayed. In None, then display names by index.
 show_tooltips: bool, default is True.
If false, completely disable tooltips. This is useful when using output in spacetight pages or will display node data in custom ways.
 nbins: int, default is 10
Number of bins shown in histogram of tooltip color distributions.
Returns:  html: string
Returns the same html that is normally output to path_html. Complete graph and data ready for viewing.
Examples
>>> mapper.visualize(simplicial_complex, path_html="mapper_visualization_output.html", custom_meta={'Data': 'MNIST handwritten digits', 'Created by': 'Franklin Roosevelt' }, )

data_from_cluster_id
(cluster_id, graph, data)[source]¶ Returns the original data of each cluster member for a given cluster ID
Parameters:  cluster_id : String
ID of the cluster.
 graph : dict
The resulting dictionary after applying map()
 data : Numpy Array
Original dataset. Accepts both 1D and 2D array.
Returns:  entries:
rows of cluster member data as Numpy array.
Cover Schemes¶
Cover schemes provide a customizable way of defining a cover for your lens.

class
kmapper.cover.
Cover
(n_cubes=10, perc_overlap=0.2, nr_cubes=None, overlap_perc=None, limits=None)[source]¶ Helper class that defines the default covering scheme
Parameters:  limits: Numpy Array (n_dim,2)
(lower bound, upper bound) for every dimension If a value is set to np.float(‘inf’), the bound will be assumed to be the min/max value of the dimension Also, if limits == None, the limits are defined by the maximum and minimum value of the lens for all dimensions. i.e. [[min_1, max_1], [min_2, max_2], [min_3, max_3]]

define_bins
(data)[source]¶ Returns an iterable of all bins in the cover.
Warning: This function must assume that the first column of data are indices.
Examples
If there are 4 cubes per dimension and 3 dimensions return the bottom left (origin) coordinates of 64 hypercubes, as a sorted list of Numpy arrays

find_entries
(data, cube, verbose=0)[source]¶ Find all entries in data that are in the given cube.
Parameters:  data: Numpy array
Either projected data or original data.
 cube:
an item from the list of cubes provided by cover.define_bins iterable.
Returns:  hypercube: Numpy Array
All entries in data that are in cube.
Nerves¶

class
kmapper.nerve.
Nerve
[source]¶ Base class for implementations of a nerve finder to build a Mapper complex.

class
kmapper.nerve.
GraphNerve
(min_intersection=1)[source]¶ Creates the 1skeleton of the Mapper complex.
Parameters:  min_intersection: int, default is 1
Minimum intersection considered when computing the nerve. An edge will be created only when the intersection between two nodes is greater than or equal to min_intersection
Adapters¶
Adapt Mapper format into other common formats.
 networkx

kmapper.adapter.
to_networkx
(graph)[source]¶ Convert a Mapper 1complex to a networkx graph.
Parameters:  graph: dictionary, graph object returned from `kmapper.map`
Returns:  g: graph as networkx.Graph() object

kmapper.adapter.
to_nx
(graph)¶ Convert a Mapper 1complex to a networkx graph.
Parameters:  graph: dictionary, graph object returned from `kmapper.map`
Returns:  g: graph as networkx.Graph() object
Visuals¶

kmapper.jupyter.
display
(path_html='mapper_visualization_output.html')[source]¶ Displays a html file inside a Jupyter Notebook output cell.
Parameters:  path_html : str
Path to html. Use file name for file inside current working directory. Use file:// browser urlformat for path to local file. Use https:// urls for externally hosted resources.
Notes
Thanks to https://github.com/smartinsightsfromdata for the issue: https://github.com/MLWave/keplermapper/issues/10

kmapper.plotlyviz.
plotlyviz
(scomplex, colorscale=[[0.0, 'rgb(68, 1, 84)'], [0.1, 'rgb(72, 35, 116)'], [0.2, 'rgb(64, 67, 135)'], [0.3, 'rgb(52, 94, 141)'], [0.4, 'rgb(41, 120, 142)'], [0.5, 'rgb(32, 144, 140)'], [0.6, 'rgb(34, 167, 132)'], [0.7, 'rgb(68, 190, 112)'], [0.8, 'rgb(121, 209, 81)'], [0.9, 'rgb(189, 222, 38)'], [1.0, 'rgb(253, 231, 36)']], title='Kepler Mapper', graph_layout='kk', color_function=None, color_function_name=None, dashboard=False, graph_data=False, factor_size=3, edge_linewidth=1.5, node_linecolor='rgb(200,200,200)', width=600, height=500, bgcolor='rgba(240, 240, 240, 0.95)', left=10, bottom=35, summary_height=300, summary_width=600, summary_left=20, summary_right=20, hist_left=25, hist_right=25, member_textbox_width=800, filename=None)[source]¶ Visualizations and dashboards for kmapper graphs using Plotly. This method is suitable for use in Jupyter notebooks.
Parameters:  scomplex: dict
Simplicial complex is the output from the KeplerMapper map method.
 title: str
Title of output graphic
 graph_layout: igraph layout;
recommended ‘kk’ (kamadakawai) or ‘fr’ (fruchtermanreingold)
 colorscale:
Plotly colorscale(colormap) to color graph nodes
 dashboard: bool, default is False
If true, display complete dashboard of node information
 graph_data: bool, default is False
If true, display graph metadata
 factor_size: double, default is 3
a factor for the node size
 edge_linewidth : double, default is 1.5
 node_linecolor: color str, default is “rgb(200,200,200)”
 width: int, default is 600,
 height: int, default is 500,
 bgcolor: color str, default is “rgba(240, 240, 240, 0.95)”,
 left: int, default is 10,
 bottom: int, default is 35,
 summary_height: int, default is 300,
 summary_width: int, default is 600,
 summary_left: int, default is 20,
 summary_right: int, default is 20,
 hist_left: int, default is 25,
 hist_right: int, default is 25,
 member_textbox_width: int, default is 800,
 filename: str, default is None
if filename is given, the graphic will be saved to that file.