hana_ml.visualizers package

The Visualizers Package consists of the following sections:

hana_ml.visualizers.eda

This module represents an eda plotter. Matplotlib is used for all visualizations.

class hana_ml.visualizers.eda.EDAVisualizer(ax=None, size=None, cmap=None, title=None)

Bases: hana_ml.visualizers.visualizer_base.Visualizer

Class for all EDA visualizations, including:
  • Distribution plot

  • Pie plot

  • Correlation plot

  • Scatter plot

  • Bar plot

  • Box plot

Parameters
axmatplotlib.Axes, optional

The axes to use to plot the figure. Default value : Current axes

sizetuple of integers, optional

(width, height) of the plot in dpi Default value: Current size of the plot

titlestr, optional

This plot’s title. Default value : Empty str

Attributes
ax

Returns the matplotlib Axes where the Visualizer will draw.

cmap

Returns the color map being used for the plot.

size

Returns the size of the plot in pixels.

title

Returns the title of the plot.

Methods

bar_plot(self, data, column, aggregation[, …])

Returns a bar plot for the HANA DataFrame column specified.

box_plot(self, data, column[, outliers, …])

Returns a box plot for the HANA DataFrame column specified.

correlation_plot(self, data[, key, …])

Returns a correlation plot for the HANA DataFrame columns specified.

distribution_plot(self, data, column, bins)

Returns a distribution plot for the HANA DataFrame column specified.

pie_plot(self, data, column[, explode, …])

Returns a pie plot for the HANA DataFrame column specified.

scatter_plot(self, data, x, y, x_bins, y_bins)

Returns a scatter plot for the HANA DataFrame columns specified.

set_ax(self, ax)

Sets the Axes

set_cmap(self, cmap)

Sets the colormap

set_size(self, size)

Sets the size

set_title(self, title)

Sets the title of the plot

distribution_plot(self, data, column, bins, title=None, x_axis_fontsize=10, x_axis_rotation=90, debrief=True)

Returns a distribution plot for the HANA DataFrame column specified.

Parameters
dataDataFrame

DataFrame to use for the plot.

columnstr

Column in the DataFrame being plotted.

binsint

Number of bins to create based on the value of column.

titlestr, optional

Title for the plot.

x_axis_fontsizeint, optional

Size of x axis labels

x_axis_rotationint, optional

Rotation of x axis labels

debriefbool, optional

Include skewness debrief

Returns
axAxes

The axes for the plot.

bin_datapandas.DataFrame

The data used in the plot.

pie_plot(self, data, column, explode=0.03, title=None, legend=True)

Returns a pie plot for the HANA DataFrame column specified.

Parameters
dataDataFrame

DataFrame to use for the plot.

columnstr

Column in the DataFrame being plotted.

explodefloat, optional

Relative spacing between pie segments.

titlestr, optional

Title for the plot.

legendbool, optional

Legend for the plot.

Returns
axAxes

The axes for the plot. This can be used to set specific properties for the plot.

pie_datapandas.DataFrame

The data used in the plot.

correlation_plot(self, data, key=None, corr_cols=None, label=True, cmap='RdYlBu')

Returns a correlation plot for the HANA DataFrame columns specified.

Parameters
dataDataFrame

DataFrame to use for the plot.

keystr

Name of ID column.

corr_colslist of str, optional

Columns in the DataFrame being plotted. If None all numeric columns will be plotted.

labelbool, optional

Plot a colorbar.

cmapmatplotlib.pyplot.colormap, optional

Color map to use for the plot.

Returns
axAxes

The axes for the plot. This can be used to set specific properties for the plot.

corrpandas.DataFrame

The data used in the plot.

scatter_plot(self, data, x, y, x_bins, y_bins, title=None, label=True, cmap='Blues', debrief=True, rounding_precision=3)

Returns a scatter plot for the HANA DataFrame columns specified.

Parameters
dataDataFrame

DataFrame to use for the plot.

xstr

Column to be plotted on the x axis.

ystr

Column to be plotted on the y axis.

x_binsint

Number of x axis bins to create based on the value of column.

y_binsint

Number of y axis bins to create based on the value of column.

titlestr, optional

Title for the plot.

labelstr, optional

Label for the color bar.

cmapmatplotlib.pyplot.colormap, optional

Color map to use for the plot.

debriefbool, optional

Include correlation debrief.

rounding_precisionint, optional

The rounding precision for bin size.

Returns
axAxes

The axes for the plot.

bin_matrixpandas.DataFrame

The data used in the plot.

bar_plot(self, data, column, aggregation, title=None)

Returns a bar plot for the HANA DataFrame column specified.

Parameters
dataDataFrame

DataFrame to use for the plot.

columnstr

Column to be aggregated.

aggregationdict

Aggregation conditions (‘avg’, ‘count’, ‘max’, ‘min’).

titlestr, optional

Title for the plot.

Returns
axAxes

The axes for the plot.

bar_datapandas.DataFrame

The data used in the plot.

Examples

>>> ax1 = f.add_subplot(111)
>>> eda = EDAVisualizer(ax1)
>>> ax, bar_data = eda.bar_plot(data=data, column='COLUMN',
                                aggregation={'COLUMN':'count'})

Returns : bar plot (count) of ‘COLUMN’

>>> ax1 = f.add_subplot(111)
>>> eda = EDAVisualizer(ax1)
>>> ax, bar_data = eda.bar_plot(data=data, column='COLUMN',
                                aggregation={'OTHER_COLUMN':'avg'})

Returns : bar plot (avg) of ‘COLUMN’ against ‘OTHER_COLUMN’

box_plot(self, data, column, outliers=False, title=None, groupby=None)

Returns a box plot for the HANA DataFrame column specified.

Parameters
dataDataFrame

DataFrame to use for the plot.

columnstr

Column in the DataFrame being plotted.

outliersbool

Whether to plot suspected outliers and outliers.

titlestr, optional

Title for the plot.

groupbystr, optional

Column to group by and compare.

Returns
axAxes

The axes for the plot.

contpandas.DataFrame

The data used in the plot.

ax

Returns the matplotlib Axes where the Visualizer will draw.

cmap

Returns the color map being used for the plot.

set_ax(self, ax)

Sets the Axes

set_cmap(self, cmap)

Sets the colormap

set_size(self, size)

Sets the size

set_title(self, title)

Sets the title of the plot

size

Returns the size of the plot in pixels.

title

Returns the title of the plot.

class hana_ml.visualizers.eda.Profiler

Bases: object

Class to build a HANA Profiler, including: - Variable descriptions - Missing values % - High cardinality % - Skewness - Numeric distributions - Categorical distributions - Correlations - High correlaton warnings

Methods

description(self, data, key[, bins, …])

Returns a HANA profiler, including: - Variable descriptions - Missing values % - High cardinality % - Skewness - Numeric distributions - Categorical distributions - Correlations - High correlaton warnings

set_size(self, fig, figsize)

Set the size of the data description plot, in inches.

description(self, data, key, bins=20, missing_threshold=10, card_threshold=100, skew_threshold=0.5, figsize=None)

Returns a HANA profiler, including: - Variable descriptions - Missing values % - High cardinality % - Skewness - Numeric distributions - Categorical distributions - Correlations - High correlaton warnings

Parameters
dataDataFrame

DataFrame to use for the plot.

keystr, optional

Key in the DataFrame.

binsint, optional

Number of bins for numeric distributions. Default value = 20.

missing_thresholdfloat

Percentage threshold to display missing values.

card_thresholdint

Threshold for column to be considered with high cardinality.

skew_thresholdfloat

Absolute value threshold for column to be considered as highly skewed.

tight_layoutbool, optional

Use matplotlib tight layout or not.

figsizetuple, optional

Size of figure to be plotted. First element is width, second is height.

Note: categorical columns with cardinality warnings are not plotted.
Returns
figFigure

matplotlib axis of the profiler

set_size(self, fig, figsize)

Set the size of the data description plot, in inches.

Parameters
figax

The returned axes constructed by the description method.

figsizetuple

Tuple of width and height for the plot.

hana_ml.visualizers.metrics

This module represents a visualizer for metrics.

class hana_ml.visualizers.metrics.MetricsVisualizer(ax=None, size=None, cmap=None, title=None)

Bases: hana_ml.visualizers.visualizer_base.Visualizer, object

The MetricVisualizer is used to visualize metrics.

Parameters
axmatplotlib.Axes, optional

The axes to use to plot the figure. Default value : Current axes

sizetuple of integers, optional

(width, height) of the plot in dpi Default value: Current size of the plot

titlestr, optional

This plot’s title. Default value : Empty str

Attributes
ax

Returns the matplotlib Axes where the Visualizer will draw.

cmap

Returns the color map being used for the plot.

size

Returns the size of the plot in pixels.

title

Returns the title of the plot.

Methods

plot_confusion_matrix(self, df[, normalize])

This function plots the confusion matrix and returns the Axes where this is drawn.

set_ax(self, ax)

Sets the Axes

set_cmap(self, cmap)

Sets the colormap

set_size(self, size)

Sets the size

set_title(self, title)

Sets the title of the plot

plot_confusion_matrix(self, df, normalize=False)

This function plots the confusion matrix and returns the Axes where this is drawn.

Parameters
dfDataFrame

Data points to the resulting confusion matrix. This dataframe’s columns should match columns (‘CLASS’, ‘’)

ax

Returns the matplotlib Axes where the Visualizer will draw.

cmap

Returns the color map being used for the plot.

set_ax(self, ax)

Sets the Axes

set_cmap(self, cmap)

Sets the colormap

set_size(self, size)

Sets the size

set_title(self, title)

Sets the title of the plot

size

Returns the size of the plot in pixels.

title

Returns the title of the plot.

hana_ml.visualizers.m4_sampling

M4 algorithm for sampling query

hana_ml.visualizers.m4_sampling.get_min_index(data)

Get Minimum Timestamp of Time Series Data

Parameters
dataDataFrame

Time series data whose 1st column is index and 2nd one is value.

Returns
datetime

Return the minimum timestamp.

hana_ml.visualizers.m4_sampling.get_max_index(data)

Get Maximum Timestamp of Time Series Data

Parameters
dataDataFrame

Time series data whose 1st column is index and 2nd one is value.

Returns
datetime

Return the maximum timestamp.

hana_ml.visualizers.m4_sampling.m4_sampling(data, width)

M4 algorithm for big data visualization

Parameters
dataDataFrame

Data to be sampled. Time seires data whose 1st column is index and 2nd one is value.

widthint

Sampling Rate. It is an indicator of how many pixels being in the picture.

Returns
DataFrame

Return the sampled dataframe.

hana_ml.visualizers.model_debriefing

class hana_ml.visualizers.model_debriefing.TreeModelDebriefing

Bases: object

Visualize tree model.

Methods

tree_debrief(self, model)

Visualize tree model by data in JSON or XML format.

tree_debrief_from_file(path)

Visualize tree model by a DOT, JSON or XML file.

tree_debrief_with_dot(self, model)

Visualize tree model by data in DOT format.

tree_export(self, model)

Export tree model as a JSON or XML file.

tree_export_with_dot(self, model)

Export tree model as a DOT file.

tree_parse(self, model)

Transform tree model content using DOT language.

tree_debrief(self, model)

Visualize tree model by data in JSON or XML format.

Parameters
modelDataFrame

Tree model.

Returns
JSON or XML Component

This object can be rendered by browser.

tree_debrief_with_dot(self, model)

Visualize tree model by data in DOT format.

Parameters
modelDataFrame

Tree model.

Returns
SVG Component

This object can be rendered by browser.

tree_parse(self, model)

Transform tree model content using DOT language.

Parameters
modelDataFrame

Tree model.

tree_export(self, model)

Export tree model as a JSON or XML file.

Parameters
modelDataFrame

Tree model.

Returns
Interactive Text and Button Widgets

Those widgets can be rendered by browser.

tree_export_with_dot(self, model)

Export tree model as a DOT file.

Parameters
modelDataFrame

Tree model.

Returns
Interactive Text and Button Widgets

Those widgets can be rendered by browser.

static tree_debrief_from_file(path)

Visualize tree model by a DOT, JSON or XML file.

Parameters
pathString

File path.

Returns
SVG, JSON or XML Component

This object can be rendered by browser.