hana_ml.visualizers package¶
The Visualizers Package consists of the following sections:
hana_ml.visualizers.eda¶
This module represents an eda plotter. Matplotlib is used for all visualizations.
-
class
hana_ml.visualizers.eda.
EDAVisualizer
(ax=None, size=None, cmap=None, title=None)¶ Bases:
hana_ml.visualizers.visualizer_base.Visualizer
- Class for all EDA visualizations, including:
Distribution plot
Pie plot
Correlation plot
Scatter plot
Bar plot
Box plot
- Parameters
- axmatplotlib.Axes, optional
The axes to use to plot the figure. Default value : Current axes
- sizetuple of integers, optional
(width, height) of the plot in dpi Default value: Current size of the plot
- titlestr, optional
This plot’s title. Default value : Empty str
- Attributes
Methods
bar_plot
(self, data, column, aggregation[, …])Returns a bar plot for the HANA DataFrame column specified.
box_plot
(self, data, column[, outliers, …])Returns a box plot for the HANA DataFrame column specified.
correlation_plot
(self, data[, key, …])Returns a correlation plot for the HANA DataFrame columns specified.
distribution_plot
(self, data, column, bins)Returns a distribution plot for the HANA DataFrame column specified.
pie_plot
(self, data, column[, explode, …])Returns a pie plot for the HANA DataFrame column specified.
scatter_plot
(self, data, x, y, x_bins, y_bins)Returns a scatter plot for the HANA DataFrame columns specified.
set_ax
(self, ax)Sets the Axes
set_cmap
(self, cmap)Sets the colormap
set_size
(self, size)Sets the size
set_title
(self, title)Sets the title of the plot
-
distribution_plot
(self, data, column, bins, title=None, x_axis_fontsize=10, x_axis_rotation=90, debrief=True)¶ Returns a distribution plot for the HANA DataFrame column specified.
- Parameters
- dataDataFrame
DataFrame to use for the plot.
- columnstr
Column in the DataFrame being plotted.
- binsint
Number of bins to create based on the value of column.
- titlestr, optional
Title for the plot.
- x_axis_fontsizeint, optional
Size of x axis labels
- x_axis_rotationint, optional
Rotation of x axis labels
- debriefbool, optional
Include skewness debrief
- Returns
- axAxes
The axes for the plot.
- bin_datapandas.DataFrame
The data used in the plot.
-
pie_plot
(self, data, column, explode=0.03, title=None, legend=True)¶ Returns a pie plot for the HANA DataFrame column specified.
- Parameters
- dataDataFrame
DataFrame to use for the plot.
- columnstr
Column in the DataFrame being plotted.
- explodefloat, optional
Relative spacing between pie segments.
- titlestr, optional
Title for the plot.
- legendbool, optional
Legend for the plot.
- Returns
- axAxes
The axes for the plot. This can be used to set specific properties for the plot.
- pie_datapandas.DataFrame
The data used in the plot.
-
correlation_plot
(self, data, key=None, corr_cols=None, label=True, cmap='RdYlBu')¶ Returns a correlation plot for the HANA DataFrame columns specified.
- Parameters
- dataDataFrame
DataFrame to use for the plot.
- keystr
Name of ID column.
- corr_colslist of str, optional
Columns in the DataFrame being plotted. If None all numeric columns will be plotted.
- labelbool, optional
Plot a colorbar.
- cmapmatplotlib.pyplot.colormap, optional
Color map to use for the plot.
- Returns
- axAxes
The axes for the plot. This can be used to set specific properties for the plot.
- corrpandas.DataFrame
The data used in the plot.
-
scatter_plot
(self, data, x, y, x_bins, y_bins, title=None, label=True, cmap='Blues', debrief=True, rounding_precision=3)¶ Returns a scatter plot for the HANA DataFrame columns specified.
- Parameters
- dataDataFrame
DataFrame to use for the plot.
- xstr
Column to be plotted on the x axis.
- ystr
Column to be plotted on the y axis.
- x_binsint
Number of x axis bins to create based on the value of column.
- y_binsint
Number of y axis bins to create based on the value of column.
- titlestr, optional
Title for the plot.
- labelstr, optional
Label for the color bar.
- cmapmatplotlib.pyplot.colormap, optional
Color map to use for the plot.
- debriefbool, optional
Include correlation debrief.
- rounding_precisionint, optional
The rounding precision for bin size.
- Returns
- axAxes
The axes for the plot.
- bin_matrixpandas.DataFrame
The data used in the plot.
-
bar_plot
(self, data, column, aggregation, title=None)¶ Returns a bar plot for the HANA DataFrame column specified.
- Parameters
- dataDataFrame
DataFrame to use for the plot.
- columnstr
Column to be aggregated.
- aggregationdict
Aggregation conditions (‘avg’, ‘count’, ‘max’, ‘min’).
- titlestr, optional
Title for the plot.
- Returns
- axAxes
The axes for the plot.
- bar_datapandas.DataFrame
The data used in the plot.
Examples
>>> ax1 = f.add_subplot(111) >>> eda = EDAVisualizer(ax1) >>> ax, bar_data = eda.bar_plot(data=data, column='COLUMN', aggregation={'COLUMN':'count'})
Returns : bar plot (count) of ‘COLUMN’
>>> ax1 = f.add_subplot(111) >>> eda = EDAVisualizer(ax1) >>> ax, bar_data = eda.bar_plot(data=data, column='COLUMN', aggregation={'OTHER_COLUMN':'avg'})
Returns : bar plot (avg) of ‘COLUMN’ against ‘OTHER_COLUMN’
-
box_plot
(self, data, column, outliers=False, title=None, groupby=None)¶ Returns a box plot for the HANA DataFrame column specified.
- Parameters
- dataDataFrame
DataFrame to use for the plot.
- columnstr
Column in the DataFrame being plotted.
- outliersbool
Whether to plot suspected outliers and outliers.
- titlestr, optional
Title for the plot.
- groupbystr, optional
Column to group by and compare.
- Returns
- axAxes
The axes for the plot.
- contpandas.DataFrame
The data used in the plot.
-
ax
¶ Returns the matplotlib Axes where the Visualizer will draw.
-
cmap
¶ Returns the color map being used for the plot.
-
set_ax
(self, ax)¶ Sets the Axes
-
set_cmap
(self, cmap)¶ Sets the colormap
-
set_size
(self, size)¶ Sets the size
-
set_title
(self, title)¶ Sets the title of the plot
-
size
¶ Returns the size of the plot in pixels.
-
title
¶ Returns the title of the plot.
-
class
hana_ml.visualizers.eda.
Profiler
¶ Bases:
object
Class to build a HANA Profiler, including: - Variable descriptions - Missing values % - High cardinality % - Skewness - Numeric distributions - Categorical distributions - Correlations - High correlaton warnings
Methods
description
(self, data, key[, bins, …])Returns a HANA profiler, including: - Variable descriptions - Missing values % - High cardinality % - Skewness - Numeric distributions - Categorical distributions - Correlations - High correlaton warnings
set_size
(self, fig, figsize)Set the size of the data description plot, in inches.
-
description
(self, data, key, bins=20, missing_threshold=10, card_threshold=100, skew_threshold=0.5, figsize=None)¶ Returns a HANA profiler, including: - Variable descriptions - Missing values % - High cardinality % - Skewness - Numeric distributions - Categorical distributions - Correlations - High correlaton warnings
- Parameters
- dataDataFrame
DataFrame to use for the plot.
- keystr, optional
Key in the DataFrame.
- binsint, optional
Number of bins for numeric distributions. Default value = 20.
- missing_thresholdfloat
Percentage threshold to display missing values.
- card_thresholdint
Threshold for column to be considered with high cardinality.
- skew_thresholdfloat
Absolute value threshold for column to be considered as highly skewed.
- tight_layoutbool, optional
Use matplotlib tight layout or not.
- figsizetuple, optional
Size of figure to be plotted. First element is width, second is height.
- Note: categorical columns with cardinality warnings are not plotted.
- Returns
- figFigure
matplotlib axis of the profiler
-
set_size
(self, fig, figsize)¶ Set the size of the data description plot, in inches.
- Parameters
- figax
The returned axes constructed by the description method.
- figsizetuple
Tuple of width and height for the plot.
-
hana_ml.visualizers.metrics¶
This module represents a visualizer for metrics.
-
class
hana_ml.visualizers.metrics.
MetricsVisualizer
(ax=None, size=None, cmap=None, title=None)¶ Bases:
hana_ml.visualizers.visualizer_base.Visualizer
,object
The MetricVisualizer is used to visualize metrics.
- Parameters
- axmatplotlib.Axes, optional
The axes to use to plot the figure. Default value : Current axes
- sizetuple of integers, optional
(width, height) of the plot in dpi Default value: Current size of the plot
- titlestr, optional
This plot’s title. Default value : Empty str
- Attributes
Methods
plot_confusion_matrix
(self, df[, normalize])This function plots the confusion matrix and returns the Axes where this is drawn.
set_ax
(self, ax)Sets the Axes
set_cmap
(self, cmap)Sets the colormap
set_size
(self, size)Sets the size
set_title
(self, title)Sets the title of the plot
-
plot_confusion_matrix
(self, df, normalize=False)¶ This function plots the confusion matrix and returns the Axes where this is drawn.
- Parameters
- dfDataFrame
Data points to the resulting confusion matrix. This dataframe’s columns should match columns (‘CLASS’, ‘’)
-
ax
¶ Returns the matplotlib Axes where the Visualizer will draw.
-
cmap
¶ Returns the color map being used for the plot.
-
set_ax
(self, ax)¶ Sets the Axes
-
set_cmap
(self, cmap)¶ Sets the colormap
-
set_size
(self, size)¶ Sets the size
-
set_title
(self, title)¶ Sets the title of the plot
-
size
¶ Returns the size of the plot in pixels.
-
title
¶ Returns the title of the plot.
hana_ml.visualizers.m4_sampling¶
M4 algorithm for sampling query
-
hana_ml.visualizers.m4_sampling.
get_min_index
(data)¶ Get Minimum Timestamp of Time Series Data
- Parameters
- dataDataFrame
Time series data whose 1st column is index and 2nd one is value.
- Returns
- datetime
Return the minimum timestamp.
-
hana_ml.visualizers.m4_sampling.
get_max_index
(data)¶ Get Maximum Timestamp of Time Series Data
- Parameters
- dataDataFrame
Time series data whose 1st column is index and 2nd one is value.
- Returns
- datetime
Return the maximum timestamp.
-
hana_ml.visualizers.m4_sampling.
m4_sampling
(data, width)¶ M4 algorithm for big data visualization
- Parameters
- dataDataFrame
Data to be sampled. Time seires data whose 1st column is index and 2nd one is value.
- widthint
Sampling Rate. It is an indicator of how many pixels being in the picture.
- Returns
- DataFrame
Return the sampled dataframe.
hana_ml.visualizers.model_debriefing¶
-
class
hana_ml.visualizers.model_debriefing.
TreeModelDebriefing
¶ Bases:
object
- Visualize tree model.
- Dependency packages
- 1.Graphviz
To render the generated DOT source code, you need to install Graphviz.
Download page: https://www.graphviz.org/download/.
Make sure that the directory containing the dot executable is on your system path.
2.graphviz
3.pydotplus
- 4.ipywidgets
To render the Jupyter widgets, you also need to install JupyterLab extension.
Install Page: https://ipywidgets.readthedocs.io/en/latest/user_install.
Methods
tree_debrief
(self, model)Visualize tree model by data in JSON or XML format.
tree_debrief_from_file
(path)Visualize tree model by a DOT, JSON or XML file.
tree_debrief_with_dot
(self, model)Visualize tree model by data in DOT format.
tree_export
(self, model)Export tree model as a JSON or XML file.
tree_export_with_dot
(self, model)Export tree model as a DOT file.
tree_parse
(self, model)Transform tree model content using DOT language.
-
tree_debrief
(self, model)¶ Visualize tree model by data in JSON or XML format.
- Parameters
- modelDataFrame
Tree model.
- Returns
- JSON or XML Component
This object can be rendered by browser.
-
tree_debrief_with_dot
(self, model)¶ Visualize tree model by data in DOT format.
- Parameters
- modelDataFrame
Tree model.
- Returns
- SVG Component
This object can be rendered by browser.
-
tree_parse
(self, model)¶ Transform tree model content using DOT language.
- Parameters
- modelDataFrame
Tree model.
-
tree_export
(self, model)¶ Export tree model as a JSON or XML file.
- Parameters
- modelDataFrame
Tree model.
- Returns
- Interactive Text and Button Widgets
Those widgets can be rendered by browser.
-
tree_export_with_dot
(self, model)¶ Export tree model as a DOT file.
- Parameters
- modelDataFrame
Tree model.
- Returns
- Interactive Text and Button Widgets
Those widgets can be rendered by browser.
-
static
tree_debrief_from_file
(path)¶ Visualize tree model by a DOT, JSON or XML file.
- Parameters
- pathString
File path.
- Returns
- SVG, JSON or XML Component
This object can be rendered by browser.