hana_ml.graph.algorithms package
This package contains various algorithms you can use to explore and work on a graph.
The general pattern is: Create an algorithm object instance (which expects a Graph object in the constructor) and then call execute(<parameters>) on that algorithm instance. This can be combined in one statement.
>>> import hana_ml.graph.algorithms as hga
>>> sp = hga.ShortestPath(graph=g).execute(source="1", target="3")
The execute statement always returns the algorithm instance itself, so that it can used to access the result properties.
>>> print("Vertices", sp.vertices)
>>> print("Edges", sp.edges)
>>> print("Weight:", sp.weight)
If you want to create a new algorithm, have a closer look at
algorithm_base.AlgorithmBase
.
The following algorithms are available:
- class hana_ml.graph.algorithms.ShortestPath(graph: Graph)
Bases:
AlgorithmBase
Given a source and target vertex_key with optional weight and direction, get the shortest path between them.
The procedure may fail for HANA versions prior to SP05 therefore this is checked at execution time.
The user can take the results and visualize them with libraries such as networkX using the
edges()
property.The calculation is started by calling
execute()
.Examples
>>> import hana_ml.graph.algorithms as hga >>> sp = hga.ShortestPath(graph=g).execute(source="1", target="3") >>> print("Vertices", sp.vertices) >>> print("Edges", sp.edges) >>> print("Weight:", sp.weight)
- execute(source: str, target: str, weight: str = None, direction: str = 'OUTGOING') ShortestPath
Executes the calculation of the shortest path.
- Parameters:
- sourcestr
Vertex key from which the shortest path will start.
- targetstr
Vertex key from which the shortest path will end.
- weightstr, optional
Variable for column name to which to apply the weight.
Defaults to None.
- directionstr, optional
OUTGOING, INCOMING, or ANY which determines the algorithm results.
Defaults to OUTGOING.
- Returns:
- ShortestPath
ShortestPath object instance
- property vertices: DataFrame
- Returns:
- pandas.Dataframe
A Pandas DataFrame that contains the vertices of the shortest path
- property edges: DataFrame
- Returns:
- pandas.Dataframe
A Pandas DataFrame that contains the edges of the shortest path
- property weight: float
Weight of the shortest path. Returns 1.0 if no weight column was provided to the execute() call. Returns -1.0 as initial value.
- Returns:
- float
Weight of the shortest path.
- class hana_ml.graph.algorithms.Neighbors(graph: Graph)
Bases:
_NeighborsBase
Get a virtual subset of the graph based on a start_vertex and all vertices within a lower_bound->upper_bound count of degrees of separation.
The calculation is started by calling
execute()
.Examples
>>> import hana_ml.graph.algorithms as hga >>> nb = hga.Neighbors(graph=g).execute(start_vertex="1") >>> print("Vertices", nb.vertices)
- execute(start_vertex: str, direction: str = 'OUTGOING', lower_bound: int = 1, upper_bound: int = 1) Neighbors
Executes the calculation of the neighbors.
- Parameters:
- start_vertexstr
Source from which the subset is based.
- directionstr, optional
OUTGOING, INCOMING, or ANY which determines the algorithm results.
Defaults to OUTGOING.
- lower_boundint, optional
The count of degrees of separation from which to start considering neighbors. If you want to include the start node into consideration, set lower_bound=0.
Defaults to 1.
- upper_boundint, optional
The count of degrees of separation at which to end considering neighbors.
Defaults to 1.
- Returns:
- Neighbors
Neighbors object instance
- property vertices: DataFrame
- Returns:
- pandas.Dataframe
A Pandas DataFrame that contains the vertices
- class hana_ml.graph.algorithms.NeighborsSubgraph(graph: Graph)
Bases:
_NeighborsBase
Get a virtual subset of the graph based on a start_vertex and all vertices within a lower_bound->upper_bound count of degrees of separation. The result is similar to
neighbors()
but includes edges which could be useful for visualization.Note: The edges table also contains edges between neighbors, if there are any (not only edges from the start vertex).
The calculation is started by calling
execute()
.Examples
>>> import hana_ml.graph.algorithms as hga >>> nb = hga.NeighborsSubgraph(graph=g).execute(start_vertex="1") >>> print("Vertices", nb.vertices) >>> print("Edges", nb.edges)
- execute(start_vertex: str, direction: str = 'OUTGOING', lower_bound: int = 1, upper_bound: int = 1) NeighborsSubgraph
Executes the calculation of the neighbors with edges.
- Parameters:
- start_vertexstr
Source from which the subset is based.
- directionstr, optional
OUTGOING, INCOMING, or ANY which determines the algorithm results.
Defaults to OUTGOING.
- lower_boundint, optional
The count of degrees of separation from which to start considering neighbors. If you want to include the start node into consideration, set lower_bound=0.
Defaults to 1.
- upper_boundint, optional
The count of degrees of separation at which to end considering neighbors.
Defaults to 1.
- Returns:
- NeighborsSubgraph
NeighborsSubgraph object instance
- property vertices: DataFrame
- Returns:
- pandas.Dataframe
A Pandas DataFrame that contains the vertices
- property edges: DataFrame
- Returns:
- pandas.Dataframe
A Pandas DataFrame that contains the edges between neighbors
- class hana_ml.graph.algorithms.KShortestPaths(graph: Graph)
Bases:
AlgorithmBase
Given a source and target vertex_key with optional weight, get the the Top-k shortest paths between them.
The procedure may fail for HANA versions prior to SP05 therefore this is checked at execution time.
The calculation is started by calling
execute()
.Examples
>>> import hana_ml.graph.algorithms as hga >>> topk = hga.KShortestPaths(graph=g).execute(source="1", target="3", k=3)
>>> print("Paths", topk.paths)
- execute(source: str, target: str, k: int, weight: str = None) KShortestPaths
Executes the calculation of the top-k shortest paths.
- Parameters:
- sourcestr
Vertex key from which the shortest path will start.
- targetstr
Vertex key from which the shortest path will end.
- kint
Number of paths that will be calculated
- weightstr, optional
Variable for column name to which to apply the weight.
Defaults to None.
- Returns:
- KShortestPaths
KShortestPaths object instance
- property paths: DataFrame
- Returns:
- pandas.Dataframe
A Pandas DataFrame that contains the paths
- class hana_ml.graph.algorithms.TopologicalSort(graph: Graph)
Bases:
AlgorithmBase
Calculates the topological sort if possible.
A topological ordering of a directed graph is a linear ordering such that for each edge the source vertex comes before the target vertex in the row.
The topological order is not necessarily unique. A directed graph is topological sortable if and only if it does not contain any directed cycles.
There are some common used algorithms for finding a topological order in the input directed graph. Our implementation is based on the depth-first search.
In case the directed graph contains a directed cycle, the Boolean property
is_sortable()
returns with the value 'False'. Otherwise, the algorithm returns a topological order.The calculation is started by calling
execute()
.Examples
>>> import hana_ml.graph.algorithms as hga >>> ts = hga.TopologicalSort(graph=g).execute() >>> print("Vertices", ts.vertices) >>> print("Sortable", ts.is_sortable)
- execute() TopologicalSort
Executes the topological sort.
- Returns:
- TopologicalSort
TopologicalSort object instance
- property vertices: DataFrame
- Returns:
- pandas.Dataframe
A Pandas DataFrame that contains the topologically sorted vertices
- property is_sortable: bool
Flag if the graph is topologically sortable or not. (e.g. false for cyclic graphs)
- Returns:
- bool
Weight of the shortest path.
- class hana_ml.graph.algorithms.ShortestPathsOneToAll(graph: Graph)
Bases:
AlgorithmBase
Calculates the shortest paths from a start vertex to all other vertices in the graph.
The procedure may fail for HANA versions prior to SP05 therefore this is checked at execution time.
The calculation is started by calling
execute()
.Examples
>>> import hana_ml.graph.algorithms as hga >>> spoa = hga.ShortestPathsOneToAll(graph=g).execute( source=2257, direction='OUTGOING', weight='DIST_KM' ) >>> print("Vertices", spoa.vertices) >>> print("Edges", spoa.edges)
- execute(source: str, weight: str = None, direction: str = 'OUTGOING') ShortestPathsOneToAll
Executes the calculation of the shortest paths one to all.
- Parameters:
- sourcestr
Vertex key from which the shortest paths one to all will start.
- weightstr, optional
Variable for column name to which to apply the weight.
Defaults to None.
- directionstr, optional
OUTGOING, INCOMING, or ANY which determines the algorithm results.
Defaults to OUTGOING.
- Returns:
- ShortestPathsOneToAll
ShortestPathOneToAll object instance
- property vertices: DataFrame
- Returns:
- pandas.Dataframe
A Pandas DataFrame that contains the vertices and the distance to the start vertex
- property edges: DataFrame
- Returns:
- pandas.Dataframe
A Pandas DataFrame that contains the edges which are on one of the shortest paths
- class hana_ml.graph.algorithms.StronglyConnectedComponents(graph: Graph)
Bases:
AlgorithmBase
Identifies the strongly connected components of a graph.
A directed graph is called strongly connected if each of its vertices is reachable of every other ones. Being strongly connected is an equivalence relation and therefore the strongly connected components (scc) of the graph form a partition on the vertex set.
The induced subgraphs on these subsets are the strongly connected components. Note, that each vertex of the graph is part of exactly one scc, but not every edge is part of any scc (if yes, then in only one scc).
In case each scc contains only one vertex, the graph is a directed acyclic graph, as in all strongly connected graphs there should exist a cycle on all of its vertices.
The calculation is started by calling
execute()
.Examples
>>> import hana_ml.graph.algorithms as hga >>> scc = hga.StronglyConnectedComponents(graph=g).execute() >>> print("Vertices", scc.vertices) >>> print("Components", scc.components) >>> print("Number of Components", scc.components_count)
- execute() StronglyConnectedComponents
Executes Strongly Connected Components.
- Returns:
- StronglyConnectedComponents
StronglyConnectedComponents object instance
- property vertices: DataFrame
- Returns:
- pandas.Dataframe
A Pandas DataFrame that contains an assignment of each vertex to a strongly connected component.
- property components: DataFrame
- Returns:
- pandas.Dataframe
A Pandas DataFrame that contains strongly connected components and number of vertices in each component.
- property components_count: int
- Returns:
- Int
The number of strongly connected components in the graph.
- class hana_ml.graph.algorithms.WeaklyConnectedComponents(graph: Graph)
Bases:
AlgorithmBase
Identifies (weakly) connected components.
An undirected graph is called connected if each of its vertices is reachable of every other ones. A directed graph is called weakly connected if the undirected graph, naturally derived from that, is connected. Being weakly connected is an equivalence relation between vertices and therefore the weakly connected components (wcc) of the graph form a partition on the vertex set.
The induced subgraphs on these subsets are the weakly connected components. Note, that each vertex end each edge of the graph is part of exactly one wcc.
The calculation is started by calling
execute()
.Examples
>>> import hana_ml.graph.algorithms as hga >>> cc = hga.WeaklyConnectedComponents(graph=g).execute() >>> print("Vertices", cc.vertices) >>> print("Components", cc.components) >>> print("Number of Components", cc.components_count)
- execute() WeaklyConnectedComponents
Executes the connected component.
- Returns:
- WeaklyConnectedComponents
WeaklyConnectedComponents object instance
- property vertices: DataFrame
- Returns:
- pandas.Dataframe
A Pandas DataFrame that contains the [wie in strongly connected components]
- property components: DataFrame
- Returns:
- pandas.Dataframe
A Pandas DataFrame that contains connected components and number of vertices in each component.
- property components_count: int
- Returns:
- int
The number of weakly connected components in the graph.
- class hana_ml.graph.algorithms.algorithm_base.AlgorithmBase(graph: Graph)
Bases:
object
Algorithm base class, every algorithm should derive from.
To implement a new algorithm you have to do the following:
Create a new class, which derives from
AlgorithmBase
- Implement the constructor
Set self._graph_script. It can contain {key} templates witch are processed by self._graph_script.format() at runtime. Here is an example from
ShortestPath
implementation:
>>> self._graph_script = """ DO ( IN i_startVertex {vertex_dtype} => '{start_vertex}', IN i_endVertex {vertex_dtype} => '{end_vertex}', IN i_direction NVARCHAR(10) => '{direction}', OUT o_vertices TABLE ({vertex_columns}, "VERTEX_ORDER" BIGINT) => ?, OUT o_edges TABLE ({edge_columns}, "EDGE_ORDER" BIGINT) => ?, OUT o_scalars TABLE ("WEIGHT" DOUBLE) => ? ) LANGUAGE GRAPH BEGIN GRAPH g = Graph("{schema}", "{workspace}"); VERTEX v_start = Vertex(:g, :i_startVertex); VERTEX v_end = Vertex(:g, :i_endVertex); {weighted_definition} o_vertices = SELECT {vertex_select}, :VERTEX_ORDER FOREACH v IN Vertices(:p) WITH ORDINALITY AS VERTEX_ORDER; o_edges = SELECT {edge_select}, :EDGE_ORDER FOREACH e IN Edges(:p) WITH ORDINALITY AS EDGE_ORDER; DOUBLE p_weight= DOUBLE(WEIGHT(:p)); o_scalars."WEIGHT"[1L] = :p_weight; END; """
Set self._graph_script_vars. This is a dictionary with tuples which define the parameters being used and replaced in the self._graph_script template. The key of the dictionary is the name of the placeholder in the graph script.
You can either map it to parameters from
execute()
's signature or assign default values. If you have placeholders in the script, which need to be calculated, you can fo that by overwriting_process_parameters()
and set them there. The Each tuple in the list is expected to have the following format:
>>> { "name_in_graph_script": ( "parameter name|None", : Parameter name expected in execute() signature. If the placeholder is only needed in the script, this can be set to None. Then it's not expected as a signiture parameter. mandatory: True|False : Needs to be passed to execute() type|None, : Expected Type default value|None : Default value for optional parameters or parameters not part of the execute() signature ) }
The default value do not need to be static, but they can be dynamic as well. There are also some convenient functions available to provide the most common replacement strings in scripts. Here is a more complex example from the
ShortestPath
implementation:>>> self._graph_script_vars = { "start_vertex": ("source", True, None, None), "end_vertex": ("target", True, None, None), "weight": ("weight", False, str, None), "direction": ("direction", False, str, DEFAULT_DIRECTION), "schema": (None, False, str, self._graph.schema), "workspace": (None, False, str, self._graph.workspace_name), "vertex_dtype": (None, False, str, self._graph.vertex_key_col_dtype), "vertex_columns": (None, False, str, self._default_vertex_cols()), "edge_columns": (None, False, str, self._default_edge_cols()), "vertex_select": (None, False, str, self._default_vertex_select("v")), "edge_select": (None, False, str, self._default_edge_select("e")), }
If necessary, overwrite
_process_parameters()
If necessary, overwrite
_validate_parameters()
- Parameters:
- graph: Graph
Graph object, the algorithm is executed on
Methods
execute
(**kwargs)Execute the algorithm
projection_expr_from_cols
(source, variable)Turn columns into a string for projection expression
signature_from_cols
(source[, column_filter])Turn columns into a string for script parameters
- static signature_from_cols(source: DataFrame, column_filter: list = None) str
Turn columns into a string for script parameters
A common pattern in graph scripts is the definition of OUT parameters based on tables: OUT o_edges TABLE (<edge_columns>) => ?. Where the edge_columns are dynamically depending on the graph definition. Therefore they need to be derived at runtime from the give graph's edges or vertices.
This helper method turns the edge or graph columns (or a subset) into a string that can be used in the graph script replacing a placeholder.
- Parameters:
- source: DataFrame
The DataFrame to read the columns from
- column_filter: list, optional
Subset of columns to be considered. If None, all columns are used
- Returns:
- strString in the form of "<column name>" <data_type>
Example: "edge_id" INT, "from" INT, "to" INT
- static projection_expr_from_cols(source: DataFrame, variable: str, column_filter: list = None) str
Turn columns into a string for projection expression
A common pattern in graph script is the assignment of projection expressions to an OUT parameter. These expressions define a SELECT statement from a container element and therefore need a list of columns that should be selected. Example: SELECT <columns> FOREACH <variable> IN Edges(...)
This helper method turns the edge or graph columns (or a subset) into a string that can be used in the select statement, replacing a placeholder.
- Parameters:
- source:
The DataFrame to read the columns from
- variable:
Name of the iterator variable used in the script. Will prefix the column names.
- column_filter:
Subset of columns to be considered. If None,all columns are used
- Returns:
- strString in the form of ':<variable>."<column name>"'
Example: :e."edge_id", :e."from", :e."to"
- _default_vertex_cols() str
Convenient method, that calls
signature_from_cols()
with just the vertex key column from the graph- Returns:
- str'"<key_column_name>" <key_column_datatype>'
Example: '"guid" INT'
- _default_edge_cols() str
Convenient method, that calls
signature_from_cols()
with just the following edge columns from the graph:edge_key_column
edge_target_column
edge_source_column
- Returns:
- str'"<key_col>" <col_dtyp>, "<source_col>" <source_dtyp>, "<target_col>" <target_dtyp>'
Example: '"edge_id" INT, "from" INT, "to" INT'
- _default_vertex_select(variable: str) str
Convenient method, that calls
projection_expr_from_cols()
with just the vertex key column from the graph:- Parameters:
- variable:
Name of the iterator variable used in the script. Will prefix the column names.
- Returns:
- str':<variable>."<key_col>"'
Example: ':v."guid"'
- _default_edge_select(variable: str) str
Convenient method, that calls
projection_expr_from_cols()
with just the following edge columns from the graph:edge_key_column
edge_target_column
edge_source_column
- Parameters:
- variable:
Name of the iterator variable used in the script. Will prefix the column names.
- Returns:
- str':<variable>."<key_col>", :<variable>."<source_col>", :<variable>."<target_col>"'
Example: ':e."edge_id", :e."from", :e."to"'
- _process_parameters(arguments)
Validates and processes the parameters provided when calling
execute()
. The results are stored in the _template_vals dictionary. Every placeholder key value is mapped to the corresponding value according to the _graph_script_vars definition. The _template_vals dictionary is passed to the _graph_script.format() placeholder replacement.If you need additional parameter to be added to the dictionary, simply overwrite this method in your algorithm implementation class. Make sure, you still call super()._process_parameters(arguments). You can then add additional placeholder-keys to the dictionary or modify existing values, before they get replaced in the script template.
- Parameters:
- arguments: kwargs
Arguments provided to
execute()
by the caller
Examples
>>> super()._process_parameters(arguments) >>> self._templ_vals["my_value_in_script"] = "Replacement Text"
- _validate_parameters()
This method is called after the input parameters are processed and mapped. It can be overwritten to implement specific validity checks.
Examples
>>> # Version check >>> if int(self._graph.connection_context.hana_major_version()) < 4: raise EnvironmentError( "SAP HANA version is not compatible with this method" )
- execute(**kwargs)
Execute the algorithm
- Parameters:
- kwargs:
List of keyword parameters as specified by the implementing class
- Returns:
- selfAlgorithmBase for command chaining