LinkPrediction

class hana_ml.algorithms.pal.linkpred.LinkPrediction(method, beta=None, min_score=None, thread_ratio=None)

Link predictor for calculating, in a network, proximity scores between nodes that are not directly linked, which is helpful for predicting missing links(the higher the proximity score is, the more likely the two nodes are to be linked).

Parameters:
method{'common_neighbors', 'jaccard', 'adamic_adar', 'katz'}

Method for computing the proximity between 2 nodes that are not directly linked.

betafloat, optional

A parameter included in the calculation of Katz similarity(proximity) score. Valid only when method is 'katz'.

Defaults to 0.005.

min_scorefloat, optional

The links whose scores are lower than min_score will be filtered out from the result table.

Defaults to 0.

Examples

Input dataframe df for training:

>>> df.collect()
   NODE1  NODE2
0      1      2
1      1      4
2      2      3
3      3      4
4      5      1
5      6      2
6      7      4
7      7      5
8      6      7
9      5      4

Create linkpred instance:

>>> lp = LinkPrediction(method='common_neighbors',
...                     beta=0.005,
...                     min_score=0,
...                     thread_ratio=0.2)

Calculate the proximity score of all nodes in the network with missing links, and check the result:

>>> res = lp.proximity_score(data=df, node1='NODE1', node2='NODE2')
>>> res.collect()
    NODE1  NODE2     SCORE
0       1      3  0.285714
1       1      6  0.142857
2       1      7  0.285714
3       2      4  0.285714
4       2      5  0.142857
5       2      7  0.142857
6       4      6  0.142857
7       3      5  0.142857
8       3      6  0.142857
9       3      7  0.142857
10      5      6  0.142857
Attributes:
None

Methods

proximity_score(data[, node1, node2])

For predicting proximity scores between nodes under current choice of method.

proximity_score(data, node1=None, node2=None)

For predicting proximity scores between nodes under current choice of method.

Parameters:
dataDataFrame

Network data with nodes and links.

Nodes are in columns while links in rows, where each link is represented by a pair of adjacent nodes as (node1, node2).

node1str, optional

Column name of data that gives node1 of all available links (see data).

Defaults to the name of the first column of data if not provided.

node2str, optional

Column name of data that gives node2 of all available links (see data).

Defaults to the name of the last column of data if not provided.

Returns:
DataFrame

The proximity scores of pairs of nodes with missing links between them that are above 'min_score', structured as follows:

  • 1st column: node1 of a link

  • 2nd column: node2 of a link

  • 3rd column: proximity score of the two nodes

property fit_hdbprocedure

Returns the generated hdbprocedure for fit.

property predict_hdbprocedure

Returns the generated hdbprocedure for predict.

Inherited Methods from PALBase

Besides those methods mentioned above, the LinkPrediction class also inherits methods from PALBase class, please refer to PAL Base for more details.