LinkPrediction
- class hana_ml.algorithms.pal.linkpred.LinkPrediction(method, beta=None, min_score=None, thread_ratio=None)
Link predictor for calculating, in a network, proximity scores between nodes that are not directly linked, which is helpful for predicting missing links(the higher the proximity score is, the more likely the two nodes are to be linked).
- Parameters:
- method{'common_neighbors', 'jaccard', 'adamic_adar', 'katz'}
Method for computing the proximity between 2 nodes that are not directly linked.
- betafloat, optional
A parameter included in the calculation of Katz similarity(proximity) score. The value should be between 0 and 1. A smaller
beta
is preferred. Valid only whenmethod
is 'katz'.Defaults to 0.005.
- min_scorefloat, optional
The links whose scores are lower than
min_score
will be filtered out from the result table.Defaults to 0.
Examples
Input DataFrame df:
>>> df.collect() NODE1 NODE2 0 1 2 1 1 4 ... 8 6 7 9 5 4
Create a LinkPrediction instance:
>>> lp = LinkPrediction(method='common_neighbors', ... beta=0.005, ... min_score=0)
Calculate the proximity score of all nodes in the network with missing links, and check the result:
>>> res = lp.proximity_score(data=df, node1='NODE1', node2='NODE2') >>> res.collect() NODE1 NODE2 SCORE 0 1 3 0.285714 1 1 6 0.142857 ... 9 3 7 0.142857 10 5 6 0.142857
- Attributes:
- None
Methods
Get the model metrics.
Get the score metrics.
proximity_score
(data[, node1, node2])Predicts proximity scores between nodes under current choice of method.
- proximity_score(data, node1=None, node2=None)
Predicts proximity scores between nodes under current choice of method.
- Parameters:
- dataDataFrame
Network data with nodes and links.
Nodes are in columns while links in rows, where each link is represented by a pair of adjacent nodes as (
node1
,node2
).- node1str, optional
Column name of
data
that givesnode1
of all available links.Defaults to the name of the first column of
data
if not provided.- node2str, optional
Column name of
data
that givesnode2
of all available links.Defaults to the name of the last column of
data
if not provided.
- Returns:
- DataFrame
The proximity scores of pairs of nodes with missing links between them that are above 'min_score', structured as follows:
1st column:
node1
of a link2nd column:
node2
of a link3rd column: proximity score of the two nodes
- get_model_metrics()
Get the model metrics.
- Returns:
- DataFrame
The model metrics.
- get_score_metrics()
Get the score metrics.
- Returns:
- DataFrame
The score metrics.
Inherited Methods from PALBase
Besides those methods mentioned above, the LinkPrediction class also inherits methods from PALBase class, please refer to PAL Base for more details.