LinkPrediction
- class hana_ml.algorithms.pal.linkpred.LinkPrediction(method, beta=None, min_score=None, thread_ratio=None)
Link predictor for calculating, in a network, proximity scores between nodes that are not directly linked, which is helpful for predicting missing links(the higher the proximity score is, the more likely the two nodes are to be linked).
- Parameters:
- method{'common_neighbors', 'jaccard', 'adamic_adar', 'katz'}
Method for computing the proximity between 2 nodes that are not directly linked.
- betafloat, optional
A parameter included in the calculation of Katz similarity(proximity) score. The value should be between 0 and 1. A smaller
betais preferred. Valid only whenmethodis 'katz'.Defaults to 0.005.
- min_scorefloat, optional
The links whose scores are lower than
min_scorewill be filtered out from the result table.Defaults to 0.
- Attributes:
- None
Methods
proximity_score(data[, node1, node2])Predicts proximity scores between nodes under current choice of method.
Examples
Input DataFrame df:
>>> df.collect() NODE1 NODE2 0 1 2 1 1 4 ... 8 6 7 9 5 4
Create a LinkPrediction instance:
>>> lp = LinkPrediction(method='common_neighbors', ... beta=0.005, ... min_score=0)
Calculate the proximity score of all nodes in the network with missing links, and check the result:
>>> res = lp.proximity_score(data=df, node1='NODE1', node2='NODE2') >>> res.collect() NODE1 NODE2 SCORE 0 1 3 0.285714 1 1 6 0.142857 ... 9 3 7 0.142857 10 5 6 0.142857
- proximity_score(data, node1=None, node2=None)
Predicts proximity scores between nodes under current choice of method.
- Parameters:
- dataDataFrame
Network data with nodes and links.
Nodes are in columns while links in rows, where each link is represented by a pair of adjacent nodes as (
node1,node2).- node1str, optional
Column name of
datathat givesnode1of all available links.Defaults to the name of the first column of
dataif not provided.- node2str, optional
Column name of
datathat givesnode2of all available links.Defaults to the name of the last column of
dataif not provided.
- Returns:
- DataFrame
The proximity scores of pairs of nodes with missing links between them that are above 'min_score', structured as follows:
1st column:
node1of a link2nd column:
node2of a link3rd column: proximity score of the two nodes
Inherited Methods from PALBase
Besides those methods mentioned above, the LinkPrediction class also inherits methods from PALBase class, please refer to PAL Base for more details.