Link Prediction — hanaml.LinkPredict • hana.ml.r

hanaml.LinkPredict is an R wrapper for SAP HANA PAL link prediction algorithm.

hanaml.LinkPredict(
  data,
  used.cols = NULL,
  method,
  katz.beta = NULL,
  min.score = NULL
)

Arguments

data: DataFrame
DataFrame containing links among all nodes in a social network.
used.cols: list of characters, optional
This parameter specifies the two columns for the two node information of all those links in data. In the settings here, one node is named "node1", and another node named "node2". Defaults to the 1st and 2nd column of data if not provided.
method: c("common.neighbors", "jaccard", "adamic.adar", "katz")
Method for predicting potential missings links between nodes.
katz.beta: double, optional
The beta parameter for the 'katz' method. The value should be between 0 and 1. Values closer to 0 are ususally prefered.
Only valid when method is "katz".
Defaults to 0.005.
min.score: double, optional
Links prediction algorithms compute scores for all pair of nodes with missing links. A link is assumed to exist only if the computed score is above `min.score`, and the links whose scores are lower than this threshold will be filtered out from the result table.
Defaults to 0.

Value

DataFrame
The data frame that contains the computed scores of all missing links in a network.

Details

Predicting potential missing links between different nodes is a common task in social network analysis. Link prediction algorithms compute the distance of any two nodes using existing links in a social network, and make prediction on the missing links based on these distances..

Examples

Social networks data that contain existing links between nodes:


> data$Collect()
   NODE1 NODE2
1      1     2
2      1     4
3      2     3
4      3     4
5      5     1
6      6     2
7      7     4
8      7     5
9      6     7
10     5     4

Creating a LinkPredict instance for predicting potential missing links between all nodes:


> lp <- hanaml.LinkPredict(data = data,
                           used.cols = c(node1 = "NODE1", node2 = "NODE2"),
                           method = "common.neighbors")

Output:


> lp$result
   NODE1 NODE2     SCORE
1      1     3 0.2857143
2      1     6 0.1428571
3      1     7 0.2857143
4      2     4 0.2857143
5      2     5 0.1428571
6      2     7 0.1428571
7      4     6 0.1428571
8      3     5 0.1428571
9      3     6 0.1428571
10     3     7 0.1428571
11     5     6 0.1428571