hanaml.PageRank.Rd
hanaml.PageRank is an R wrapper for SAP HANA PAL page rank algorithm.
hanaml.PageRank(
data,
used.cols = NULL,
damping = NULL,
max.iter = NULL,
tol = NULL
)
DataFrame
DataFrame containing links among all nodes in a social network.
list of characters, optional
This parameter specifies the two columns for source and
sink nodes of all those links in data
. In the settings here,
source node named "source" and sink node named "sink".
Defaults to the 1st and 2nd column of data
if not provided.
double, optional
The damping factor for PageRank scores.
Defautls to 0.85.
integer, optional
The maximum number of iterations of power method for solving
the PageRank problem.
The value 0 means no maximum number of iterations is set,
and the calculation stops when the result converges.
Defaults to 0.
double, optional
The stopping criterion for power method.
When the mean improvement value of ranks is less than this value,
the program stops calculation.
Defaults to 1e-6.
DataFrame
The data frame that contains the ranking scores of all nodes in a network.
PageRank is an algorithm used by a search engine to measure the importance of website pages. A website page is considered more important if it receives more links from other websites. PageRank represents the likelihood that a visitor will visit a particular page by randomly clicking of other webpages. Higher rank in PageRank means greater probability of the site being reached.
Social networks data that contain existing links between nodes:
> data$Collect()
FROM_NODE TO_NODE
1 Node1 Node2
2 Node1 Node3
3 Node1 Node4
4 Node2 Node3
5 Node2 Node4
6 Node3 Node1
7 Node4 Node1
8 Node4 Node3
Call the function for calculating the ranking scores of all nodes in the network:
> result <- hanaml.PageRank(data = data,
used.cols = c(source = "FROM_NODE", sink = "TO_NODE"),
damping = 0.85)
Output:
> result$Collect()
NODE RANK
1 Node1 0.3681516
2 Node2 0.1418082
3 Node3 0.2879621
4 Node4 0.2020780