hanaml.PageRank {hana.ml.r}R Documentation

Page Rank

Description

hanaml.PageRank is an R wrapper for PAL page rank algorithm.

Usage

hanaml.PageRank(conn.context,
                 data,
                 used.cols = NULL,
                 damping = NULL,
                 max.iter = NULL,
                 tol = NULL)

Arguments

conn.context

ConnectionContext
Connection to the SAP HANA system.

data

DataFrame
DataFrame containing links among all nodes in a social network.

used.cols

list of characters, optional
This parameter specifies the two columns for source and sink nodes of all those links in data. In the settings here, source node named "source" and sink node named "sink". Defaults to the 1st and 2nd column of data if not provided.

damping

double, optional
The damping factor for PageRank scores.
Defautls to 0.85.

max.iter

integer, optional
The maximum number of iterations of power method for solving the PageRank problem. The value 0 means no maximum number of iterations is set, and the calculation stops when the result converges.
Defaults to 0.

tol

double, optional
The stopping criterion for power method. When the mean improvement value of ranks is less than this value, the program stops calculation.
Defaults to 1e-6.

Details

PageRank is an algorithm used by a search engine to measure the importance of website pages. A website page is considered more important if it receives more links from other websites. PageRank represents the likelihood that a visitor will visit a particular page by randomly clicking of other webpages. Higher rank in PageRank means greater probability of the site being reached.

Value

Examples

## Not run: 
Social networks data that contain existing links between nodes:\cr

> df
  FROM_NODE TO_NODE
1     Node1   Node2
2     Node1   Node3
3     Node1   Node4
4     Node2   Node3
5     Node2   Node4
6     Node3   Node1
7     Node4   Node1
8     Node4   Node3

Creating a PageRank instance for calculating the ranking scores of all nodes in the network:\cr

> pr <- hanaml.PageRank(conn.context = conn,
                        data = df,
                        used.cols = c(source = "FROM_NODE",
                                      sink = "TO_NODE"),
                        damping = 0.85)

Computed ranking result:\cr

> pr$result
   NODE      RANK
1 Node1 0.3681516
2 Node2 0.1418082
3 Node3 0.2879621
4 Node4 0.2020780

## End(Not run)

[Package hana.ml.r version 1.0.8 Index]