PageRank
- class hana_ml.algorithms.pal.pagerank.PageRank(damping=None, max_iter=None, tol=None, thread_ratio=None)
A page rank model.
- Parameters
- dampingfloat, optional
The damping factor d.
Defaults to 0.85.
- max_iterint, optional
The maximum number of iterations of power method.
The value 0 means no maximum number of iterations is set and the calculation stops when the result converges.
Defaults to 0.
- tolfloat, optional
Specifies the stop condition.
When the mean improvement value of ranks is less than this value, the program stops calculation.
Defaults to 1e-6.
- thread_ratiofloat, optional
Specifies the ratio of total number of threads that can be used by this function.
The value range is from 0 to 1, where 0 means only using 1 thread, and 1 means using at most all the currently available threads.
Values outside the range will be ignored and this function heuristically determines the number of threads to use.
Defaults to 0.
Examples
Input dataframe df for training:
>>> df.collect() FROM_NODE TO_NODE 0 Node1 Node2 1 Node1 Node3 2 Node1 Node4 3 Node2 Node3 4 Node2 Node4 5 Node3 Node1 6 Node4 Node1 7 Node4 Node3
Create a PageRank instance:
>>> pr = PageRank()
Call run() on given data sequence:
>>> result = pr.run(data=df) >>> result.collect() NODE RANK 0 NODE1 0.368152 1 NODE2 0.141808 2 NODE3 0.287962 3 NODE4 0.202078
- Attributes
- None
Methods
run
(data)This method reads link information and calculates rank for each node.
- run(data)
This method reads link information and calculates rank for each node.
- Parameters
- dataDataFrame
Data for predicting the class labels.
- Returns
- DataFrame
Calculated rank values and corresponding node names, structured as follows:
NODE: node names.
RANK: the PageRank of the corresponding node.
- property fit_hdbprocedure
Returns the generated hdbprocedure for fit.
- property predict_hdbprocedure
Returns the generated hdbprocedure for predict.