transform.LatentDirichletAllocation {hana.ml.r}R Documentation

Make Inference from a "LatentDirichletAllocation" Object

Description

Similar to other predict methods, this function predicts fitted values from a fitted "LatentDirichletAllocation" object.

Usage

## S3 method for class 'LatentDirichletAllocation'
transform(model, data, key,
  document = NULL, burn.in = NULL, iteration = NULL, thin = NULL,
  seed = NULL, gibbs.init = NULL, delimiters = NULL,
  output.word.assignment = NULL)

Arguments

model

R6Class object
DBSCAN object for prediction.

data

DataFrame
data for prediction.

key

character
Name of the ID column.

document

character, optional
Names of the document columns.

burn.in

integer, optional
Number of omitted Gibbs iterations at the beginning.
Defaults to 0.

iteration

integer, optional
Number of Gibbs iterations.
Defaults to 2000.

thin

integer, optional
Number of omitted in-between Gibbs iterations.
Defaults to 1.

seed

integer, optional

Indicates the seed used to initialize the random number generator.
-0: uses the system time.
-Not 0: uses the specified seed.
Defaults to 0.

gibbs.init

character, optional
Specifies initialization method for Gibbs sampling:
'uniform': Assigns each word in each document a topic by a uniform distribution. Each topic has the same probability to be assigned for each word.
'gibbs': Initialization by Gibbs sampling. Assigns each word in each document a topic by one round of Gibbs sampling using the prior distribution of document-topic and topic-word given by parameters ALPHA and BETA.
Defaults to 'uniform'.

delimiters

list of character, optional
Specifies the delimit to separate words in a document.
For example, if the words are separated by , and :, then the delimit can be ,:.
For example, if the words are separated by , or :, then the delimit should be ',' or ':'.
Defaults to [' '].

output.word.assignment

logical, optional
Controls whether to output the word-topic assignment or not. Note that if this parameter is set to TRUE, the procedure would take more time to return to write the WORD_TOPIC_ASSIGNMENT table.
Defaults to FALSE.

Value

Predicted values are returned as a list of DataFrame, structured as follows:

DataFrame 1:
Document-topic distribution table, structured as follows:

DataFrame 2:
Word-topic assignment table, structured as follows:

DataFrame 3:
Statistics table, structured as follows:

See Also

hanaml.LatentDirichletAllocation

Examples

## Not run: 
Perform the predict on DataFrame data1 using "LatentDirichletAllocation" object LDA:
> data1$Collect()
  DOCUMENT_ID                   TEXT
 1          10     toy toy spoon cpu

> result <- transform(LDA, pred.data, key = "DOCUMENT_ID",
                        document = "TEXT", burn.in = 2000,
                        iteration = 1000, thin = 100,
                        seed = 1, output.word.assignment = TRUE)

> result[[1]]$Collect()
   DOCUMENT_ID  TOPIC_ID   PROBABILITY
1       10         0       0.23913043478260873
2       10         1       0.4565217391304348
3       10         2       0.02173913043478261
4       10         3       0.02173913043478261
5       10         4       0.23913043478260873
6       10         5       0.02173913043478261

## End(Not run)

[Package hana.ml.r version 1.0.8 Index]