R: Make Inference from a "LatentDirichletAllocation" Object

transform.LatentDirichletAllocation {hana.ml.r}

R Documentation

Make Inference from a "LatentDirichletAllocation" Object

Description

Similar to other predict methods, this function predicts fitted values from a fitted "LatentDirichletAllocation" object.

Usage

## S3 method for class 'LatentDirichletAllocation'
transform(model, data, key,
  document = NULL, burn.in = NULL, iteration = NULL, thin = NULL,
  seed = NULL, gibbs.init = NULL, delimiters = NULL,
  output.word.assignment = NULL)

Arguments

`model`	`R6Class object` DBSCAN object for prediction.
`data`	`DataFrame` data for prediction.
`key`	`character` Name of the ID column.
`document`	`character, optional` Names of the document columns.
`burn.in`	`integer, optional` Number of omitted Gibbs iterations at the beginning. Defaults to 0.
`iteration`	`integer, optional` Number of Gibbs iterations. Defaults to 2000.
`thin`	`integer, optional` Number of omitted in-between Gibbs iterations. Defaults to 1.
`seed`	`integer, optional` Indicates the seed used to initialize the random number generator. -0: uses the system time. -Not 0: uses the specified seed. Defaults to 0.
`gibbs.init`	`character, optional` Specifies initialization method for Gibbs sampling: 'uniform': Assigns each word in each document a topic by a uniform distribution. Each topic has the same probability to be assigned for each word. 'gibbs': Initialization by Gibbs sampling. Assigns each word in each document a topic by one round of Gibbs sampling using the prior distribution of document-topic and topic-word given by parameters ALPHA and BETA. Defaults to 'uniform'.
`delimiters`	`list of character, optional` Specifies the delimit to separate words in a document. For example, if the words are separated by , and :, then the delimit can be ,:. For example, if the words are separated by , or :, then the delimit should be ',' or ':'. Defaults to [' '].
`output.word.assignment`	`logical, optional` Controls whether to output the word-topic assignment or not. Note that if this parameter is set to TRUE, the procedure would take more time to return to write the WORD_TOPIC_ASSIGNMENT table. Defaults to FALSE.

Value

Predicted values are returned as a list of DataFrame, structured as follows:

DataFrame 1:
Document-topic distribution table, structured as follows:

Document ID column: with same name and type as data's document ID column.
TOPIC_ID: type INTEGER, topic ID.
PROBABILITY: type DOUBLE, probability of topic given document.

DataFrame 2:
Word-topic assignment table, structured as follows:

Document ID column:with same name and type as data's document ID column.
WORD_ID:type INTEGER, word ID.
TOPIC_ID: type INTEGER, topic ID.

DataFrame 3:
Statistics table, structured as follows:

STAT_NAME: type NVARCHAR(256), statistic name.
STAT_VALUE: type NVARCHAR(1000), statistic value.

Examples

## Not run: 
Perform the predict on DataFrame data1 using "LatentDirichletAllocation" object LDA:
> data1$Collect()
  DOCUMENT_ID                   TEXT
 1          10     toy toy spoon cpu

> result <- transform(LDA, pred.data, key = "DOCUMENT_ID",
                        document = "TEXT", burn.in = 2000,
                        iteration = 1000, thin = 100,
                        seed = 1, output.word.assignment = TRUE)

> result[[1]]$Collect()
   DOCUMENT_ID  TOPIC_ID   PROBABILITY
1       10         0       0.23913043478260873
2       10         1       0.4565217391304348
3       10         2       0.02173913043478261
4       10         3       0.02173913043478261
5       10         4       0.23913043478260873
6       10         5       0.02173913043478261

## End(Not run)