Make Inference from a

Similar to other predict methods, this function predicts fitted values from a fitted "LatentDirichletAllocation" object.

# S3 method for LatentDirichletAllocation
transform(
  model,
  data,
  key,
  document = NULL,
  burn.in = NULL,
  iteration = NULL,
  thin = NULL,
  seed = NULL,
  gibbs.init = NULL,
  delimiters = NULL,
  output.word.assignment = NULL
)

Arguments

model	`R6Class object` A "LatentDirichletAllocation" object for prediction.
data	`DataFrame` DataFrame containting the data.
key	`character` Name of the ID column.
document	`character, optional` Names of the document columns. Defaults to the first non-ID column.
burn.in	`integer, optional` Number of omitted Gibbs iterations at the beginning. Defaults to 0.
iteration	`integer, optional` Number of Gibbs iterations. Defaults to 2000.
thin	`integer, optional` Number of omitted in-between Gibbs iterations. Defaults to 1.
seed	`integer, optional` Indicates the seed used to initialize the random number generator. 0: uses the system time. Not 0: uses the specified seed. Defaults to 0.
gibbs.init	`character, optional` Specifies initialization method for Gibbs sampling. This value takes precedence over the corresponding one in the general information table. 'uniform': Assigns each word in each document a topic by a uniform distribution. Each topic has the same probability to be assigned for each word. 'gibbs': Initialization by Gibbs sampling. Defaults to 'uniform'.
delimiters	`list of character, optional` Specifies the delimit to separate words in a document. For example, if the words are separated by , and :, then the delimit can be ,:. For example, if the words are separated by , or :, then the delimit should be ',' or ':'. Defaults to ''.
output.word.assignment	`logical, optional` Controls whether to output the word-topic assignment or not. Note that if this parameter is set to TRUE, the procedure would take more time to return to write the WORD_TOPIC_ASSIGNMENT table. Defaults to FALSE.

Value

Predicted values are returned as a list of DataFrames, structured as follows:

Document ID column: with same name and type as data's document ID column.
TOPIC_ID: type INTEGER, topic ID.
PROBABILITY: type DOUBLE, probability of topic given document.

Document ID column:with same name and type as data's document ID column.
WORD_ID:type INTEGER, word ID.
TOPIC_ID: type INTEGER, topic ID.

STAT_NAME: type NVARCHAR(256), statistic name.
STAT_VALUE: type NVARCHAR(1000), statistic value.

Examples

Perform the predict on DataFrame data1 using "LatentDirichletAllocation" object LDA:

> data1$Collect()
   DOCUMENT_ID                   TEXT
 1          10      toy toy spoon cpu

> result <- transform(LDA, pred.data, key = "DOCUMENT_ID",
                      document = "TEXT", burn.in = 2000,
                      iteration = 1000, thin = 100,
                      seed = 1, output.word.assignment = TRUE)

Output:

> result[[1]]$Collect()
    DOCUMENT_ID  TOPIC_ID               PROBABILITY
1            10         0       0.23913043478260873
2            10         1       0.4565217391304348
3            10         2       0.02173913043478261
4            10         3       0.02173913043478261
5            10         4       0.23913043478260873
6            10         5       0.02173913043478261

Make Inference from a "LatentDirichletAllocation" Object

Arguments

Value

Examples

See also