transform.LatentDirichletAllocation.Rd
Similar to other predict methods, this function predicts fitted values from a fitted "LatentDirichletAllocation" object.
# S3 method for LatentDirichletAllocation
transform(
model,
data,
key,
document = NULL,
burn.in = NULL,
iteration = NULL,
thin = NULL,
seed = NULL,
gibbs.init = NULL,
delimiters = NULL,
output.word.assignment = NULL
)
R6Class object
A "LatentDirichletAllocation" object for prediction.
DataFrame
DataFrame containting the data.
character
Name of the ID column.
character, optional
Names of the document columns.
Defaults to the first non-ID column.
integer, optional
Number of omitted Gibbs iterations at the beginning.
Defaults to 0.
integer, optional
Number of Gibbs iterations.
Defaults to 2000.
integer, optional
Number of omitted in-between Gibbs iterations.
Defaults to 1.
integer, optional
Indicates the seed used to initialize the random number generator.
0: uses the system time.
Not 0: uses the specified seed.
Defaults to 0.
character, optional
Specifies initialization method for Gibbs sampling.
This value takes precedence over the corresponding one in the general information table.
'uniform': Assigns each word in each document a topic by a uniform distribution. Each topic has the same probability to be assigned for each word.
'gibbs': Initialization by Gibbs sampling.
Defaults to 'uniform'.
list of characters, optional
Specifies the delimit to separate words in a document.
For example, if the words are separated by , and :, then the delimit can be
,:.
For example, if the words are separated by , or :, then the delimit should
be ',' or ':'.
Defaults to ''.
logical, optional
Controls whether to output the word-topic assignment or not.
Note that if this parameter is set to TRUE, the procedure would take
more time to return to write the WORD_TOPIC_ASSIGNMENT table.
Defaults to FALSE.
Predicted values are returned as a list of DataFrames, structured as follows:
Document ID column
: with same name and type as data's
document ID column.
TOPIC_ID
: type INTEGER, topic ID.
PROBABILITY
: type DOUBLE, probability of topic given document.
Document ID column
:with same name and type as data's
document ID column.
WORD_ID
:type INTEGER, word ID.
TOPIC_ID
: type INTEGER, topic ID.
STAT_NAME
: type NVARCHAR(256), statistic name.
STAT_VALUE
: type NVARCHAR(1000), statistic value.
Perform the predict on DataFrame data1 using "LatentDirichletAllocation" object LDA:
> data1$Collect()
DOCUMENT_ID TEXT
1 10 toy toy spoon cpu
> result <- transform(LDA, pred.data, key = "DOCUMENT_ID",
document = "TEXT", burn.in = 2000,
iteration = 1000, thin = 100,
seed = 1, output.word.assignment = TRUE)
Output:
> result[[1]]$Collect()
DOCUMENT_ID TOPIC_ID PROBABILITY
1 10 0 0.23913043478260873
2 10 1 0.4565217391304348
3 10 2 0.02173913043478261
4 10 3 0.02173913043478261
5 10 4 0.23913043478260873
6 10 5 0.02173913043478261