new Session(p) → {$.text.mining.Session}
This constructor function creates a Text Mining session object linked to the given reference table and column. Text Mining functions can subsequently be invoked using this object and they will use this linked reference data and the configuration parameters it was initialized with. Multiple such objects can be created to handle multiple sets of reference data.
This function does not initialize Text Mining for the reference table and column. Initialization is done separately, typically when the full text index is created for the table and column.
Note that it is possible to use a custom XS SQL connection to access the Text Mining reference table as a different user (see "Creating Custom XS SQL Connections" in the SAP HANA Developer Guide).
Parameters:
Name | Type | Description | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
p |
object | Encapsulates constructor parameters.
Properties
|
Returns:
Example
var TM = new $.text.mining.Session({
referenceTable: "SYSTEM.TMDOCUMENTS",
referenceColumn: "FILECONTENT"
});
Methods
-
categorizeKNN(p) → {Array.<$.text.mining.Session~CategoryResult>}
-
Given an input document, this function returns the top-ranked category values for the given category set columns in the reference data, using the KNN (K Nearest Neighbors) method.
! Security note
For this and the subsequent text mining functions.
The following 4 parameters are SQL expressions. The user application needs to take responsibility for blocking any potentially malicious values from being used:
- inputDocumentSubquery
- inputDocumentCondition
- documentRestriction
- termTypeRestriction
Parameters:
Name Type Description p
object Encapsulates categorizeKNN parameters. Properties
Name Type Argument Description inputDocumentText|
inputDocumentSubquery|
inputDocumentCondition|
inputDocumentIDsstring Input document to process. One and only one of the following: inputDocumentText This literal text inputDocumentSubquery Text returned by this SQL subquery inputDocumentCondition Text returned from reference table rows for which this SQL "where" clause is true inputDocumentIDs Text returned from reference table rows with this (these) internal document ID number(s) language
string <optional>
Language code of input text, e.g. "EN", "DE" ("" for all). mimeType
string <optional>
Mime type of input text, e.g. "text/plain" ("" for unspecified). categorySets
Array.<string> Category set column names in the reference table that have been assigned category values. kNN
integer <optional>
The number of nearest neighbors to be considered. top
integer <optional>
Maximum number of returned results. threshold
number <optional>
Restricts the returned results to those displaying a score greater than or equal to this numeric value in the range [0,1] (0 to allow all). documentRestriction
string <optional>
Specified condition (SQL "where" clause) to be met for reference document rows to be considered in the computation ("" for all). termTypeRestriction
string <optional>
Comma-separated list of term types to consider ("" for all). Throws:
Throws an error if the parameters object is not valid or the execution fails.Returns:
Array of CategoryResult objects.- Type
- Array.<$.text.mining.Session~CategoryResult>
Example
var categoryResults = TM.categorizeKNN({ inputDocumentSubquery: "SELECT CONTENT FROM TWEETS WHERE ID = 132", categorySets: ["SUBJECT", "REGION"], top: 15 });
-
getRelatedDocuments(p) → {Array.<$.text.mining.Session~DocumentResult>}
-
Given an input document, this function returns the top-ranked related documents from the reference data, based on co-occurrence statistics of terms.
Parameters:
Name Type Description p
object Encapsulates getRelatedDocuments parameters. Properties
Name Type Argument Description inputDocumentText|
inputDocumentSubquery|
inputDocumentCondition|
inputDocumentIDsstring Input document to process. One and only one of the following: inputDocumentText This literal text inputDocumentSubquery Text returned by this SQL subquery inputDocumentCondition Text returned from reference table rows for which this SQL "where" clause is true inputDocumentIDs Text returned from reference table rows with this (these) internal document ID number(s) language
string <optional>
Language code of input text, e.g. "EN", "DE" ("" for all). mimeType
string <optional>
Mime type of input text, e.g. "text/plain" ("" for unspecified). top
integer <optional>
Maximum number of returned results. threshold
number <optional>
Restricts the returned results to those displaying a score greater than or equal to this numeric value in the range [0,1] (0 to allow all). documentRestriction
string <optional>
Specified condition (SQL "where" clause) to be met for reference document rows to be considered in the computation ("" for all). termTypeRestriction
string <optional>
Comma-separated list of term types to consider ("" for all). includeColumns
Array.<string> <optional>
Specifies columns from the reference table that are to be included in the result table. This provides a way to obtain the content or other data belonging to returned documents. correlationMatrix
boolean <optional>
If specified and true, the returned result includes a document correlation matrix. principalComponents
integer <optional>
If specified and non-zero, the returned result includes the specified number of principal components of the correlation matrix. Must be in the range from 0 to 3. clustering
string <optional>
If specified, the returned result includes hierarchical clusters computed with the method indicated. The possible values "COMPLETE_LINKAGE", "SINGLE_LINKAGE", "AVG_DISTANCE_WITHIN", "AVG_DISTANCE_BETWEEN", and "WARD" stand respectively for the methods Complete Linkage, Single Linkage, Average Distance Within, Average Distance Between, and Ward's Method. Throws:
Throws an error if the parameters object is not valid or the execution fails.Returns:
Array of DocumentResult objects.- Type
- Array.<$.text.mining.Session~DocumentResult>
Example
var documentResults = TM.getRelatedDocuments ({ top: 16, inputDocumentText: "animals", includeColumns: ["KEY", "FILECONTENT"], });
-
getRelatedTerms(p) → {Array.<$.text.mining.Session~TermResult>}
-
Given an input term, this function returns the top-ranked related terms from the reference data, based on co-occurrence statistics.
Parameters:
Name Type Description p
object Encapsulates getRelatedTerms parameters. Properties
Name Type Argument Description inputTermText|
inputTermIDsstring Input term to process. One and only one of the following: inputTermText This literal text. Typically a single term, but can be multiple terms with optional term types and wildcarding. See SAP HANA SQL and System Views Reference for details. inputTermIDs Text associated with reference table columns with this (these) internal term ID number(s) top
integer <optional>
Maximum number of returned results. threshold
number <optional>
Restricts the returned results to those displaying a score greater than or equal to this numeric value in the range [0,1] (0 to allow all). documentRestriction
string <optional>
Specified condition (SQL "where" clause) to be met for reference document rows to be considered in the computation ("" for all). termTypeRestriction
string <optional>
Comma-separated list of term types to consider ("" for all). correlationMatrix
boolean <optional>
If specified and true, the returned result includes a term correlation matrix. principalComponents
integer <optional>
If specified and non-zero, the returned result includes the specified number of principal components of the correlation matrix. Must be in the range from 0 to 3. clustering
string <optional>
If specified, the returned result includes hierarchical clusters computed with the method indicated. The possible values "COMPLETE_LINKAGE", "SINGLE_LINKAGE", "AVG_DISTANCE_WITHIN", "AVG_DISTANCE_BETWEEN", and "WARD" stand respectively for the methods Complete Linkage, Single Linkage, Average Distance Within, Average Distance Between, and Ward's Method. Throws:
Throws an error if the parameters object is not valid or the execution fails.Returns:
Array of TermResult objects.- Type
- Array.<$.text.mining.Session~TermResult>
Example
var termResults = TM.getRelatedTerms({ top: 16, inputTermText: "animals", });
-
getRelevantDocuments(p) → {Array.<$.text.mining.Session~DocumentResult>}
-
Given an input term, this function returns the top-ranked documents from the reference data that are deemed relevant to the term.
Parameters:
Name Type Description p
object Encapsulates getRelevantDocuments parameters. Properties
Name Type Argument Description inputTermText|
inputTermIDsstring Input term to process. One and only one of the following: inputTermText This literal text. Typically a single term, but can be multiple terms with optional term types and wildcarding. See SAP HANA SQL and System Views Reference for details. inputTermIDs Text associated with reference table columns with this (these) internal term ID number(s) top
integer <optional>
Maximum number of returned results. threshold
number <optional>
Restricts the returned results to those displaying a score greater than or equal to this numeric value in the range [0,1] (0 to allow all). documentRestriction
string <optional>
Specified condition (SQL "where" clause) to be met for reference document rows to be considered in the computation ("" for all). termTypeRestriction
string <optional>
Comma-separated list of term types to consider ("" for all). includeColumns
Array.<string> <optional>
Specifies columns from the reference table that are to be included in the result table. This provides a way to obtain the content or other data belonging to returned documents. correlationMatrix
boolean <optional>
If specified and true, the returned result includes a document correlation matrix. principalComponents
integer <optional>
If specified and non-zero, the returned result includes the specified number of principal components of the correlation matrix. Must be in the range from 0 to 3. clustering
string <optional>
If specified, the returned result includes hierarchical clusters computed with the method indicated. The possible values "COMPLETE_LINKAGE", "SINGLE_LINKAGE", "AVG_DISTANCE_WITHIN", "AVG_DISTANCE_BETWEEN", and "WARD" stand respectively for the methods Complete Linkage, Single Linkage, Average Distance Within, Average Distance Between, and Ward's Method. Throws:
Throws an error if the parameters object is not valid or the execution fails.Returns:
Array of DocumentResult objects.- Type
- Array.<$.text.mining.Session~DocumentResult>
Example
var documentResults = TM.getRelevantDocuments ({ top: 16, inputTermText: "animals", includeColumns: ["KEY", "FILECONTENT"], });
-
getRelevantTerms(p) → {Array.<$.text.mining.Session~TermResult>}
-
Given an input document, this function returns the top-ranked keyphrases or relevant terms from the reference data, i.e., the terms that saliently describe the document.
Keyphrases are used to summarize, characterize and provide thematic access to data.
Parameters:
Name Type Description p
object Encapsulates getRelevantTerms parameters. Properties
Name Type Argument Description inputDocumentText|
inputDocumentSubquery|
inputDocumentCondition|
inputDocumentIDsstring Input document to process. One and only one of the following: inputDocumentText This literal text inputDocumentSubquery Text returned by this SQL subquery inputDocumentCondition Text returned from reference table rows for which this SQL "where" clause is true inputDocumentIDs Text returned from reference table rows with this (these) internal document ID number(s) language
string <optional>
Language code of input text, e.g. "EN", "DE" ("" for all). mimeType
string <optional>
Mime type of input text, e.g. "text/plain" ("" for unspecified). top
integer <optional>
Maximum number of returned results. threshold
number <optional>
Restricts the returned results to those displaying a score greater than or equal to this numeric value in the range [0,1] (0 to allow all). documentRestriction
string <optional>
Specified condition (SQL "where" clause) to be met for reference document rows to be considered in the computation ("" for all). termTypeRestriction
string <optional>
Comma-separated list of term types to consider ("" for all). correlationMatrix
boolean <optional>
If specified and true, the returned result includes a term correlation matrix. principalComponents
integer <optional>
If specified and non-zero, the returned result includes the specified number of principal components of the correlation matrix. Must be in the range from 0 to 3. clustering
string <optional>
If specified, the returned result includes hierarchical clusters computed with the method indicated. The possible values "COMPLETE_LINKAGE", "SINGLE_LINKAGE", "AVG_DISTANCE_WITHIN", "AVG_DISTANCE_BETWEEN", and "WARD" stand respectively for the methods Complete Linkage, Single Linkage, Average Distance Within, Average Distance Between, and Ward's Method. Throws:
Throws an error if the parameters object is not valid or the execution fails.Returns:
Array of TermResult objects.- Type
- Array.<$.text.mining.Session~TermResult>
Example
var termResults = TM.getRelevantTerms ({ top: 16, inputDocumentText: "animals", });
-
getSuggestedTerms(p) → {Array.<$.text.mining.Session~TermResult>}
-
Given an input term initial substring, this function returns the top-ranked terms from the reference data that complete that initial substring.
Term suggestion is used to present a user with likely search terms as the user enters characters within a search application.
Parameters:
Name Type Description p
object Encapsulates getSuggestedTerms parameters. Properties
Name Type Argument Description inputTermText|
inputTermIDsstring Input term to process. One and only one of the following: inputTermText This literal text inputTermIDs Text associated with reference table columns with this (these) internal term ID number(s) top
integer <optional>
Maximum number of returned results. threshold
number <optional>
Restricts the returned results to those displaying a score greater than or equal to this numeric value in the range [0,1] (0 to allow all). documentRestriction
string <optional>
Specified condition (SQL "where" clause) to be met for reference document rows to be considered in the computation ("" for all). termTypeRestriction
string <optional>
Comma-separated list of term types to consider ("" for all). Throws:
Throws an error if the parameters object is not valid or the execution fails.Returns:
Array of TermResult objects.- Type
- Array.<$.text.mining.Session~TermResult>
Example
var termResults = TM.getSuggestedTerms ({ top: 16, inputTermText: "a", });
-
initialize(p)
-
This function initializes (or re-initializes) Text Mining for the reference table and column linked to the TextMiningSession object. This creates the Term-Document matrix and other configuration context data that is needed for Text Mining functions. The Text Mining context is specific to the reference data, but it is persistent and global for all users. The configuration context specified at initialization time serves later as defaults for unspecified parameters when Text Mining functions are invoked on the given reference data.
! Advanced function
This function is typically not used. Initialization of Text Mining is normally done separately when the full text index is created for a given reference table and column. Since the Text Mining context is persistent and global for all users, that is the best way to assure consistent results and avoid confusion.
This initialize() function provides a way to directly initialize Text Mining for development purposes or special customer applications. If this function is used carelessly, it can unexpectedly affect other running applications.
Parameters:
Name Type Description p
object Encapsulates initialize parameters. Properties
Name Type Argument Description configuration
string <optional>
Repository path to the configuration. If omitted, the default configuration is used. list of parameters...
* <optional>
<repeatable>
Text Mining parameters and defaults to use for this reference table and column that override what is specified in the configuration. See the SAP HANA Text Mining Developer Guide for details. Throws:
Throws an error if the parameters object is not valid or the execution fails.Example
TM.initialize({ configuration: "acme.textmining::defaults.textminingconfig", minTermFrequency : 3, maxTermFrequency : 100, });
Type Definitions
-
CategoryResult
-
Represents a single category value result from Text Mining categorization.
Type:
- object
Properties:
Name Type Description categorySet
string The name of the category set column in which this category value occurs. category
string The category value. One or more reference documents in the group of K nearest neighbors were assigned this category. documentCount
integer The number of reference documents in the group of K nearest neighbors that were assigned this category value. score
number The score of this category value in the range [0,1]. -
DocumentResult
-
Represents a single document result from certain Text Mining methods.
Type:
- object
Properties:
Name Type Description includeColumns...
* The requested columns from the reference table to be included in the result table, as specified via the includeColumns input parameter. These are returned as separate columns with the same names and types as the original specified include columns. id
integer The document ID number used internally by Text Mining. This can be used in subsequent Text Mining method calls in the inputDocumentIDs parameter for faster performance. termCountTotal
integer The total number of terms in this document, including duplicates. termCount
integer The number of different terms in this document. correlation1...correlationN
number These appear if the document correlation matrix was requested. The columns of the document correlation matrix contain the correlation values for this document and each of the other returned documents. N is the number of returned documents. The document correlation matrix is a square matrix where the rows and the columns each list all the returned documents in order. The matrix portrays every combination of the returned document pairs, with a duplicate reflection across the diagonal of the matrix. Each cell of the matrix contains the correlation value for the two documents at that row and column, based on the co-occurrence of their terms in the reference documents.
factor1...factorN
rotation1...rotationNnumber These appear if principal components analysis was requested. The factor and rotation values from principal component analysis (dimensionality reduction) for this document. N is the number of principal components requested via the principalComponents input parameter. clusteringLevel
number This appears if clustering was requested. The clustering level for this document. clusteringLeft
clusteringRightinteger These appear if clustering was requested. The clustering left value and right value for this document. score
number The score of this document in the range [0,1] -
TermResult
-
Represents a single term result from certain Text Mining methods.
Type:
- object
Properties:
Name Type Description term
string The term. termNormalized
string The normalized version of this term. Text Mining terms are normalized with respect to capitalization, whitespace, and accentuation. termType
string The type of this term, an entity type or part-of-speech. id
integer The term ID number used internally by Text Mining. This can be used in subsequent Text Mining method calls in the inputTermIDs parameter for faster performance. frequencyTotal
integer The total number of times this term occurs in the reference documents. frequencyDocumentCount
integer The number of reference documents in which this term occurs. correlation1...correlationN
number These appear if the term correlation matrix was requested (not available with the getSuggestedTerms method). The columns of the term correlation matrix contain the correlation values for this term and each of the other returned terms. N is the number of returned terms. The term correlation matrix is a square matrix where the rows and the columns each list all the returned terms in order. The matrix portrays every combination of the returned term pairs, with a duplicate reflection across the diagonal of the matrix. Each cell of the matrix contains the correlation value for the two terms at that row and column, based on their co-occurrence in the reference documents.
factor1...factorN
rotation1...rotationNnumber These appear if principal components analysis was requested (not available with the getSuggestedTerms method). The factor and rotation values from principal component analysis (dimensionality reduction) for this term. N is the number of principal components requested via the principalComponents input parameter. clusteringLevel
number This appears if clustering was requested (not available with the getSuggestedTerms method). The clustering level for this term. clusteringLeft
clusteringRightinteger These appear if clustering was requested (not available with the getSuggestedTerms method). The clustering left value and right value for this term. score
number The score of this term in the range [0,1].