Class: Session

$.text.mining. Session

$.text.mining.Session represents a Text Mining session.

Caution
The information in this section applies to a feature that is included in the component SAP HANA Advanced Data Processing. To make use of this feature, you must have purchased a license to use SAP HANA Advanced Data Processing.

new Session(p) → {$.text.mining.Session}

The Session object represents a Text Mining session.
This constructor function creates a Text Mining session object linked to the given reference table and column. Text Mining functions can subsequently be invoked using this object and they will use this linked reference data and the configuration parameters it was initialized with. Multiple such objects can be created to handle multiple sets of reference data.
This function does not initialize Text Mining for the reference table and column. Initialization is done separately, typically when the full text index is created for the table and column.
Note that it is possible to use a custom XS SQL connection to access the Text Mining reference table as a different user (see "Creating Custom XS SQL Connections" in the SAP HANA Developer Guide).
Parameters:
Name Type Description
p object Encapsulates constructor parameters.
Properties
Name Type Argument Description
referenceTable string The table in which the reference documents are stored.
referenceColumn string The column in which the reference documents' text content is stored.
connection $.db.Connection <optional>
A database connection object that will be used to authenticate Text Mining database access. By default the credentials of the caller are used.
Returns:
A Text Mining session object that holds context for the session and is used to call the Text Mining method functions.
Type
$.text.mining.Session
Example
var TM = new $.text.mining.Session({
    referenceTable: "SYSTEM.TMDOCUMENTS",
    referenceColumn: "FILECONTENT"
});

Methods

categorizeKNN(p) → {Array.<$.text.mining.Session~CategoryResult>}

Given an input document, this function returns the top-ranked category values for the given category set columns in the reference data, using the KNN (K Nearest Neighbors) method.

!    Security note

For this and the subsequent text mining functions.

The following 4 parameters are SQL expressions. The user application needs to take responsibility for blocking any potentially malicious values from being used:

  • inputDocumentSubquery
  • inputDocumentCondition
  • documentRestriction
  • termTypeRestriction

Parameters:
Name Type Description
p object Encapsulates categorizeKNN parameters.
Properties
Name Type Argument Description
inputDocumentText|
inputDocumentSubquery|
inputDocumentCondition|
inputDocumentIDs
string Input document to process. One and only one of the following:
inputDocumentText This literal text
inputDocumentSubquery Text returned by this SQL subquery
inputDocumentCondition Text returned from reference table rows for which this SQL "where" clause is true
inputDocumentIDs Text returned from reference table rows with this (these) internal document ID number(s)
language string <optional>
Language code of input text, e.g. "EN", "DE" ("" for all).
mimeType string <optional>
Mime type of input text, e.g. "text/plain" ("" for unspecified).
categorySets Array.<string> Category set column names in the reference table that have been assigned category values.
kNN integer <optional>
The number of nearest neighbors to be considered.
top integer <optional>
Maximum number of returned results.
threshold number <optional>
Restricts the returned results to those displaying a score greater than or equal to this numeric value in the range [0,1] (0 to allow all).
documentRestriction string <optional>
Specified condition (SQL "where" clause) to be met for reference document rows to be considered in the computation ("" for all).
termTypeRestriction string <optional>
Comma-separated list of term types to consider ("" for all).
Throws:
Throws an error if the parameters object is not valid or the execution fails.
Returns:
Array of CategoryResult objects.
Type
Array.<$.text.mining.Session~CategoryResult>
Example
var categoryResults = TM.categorizeKNN({
    inputDocumentSubquery: "SELECT CONTENT FROM TWEETS WHERE ID = 132",
    categorySets: ["SUBJECT", "REGION"], top: 15
});

getRelatedDocuments(p) → {Array.<$.text.mining.Session~DocumentResult>}

Given an input document, this function returns the top-ranked related documents from the reference data, based on co-occurrence statistics of terms.
Parameters:
Name Type Description
p object Encapsulates getRelatedDocuments parameters.
Properties
Name Type Argument Description
inputDocumentText|
inputDocumentSubquery|
inputDocumentCondition|
inputDocumentIDs
string Input document to process. One and only one of the following:
inputDocumentText This literal text
inputDocumentSubquery Text returned by this SQL subquery
inputDocumentCondition Text returned from reference table rows for which this SQL "where" clause is true
inputDocumentIDs Text returned from reference table rows with this (these) internal document ID number(s)
language string <optional>
Language code of input text, e.g. "EN", "DE" ("" for all).
mimeType string <optional>
Mime type of input text, e.g. "text/plain" ("" for unspecified).
top integer <optional>
Maximum number of returned results.
threshold number <optional>
Restricts the returned results to those displaying a score greater than or equal to this numeric value in the range [0,1] (0 to allow all).
documentRestriction string <optional>
Specified condition (SQL "where" clause) to be met for reference document rows to be considered in the computation ("" for all).
termTypeRestriction string <optional>
Comma-separated list of term types to consider ("" for all).
includeColumns Array.<string> <optional>
Specifies columns from the reference table that are to be included in the result table. This provides a way to obtain the content or other data belonging to returned documents.
correlationMatrix boolean <optional>
If specified and true, the returned result includes a document correlation matrix.
principalComponents integer <optional>
If specified and non-zero, the returned result includes the specified number of principal components of the correlation matrix. Must be in the range from 0 to 3.
clustering string <optional>
If specified, the returned result includes hierarchical clusters computed with the method indicated. The possible values "COMPLETE_LINKAGE", "SINGLE_LINKAGE", "AVG_DISTANCE_WITHIN", "AVG_DISTANCE_BETWEEN", and "WARD" stand respectively for the methods Complete Linkage, Single Linkage, Average Distance Within, Average Distance Between, and Ward's Method.
Throws:
Throws an error if the parameters object is not valid or the execution fails.
Returns:
Array of DocumentResult objects.
Type
Array.<$.text.mining.Session~DocumentResult>
Example
var documentResults = TM.getRelatedDocuments ({
    top: 16,
    inputDocumentText: "animals",
    includeColumns: ["KEY", "FILECONTENT"],
});

getRelatedTerms(p) → {Array.<$.text.mining.Session~TermResult>}

Given an input term, this function returns the top-ranked related terms from the reference data, based on co-occurrence statistics.
Parameters:
Name Type Description
p object Encapsulates getRelatedTerms parameters.
Properties
Name Type Argument Description
inputTermText|
inputTermIDs
string Input term to process. One and only one of the following:
inputTermText This literal text. Typically a single term, but can be multiple terms with optional term types and wildcarding. See SAP HANA SQL and System Views Reference for details.
inputTermIDs Text associated with reference table columns with this (these) internal term ID number(s)
top integer <optional>
Maximum number of returned results.
threshold number <optional>
Restricts the returned results to those displaying a score greater than or equal to this numeric value in the range [0,1] (0 to allow all).
documentRestriction string <optional>
Specified condition (SQL "where" clause) to be met for reference document rows to be considered in the computation ("" for all).
termTypeRestriction string <optional>
Comma-separated list of term types to consider ("" for all).
correlationMatrix boolean <optional>
If specified and true, the returned result includes a term correlation matrix.
principalComponents integer <optional>
If specified and non-zero, the returned result includes the specified number of principal components of the correlation matrix. Must be in the range from 0 to 3.
clustering string <optional>
If specified, the returned result includes hierarchical clusters computed with the method indicated. The possible values "COMPLETE_LINKAGE", "SINGLE_LINKAGE", "AVG_DISTANCE_WITHIN", "AVG_DISTANCE_BETWEEN", and "WARD" stand respectively for the methods Complete Linkage, Single Linkage, Average Distance Within, Average Distance Between, and Ward's Method.
Throws:
Throws an error if the parameters object is not valid or the execution fails.
Returns:
Array of TermResult objects.
Type
Array.<$.text.mining.Session~TermResult>
Example
var termResults = TM.getRelatedTerms({
    top: 16,
    inputTermText: "animals",
});

getRelevantDocuments(p) → {Array.<$.text.mining.Session~DocumentResult>}

Given an input term, this function returns the top-ranked documents from the reference data that are deemed relevant to the term.
Parameters:
Name Type Description
p object Encapsulates getRelevantDocuments parameters.
Properties
Name Type Argument Description
inputTermText|
inputTermIDs
string Input term to process. One and only one of the following:
inputTermText This literal text. Typically a single term, but can be multiple terms with optional term types and wildcarding. See SAP HANA SQL and System Views Reference for details.
inputTermIDs Text associated with reference table columns with this (these) internal term ID number(s)
top integer <optional>
Maximum number of returned results.
threshold number <optional>
Restricts the returned results to those displaying a score greater than or equal to this numeric value in the range [0,1] (0 to allow all).
documentRestriction string <optional>
Specified condition (SQL "where" clause) to be met for reference document rows to be considered in the computation ("" for all).
termTypeRestriction string <optional>
Comma-separated list of term types to consider ("" for all).
includeColumns Array.<string> <optional>
Specifies columns from the reference table that are to be included in the result table. This provides a way to obtain the content or other data belonging to returned documents.
correlationMatrix boolean <optional>
If specified and true, the returned result includes a document correlation matrix.
principalComponents integer <optional>
If specified and non-zero, the returned result includes the specified number of principal components of the correlation matrix. Must be in the range from 0 to 3.
clustering string <optional>
If specified, the returned result includes hierarchical clusters computed with the method indicated. The possible values "COMPLETE_LINKAGE", "SINGLE_LINKAGE", "AVG_DISTANCE_WITHIN", "AVG_DISTANCE_BETWEEN", and "WARD" stand respectively for the methods Complete Linkage, Single Linkage, Average Distance Within, Average Distance Between, and Ward's Method.
Throws:
Throws an error if the parameters object is not valid or the execution fails.
Returns:
Array of DocumentResult objects.
Type
Array.<$.text.mining.Session~DocumentResult>
Example
var documentResults = TM.getRelevantDocuments ({
    top: 16,
    inputTermText: "animals",
    includeColumns: ["KEY", "FILECONTENT"],
});

getRelevantTerms(p) → {Array.<$.text.mining.Session~TermResult>}

Given an input document, this function returns the top-ranked keyphrases or relevant terms from the reference data, i.e., the terms that saliently describe the document.

Keyphrases are used to summarize, characterize and provide thematic access to data.

Parameters:
Name Type Description
p object Encapsulates getRelevantTerms parameters.
Properties
Name Type Argument Description
inputDocumentText|
inputDocumentSubquery|
inputDocumentCondition|
inputDocumentIDs
string Input document to process. One and only one of the following:
inputDocumentText This literal text
inputDocumentSubquery Text returned by this SQL subquery
inputDocumentCondition Text returned from reference table rows for which this SQL "where" clause is true
inputDocumentIDs Text returned from reference table rows with this (these) internal document ID number(s)
language string <optional>
Language code of input text, e.g. "EN", "DE" ("" for all).
mimeType string <optional>
Mime type of input text, e.g. "text/plain" ("" for unspecified).
top integer <optional>
Maximum number of returned results.
threshold number <optional>
Restricts the returned results to those displaying a score greater than or equal to this numeric value in the range [0,1] (0 to allow all).
documentRestriction string <optional>
Specified condition (SQL "where" clause) to be met for reference document rows to be considered in the computation ("" for all).
termTypeRestriction string <optional>
Comma-separated list of term types to consider ("" for all).
correlationMatrix boolean <optional>
If specified and true, the returned result includes a term correlation matrix.
principalComponents integer <optional>
If specified and non-zero, the returned result includes the specified number of principal components of the correlation matrix. Must be in the range from 0 to 3.
clustering string <optional>
If specified, the returned result includes hierarchical clusters computed with the method indicated. The possible values "COMPLETE_LINKAGE", "SINGLE_LINKAGE", "AVG_DISTANCE_WITHIN", "AVG_DISTANCE_BETWEEN", and "WARD" stand respectively for the methods Complete Linkage, Single Linkage, Average Distance Within, Average Distance Between, and Ward's Method.
Throws:
Throws an error if the parameters object is not valid or the execution fails.
Returns:
Array of TermResult objects.
Type
Array.<$.text.mining.Session~TermResult>
Example
var termResults = TM.getRelevantTerms ({
    top: 16,
    inputDocumentText: "animals",
});

getSuggestedTerms(p) → {Array.<$.text.mining.Session~TermResult>}

Given an input term initial substring, this function returns the top-ranked terms from the reference data that complete that initial substring.

Term suggestion is used to present a user with likely search terms as the user enters characters within a search application.

Parameters:
Name Type Description
p object Encapsulates getSuggestedTerms parameters.
Properties
Name Type Argument Description
inputTermText|
inputTermIDs
string Input term to process. One and only one of the following:
inputTermText This literal text
inputTermIDs Text associated with reference table columns with this (these) internal term ID number(s)
top integer <optional>
Maximum number of returned results.
threshold number <optional>
Restricts the returned results to those displaying a score greater than or equal to this numeric value in the range [0,1] (0 to allow all).
documentRestriction string <optional>
Specified condition (SQL "where" clause) to be met for reference document rows to be considered in the computation ("" for all).
termTypeRestriction string <optional>
Comma-separated list of term types to consider ("" for all).
Throws:
Throws an error if the parameters object is not valid or the execution fails.
Returns:
Array of TermResult objects.
Type
Array.<$.text.mining.Session~TermResult>
Example
var termResults = TM.getSuggestedTerms ({
    top: 16,
    inputTermText: "a",
});

initialize(p)

This function initializes (or re-initializes) Text Mining for the reference table and column linked to the TextMiningSession object. This creates the Term-Document matrix and other configuration context data that is needed for Text Mining functions. The Text Mining context is specific to the reference data, but it is persistent and global for all users. The configuration context specified at initialization time serves later as defaults for unspecified parameters when Text Mining functions are invoked on the given reference data.

!    Advanced function

This function is typically not used. Initialization of Text Mining is normally done separately when the full text index is created for a given reference table and column. Since the Text Mining context is persistent and global for all users, that is the best way to assure consistent results and avoid confusion.

This initialize() function provides a way to directly initialize Text Mining for development purposes or special customer applications. If this function is used carelessly, it can unexpectedly affect other running applications.

Parameters:
Name Type Description
p object Encapsulates initialize parameters.
Properties
Name Type Argument Description
configuration string <optional>
Repository path to the configuration. If omitted, the default configuration is used.
list of parameters... * <optional>
<repeatable>
Text Mining parameters and defaults to use for this reference table and column that override what is specified in the configuration. See the SAP HANA Text Mining Developer Guide for details.
Throws:
Throws an error if the parameters object is not valid or the execution fails.
Example
TM.initialize({
    configuration: "acme.textmining::defaults.textminingconfig",
    minTermFrequency : 3,
    maxTermFrequency : 100,
});

Type Definitions

CategoryResult

Represents a single category value result from Text Mining categorization.
Type:
  • object
Properties:
Name Type Description
categorySet string The name of the category set column in which this category value occurs.
category string The category value. One or more reference documents in the group of K nearest neighbors were assigned this category.
documentCount integer The number of reference documents in the group of K nearest neighbors that were assigned this category value.
score number The score of this category value in the range [0,1].

DocumentResult

Represents a single document result from certain Text Mining methods.
Type:
  • object
Properties:
Name Type Description
includeColumns... * The requested columns from the reference table to be included in the result table, as specified via the includeColumns input parameter. These are returned as separate columns with the same names and types as the original specified include columns.
id integer The document ID number used internally by Text Mining. This can be used in subsequent Text Mining method calls in the inputDocumentIDs parameter for faster performance.
termCountTotal integer The total number of terms in this document, including duplicates.
termCount integer The number of different terms in this document.
correlation1...correlationN number These appear if the document correlation matrix was requested. The columns of the document correlation matrix contain the correlation values for this document and each of the other returned documents. N is the number of returned documents.

The document correlation matrix is a square matrix where the rows and the columns each list all the returned documents in order. The matrix portrays every combination of the returned document pairs, with a duplicate reflection across the diagonal of the matrix. Each cell of the matrix contains the correlation value for the two documents at that row and column, based on the co-occurrence of their terms in the reference documents.

factor1...factorN
rotation1...rotationN
number These appear if principal components analysis was requested. The factor and rotation values from principal component analysis (dimensionality reduction) for this document. N is the number of principal components requested via the principalComponents input parameter.
clusteringLevel number This appears if clustering was requested. The clustering level for this document.
clusteringLeft
clusteringRight
integer These appear if clustering was requested. The clustering left value and right value for this document.
score number The score of this document in the range [0,1]

TermResult

Represents a single term result from certain Text Mining methods.
Type:
  • object
Properties:
Name Type Description
term string The term.
termNormalized string The normalized version of this term. Text Mining terms are normalized with respect to capitalization, whitespace, and accentuation.
termType string The type of this term, an entity type or part-of-speech.
id integer The term ID number used internally by Text Mining. This can be used in subsequent Text Mining method calls in the inputTermIDs parameter for faster performance.
frequencyTotal integer The total number of times this term occurs in the reference documents.
frequencyDocumentCount integer The number of reference documents in which this term occurs.
correlation1...correlationN number These appear if the term correlation matrix was requested (not available with the getSuggestedTerms method). The columns of the term correlation matrix contain the correlation values for this term and each of the other returned terms. N is the number of returned terms.

The term correlation matrix is a square matrix where the rows and the columns each list all the returned terms in order. The matrix portrays every combination of the returned term pairs, with a duplicate reflection across the diagonal of the matrix. Each cell of the matrix contains the correlation value for the two terms at that row and column, based on their co-occurrence in the reference documents.

factor1...factorN
rotation1...rotationN
number These appear if principal components analysis was requested (not available with the getSuggestedTerms method). The factor and rotation values from principal component analysis (dimensionality reduction) for this term. N is the number of principal components requested via the principalComponents input parameter.
clusteringLevel number This appears if clustering was requested (not available with the getSuggestedTerms method). The clustering level for this term.
clusteringLeft
clusteringRight
integer These appear if clustering was requested (not available with the getSuggestedTerms method). The clustering left value and right value for this term.
score number The score of this term in the range [0,1].