pos_tag
- hana_ml.text.ta.pos_tag(data, lang=None, thread_ratio=None, timeout=None)
Part of Speech (POS) tagging is a natural language processing technique that involves assigning specific grammatical categories or labels (such as nouns, verbs, adjectives, adverbs, pronouns, etc.) to individual words within a sentence. This process provides insights into the syntactic structure of the text, aiding in understanding word relationships, disambiguating word meanings, and facilitating various linguistic and computational analyses of textual data.
- dataDataFrame
The input data for text analysis, should be a DataFrame structured as follows:
1st column : ID of input text, of type INT, VARCHAR if NVARCHAR
2nd column : Text content, of type VARCHAR, NVARCHAR or NCLOB
3rd column (optional) : Specifies the language of the text content, can be 'en', 'de', 'fr', 'es', 'pt' or NULL (means automatically detected).
- lang{'en', 'de', 'fr', 'es', 'pt'}, optional
Specifies the language of the input texts in
data.Effective only when the language column in
datais not provided (i.e.datahas two columns).- thread_ratiofloat, optional
Specifies the ratio of threads that can be used by this function, with valid range from 0 to 1, where
0 means only using a single thread.
1 means using at most all the currently available threads.
Values outside valid range are ignored (no error thrown), and in such case the function heuristically determines the number of threads to use.
Defaults to 0.0.
- timeoutint, optional
Specifies the maximum amount of time (in seconds) the client will wait for a response from the server.
Defaults to 10.
- Returns:
- A tuple of DataFrames:
DataFrame 1 : The POS result table
DataFrame 2 : Sentences result table
DataFrame 3 : Extra result table
Examples
>>> pos, sentences, extra = pos_tag(data=df, thread_ratio=0.5, timeout=20)