PALEmbeddings
- class hana_ml.text.pal_embeddings.PALEmbeddings(model_version=None, max_token_num=None)
Embeds input documents into vectors.
- Parameters:
- model_version: {'SAP_NEB.20240715', 'SAP_GXY.20250407'}, optional
Model version to use. If None, defaults to 'SAP_NEB.20240715'.
Options:
'SAP_NEB.20240715'
'SAP_GXY.20250407'
Defaults to None (uses 'SAP_NEB.20240715' by default).
- Attributes:
- result_DataFrame
The embedding result.
- stat_DataFrame
The statistics.
Methods
fit_transform(data, key, target[, ...])Embed input documents into vectors.
Examples
Suppose you have a HANA DataFrame df with columns 'ID' and 'TEXT'. To embed the documents into vectors, create a PALEmbeddings instance and call fit_transform:
>>> from hana_ml.text.pal_embeddings import PALEmbeddings >>> embedder = PALEmbeddings(model_version='SAP_GXY.20250407') >>> result = embedder.fit_transform(data=df, key='ID', target='TEXT') >>> # The result is a DataFrame with the original data and embedding columns >>> print(result.collect())
You can also embed multiple text columns at once if you have more than one text column:
>>> embedder = PALEmbeddings(model_version='SAP_GXY.20250407') >>> result = embedder.fit_transform(data=df, key='ID', target=['TEXT1', 'TEXT2']) >>> print(result.collect())
- fit_transform(data, key, target, thread_number=None, batch_size=None, is_query=None, max_token_num=None)
Embed input documents into vectors.
- Parameters:
- data: DataFrame
Input data containing the documents to embed.
- key: str
Name of the key column.
- target: str or list of str
Name(s) of the text column(s) to embed.
- thread_number: int, optional
Number of HTTP connections to the backend embedding service (1-10).
Defaults to 6.
- batch_size: int, optional
Number of documents batched per request (1-50).
Defaults to 10.
- is_query: bool, optional
If True, use query embedding for Asymmetric Semantic Search.
Defaults to False.
- max_token_num: int, optional
Maximum number of tokens per document depends on the embedding model.
'SAP_NEB.20240715': 1024 (default is 256 if not set)
'SAP_GXY.20250407': 1024 (default is 512 if not set)
If
max_token_numis not set, the default value for the selected model version will be used. Defaults to None (uses the default value of the selected embedding model).
- Returns:
- DataFrame
DataFrame containing the original data and embedding columns.
Inherited Methods from PALBase
Besides those methods mentioned above, the PALEmbeddings class also inherits methods from PALBase class, please refer to PAL Base for more details.