ANNSModel
- class hana_ml.text.anns_model.ANNSModel(state_id=None, by_doc=False)
ANNS model create with IVF indexing.
- Parameters:
- state_idstr, optional
The state id of the ANNS model.
Defaults to None.
- by_docbool, optional
Wether to use document or vector as input.
Defaults to False.
Examples
Assume we have a hana dataframe df which has an 'ID' column and a 'TEXT' column and a query dataframe query_df, then we could invoke create an ANNSModel object:
>>> anns = ANNSModel(by_doc=True)
Then, invoke fit():
>>> anns.fit(data=df, key='ID', target='TEXT')
Then, invoke predict() to get the nearest neighbours:
>>> query_res = anns.predict(data=query_df, key='ID', target='QUERY', is_query=True, k_nearest_neighbours=10) >>> query_res.collect()
- Attributes:
- state_DataFrame
The state information when fit() has been invoked.
- If the parameter 'by_doc' is True and fit() has been invoked:
- embedding_result_DataFrame
The embedding result.
- stat_DataFrame
The statistics after predict() has been invoked.
Methods
delete_model
([state_id, connection_context, ...])Deletes the model.
delete_models
(state_ids[, ...])Deletes the models.
fit
(data, key, target[, thread_ratio, ...])Fits the model.
predict
(data, key, target[, thread_ratio, ...])Predicts the model.
- fit(data, key, target, thread_ratio=None, group_number=None, init_type=None, max_iteration=None, exit_threshold=None, comment=None, model_version=None)
Fits the model.
- Parameters:
- dataDataFrame
Input data.
- keystr
Key column name.
- targetstr
Vector/doc column name.
- thread_ratioint, optional
The ratio of the number of threads to the number of logical processors.
Defaults to 1.0.
- group_numberint, optional
Number of groups (k). The value range is from 1 to the number of training records. This function splitting the vectors into
group_number
clusters, and during search time, only K_CLUSTER clusters are searched. If gives 1 then ANNS will perform just like KNN.Defaults to 1.
- init_type{'first_k', 'replace', 'no_replace', 'patent'}, optional
Governs the selection of initial cluster centers:
'first_k': First k observations.
'replace': Random with replacement.
'no_replace': Random without replacement.
'patent': Patent of selecting the init center (US 6,882,998 B1).
Defaults to 'no_replace'.
- max_iterationint, optional
Maximum iterations when doing IVF clustering.
Only valid when
group_number
is greater than 1.Defaults to 100.
- exit_thresholdfloat, optional
Threshold (actual value) for exiting the iterations when doing IVF clustering.
Only valid when
group_number
is greater than 1. Defaults to 1e-6.- commentstr, optional
Some extra comments for that model.
Defaults to None.
- model_versionstr, optional
Indicate which embedding model version will be used.
Defaults to the latest embedding model.
- predict(data, key, target, thread_ratio=None, k_cluster=None, k_nearest_neighbours=None, batch_size=None, is_query=None, state_id=None)
Predicts the model.
- Parameters:
- dataDataFrame
Input data.
- keystr
Key column name.
- targetstr
Vector/doc column name.
- thread_ratioint, optional
The ratio of the number of threads to the number of logical processors.
Defaults to 1.0.
- k_clusterint, optional
Number of groups to search (k). The value range is from 1 to the number of
group_number
used when model created.Defaults to 1.
- k_nearest_neighboursint, optional
The number of nearest neighbors (k).
Defaults to 1.
- batch_sizeint, optional
The batch size. Only available when
by_doc=True
.Defaults to 10.
- is_query: bool, optional
Use query embedding or not. Only available when
by_doc=True
.True: Use query embedding.
False: Use normal embedding.
Defaults to False.
- state_idstr, optional
The state id of the ANNS model.
Defaults to None.
- Returns:
- DataFrames
The result.
- delete_models(state_ids, connection_context=None, force_status=None)
Deletes the models.
- Parameters:
- state_idslist of str
The state IDs.
- connection_contextConnectionContext, optional
The connection context.
Defaults to self.connection_context.
- force_statusbool, optional
Throw the error message or force deletion, if the state id is invalid.
False : Does not delete the element and throw the error message.
True : Forcing the element to be deleted.
Defaults to False.
- delete_model(state_id=None, connection_context=None, force_status=None)
Deletes the model.
- Parameters:
- state_idstr, optional
The state id of the ANNS model.
Defaults to the self.state_id.
- connection_contextConnectionContext, optional
The connection context.
Defaults to the self.connection_context.
- force_statusbool, optional
Throw the error message or force deletion, if the state id is invalid.
False : Does not delete the element and throw the error message.
True : Forcing the element to be deleted.
Defaults to False.
- Returns:
- DataFrames
The table containing the model information.
Inherited Methods from PALBase
Besides those methods mentioned above, the ANNSModel class also inherits methods from PALBase class, please refer to PAL Base for more details.