ANNSModel¶
- class hana_ml.text.anns_model.ANNSModel(state_id=None, by_doc=False)¶
ANNS model create with IVF indexing.
- Parameters
- state_idstr, optional
The state id of the ANNS model.
Defaults to None.
- by_docbool, optional
Wether to use document or vector as input.
Defaults to False.
- Attributes
- state_DataFrame
The state information when fit() has been invoked.
- If the parameter 'by_doc' is True and fit() has been invoked:
- embedding_result_DataFrame
The embedding result.
- stat_DataFrame
The statistics after predict() has been invoked.
Methods
delete_model([state_id, connection_context, ...])Deletes the model.
delete_models([state_ids, ...])Deletes the models.
fit(data, key, target[, thread_ratio, ...])Fits the model.
predict(data, key, target[, thread_ratio, ...])Predicts the model.
Examples
Assume we have a hana dataframe df which has an 'ID' column and a 'TEXT' column and a query dataframe query_df, then we could invoke create an ANNSModel object:
>>> anns = ANNSModel(by_doc=True)
Then, invoke fit():
>>> anns.fit(data=df, key='ID', target='TEXT')
Then, invoke predict() to get the nearest neighbours:
>>> query_res = anns.predict(data=query_df, key='ID', target='QUERY', is_query=True, k_nearest_neighbours=10) >>> query_res.collect()
- fit(data, key, target, thread_ratio=None, group_number=None, init_type=None, max_iteration=None, exit_threshold=None, comment=None, model_version=None)¶
Fits the model.
- Parameters
- dataDataFrame
Input data.
- keystr
Key column name.
- targetstr
Vector/doc column name.
- thread_ratioint, optional
The ratio of the number of threads to the number of logical processors.
Defaults to 1.0.
- group_numberint, optional
Number of groups (k). The value range is from 1 to the number of training records. This function splitting the vectors into
group_numberclusters, and during search time, only K_CLUSTER clusters are searched. If gives 1 then ANNS will perform just like KNN.Defaults to 1.
- init_type{'first_k', 'replace', 'no_replace', 'patent'}, optional
Governs the selection of initial cluster centers:
'first_k': First k observations.
'replace': Random with replacement.
'no_replace': Random without replacement.
'patent': Patent of selecting the init center (US 6,882,998 B1).
Defaults to 'no_replace'.
- max_iterationint, optional
Maximum iterations when doing IVF clustering.
Only valid when
group_numberis greater than 1.Defaults to 100.
- exit_thresholdfloat, optional
Threshold (actual value) for exiting the iterations when doing IVF clustering.
Only valid when
group_numberis greater than 1. Defaults to 1e-6.- commentstr, optional
Some extra comments for that model.
Defaults to None.
- model_versionstr, optional
Indicate which embedding model version will be used.
Defaults to the latest embedding model.
- predict(data, key, target, thread_ratio=None, k_cluster=None, k_nearest_neighbours=None, batch_size=None, is_query=None, state_id=None)¶
Predicts the model.
- Parameters
- dataDataFrame
Input data.
- keystr
Key column name.
- targetstr
Vector/doc column name.
- thread_ratioint, optional
The ratio of the number of threads to the number of logical processors.
Defaults to 1.0.
- k_clusterint, optional
Number of groups to search (k). The value range is from 1 to the number of
group_numberused when model created.Defaults to 1.
- k_nearest_neighboursint, optional
The number of nearest neighbors (k).
Defaults to 1.
- batch_sizeint, optional
The batch size. Only available when
by_doc=True.Defaults to 10.
- is_query: bool, optional
Use query embedding or not. Only available when
by_doc=True.True: Use query embedding.
False: Use normal embedding.
Defaults to False.
- state_idstr, optional
The state id of the ANNS model.
Defaults to None.
- Returns
- DataFrame
The result.
- delete_models(state_ids=None, connection_context=None, force_status=None)¶
Deletes the models.
- Parameters
- state_idslist of str, optional
The state IDs.
- connection_contextConnectionContext, optional
The connection context.
Defaults to self.connection_context.
- force_statusbool, optional
Throw the error message or force deletion, if the state id is invalid.
False : Does not delete the element and throw the error message.
True : Forcing the element to be deleted.
Defaults to False.
- delete_model(state_id=None, connection_context=None, force_status=None)¶
Deletes the model.
- Parameters
- state_idstr, optional
The state id of the ANNS model.
Defaults to the self.state_id.
- connection_contextConnectionContext, optional
The connection context.
Defaults to the self.connection_context.
- force_statusbool, optional
Throw the error message or force deletion, if the state id is invalid.
False : Does not delete the element and throw the error message.
True : Forcing the element to be deleted.
Defaults to False.
- Returns
- DataFrames
The table containing the model information.