ANNSModel

class hana_ml.text.anns_model.ANNSModel(state_id=None, by_doc=False)

ANNS model create with IVF indexing.

Parameters:
state_idstr, optional

The state id of the ANNS model.

Defaults to None.

by_docbool, optional

Wether to use document or vector as input.

Defaults to False.

Examples

Assume we have a hana dataframe df which has an 'ID' column and a 'TEXT' column and a query dataframe query_df, then we could invoke create an ANNSModel object:

>>> anns = ANNSModel(by_doc=True)

Then, invoke fit():

>>> anns.fit(data=df, key='ID', target='TEXT')

Then, invoke predict() to get the nearest neighbours:

>>> query_res = anns.predict(data=query_df, key='ID', target='QUERY',
                             is_query=True, k_nearest_neighbours=10)
>>> query_res.collect()
Attributes:
state_DataFrame

The state information when fit() has been invoked.

If the parameter 'by_doc' is True and fit() has been invoked:
embedding_result_DataFrame

The embedding result.

stat_DataFrame

The statistics after predict() has been invoked.

Methods

delete_model([state_id, connection_context, ...])

Deletes the model.

delete_models(state_ids[, ...])

Deletes the models.

fit(data, key, target[, thread_ratio, ...])

Fits the model.

predict(data, key, target[, thread_ratio, ...])

Predicts the model.

fit(data, key, target, thread_ratio=None, group_number=None, init_type=None, max_iteration=None, exit_threshold=None, comment=None, model_version=None)

Fits the model.

Parameters:
dataDataFrame

Input data.

keystr

Key column name.

targetstr

Vector/doc column name.

thread_ratioint, optional

The ratio of the number of threads to the number of logical processors.

Defaults to 1.0.

group_numberint, optional

Number of groups (k). The value range is from 1 to the number of training records. This function splitting the vectors into group_number clusters, and during search time, only K_CLUSTER clusters are searched. If gives 1 then ANNS will perform just like KNN.

Defaults to 1.

init_type{'first_k', 'replace', 'no_replace', 'patent'}, optional

Governs the selection of initial cluster centers:

  • 'first_k': First k observations.

  • 'replace': Random with replacement.

  • 'no_replace': Random without replacement.

  • 'patent': Patent of selecting the init center (US 6,882,998 B1).

Defaults to 'no_replace'.

max_iterationint, optional

Maximum iterations when doing IVF clustering.

Only valid when group_number is greater than 1.

Defaults to 100.

exit_thresholdfloat, optional

Threshold (actual value) for exiting the iterations when doing IVF clustering.

Only valid when group_number is greater than 1. Defaults to 1e-6.

commentstr, optional

Some extra comments for that model.

Defaults to None.

model_versionstr, optional

Indicate which embedding model version will be used.

Defaults to the latest embedding model.

predict(data, key, target, thread_ratio=None, k_cluster=None, k_nearest_neighbours=None, batch_size=None, is_query=None, state_id=None)

Predicts the model.

Parameters:
dataDataFrame

Input data.

keystr

Key column name.

targetstr

Vector/doc column name.

thread_ratioint, optional

The ratio of the number of threads to the number of logical processors.

Defaults to 1.0.

k_clusterint, optional

Number of groups to search (k). The value range is from 1 to the number of group_number used when model created.

Defaults to 1.

k_nearest_neighboursint, optional

The number of nearest neighbors (k).

Defaults to 1.

batch_sizeint, optional

The batch size. Only available when by_doc=True.

Defaults to 10.

is_query: bool, optional

Use query embedding or not. Only available when by_doc=True.

  • True: Use query embedding.

  • False: Use normal embedding.

Defaults to False.

state_idstr, optional

The state id of the ANNS model.

Defaults to None.

Returns:
DataFrames

The result.

delete_models(state_ids, connection_context=None, force_status=None)

Deletes the models.

Parameters:
state_idslist of str

The state IDs.

connection_contextConnectionContext, optional

The connection context.

Defaults to self.connection_context.

force_statusbool, optional

Throw the error message or force deletion, if the state id is invalid.

  • False : Does not delete the element and throw the error message.

  • True : Forcing the element to be deleted.

Defaults to False.

delete_model(state_id=None, connection_context=None, force_status=None)

Deletes the model.

Parameters:
state_idstr, optional

The state id of the ANNS model.

Defaults to the self.state_id.

connection_contextConnectionContext, optional

The connection context.

Defaults to the self.connection_context.

force_statusbool, optional

Throw the error message or force deletion, if the state id is invalid.

  • False : Does not delete the element and throw the error message.

  • True : Forcing the element to be deleted.

Defaults to False.

Returns:
DataFrames

The table containing the model information.

Inherited Methods from PALBase

Besides those methods mentioned above, the ANNSModel class also inherits methods from PALBase class, please refer to PAL Base for more details.