gen_ai_hub.document_grounding package

Document Grounding package for SAP Generative AI Hub.

This package provides APIs for document grounding capabilities including: - Pipeline management for document vectorization from various data sources - Vector store operations for semantic search - Retrieval operations for querying document repositories

The package includes three main API clients: - PipelineAPIClient: Manages document vectorization pipelines - VectorAPIClient: Manages vector collections and semantic search - RetrievalAPIClient: Performs retrieval operations across data repositories

class BaseDocument

Bases: BaseModel

chunks: List[TextOnlyBaseChunk]
metadata: List[VectorKeyValueListPair]
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class BasePipelineResponse

Bases: BaseModel

id: str
metadata: MetaData | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

type: str
class Collection

Bases: BaseModel

embeddingConfig: EmbeddingConfig
id: str
metadata: List[VectorKeyValueListPair] | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

title: str | None
class CollectionCreateRequest

Bases: BaseModel

embeddingConfig: EmbeddingConfig
metadata: List[VectorKeyValueListPair] | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

title: str | None
class CollectionCreatedResponse

Bases: BaseModel

collectionURL: str
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

status: Literal['CREATED']
class CollectionDeletedResponse

Bases: BaseModel

collectionURL: str
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

status: Literal['DELETED']
class CollectionPendingResponse

Bases: BaseModel

Location: str
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

status: Literal['PENDING']
class CollectionsListResponse

Bases: BaseModel

count: int | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

resources: List[Collection]
class CommonConfiguration

Bases: BaseModel

destination: str
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class DataRepositories

Bases: BaseModel

count: int | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

resources: List[DataRepository]
class DataRepository

Bases: BaseModel

id: str
metadata: List[RetrievalKeyValueListPair] | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

title: str
type: Literal['vector', 'help.sap.com'] | str
class DataRepositoryMetadataItem

Bases: BaseModel

key: str
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

value: List[str]
class DataRepositoryWithDocuments

Bases: BaseModel

documents: List[RetrievalDocument]
id: str
metadata: List[RetrievalKeyValueListPair] | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

title: str
class Document

Bases: BaseModel

absoluteUrl: str | None
createdTimestamp: datetime | None
downloadLocation: str | None
id: str
lastUpdatedTimestamp: datetime | None
metadataId: str | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

status: DocumentStatus | None
title: str | None
viewLocation: str | None
class DocumentOutput

Bases: BaseModel

chunks: List[VectorChunk]
id: str
metadata: List[VectorKeyValueListPair] | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class DocumentStatus

Bases: str, Enum

__new__(value)
DEINDEXED = 'DEINDEXED'
FAILED = 'FAILED'
FAILED_TO_BE_RETRIED = 'FAILED_TO_BE_RETRIED'
INDEXED = 'INDEXED'
REINDEXED = 'REINDEXED'
TO_BE_PROCESSED = 'TO_BE_PROCESSED'
TO_BE_SCHEDULED = 'TO_BE_SCHEDULED'
class DocumentWithoutChunks

Bases: BaseModel

id: str
metadata: List[VectorKeyValueListPair]
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class DocumentsChunk

Bases: BaseModel

documents: List[DocumentOutput]
id: str
metadata: List[VectorKeyValueListPair] | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

title: str
class DocumentsCreateRequest

Bases: BaseModel

documents: List[BaseDocument]
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class DocumentsListResponse

Bases: BaseModel

documents: List[DocumentWithoutChunks]
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class DocumentsResponse

Bases: BaseModel

count: int | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

resources: List[DocumentWithoutChunks]
class DocumentsStatusResponse

Bases: BaseModel

count: int | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

resources: List[Document]
class DocumentsUpdateRequest

Bases: BaseModel

documents: List[Document]
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class EmbeddingConfig

Bases: BaseModel

modelName: str | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class GetPipelineExecutionsResponse

Bases: BaseModel

count: int | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

resources: List[PipelineExecution]
class GetPipelineStatusResponse

Bases: BaseModel

lastStarted: str | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

status: str | None
class GetPipelinesResponse

Bases: BaseModel

count: int | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

resources: List[Annotated[MSSharePointPipelineGetResponse | S3PipelineGetResponse | SFTPPipelineGetResponse, FieldInfo(annotation=NoneType, required=True, discriminator='type')]]
class MSSharePointConfiguration

Bases: BaseModel

destination: str
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

sharePoint: SharePointConfig
class MSSharePointConfigurationGetResponse

Bases: BaseModel

destination: str
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

sharePoint: SharePointConfig
class MSSharePointPipelineCreateRequest

Bases: BaseModel

configuration: MSSharePointConfiguration
metadata: MetaData | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

type: Literal['MSSharePoint']
class MSSharePointPipelineGetResponse

Bases: BasePipelineResponse

configuration: MSSharePointConfigurationGetResponse
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

type: Literal['MSSharePoint']
class ManualPipelineTrigger

Bases: BaseModel

metadataOnly: bool | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

pipelineId: str
class MetaData

Bases: BaseModel

destination: str
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class PipelineAPIClient

Bases: object

The Pipelines API creates and manages vector stores based on documents from user data repositories: S3, SFTP, and Microsoft SharePoint. Each pipeline represents a configured end-to-end process including the following steps:

  • Fetches documents from a supported data source

  • Preprocesses and chunks the document content, and generates semantic embeddings. Semantic embeddings are multidimensional representations of textual information.

  • Stores semantic embeddings into the HANA Vector Store

The Pipeline API is compatible with the following data repositories:

  • Microsoft SharePoint

  • AWS S3

  • SFTP

See https://api.sap.com/api/DOCUMENT_GROUNDING_API/resource/Pipelines

__init__(proxy_client=None)

Initializes the PipelineAPIClient

Parameters:

proxy_client (Optional[GenAIHubProxyClient], optional) -- proxy client to use for requests, defaults to None

create_pipeline(pipeline_request)

Create a document vectorization pipeline

Parameters:

pipeline_request (CreatePipelineRequest) -- The object containing the pipeline configuration.

Returns:

ID of the created pipeline

Return type:

PipelineIdResponse

delete_pipeline_by_id(pipeline_id)

Delete a pipeline by pipeline id

Parameters:

pipeline_id (str) -- ID of the pipeline to delete

Returns:

Response of the delete operation

Return type:

requests.Response

get_execution_document_by_id(pipeline_id, execution_id, document_id)

Get Document by ID for a Pipeline Execution

Returns:

Document for the Pipeline Execution

Return type:

Document

Parameters:
  • pipeline_id (str)

  • execution_id (str)

  • document_id (str)

get_execution_documents(pipeline_id, execution_id, top=None, skip=None, count=None)

Get Documents for a Pipeline Execution

Parameters:
  • pipeline_id (str) -- Pipeline ID

  • execution_id (str) -- Execution ID

  • top (Optional[int], optional) -- the maximum number of documents to return, defaults to None

  • skip (Optional[int], optional) -- number of documents to skip, defaults to None

  • count (Optional[bool], optional) -- flag to include count of total documents, defaults to None

Returns:

Documents for the Pipeline Execution

Return type:

DocumentsStatusResponse

get_pipeline_by_id(pipeline_id)

Get details of a pipeline by pipeline id.

Parameters:

pipeline_id (str) -- Pipeline ID

Returns:

Details of the pipeline

Return type:

BasePipelineResponse

get_pipeline_document_by_id(pipeline_id, document_id)

Get Document by ID for a Pipeline

Parameters:
  • pipeline_id (str) -- Pipeline ID

  • document_id (str) -- Document ID

Returns:

Document for the Pipeline

Return type:

Document

get_pipeline_documents(pipeline_id, top=None, skip=None, count=None)

Get Documents for a Pipeline

Parameters:
  • pipeline_id (str) -- Pipeline ID

  • top (Optional[int], optional) -- the maximum number of documents to return, defaults to None

  • skip (Optional[int], optional) -- number of documents to skip, defaults to None

  • count (Optional[bool], optional) -- flag to include count of total documents, defaults to None

Returns:

Documents for the Pipeline

Return type:

DocumentsStatusResponse

get_pipeline_execution_by_id(pipeline_id, execution_id)

Get Pipeline Execution by ID

Parameters:
  • pipeline_id (str) -- Pipeline ID

  • execution_id (str) -- Execution ID

Returns:

Pipeline Execution

Return type:

PipelineExecution

get_pipeline_executions(pipeline_id, last_execution=None, top=None, skip=None, count=None)

Get Pipeline Executions

Parameters:
  • pipeline_id (str) -- Pipeline ID

  • last_execution (Optional[bool], optional) -- flag to get only the last execution, defaults to None

  • top (Optional[int], optional) -- number of executions to retrieve, defaults to None

  • skip (Optional[int], optional) -- number of executions to skip, defaults to None

  • count (Optional[bool], optional) -- flag to include count of total executions, defaults to None

Returns:

Pipeline Executions

Return type:

GetPipelineExecutionsResponse

get_pipeline_status(pipeline_id)

Get pipeline status by pipeline id

Parameters:

pipeline_id (str) -- Pipeline ID

Returns:

Status of the pipeline

Return type:

GetPipelineStatusResponse

get_pipelines(top=None, skip=None, count=None)

Get all pipelines.

Returns:

Get all pipelines

Return type:

GetPipelinesResponse

Parameters:
  • top (int | None)

  • skip (int | None)

  • count (bool | None)

search_pipelines(body)

Pipeline Search by Metadata

Parameters:

body (SearchPipelineRequest) -- The search request object containing metadata filters.

Returns:

Search results containing matching pipelines.

Return type:

SearchPipelinesResponse

trigger_pipeline(request)

Trigger Pipeline Manually

Parameters:

request (ManualPipelineTrigger) -- The manual trigger request object.

Returns:

Response of the trigger operation

Return type:

requests.Response

class PipelineExecution

Bases: BaseModel

createdAt: datetime | None
id: str
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

modifiedAt: datetime | None
status: PipelineExecutionStatus | None
class PipelineExecutionStatus

Bases: str, Enum

__new__(value)
FINISHED = 'FINISHED'
FINISHED_WITH_ERRORS = 'FINISHEDWITHERRORS'
INPROGRESS = 'INPROGRESS'
NEW = 'NEW'
TIMEOUT = 'TIMEOUT'
UNKNOWN = 'UNKNOWN'
class PipelineIdResponse

Bases: BaseModel

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

pipelineId: str
class RetrievalAPIClient

Bases: object

The Retrieval API enables querying and retrieving relevant content from configured data repositories, such as vector or external document sources (e.g., help.sap.com).

Retrieval combines semantic search with repository metadata filtering and supports custom retrieval configurations for chunk/document granularity.

Reference: https://api.sap.com/api/DOCUMENT_GROUNDING_API/resource/Retrieval

__init__(proxy_client=None)

Initialize the RetrievalAPIClient.

Parameters:

proxy_client (Optional[GenAIHubProxyClient], optional) -- Optional proxy client for making API requests.

get_data_repositories(top=None, skip=None, count=None)

List all data repositories available to the tenant.

Parameters:
  • top (Optional[int], optional) -- the number of items to return, defaults to None

  • skip (Optional[int], optional) -- the number of items to skip, defaults to None

  • count (Optional[bool], optional) -- whether to include a count of total items, defaults to None

Returns:

DataRepositories model containing the list of data repositories

Return type:

DataRepositories

get_data_repository_by_id(repository_id)

Get a single data repository by its unique ID.

Parameters:

repository_id (str) -- the unique identifier of the data repository

Returns:

DataRepository model representing the data repository

Return type:

DataRepository

search(search_input)

Perform a retrieval search for relevant content.

Parameters:

search_input (RetrievalSearchInput) -- RetrievalSearchInput model defining the query and filters.

Returns:

RetrievalSearchResults model containing repositories, documents, and chunks.

Return type:

RetrievalSearchResults

class RetrievalChunk

Bases: BaseModel

content: str
id: str
metadata: List[RetrievalKeyValueListPair] | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class RetrievalDataRepositorySearchResult

Bases: BaseModel

dataRepository: DataRepositoryWithDocuments
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class RetrievalDocument

Bases: BaseModel

chunks: List[RetrievalChunk]
id: str
metadata: List[RetrievalDocumentKeyValueListPair] | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class RetrievalDocumentKeyValueListPair

Bases: RetrievalKeyValueListPair

matchMode: str | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class RetrievalKeyValueListPair

Bases: BaseModel

key: str
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

value: List[str]
class RetrievalPerFilterSearchResult

Bases: BaseModel

filterId: str
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

results: List[RetrievalDataRepositorySearchResult]
class RetrievalPerFilterSearchResultError

Bases: BaseModel

message: str
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class RetrievalPerFilterSearchResultWithError

Bases: BaseModel

error: RetrievalPerFilterSearchResultError
filterId: str
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class RetrievalSearchConfiguration

Bases: BaseModel

maxChunkCount: int | None
maxDocumentCount: int | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class RetrievalSearchDocumentKeyValueListPair

Bases: BaseModel

key: str
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

selectMode: List[str] | None
value: List[str]
class RetrievalSearchFilter

Bases: BaseModel

chunkMetadata: List[RetrievalKeyValueListPair] | None
dataRepositories: List[str] | None
dataRepositoryMetadata: List[RetrievalKeyValueListPair] | None
dataRepositoryType: Literal['vector', 'help.sap.com'] | str
documentMetadata: List[RetrievalSearchDocumentKeyValueListPair] | None
id: str
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

searchConfiguration: RetrievalSearchConfiguration | None
class RetrievalSearchInput

Bases: BaseModel

filters: List[RetrievalSearchFilter]
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

query: str
class RetrievalSearchResults

Bases: BaseModel

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

results: List[RetrievalPerFilterSearchResult | RetrievalPerFilterSearchResultWithError]
class S3PipelineCreateRequest

Bases: BaseModel

configuration: CommonConfiguration
metadata: MetaData | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

type: Literal['S3']
class S3PipelineGetResponse

Bases: BasePipelineResponse

configuration: CommonConfiguration
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

type: Literal['S3']
class SFTPPipelineCreateRequest

Bases: BaseModel

configuration: CommonConfiguration
metadata: MetaData | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

type: Literal['SFTP']
class SFTPPipelineGetResponse

Bases: BasePipelineResponse

configuration: CommonConfiguration
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

type: Literal['SFTP']
class SearchPipelineData

Bases: BaseModel

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

pipelineId: str
class SearchPipelineRequest

Bases: BaseModel

dataRepositoryMetadata: List[DataRepositoryMetadataItem]
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class SearchPipelinesResponse

Bases: BaseModel

count: int | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

resources: List[SearchPipelineData]
class SharePointConfig

Bases: BaseModel

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

site: SharePointSite
class SharePointSite

Bases: BaseModel

includePaths: List[str] | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str
class TextOnlyBaseChunk

Bases: BaseModel

content: str
metadata: List[VectorKeyValueListPair] | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class TextSearchRequest

Bases: BaseModel

filters: List[VectorSearchFilter]
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

query: str
class VectorAPIClient

Bases: object

The Vector API provides management and search capabilities for vector-based document collections.

It enables creating, retrieving, updating, and deleting collections, as well as managing documents and performing semantic vector searches within those collections.

Reference: https://api.sap.com/api/DOCUMENT_GROUNDING_API/resource/Vector

__init__(proxy_client=None)

Initializes the VectorAPIClient

Parameters:

proxy_client (Optional[GenAIHubProxyClient], optional) -- Optional proxy client to use for requests

create_collection(collection_request)

Create a new collection.

Parameters:

collection_request (CollectionCreateRequest) -- The object containing the collection configuration.

Returns:

requests.Response empty object with 202 status code

Return type:

requests.Response

create_documents(collection_id, request)

Create documents in a collection.

Parameters:
  • collection_id (str) -- The ID of the collection to add documents to.

  • request (DocumentsCreateRequest) -- The object containing the documents to create.

Returns:

A DocumentsListResponse object containing the created documents

Return type:

DocumentsListResponse

delete_collection(collection_id)

Delete collection by ID.

Parameters:

collection_id (str) -- The ID of the collection to delete.

Returns:

requests.Response empty object with 204 status code

Return type:

requests.Response

delete_document(collection_id, document_id)

Delete a document from a collection.

Parameters:
  • collection_id (str) -- The ID of the collection to delete the document from.

  • document_id (str) -- The ID of the document to delete.

Returns:

requests.Response empty object with 204 status code

Return type:

requests.Response

get_collection_by_id(collection_id)

Get collection details by ID.

Parameters:

collection_id (str) -- The ID of the collection to retrieve.

Returns:

A Collection object containing the collection details

Return type:

Collection

get_collection_creation_status(collection_id)

Get creation status for a collection.

Parameters:

collection_id (str) -- The ID of the collection to retrieve the creation status for.

Returns:

A CollectionCreationStatusResponse object containing the creation status

Return type:

CollectionCreationStatusResponse

get_collection_deletion_status(collection_id)

Get deletion status for a collection.

Parameters:

collection_id (str) -- The ID of the collection to retrieve the deletion status for.

Returns:

A CollectionDeletionStatusResponse object containing the deletion status

Return type:

CollectionDeletionStatusResponse

get_collections(top=None, skip=None, count=None)

Get all collections.

Parameters:
  • top (Optional[int], optional) -- the number of collections to retrieve, defaults to None

  • skip (Optional[int], optional) -- the number of collections to skip, defaults to None

  • count (Optional[bool], optional) -- whether to include the total count of collections, defaults to None

Returns:

A CollectionsListResponse object containing the list of collections

Return type:

CollectionsListResponse

get_document_by_id(collection_id, document_id)

Get a document by ID from a collection.

Parameters:
  • collection_id (str) -- The ID of the collection to retrieve the document from.

  • document_id (str) -- The ID of the document to retrieve.

Returns:

A Document object containing the document details

Return type:

Document

get_documents(collection_id, top=None, skip=None, count=None)

Get documents from a collection.

Parameters:
  • collection_id (str) -- The ID of the collection to retrieve documents from.

  • top (Optional[int], optional) -- the number of documents to retrieve, defaults to None

  • skip (Optional[int], optional) -- the number of documents to skip, defaults to None

  • count (Optional[bool], optional) -- whether to include the total count of documents, defaults to None

Returns:

A DocumentsResponse object containing the list of documents

Return type:

DocumentsResponse

search(request)

Perform semantic search in vector collections.

Parameters:

request (TextSearchRequest) -- The object containing the search parameters.

Returns:

A VectorSearchResults object containing the search results

Return type:

VectorSearchResults

update_documents(collection_id, request)

Update documents in a collection.

Parameters:
  • collection_id (str) -- The ID of the collection to update documents in.

  • request (DocumentsUpdateRequest) -- The object containing the documents to update.

Returns:

A DocumentsListResponse object containing the updated documents

Return type:

DocumentsListResponse

class VectorChunk

Bases: BaseModel

content: str
id: str
metadata: List[VectorKeyValueListPair] | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

VectorDocument

alias of Document

class VectorKeyValueListPair

Bases: BaseModel

key: str
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

value: List[str]
class VectorPerFilterSearchResult

Bases: BaseModel

filterId: str
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

results: List[DocumentsChunk]
class VectorSearchConfiguration

Bases: BaseModel

maxChunkCount: int | None
maxDocumentCount: int | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class VectorSearchDocumentKeyValueListPair

Bases: BaseModel

key: str
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

selectMode: List[str] | None
value: List[str]
class VectorSearchFilter

Bases: BaseModel

chunkMetadata: List[VectorKeyValueListPair] | None
collectionIds: List[str]
collectionMetadata: List[VectorKeyValueListPair] | None
configuration: VectorSearchConfiguration
documentMetadata: List[VectorSearchDocumentKeyValueListPair] | None
id: str
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class VectorSearchResults

Bases: BaseModel

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

results: List[VectorPerFilterSearchResult]

Subpackages

Submodules

gen_ai_hub.document_grounding.client module

Client module for Document Grounding API.

This module provides convenient imports for all Document Grounding API clients and their associated constants. It serves as the main entry point for accessing Pipeline, Retrieval, and Vector API functionality.

Exported clients:
  • PipelineAPIClient: Client for managing document vectorization pipelines

  • RetrievalAPIClient: Client for retrieval operations across data repositories

  • VectorAPIClient: Client for vector collection management and semantic search

Exported constants:
  • PATH_DOCUMENT_GROUNDING: Base path for document grounding endpoints

  • PATH_DOCUMENT_GROUNDING_PIPELINES: Path for pipeline endpoints

  • PATH_DOCUMENT_GROUNDING_RETRIEVAL: Path for retrieval endpoints

  • PATH_DOCUMENT_GROUNDING_VECTOR: Path for vector endpoints

class PipelineAPIClient

Bases: object

The Pipelines API creates and manages vector stores based on documents from user data repositories: S3, SFTP, and Microsoft SharePoint. Each pipeline represents a configured end-to-end process including the following steps:

  • Fetches documents from a supported data source

  • Preprocesses and chunks the document content, and generates semantic embeddings. Semantic embeddings are multidimensional representations of textual information.

  • Stores semantic embeddings into the HANA Vector Store

The Pipeline API is compatible with the following data repositories:

  • Microsoft SharePoint

  • AWS S3

  • SFTP

See https://api.sap.com/api/DOCUMENT_GROUNDING_API/resource/Pipelines

__init__(proxy_client=None)

Initializes the PipelineAPIClient

Parameters:

proxy_client (Optional[GenAIHubProxyClient], optional) -- proxy client to use for requests, defaults to None

create_pipeline(pipeline_request)

Create a document vectorization pipeline

Parameters:

pipeline_request (CreatePipelineRequest) -- The object containing the pipeline configuration.

Returns:

ID of the created pipeline

Return type:

PipelineIdResponse

delete_pipeline_by_id(pipeline_id)

Delete a pipeline by pipeline id

Parameters:

pipeline_id (str) -- ID of the pipeline to delete

Returns:

Response of the delete operation

Return type:

requests.Response

get_execution_document_by_id(pipeline_id, execution_id, document_id)

Get Document by ID for a Pipeline Execution

Returns:

Document for the Pipeline Execution

Return type:

Document

Parameters:
  • pipeline_id (str)

  • execution_id (str)

  • document_id (str)

get_execution_documents(pipeline_id, execution_id, top=None, skip=None, count=None)

Get Documents for a Pipeline Execution

Parameters:
  • pipeline_id (str) -- Pipeline ID

  • execution_id (str) -- Execution ID

  • top (Optional[int], optional) -- the maximum number of documents to return, defaults to None

  • skip (Optional[int], optional) -- number of documents to skip, defaults to None

  • count (Optional[bool], optional) -- flag to include count of total documents, defaults to None

Returns:

Documents for the Pipeline Execution

Return type:

DocumentsStatusResponse

get_pipeline_by_id(pipeline_id)

Get details of a pipeline by pipeline id.

Parameters:

pipeline_id (str) -- Pipeline ID

Returns:

Details of the pipeline

Return type:

BasePipelineResponse

get_pipeline_document_by_id(pipeline_id, document_id)

Get Document by ID for a Pipeline

Parameters:
  • pipeline_id (str) -- Pipeline ID

  • document_id (str) -- Document ID

Returns:

Document for the Pipeline

Return type:

Document

get_pipeline_documents(pipeline_id, top=None, skip=None, count=None)

Get Documents for a Pipeline

Parameters:
  • pipeline_id (str) -- Pipeline ID

  • top (Optional[int], optional) -- the maximum number of documents to return, defaults to None

  • skip (Optional[int], optional) -- number of documents to skip, defaults to None

  • count (Optional[bool], optional) -- flag to include count of total documents, defaults to None

Returns:

Documents for the Pipeline

Return type:

DocumentsStatusResponse

get_pipeline_execution_by_id(pipeline_id, execution_id)

Get Pipeline Execution by ID

Parameters:
  • pipeline_id (str) -- Pipeline ID

  • execution_id (str) -- Execution ID

Returns:

Pipeline Execution

Return type:

PipelineExecution

get_pipeline_executions(pipeline_id, last_execution=None, top=None, skip=None, count=None)

Get Pipeline Executions

Parameters:
  • pipeline_id (str) -- Pipeline ID

  • last_execution (Optional[bool], optional) -- flag to get only the last execution, defaults to None

  • top (Optional[int], optional) -- number of executions to retrieve, defaults to None

  • skip (Optional[int], optional) -- number of executions to skip, defaults to None

  • count (Optional[bool], optional) -- flag to include count of total executions, defaults to None

Returns:

Pipeline Executions

Return type:

GetPipelineExecutionsResponse

get_pipeline_status(pipeline_id)

Get pipeline status by pipeline id

Parameters:

pipeline_id (str) -- Pipeline ID

Returns:

Status of the pipeline

Return type:

GetPipelineStatusResponse

get_pipelines(top=None, skip=None, count=None)

Get all pipelines.

Returns:

Get all pipelines

Return type:

GetPipelinesResponse

Parameters:
  • top (int | None)

  • skip (int | None)

  • count (bool | None)

search_pipelines(body)

Pipeline Search by Metadata

Parameters:

body (SearchPipelineRequest) -- The search request object containing metadata filters.

Returns:

Search results containing matching pipelines.

Return type:

SearchPipelinesResponse

trigger_pipeline(request)

Trigger Pipeline Manually

Parameters:

request (ManualPipelineTrigger) -- The manual trigger request object.

Returns:

Response of the trigger operation

Return type:

requests.Response

class RetrievalAPIClient

Bases: object

The Retrieval API enables querying and retrieving relevant content from configured data repositories, such as vector or external document sources (e.g., help.sap.com).

Retrieval combines semantic search with repository metadata filtering and supports custom retrieval configurations for chunk/document granularity.

Reference: https://api.sap.com/api/DOCUMENT_GROUNDING_API/resource/Retrieval

__init__(proxy_client=None)

Initialize the RetrievalAPIClient.

Parameters:

proxy_client (Optional[GenAIHubProxyClient], optional) -- Optional proxy client for making API requests.

get_data_repositories(top=None, skip=None, count=None)

List all data repositories available to the tenant.

Parameters:
  • top (Optional[int], optional) -- the number of items to return, defaults to None

  • skip (Optional[int], optional) -- the number of items to skip, defaults to None

  • count (Optional[bool], optional) -- whether to include a count of total items, defaults to None

Returns:

DataRepositories model containing the list of data repositories

Return type:

DataRepositories

get_data_repository_by_id(repository_id)

Get a single data repository by its unique ID.

Parameters:

repository_id (str) -- the unique identifier of the data repository

Returns:

DataRepository model representing the data repository

Return type:

DataRepository

search(search_input)

Perform a retrieval search for relevant content.

Parameters:

search_input (RetrievalSearchInput) -- RetrievalSearchInput model defining the query and filters.

Returns:

RetrievalSearchResults model containing repositories, documents, and chunks.

Return type:

RetrievalSearchResults

class VectorAPIClient

Bases: object

The Vector API provides management and search capabilities for vector-based document collections.

It enables creating, retrieving, updating, and deleting collections, as well as managing documents and performing semantic vector searches within those collections.

Reference: https://api.sap.com/api/DOCUMENT_GROUNDING_API/resource/Vector

__init__(proxy_client=None)

Initializes the VectorAPIClient

Parameters:

proxy_client (Optional[GenAIHubProxyClient], optional) -- Optional proxy client to use for requests

create_collection(collection_request)

Create a new collection.

Parameters:

collection_request (CollectionCreateRequest) -- The object containing the collection configuration.

Returns:

requests.Response empty object with 202 status code

Return type:

requests.Response

create_documents(collection_id, request)

Create documents in a collection.

Parameters:
  • collection_id (str) -- The ID of the collection to add documents to.

  • request (DocumentsCreateRequest) -- The object containing the documents to create.

Returns:

A DocumentsListResponse object containing the created documents

Return type:

DocumentsListResponse

delete_collection(collection_id)

Delete collection by ID.

Parameters:

collection_id (str) -- The ID of the collection to delete.

Returns:

requests.Response empty object with 204 status code

Return type:

requests.Response

delete_document(collection_id, document_id)

Delete a document from a collection.

Parameters:
  • collection_id (str) -- The ID of the collection to delete the document from.

  • document_id (str) -- The ID of the document to delete.

Returns:

requests.Response empty object with 204 status code

Return type:

requests.Response

get_collection_by_id(collection_id)

Get collection details by ID.

Parameters:

collection_id (str) -- The ID of the collection to retrieve.

Returns:

A Collection object containing the collection details

Return type:

Collection

get_collection_creation_status(collection_id)

Get creation status for a collection.

Parameters:

collection_id (str) -- The ID of the collection to retrieve the creation status for.

Returns:

A CollectionCreationStatusResponse object containing the creation status

Return type:

CollectionCreationStatusResponse

get_collection_deletion_status(collection_id)

Get deletion status for a collection.

Parameters:

collection_id (str) -- The ID of the collection to retrieve the deletion status for.

Returns:

A CollectionDeletionStatusResponse object containing the deletion status

Return type:

CollectionDeletionStatusResponse

get_collections(top=None, skip=None, count=None)

Get all collections.

Parameters:
  • top (Optional[int], optional) -- the number of collections to retrieve, defaults to None

  • skip (Optional[int], optional) -- the number of collections to skip, defaults to None

  • count (Optional[bool], optional) -- whether to include the total count of collections, defaults to None

Returns:

A CollectionsListResponse object containing the list of collections

Return type:

CollectionsListResponse

get_document_by_id(collection_id, document_id)

Get a document by ID from a collection.

Parameters:
  • collection_id (str) -- The ID of the collection to retrieve the document from.

  • document_id (str) -- The ID of the document to retrieve.

Returns:

A Document object containing the document details

Return type:

Document

get_documents(collection_id, top=None, skip=None, count=None)

Get documents from a collection.

Parameters:
  • collection_id (str) -- The ID of the collection to retrieve documents from.

  • top (Optional[int], optional) -- the number of documents to retrieve, defaults to None

  • skip (Optional[int], optional) -- the number of documents to skip, defaults to None

  • count (Optional[bool], optional) -- whether to include the total count of documents, defaults to None

Returns:

A DocumentsResponse object containing the list of documents

Return type:

DocumentsResponse

search(request)

Perform semantic search in vector collections.

Parameters:

request (TextSearchRequest) -- The object containing the search parameters.

Returns:

A VectorSearchResults object containing the search results

Return type:

VectorSearchResults

update_documents(collection_id, request)

Update documents in a collection.

Parameters:
  • collection_id (str) -- The ID of the collection to update documents in.

  • request (DocumentsUpdateRequest) -- The object containing the documents to update.

Returns:

A DocumentsListResponse object containing the updated documents

Return type:

DocumentsListResponse