gen_ai_hub.document_grounding package
Document Grounding package for SAP Generative AI Hub.
This package provides APIs for document grounding capabilities including: - Pipeline management for document vectorization from various data sources - Vector store operations for semantic search - Retrieval operations for querying document repositories
The package includes three main API clients: - PipelineAPIClient: Manages document vectorization pipelines - VectorAPIClient: Manages vector collections and semantic search - RetrievalAPIClient: Performs retrieval operations across data repositories
- class BaseDocument
Bases:
BaseModel- chunks: List[TextOnlyBaseChunk]
- metadata: List[VectorKeyValueListPair]
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class BasePipelineResponse
Bases:
BaseModel- id: str
- metadata: MetaData | None
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- type: str
- class Collection
Bases:
BaseModel- embeddingConfig: EmbeddingConfig
- id: str
- metadata: List[VectorKeyValueListPair] | None
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- title: str | None
- class CollectionCreateRequest
Bases:
BaseModel- embeddingConfig: EmbeddingConfig
- metadata: List[VectorKeyValueListPair] | None
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- title: str | None
- class CollectionCreatedResponse
Bases:
BaseModel- collectionURL: str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- status: Literal['CREATED']
- class CollectionDeletedResponse
Bases:
BaseModel- collectionURL: str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- status: Literal['DELETED']
- class CollectionPendingResponse
Bases:
BaseModel- Location: str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- status: Literal['PENDING']
- class CollectionsListResponse
Bases:
BaseModel- count: int | None
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- resources: List[Collection]
- class CommonConfiguration
Bases:
BaseModel- destination: str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class DataRepositories
Bases:
BaseModel- count: int | None
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- resources: List[DataRepository]
- class DataRepository
Bases:
BaseModel- id: str
- metadata: List[RetrievalKeyValueListPair] | None
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- title: str
- type: Literal['vector', 'help.sap.com'] | str
- class DataRepositoryMetadataItem
Bases:
BaseModel- key: str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- value: List[str]
- class DataRepositoryWithDocuments
Bases:
BaseModel- documents: List[RetrievalDocument]
- id: str
- metadata: List[RetrievalKeyValueListPair] | None
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- title: str
- class Document
Bases:
BaseModel- absoluteUrl: str | None
- createdTimestamp: datetime | None
- downloadLocation: str | None
- id: str
- lastUpdatedTimestamp: datetime | None
- metadataId: str | None
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- status: DocumentStatus | None
- title: str | None
- viewLocation: str | None
- class DocumentOutput
Bases:
BaseModel- chunks: List[VectorChunk]
- id: str
- metadata: List[VectorKeyValueListPair] | None
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class DocumentStatus
Bases:
str,Enum- __new__(value)
- DEINDEXED = 'DEINDEXED'
- FAILED = 'FAILED'
- FAILED_TO_BE_RETRIED = 'FAILED_TO_BE_RETRIED'
- INDEXED = 'INDEXED'
- REINDEXED = 'REINDEXED'
- TO_BE_PROCESSED = 'TO_BE_PROCESSED'
- TO_BE_SCHEDULED = 'TO_BE_SCHEDULED'
- class DocumentWithoutChunks
Bases:
BaseModel- id: str
- metadata: List[VectorKeyValueListPair]
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class DocumentsChunk
Bases:
BaseModel- documents: List[DocumentOutput]
- id: str
- metadata: List[VectorKeyValueListPair] | None
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- title: str
- class DocumentsCreateRequest
Bases:
BaseModel- documents: List[BaseDocument]
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class DocumentsListResponse
Bases:
BaseModel- documents: List[DocumentWithoutChunks]
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class DocumentsResponse
Bases:
BaseModel- count: int | None
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- resources: List[DocumentWithoutChunks]
- class DocumentsStatusResponse
Bases:
BaseModel- count: int | None
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- resources: List[Document]
- class DocumentsUpdateRequest
Bases:
BaseModel- documents: List[Document]
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class EmbeddingConfig
Bases:
BaseModel- modelName: str | None
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class GetPipelineExecutionsResponse
Bases:
BaseModel- count: int | None
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- resources: List[PipelineExecution]
- class GetPipelineStatusResponse
Bases:
BaseModel- lastStarted: str | None
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- status: str | None
- class GetPipelinesResponse
Bases:
BaseModel- count: int | None
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- resources: List[Annotated[MSSharePointPipelineGetResponse | S3PipelineGetResponse | SFTPPipelineGetResponse, FieldInfo(annotation=NoneType, required=True, discriminator='type')]]
Bases:
BaseModelConfiguration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
Bases:
BaseModelConfiguration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
Bases:
BaseModelConfiguration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
Bases:
BasePipelineResponseConfiguration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class ManualPipelineTrigger
Bases:
BaseModel- metadataOnly: bool | None
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- pipelineId: str
- class MetaData
Bases:
BaseModel- destination: str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class PipelineAPIClient
Bases:
objectThe Pipelines API creates and manages vector stores based on documents from user data repositories: S3, SFTP, and Microsoft SharePoint. Each pipeline represents a configured end-to-end process including the following steps:
Fetches documents from a supported data source
Preprocesses and chunks the document content, and generates semantic embeddings. Semantic embeddings are multidimensional representations of textual information.
Stores semantic embeddings into the HANA Vector Store
The Pipeline API is compatible with the following data repositories:
Microsoft SharePoint
AWS S3
SFTP
See https://api.sap.com/api/DOCUMENT_GROUNDING_API/resource/Pipelines
- __init__(proxy_client=None)
Initializes the PipelineAPIClient
- Parameters:
proxy_client (Optional[GenAIHubProxyClient], optional) -- proxy client to use for requests, defaults to None
- create_pipeline(pipeline_request)
Create a document vectorization pipeline
- Parameters:
pipeline_request (CreatePipelineRequest) -- The object containing the pipeline configuration.
- Returns:
ID of the created pipeline
- Return type:
- delete_pipeline_by_id(pipeline_id)
Delete a pipeline by pipeline id
- Parameters:
pipeline_id (str) -- ID of the pipeline to delete
- Returns:
Response of the delete operation
- Return type:
requests.Response
- get_execution_document_by_id(pipeline_id, execution_id, document_id)
Get Document by ID for a Pipeline Execution
- Returns:
Document for the Pipeline Execution
- Return type:
- Parameters:
pipeline_id (str)
execution_id (str)
document_id (str)
- get_execution_documents(pipeline_id, execution_id, top=None, skip=None, count=None)
Get Documents for a Pipeline Execution
- Parameters:
pipeline_id (str) -- Pipeline ID
execution_id (str) -- Execution ID
top (Optional[int], optional) -- the maximum number of documents to return, defaults to None
skip (Optional[int], optional) -- number of documents to skip, defaults to None
count (Optional[bool], optional) -- flag to include count of total documents, defaults to None
- Returns:
Documents for the Pipeline Execution
- Return type:
- get_pipeline_by_id(pipeline_id)
Get details of a pipeline by pipeline id.
- Parameters:
pipeline_id (str) -- Pipeline ID
- Returns:
Details of the pipeline
- Return type:
- get_pipeline_document_by_id(pipeline_id, document_id)
Get Document by ID for a Pipeline
- Parameters:
pipeline_id (str) -- Pipeline ID
document_id (str) -- Document ID
- Returns:
Document for the Pipeline
- Return type:
- get_pipeline_documents(pipeline_id, top=None, skip=None, count=None)
Get Documents for a Pipeline
- Parameters:
pipeline_id (str) -- Pipeline ID
top (Optional[int], optional) -- the maximum number of documents to return, defaults to None
skip (Optional[int], optional) -- number of documents to skip, defaults to None
count (Optional[bool], optional) -- flag to include count of total documents, defaults to None
- Returns:
Documents for the Pipeline
- Return type:
- get_pipeline_execution_by_id(pipeline_id, execution_id)
Get Pipeline Execution by ID
- Parameters:
pipeline_id (str) -- Pipeline ID
execution_id (str) -- Execution ID
- Returns:
Pipeline Execution
- Return type:
- get_pipeline_executions(pipeline_id, last_execution=None, top=None, skip=None, count=None)
Get Pipeline Executions
- Parameters:
pipeline_id (str) -- Pipeline ID
last_execution (Optional[bool], optional) -- flag to get only the last execution, defaults to None
top (Optional[int], optional) -- number of executions to retrieve, defaults to None
skip (Optional[int], optional) -- number of executions to skip, defaults to None
count (Optional[bool], optional) -- flag to include count of total executions, defaults to None
- Returns:
Pipeline Executions
- Return type:
- get_pipeline_status(pipeline_id)
Get pipeline status by pipeline id
- Parameters:
pipeline_id (str) -- Pipeline ID
- Returns:
Status of the pipeline
- Return type:
- get_pipelines(top=None, skip=None, count=None)
Get all pipelines.
- Returns:
Get all pipelines
- Return type:
- Parameters:
top (int | None)
skip (int | None)
count (bool | None)
- search_pipelines(body)
Pipeline Search by Metadata
- Parameters:
body (SearchPipelineRequest) -- The search request object containing metadata filters.
- Returns:
Search results containing matching pipelines.
- Return type:
- trigger_pipeline(request)
Trigger Pipeline Manually
- Parameters:
request (ManualPipelineTrigger) -- The manual trigger request object.
- Returns:
Response of the trigger operation
- Return type:
requests.Response
- class PipelineExecution
Bases:
BaseModel- createdAt: datetime | None
- id: str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- modifiedAt: datetime | None
- status: PipelineExecutionStatus | None
- class PipelineExecutionStatus
Bases:
str,Enum- __new__(value)
- FINISHED = 'FINISHED'
- FINISHED_WITH_ERRORS = 'FINISHEDWITHERRORS'
- INPROGRESS = 'INPROGRESS'
- NEW = 'NEW'
- TIMEOUT = 'TIMEOUT'
- UNKNOWN = 'UNKNOWN'
- class PipelineIdResponse
Bases:
BaseModel- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- pipelineId: str
- class RetrievalAPIClient
Bases:
objectThe Retrieval API enables querying and retrieving relevant content from configured data repositories, such as vector or external document sources (e.g., help.sap.com).
Retrieval combines semantic search with repository metadata filtering and supports custom retrieval configurations for chunk/document granularity.
Reference: https://api.sap.com/api/DOCUMENT_GROUNDING_API/resource/Retrieval
- __init__(proxy_client=None)
Initialize the RetrievalAPIClient.
- Parameters:
proxy_client (Optional[GenAIHubProxyClient], optional) -- Optional proxy client for making API requests.
- get_data_repositories(top=None, skip=None, count=None)
List all data repositories available to the tenant.
- Parameters:
top (Optional[int], optional) -- the number of items to return, defaults to None
skip (Optional[int], optional) -- the number of items to skip, defaults to None
count (Optional[bool], optional) -- whether to include a count of total items, defaults to None
- Returns:
DataRepositories model containing the list of data repositories
- Return type:
- get_data_repository_by_id(repository_id)
Get a single data repository by its unique ID.
- Parameters:
repository_id (str) -- the unique identifier of the data repository
- Returns:
DataRepository model representing the data repository
- Return type:
- search(search_input)
Perform a retrieval search for relevant content.
- Parameters:
search_input (RetrievalSearchInput) -- RetrievalSearchInput model defining the query and filters.
- Returns:
RetrievalSearchResults model containing repositories, documents, and chunks.
- Return type:
- class RetrievalChunk
Bases:
BaseModel- content: str
- id: str
- metadata: List[RetrievalKeyValueListPair] | None
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class RetrievalDataRepositorySearchResult
Bases:
BaseModel- dataRepository: DataRepositoryWithDocuments
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class RetrievalDocument
Bases:
BaseModel- chunks: List[RetrievalChunk]
- id: str
- metadata: List[RetrievalDocumentKeyValueListPair] | None
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class RetrievalDocumentKeyValueListPair
Bases:
RetrievalKeyValueListPair- matchMode: str | None
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class RetrievalKeyValueListPair
Bases:
BaseModel- key: str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- value: List[str]
- class RetrievalPerFilterSearchResult
Bases:
BaseModel- filterId: str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- results: List[RetrievalDataRepositorySearchResult]
- class RetrievalPerFilterSearchResultError
Bases:
BaseModel- message: str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class RetrievalPerFilterSearchResultWithError
Bases:
BaseModel- filterId: str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class RetrievalSearchConfiguration
Bases:
BaseModel- maxChunkCount: int | None
- maxDocumentCount: int | None
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class RetrievalSearchDocumentKeyValueListPair
Bases:
BaseModel- key: str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- selectMode: List[str] | None
- value: List[str]
- class RetrievalSearchFilter
Bases:
BaseModel- chunkMetadata: List[RetrievalKeyValueListPair] | None
- dataRepositories: List[str] | None
- dataRepositoryMetadata: List[RetrievalKeyValueListPair] | None
- dataRepositoryType: Literal['vector', 'help.sap.com'] | str
- documentMetadata: List[RetrievalSearchDocumentKeyValueListPair] | None
- id: str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- searchConfiguration: RetrievalSearchConfiguration | None
- class RetrievalSearchInput
Bases:
BaseModel- filters: List[RetrievalSearchFilter]
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- query: str
- class RetrievalSearchResults
Bases:
BaseModel- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- results: List[RetrievalPerFilterSearchResult | RetrievalPerFilterSearchResultWithError]
- class S3PipelineCreateRequest
Bases:
BaseModel- configuration: CommonConfiguration
- metadata: MetaData | None
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- type: Literal['S3']
- class S3PipelineGetResponse
Bases:
BasePipelineResponse- configuration: CommonConfiguration
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- type: Literal['S3']
- class SFTPPipelineCreateRequest
Bases:
BaseModel- configuration: CommonConfiguration
- metadata: MetaData | None
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- type: Literal['SFTP']
- class SFTPPipelineGetResponse
Bases:
BasePipelineResponse- configuration: CommonConfiguration
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- type: Literal['SFTP']
- class SearchPipelineData
Bases:
BaseModel- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- pipelineId: str
- class SearchPipelineRequest
Bases:
BaseModel- dataRepositoryMetadata: List[DataRepositoryMetadataItem]
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class SearchPipelinesResponse
Bases:
BaseModel- count: int | None
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- resources: List[SearchPipelineData]
Bases:
BaseModelConfiguration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
Bases:
BaseModelConfiguration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class TextOnlyBaseChunk
Bases:
BaseModel- content: str
- metadata: List[VectorKeyValueListPair] | None
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class TextSearchRequest
Bases:
BaseModel- filters: List[VectorSearchFilter]
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- query: str
- class VectorAPIClient
Bases:
objectThe Vector API provides management and search capabilities for vector-based document collections.
It enables creating, retrieving, updating, and deleting collections, as well as managing documents and performing semantic vector searches within those collections.
Reference: https://api.sap.com/api/DOCUMENT_GROUNDING_API/resource/Vector
- __init__(proxy_client=None)
Initializes the VectorAPIClient
- Parameters:
proxy_client (Optional[GenAIHubProxyClient], optional) -- Optional proxy client to use for requests
- create_collection(collection_request)
Create a new collection.
- Parameters:
collection_request (CollectionCreateRequest) -- The object containing the collection configuration.
- Returns:
requests.Response empty object with 202 status code
- Return type:
requests.Response
- create_documents(collection_id, request)
Create documents in a collection.
- Parameters:
collection_id (str) -- The ID of the collection to add documents to.
request (DocumentsCreateRequest) -- The object containing the documents to create.
- Returns:
A DocumentsListResponse object containing the created documents
- Return type:
- delete_collection(collection_id)
Delete collection by ID.
- Parameters:
collection_id (str) -- The ID of the collection to delete.
- Returns:
requests.Response empty object with 204 status code
- Return type:
requests.Response
- delete_document(collection_id, document_id)
Delete a document from a collection.
- Parameters:
collection_id (str) -- The ID of the collection to delete the document from.
document_id (str) -- The ID of the document to delete.
- Returns:
requests.Response empty object with 204 status code
- Return type:
requests.Response
- get_collection_by_id(collection_id)
Get collection details by ID.
- Parameters:
collection_id (str) -- The ID of the collection to retrieve.
- Returns:
A Collection object containing the collection details
- Return type:
- get_collection_creation_status(collection_id)
Get creation status for a collection.
- Parameters:
collection_id (str) -- The ID of the collection to retrieve the creation status for.
- Returns:
A CollectionCreationStatusResponse object containing the creation status
- Return type:
CollectionCreationStatusResponse
- get_collection_deletion_status(collection_id)
Get deletion status for a collection.
- Parameters:
collection_id (str) -- The ID of the collection to retrieve the deletion status for.
- Returns:
A CollectionDeletionStatusResponse object containing the deletion status
- Return type:
CollectionDeletionStatusResponse
- get_collections(top=None, skip=None, count=None)
Get all collections.
- Parameters:
top (Optional[int], optional) -- the number of collections to retrieve, defaults to None
skip (Optional[int], optional) -- the number of collections to skip, defaults to None
count (Optional[bool], optional) -- whether to include the total count of collections, defaults to None
- Returns:
A CollectionsListResponse object containing the list of collections
- Return type:
- get_document_by_id(collection_id, document_id)
Get a document by ID from a collection.
- Parameters:
collection_id (str) -- The ID of the collection to retrieve the document from.
document_id (str) -- The ID of the document to retrieve.
- Returns:
A Document object containing the document details
- Return type:
- get_documents(collection_id, top=None, skip=None, count=None)
Get documents from a collection.
- Parameters:
collection_id (str) -- The ID of the collection to retrieve documents from.
top (Optional[int], optional) -- the number of documents to retrieve, defaults to None
skip (Optional[int], optional) -- the number of documents to skip, defaults to None
count (Optional[bool], optional) -- whether to include the total count of documents, defaults to None
- Returns:
A DocumentsResponse object containing the list of documents
- Return type:
- search(request)
Perform semantic search in vector collections.
- Parameters:
request (TextSearchRequest) -- The object containing the search parameters.
- Returns:
A VectorSearchResults object containing the search results
- Return type:
- update_documents(collection_id, request)
Update documents in a collection.
- Parameters:
collection_id (str) -- The ID of the collection to update documents in.
request (DocumentsUpdateRequest) -- The object containing the documents to update.
- Returns:
A DocumentsListResponse object containing the updated documents
- Return type:
- class VectorChunk
Bases:
BaseModel- content: str
- id: str
- metadata: List[VectorKeyValueListPair] | None
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- VectorDocument
alias of
Document
- class VectorKeyValueListPair
Bases:
BaseModel- key: str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- value: List[str]
- class VectorPerFilterSearchResult
Bases:
BaseModel- filterId: str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- results: List[DocumentsChunk]
- class VectorSearchConfiguration
Bases:
BaseModel- maxChunkCount: int | None
- maxDocumentCount: int | None
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class VectorSearchDocumentKeyValueListPair
Bases:
BaseModel- key: str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- selectMode: List[str] | None
- value: List[str]
- class VectorSearchFilter
Bases:
BaseModel- chunkMetadata: List[VectorKeyValueListPair] | None
- collectionIds: List[str]
- collectionMetadata: List[VectorKeyValueListPair] | None
- configuration: VectorSearchConfiguration
- documentMetadata: List[VectorSearchDocumentKeyValueListPair] | None
- id: str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class VectorSearchResults
Bases:
BaseModel- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- results: List[VectorPerFilterSearchResult]
Subpackages
Submodules
gen_ai_hub.document_grounding.client module
Client module for Document Grounding API.
This module provides convenient imports for all Document Grounding API clients and their associated constants. It serves as the main entry point for accessing Pipeline, Retrieval, and Vector API functionality.
- Exported clients:
PipelineAPIClient: Client for managing document vectorization pipelines
RetrievalAPIClient: Client for retrieval operations across data repositories
VectorAPIClient: Client for vector collection management and semantic search
- Exported constants:
PATH_DOCUMENT_GROUNDING: Base path for document grounding endpoints
PATH_DOCUMENT_GROUNDING_PIPELINES: Path for pipeline endpoints
PATH_DOCUMENT_GROUNDING_RETRIEVAL: Path for retrieval endpoints
PATH_DOCUMENT_GROUNDING_VECTOR: Path for vector endpoints
- class PipelineAPIClient
Bases:
objectThe Pipelines API creates and manages vector stores based on documents from user data repositories: S3, SFTP, and Microsoft SharePoint. Each pipeline represents a configured end-to-end process including the following steps:
Fetches documents from a supported data source
Preprocesses and chunks the document content, and generates semantic embeddings. Semantic embeddings are multidimensional representations of textual information.
Stores semantic embeddings into the HANA Vector Store
The Pipeline API is compatible with the following data repositories:
Microsoft SharePoint
AWS S3
SFTP
See https://api.sap.com/api/DOCUMENT_GROUNDING_API/resource/Pipelines
- __init__(proxy_client=None)
Initializes the PipelineAPIClient
- Parameters:
proxy_client (Optional[GenAIHubProxyClient], optional) -- proxy client to use for requests, defaults to None
- create_pipeline(pipeline_request)
Create a document vectorization pipeline
- Parameters:
pipeline_request (CreatePipelineRequest) -- The object containing the pipeline configuration.
- Returns:
ID of the created pipeline
- Return type:
- delete_pipeline_by_id(pipeline_id)
Delete a pipeline by pipeline id
- Parameters:
pipeline_id (str) -- ID of the pipeline to delete
- Returns:
Response of the delete operation
- Return type:
requests.Response
- get_execution_document_by_id(pipeline_id, execution_id, document_id)
Get Document by ID for a Pipeline Execution
- Returns:
Document for the Pipeline Execution
- Return type:
- Parameters:
pipeline_id (str)
execution_id (str)
document_id (str)
- get_execution_documents(pipeline_id, execution_id, top=None, skip=None, count=None)
Get Documents for a Pipeline Execution
- Parameters:
pipeline_id (str) -- Pipeline ID
execution_id (str) -- Execution ID
top (Optional[int], optional) -- the maximum number of documents to return, defaults to None
skip (Optional[int], optional) -- number of documents to skip, defaults to None
count (Optional[bool], optional) -- flag to include count of total documents, defaults to None
- Returns:
Documents for the Pipeline Execution
- Return type:
- get_pipeline_by_id(pipeline_id)
Get details of a pipeline by pipeline id.
- Parameters:
pipeline_id (str) -- Pipeline ID
- Returns:
Details of the pipeline
- Return type:
- get_pipeline_document_by_id(pipeline_id, document_id)
Get Document by ID for a Pipeline
- Parameters:
pipeline_id (str) -- Pipeline ID
document_id (str) -- Document ID
- Returns:
Document for the Pipeline
- Return type:
- get_pipeline_documents(pipeline_id, top=None, skip=None, count=None)
Get Documents for a Pipeline
- Parameters:
pipeline_id (str) -- Pipeline ID
top (Optional[int], optional) -- the maximum number of documents to return, defaults to None
skip (Optional[int], optional) -- number of documents to skip, defaults to None
count (Optional[bool], optional) -- flag to include count of total documents, defaults to None
- Returns:
Documents for the Pipeline
- Return type:
- get_pipeline_execution_by_id(pipeline_id, execution_id)
Get Pipeline Execution by ID
- Parameters:
pipeline_id (str) -- Pipeline ID
execution_id (str) -- Execution ID
- Returns:
Pipeline Execution
- Return type:
- get_pipeline_executions(pipeline_id, last_execution=None, top=None, skip=None, count=None)
Get Pipeline Executions
- Parameters:
pipeline_id (str) -- Pipeline ID
last_execution (Optional[bool], optional) -- flag to get only the last execution, defaults to None
top (Optional[int], optional) -- number of executions to retrieve, defaults to None
skip (Optional[int], optional) -- number of executions to skip, defaults to None
count (Optional[bool], optional) -- flag to include count of total executions, defaults to None
- Returns:
Pipeline Executions
- Return type:
- get_pipeline_status(pipeline_id)
Get pipeline status by pipeline id
- Parameters:
pipeline_id (str) -- Pipeline ID
- Returns:
Status of the pipeline
- Return type:
- get_pipelines(top=None, skip=None, count=None)
Get all pipelines.
- Returns:
Get all pipelines
- Return type:
- Parameters:
top (int | None)
skip (int | None)
count (bool | None)
- search_pipelines(body)
Pipeline Search by Metadata
- Parameters:
body (SearchPipelineRequest) -- The search request object containing metadata filters.
- Returns:
Search results containing matching pipelines.
- Return type:
- trigger_pipeline(request)
Trigger Pipeline Manually
- Parameters:
request (ManualPipelineTrigger) -- The manual trigger request object.
- Returns:
Response of the trigger operation
- Return type:
requests.Response
- class RetrievalAPIClient
Bases:
objectThe Retrieval API enables querying and retrieving relevant content from configured data repositories, such as vector or external document sources (e.g., help.sap.com).
Retrieval combines semantic search with repository metadata filtering and supports custom retrieval configurations for chunk/document granularity.
Reference: https://api.sap.com/api/DOCUMENT_GROUNDING_API/resource/Retrieval
- __init__(proxy_client=None)
Initialize the RetrievalAPIClient.
- Parameters:
proxy_client (Optional[GenAIHubProxyClient], optional) -- Optional proxy client for making API requests.
- get_data_repositories(top=None, skip=None, count=None)
List all data repositories available to the tenant.
- Parameters:
top (Optional[int], optional) -- the number of items to return, defaults to None
skip (Optional[int], optional) -- the number of items to skip, defaults to None
count (Optional[bool], optional) -- whether to include a count of total items, defaults to None
- Returns:
DataRepositories model containing the list of data repositories
- Return type:
- get_data_repository_by_id(repository_id)
Get a single data repository by its unique ID.
- Parameters:
repository_id (str) -- the unique identifier of the data repository
- Returns:
DataRepository model representing the data repository
- Return type:
- search(search_input)
Perform a retrieval search for relevant content.
- Parameters:
search_input (RetrievalSearchInput) -- RetrievalSearchInput model defining the query and filters.
- Returns:
RetrievalSearchResults model containing repositories, documents, and chunks.
- Return type:
- class VectorAPIClient
Bases:
objectThe Vector API provides management and search capabilities for vector-based document collections.
It enables creating, retrieving, updating, and deleting collections, as well as managing documents and performing semantic vector searches within those collections.
Reference: https://api.sap.com/api/DOCUMENT_GROUNDING_API/resource/Vector
- __init__(proxy_client=None)
Initializes the VectorAPIClient
- Parameters:
proxy_client (Optional[GenAIHubProxyClient], optional) -- Optional proxy client to use for requests
- create_collection(collection_request)
Create a new collection.
- Parameters:
collection_request (CollectionCreateRequest) -- The object containing the collection configuration.
- Returns:
requests.Response empty object with 202 status code
- Return type:
requests.Response
- create_documents(collection_id, request)
Create documents in a collection.
- Parameters:
collection_id (str) -- The ID of the collection to add documents to.
request (DocumentsCreateRequest) -- The object containing the documents to create.
- Returns:
A DocumentsListResponse object containing the created documents
- Return type:
- delete_collection(collection_id)
Delete collection by ID.
- Parameters:
collection_id (str) -- The ID of the collection to delete.
- Returns:
requests.Response empty object with 204 status code
- Return type:
requests.Response
- delete_document(collection_id, document_id)
Delete a document from a collection.
- Parameters:
collection_id (str) -- The ID of the collection to delete the document from.
document_id (str) -- The ID of the document to delete.
- Returns:
requests.Response empty object with 204 status code
- Return type:
requests.Response
- get_collection_by_id(collection_id)
Get collection details by ID.
- Parameters:
collection_id (str) -- The ID of the collection to retrieve.
- Returns:
A Collection object containing the collection details
- Return type:
- get_collection_creation_status(collection_id)
Get creation status for a collection.
- Parameters:
collection_id (str) -- The ID of the collection to retrieve the creation status for.
- Returns:
A CollectionCreationStatusResponse object containing the creation status
- Return type:
CollectionCreationStatusResponse
- get_collection_deletion_status(collection_id)
Get deletion status for a collection.
- Parameters:
collection_id (str) -- The ID of the collection to retrieve the deletion status for.
- Returns:
A CollectionDeletionStatusResponse object containing the deletion status
- Return type:
CollectionDeletionStatusResponse
- get_collections(top=None, skip=None, count=None)
Get all collections.
- Parameters:
top (Optional[int], optional) -- the number of collections to retrieve, defaults to None
skip (Optional[int], optional) -- the number of collections to skip, defaults to None
count (Optional[bool], optional) -- whether to include the total count of collections, defaults to None
- Returns:
A CollectionsListResponse object containing the list of collections
- Return type:
- get_document_by_id(collection_id, document_id)
Get a document by ID from a collection.
- Parameters:
collection_id (str) -- The ID of the collection to retrieve the document from.
document_id (str) -- The ID of the document to retrieve.
- Returns:
A Document object containing the document details
- Return type:
- get_documents(collection_id, top=None, skip=None, count=None)
Get documents from a collection.
- Parameters:
collection_id (str) -- The ID of the collection to retrieve documents from.
top (Optional[int], optional) -- the number of documents to retrieve, defaults to None
skip (Optional[int], optional) -- the number of documents to skip, defaults to None
count (Optional[bool], optional) -- whether to include the total count of documents, defaults to None
- Returns:
A DocumentsResponse object containing the list of documents
- Return type:
- search(request)
Perform semantic search in vector collections.
- Parameters:
request (TextSearchRequest) -- The object containing the search parameters.
- Returns:
A VectorSearchResults object containing the search results
- Return type:
- update_documents(collection_id, request)
Update documents in a collection.
- Parameters:
collection_id (str) -- The ID of the collection to update documents in.
request (DocumentsUpdateRequest) -- The object containing the documents to update.
- Returns:
A DocumentsListResponse object containing the updated documents
- Return type: