gen_ai_hub.proxy.langchain package

class AICoreBedrockBaseModel

Bases: BaseModel

AICoreBedrockBaseModel provides all adjustments to boto3 based LangChain classes to enable communication with SAP AI Core.

classmethod get_corresponding_model_id(model_name)

Gets the corresponding model ID for a given model name.

Parameters:: model_name (str) -- the model name
Raises:: ValueError -- if the model name is not supported
Returns:: the corresponding model ID
Return type:: str

classmethod validate_environment(values)

Validates and sets up the environment for the model.

Parameters:: values (Dict) -- the input values
Returns:: the validated values
Return type:: Dict

__init__(*args, model_id='', deployment_id='', model_name='', config_id='', config_name='', proxy_client=None, **kwargs)

Initializes the AICoreBedrockBaseModel with AICore specific parameters.: Extends the constructor of the base class with aicore specific parameters

Parameters:

model_id (str, optional) -- the model identifier, defaults to ""
deployment_id (str, optional) -- the deployment identifier, defaults to ""
model_name (str, optional) -- the model name, defaults to ""
config_id (str, optional) -- the configuration identifier, defaults to ""
config_name (str, optional) -- the configuration name, defaults to ""
proxy_client (Optional[BaseProxyClient], optional) -- the proxy client to use, defaults to None

model_config: ClassVar[ConfigDict] = {'extra': 'allow'}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

AsyncOpenAIClient: alias of AsyncOpenAI

class BedrockEmbeddings

Bases: AICoreBedrockBaseModel, BedrockEmbeddings

Drop-in replacement for LangChain BedrockEmbeddings.

__init__(*args, **kwargs)

Initializes the AICoreBedrockBaseModel with AICore specific parameters.: Extends the constructor of the base class with aicore specific parameters

Parameters:

model_id (str, optional) -- the model identifier, defaults to ""
deployment_id (str, optional) -- the deployment identifier, defaults to ""
model_name (str, optional) -- the model name, defaults to ""
config_id (str, optional) -- the configuration identifier, defaults to ""
config_name (str, optional) -- the configuration name, defaults to ""
proxy_client (Optional[BaseProxyClient], optional) -- the proxy client to use, defaults to None

model_config: ClassVar[ConfigDict] = {'extra': 'allow', 'protected_namespaces': ()}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ChatBedrock

Bases: AICoreBedrockBaseModel, ChatBedrock

Drop-in replacement for LangChain ChatBedrock.

__init__(*args, **kwargs)

Initializes the AICoreBedrockBaseModel with AICore specific parameters.: Extends the constructor of the base class with aicore specific parameters

Parameters:

model_id (str, optional) -- the model identifier, defaults to ""
deployment_id (str, optional) -- the deployment identifier, defaults to ""
model_name (str, optional) -- the model name, defaults to ""
config_id (str, optional) -- the configuration identifier, defaults to ""
config_name (str, optional) -- the configuration name, defaults to ""
proxy_client (Optional[BaseProxyClient], optional) -- the proxy client to use, defaults to None

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'allow', 'populate_by_name': True, 'protected_namespaces': (), 'validate_by_alias': True, 'validate_by_name': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ChatBedrockConverse

Bases: AICoreBedrockBaseModel, ChatBedrockConverse

Drop-in replacement for LangChain ChatBedrockConverse.

__init__(*args, **kwargs)

Initializes the AICoreBedrockBaseModel with AICore specific parameters.: Extends the constructor of the base class with aicore specific parameters

Parameters:

model_id (str, optional) -- the model identifier, defaults to ""
deployment_id (str, optional) -- the deployment identifier, defaults to ""
model_name (str, optional) -- the model name, defaults to ""
config_id (str, optional) -- the configuration identifier, defaults to ""
config_name (str, optional) -- the configuration name, defaults to ""
proxy_client (Optional[BaseProxyClient], optional) -- the proxy client to use, defaults to None

extract_model_kwargs_parameters(kwargs)

Extracts specific parameters from model_kwargs and moves them to the top level of kwargs.

Parameters:: kwargs (Dict) -- the input keyword arguments

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'allow', 'populate_by_name': True, 'protected_namespaces': (), 'validate_by_alias': True, 'validate_by_name': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ChatGoogleGenerativeAI

Bases: _BaseGoogleGenerativeAI, ChatGoogleGenerativeAI

Drop-in replacement for langchain_google_genai.ChatGoogleGenerativeAI.

cached_content: str | None

The name of the cached content used as context to serve the prediction.

!!! note

Only used in explicit caching, where users can have control over caching (e.g. what content to cache) and enjoy guaranteed cost savings. Format: cachedContents/{cachedContent}.

client: Client | None

convert_system_message_to_human: bool

Whether to merge any leading SystemMessage into the following HumanMessage.

Gemini does not support system messages; any unsupported messages will raise an error.

default_metadata: Sequence[tuple[str, str]] | None

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'ignore', 'populate_by_name': True, 'protected_namespaces': (), 'validate_by_alias': True, 'validate_by_name': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_kwargs: dict[str, Any]: Holds any unexpected initialization parameters.

response_mime_type: str | None

Output response MIME type of the generated candidate text.

Supported MIME types:

'text/plain': (default) Text output.
'application/json': JSON response in the candidates.
'text/x.enum': Enum in plain text. (legacy; use JSON schema output instead)

!!! note

The model also needs to be prompted to output the appropriate response type, otherwise the behavior is undefined.

(In other words, simply setting this param doesn't force the model to comply; it only tells the model the kind of output expected. You still need to prompt it correctly.)

response_schema: dict[str, Any] | None

Enforce a schema to the output.

The format of the dictionary should follow JSON Schema specification.

!!! note "Schema Transformation"

The Google GenAI SDK automatically transforms schemas for Gemini compatibility:

Inlines $defs definitions (enables Union types with anyOf)

Resolves $ref pointers for nested/recursive schemas

Preserves property ordering

Supports constraints like minimum/maximum, minItems/maxItems

!!! tip "Using Union Types"

Union types in Pydantic models (e.g., field: Union[TypeA, TypeB]) are automatically converted to anyOf schemas and work correctly with the json_schema method.

Refer to the Gemini API [docs](https://ai.google.dev/gemini-api/docs/structured-output) for more details on supported JSON Schema features.

stop: list[str] | None: Stop sequences for the model.

streaming: bool | None: Whether to stream responses from the model.

thinking_level: Literal['minimal', 'low', 'medium', 'high'] | None

Indicates the thinking level.

Supported values:

'low': Minimizes latency and cost.
'medium': Balances latency/cost with reasoning depth.
'high': Maximizes reasoning depth.

!!! note "Replaces thinking_budget"

thinking_budget is deprecated for Gemini 3+ models. If both parameters are provided, thinking_level takes precedence.

If left unspecified, the model's default thinking level is used. For Gemini 3+, this defaults to 'high'.

class ChatOpenAI

Bases: ProxyOpenAI, ChatOpenAI

ChatOpenAI model using a proxy.

Parameters:

ProxyOpenAI (class) -- Base class for OpenAI models using a proxy
ChatOpenAI (class) -- ChatOpenAI class from langchain_openai

classmethod validate_environment(values)

Validates the environment.

Parameters:: values (Dict) -- The input values
Raises:: ValueError -- n must be at least 1.
Returns:: The validated values
Return type:: Dict

static __new__(cls, **data)

Initialize the OpenAI object. :param data: Additional data to initialize the object :type data: Any :return: The initialized OpenAI object :rtype: OpenAIBase

Parameters:: data (Any)

__init__(*args, **kwargs): Initialize the ChatOpenAI object.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'allow', 'populate_by_name': True, 'protected_namespaces': (), 'validate_by_alias': True, 'validate_by_name': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_name: str | None: Model name to use.

openai_api_version: str | None

class GoogleGenerativeAIEmbeddings

Bases: _BaseGoogleGenerativeAI, GoogleGenerativeAIEmbeddings

Drop-in replacement for langchain_google_genai.GoogleGenerativeAIEmbeddings.

model_config: ClassVar[ConfigDict] = {'extra': 'allow', 'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class OpenAI

Bases: ProxyOpenAI, OpenAI

OpenAI model using a proxy.

classmethod validate_environment(values)

Validates the environment.

Parameters:: values (Dict) -- The input values
Returns:: The validated values
Return type:: Dict

static __new__(cls, **data)

Initialize the OpenAI object.

Parameters:: data (Any)

__init__(*args, **kwargs): Initialize the OpenAI object.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'allow', 'populate_by_name': True, 'protected_namespaces': (), 'validate_by_alias': True, 'validate_by_name': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_name: str | None: Model name to use.

openai_api_version: str | None

OpenAIClient: alias of OpenAI

class OpenAIEmbeddings

Bases: ProxyOpenAI, OpenAIEmbeddings

OpenAI Embeddings model using a proxy.

classmethod validate_environment(values)

Validates the environment.

Parameters:: values (Dict) -- The input values
Returns:: The validated values
Return type:: Dict

__init__(*args, **kwargs): Initialize the OpenAIEmbeddings object.

chunk_size: int: Maximum number of texts to embed in each batch

input_type: str | None

model: str | None

model_config: ClassVar[ConfigDict] = {'extra': 'allow', 'populate_by_name': True, 'protected_namespaces': (), 'validate_by_alias': True, 'validate_by_name': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

openai_api_version: str | None

Version of the OpenAI API to use.

Automatically inferred from env var OPENAI_API_VERSION if not provided.

tiktoken_model_name: str | None

The model name to pass to tiktoken when using this class.

Tiktoken is used to count the number of tokens in documents to constrain them to be under a certain limit.

By default, when set to None, this will be the same as the embedding model name. However, there are some cases where you may want to use this Embedding class with a model name not supported by tiktoken. This can include when using Azure embeddings or when using one of the many model providers that expose an OpenAI-like API but with different models. In those cases, in order to avoid erroring when tiktoken is called, you can specify a model name to use here.

init_embedding_model(*args, proxy_client=None, init_func=None, model_id='', **kwargs)

Initializes an embedding model using the specified parameters.

Parameters:

proxy_client (BaseProxyClient) -- The proxy client to use for the model (optional)
init_func (Callable) -- Function to call for initializing the model, optional
model_id (str) -- id of the Amazon Bedrock model, needed in case a custom Amazon Bedrock model is being initiated (optional)

Returns:

The initialized embedding model

Return type:

Embeddings

init_llm(*args, proxy_client=None, temperature=0.0, max_tokens=256, top_k=None, top_p=1.0, init_func=None, model_id='', **kwargs)

Initializes a language model using the specified parameters.

Parameters:

proxy_client (ProxyClient) -- The proxy client to use for the model (optional)
temperature (float) -- The temperature parameter for model generation (default: 0.0)
max_tokens (int) -- The maximum number of tokens to generate (default: 256)
top_k (int) -- The top-k parameter for model generation (optional)
top_p (float) -- The top-p parameter for model generation (default: 1.0)
init_func (Callable) -- Function to call for initializing the model, optional
model_id (str) -- id of the Amazon Bedrock model, needed in case a custom Amazon Bedrock model is being initiated (optional)

Returns:

The initialized language model

Return type:

BaseLanguageModel

Submodules

gen_ai_hub.proxy.langchain.amazon module

class AICoreBedrockBaseModel

Bases: BaseModel

AICoreBedrockBaseModel provides all adjustments to boto3 based LangChain classes to enable communication with SAP AI Core.

classmethod get_corresponding_model_id(model_name)

Gets the corresponding model ID for a given model name.

Parameters:: model_name (str) -- the model name
Raises:: ValueError -- if the model name is not supported
Returns:: the corresponding model ID
Return type:: str

classmethod validate_environment(values)

Validates and sets up the environment for the model.

Parameters:: values (Dict) -- the input values
Returns:: the validated values
Return type:: Dict

__init__(*args, model_id='', deployment_id='', model_name='', config_id='', config_name='', proxy_client=None, **kwargs)

Initializes the AICoreBedrockBaseModel with AICore specific parameters.: Extends the constructor of the base class with aicore specific parameters

Parameters:

model_id (str, optional) -- the model identifier, defaults to ""
deployment_id (str, optional) -- the deployment identifier, defaults to ""
model_name (str, optional) -- the model name, defaults to ""
config_id (str, optional) -- the configuration identifier, defaults to ""
config_name (str, optional) -- the configuration name, defaults to ""
proxy_client (Optional[BaseProxyClient], optional) -- the proxy client to use, defaults to None

model_config: ClassVar[ConfigDict] = {'extra': 'allow'}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class BedrockEmbeddings

Bases: AICoreBedrockBaseModel, BedrockEmbeddings

Drop-in replacement for LangChain BedrockEmbeddings.

__init__(*args, **kwargs)

Initializes the AICoreBedrockBaseModel with AICore specific parameters.: Extends the constructor of the base class with aicore specific parameters

Parameters:

model_id (str, optional) -- the model identifier, defaults to ""
deployment_id (str, optional) -- the deployment identifier, defaults to ""
model_name (str, optional) -- the model name, defaults to ""
config_id (str, optional) -- the configuration identifier, defaults to ""
config_name (str, optional) -- the configuration name, defaults to ""
proxy_client (Optional[BaseProxyClient], optional) -- the proxy client to use, defaults to None

client: Any: Bedrock client.

credentials_profile_name: str | None: The name of the profile in the ~/.aws/credentials or ~/.aws/config files, which has either access keys or role information specified. If not specified, the default credential profile or, if on an EC2 instance, credentials from IMDS will be used. See: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html

endpoint_url: str | None: Needed if you don't want to default to us-east-1 endpoint

model_config: ClassVar[ConfigDict] = {'extra': 'allow', 'protected_namespaces': ()}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_id: str: Id of the model to call, e.g., amazon.titan-embed-text-v1, this is equivalent to the modelId property in the list-foundation-models api

model_kwargs: Dict | None: Keyword arguments to pass to the model.

normalize: bool: Whether the embeddings should be normalized to unit vectors

region_name: str | None: The aws region e.g., us-west-2. Fallsback to AWS_DEFAULT_REGION env variable or region specified in ~/.aws/config in case it is not provided here.

class ChatBedrock

Bases: AICoreBedrockBaseModel, ChatBedrock

Drop-in replacement for LangChain ChatBedrock.

__init__(*args, **kwargs)

Initializes the AICoreBedrockBaseModel with AICore specific parameters.: Extends the constructor of the base class with aicore specific parameters

Parameters:

model_id (str, optional) -- the model identifier, defaults to ""
deployment_id (str, optional) -- the deployment identifier, defaults to ""
model_name (str, optional) -- the model name, defaults to ""
config_id (str, optional) -- the configuration identifier, defaults to ""
config_name (str, optional) -- the configuration name, defaults to ""
proxy_client (Optional[BaseProxyClient], optional) -- the proxy client to use, defaults to None

aws_access_key_id: SecretStr | None

AWS access key id.

If provided, aws_secret_access_key must also be provided.

If not specified, the default credential profile or, if on an EC2 instance, credentials from IMDS will be used.

See: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html

If not provided, will be read from AWS_ACCESS_KEY_ID environment variable.

aws_secret_access_key: SecretStr | None

AWS secret_access_key.

If provided, aws_access_key_id must also be provided.

If not specified, the default credential profile or, if on an EC2 instance, credentials from IMDS will be used.

See: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html

If not provided, will be read from AWS_SECRET_ACCESS_KEY environment variable.

aws_session_token: SecretStr | None

AWS session token.

If provided, aws_access_key_id and aws_secret_access_key must also be provided.

Not required unless using temporary credentials.

See: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html

If not provided, will be read from AWS_SESSION_TOKEN environment variable.

base_model_id: str | None: An optional field to pass the base model id. If provided, this will be used over the value of model_id to identify the base model.

bedrock_api_key: SecretStr | None

Bedrock API key.

Enables authentication using Bedrock API keys instead of standard AWS credentials. When provided, the key is set as the AWS_BEARER_TOKEN_BEDROCK environment variable.

Warning

Because this sets a process-wide environment variable, using api_key is not compatible with multi-tenant deployments where different model instances in the same process need different API keys. Each new client creation overwrites the previous value. Use standard AWS credentials (IAM roles, profiles, etc.) for multi-tenant scenarios.

See: https://docs.aws.amazon.com/bedrock/latest/userguide/api-keys-use.html

If not provided, will be read from AWS_BEARER_TOKEN_BEDROCK environment variable (if it exists).

If both an API key and AWS credentials are present, the API key takes precedence.

bedrock_client: Any: The bedrock client for making control plane API calls

beta_use_converse_api: bool: Use the new Bedrock converse API which provides a standardized interface to all Bedrock models. Support still in beta. See ChatBedrockConverse docs for more.

cache: BaseCache | bool | None

Whether to cache the response.

If True, will use the global cache.
If False, will not use a cache
If None, will use the global cache if it's set, otherwise no cache.
If instance of BaseCache, will use the provided cache.

Caching is not currently supported for streaming methods of models.

callbacks: Callbacks: Callbacks to add to the run trace.

client: Any: The bedrock runtime client for making data plane API calls

config: Any: An optional botocore.config.Config instance to pass to the client.

credentials_profile_name: str | None

The name of the profile in the ~/.aws/credentials or ~/.aws/config files, which has either access keys or role information specified.

If not specified, the default credential profile or, if on an EC2 instance, credentials from IMDS will be used.

See: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html

custom_get_token_ids: Callable[[str], list[int]] | None: Optional encoder to use for counting tokens.

disable_streaming: bool | Literal['tool_calling']

Whether to disable streaming for this model.

If streaming is bypassed, then stream/astream/astream_events will defer to invoke/ainvoke.

If True, will always bypass streaming case.
If 'tool_calling', will bypass streaming case only when the model is called
with a tools keyword argument. In other words, LangChain will automatically switch to non-streaming behavior (invoke) only when the tools argument is provided. This offers the best of both worlds.
If False (Default), will always use streaming case if available.

The main reason for this flag is that code might be written using stream and a user may want to swap out a given model for another model whose implementation does not properly support streaming.

endpoint_url: str | None: Needed if you don't want to default to 'us-east-1' endpoint

guardrails: Mapping[str, Any] | None

An optional dictionary to configure guardrails for Bedrock.

This field guardrails consists of two keys: 'guardrailId' and 'guardrailVersion', which should be strings, but are initialized to None.

It's used to determine if specific guardrails are enabled and properly set.

Type:

Optional[Mapping[str, str]]: A mapping with 'guardrailId' and 'guardrailVersion' keys.

Example:

```python llm = BedrockLLM(model_id="<model_id>", client=<bedrock_client>,

model_kwargs={}, guardrails={

"guardrailId": "<guardrail_id>", "guardrailVersion": "<guardrail_version>"})

```

To enable tracing for guardrails, set the 'trace' key to True and pass a callback handler to the 'run_manager' parameter of the 'generate', '_call' methods.

Example:

```python llm = BedrockLLM(model_id="<model_id>", client=<bedrock_client>,

model_kwargs={}, guardrails={

"guardrailId": "<guardrail_id>", "guardrailVersion": "<guardrail_version>", "trace": True},

callbacks=[BedrockAsyncCallbackHandler()])

```

https://python.langchain.com/docs/concepts/callbacks/ for more information on callback handlers.

class BedrockAsyncCallbackHandler(AsyncCallbackHandler):

async def on_llm_error(: self, error: BaseException, **kwargs: Any,
) -> Any:: reason = kwargs.get("reason") if reason == "GUARDRAIL_INTERVENED":

...Logic to handle guardrail intervention...

max_retries: int | None: Maximum number of retry attempts. Sets retries.max_attempts on the botocore Config. If config is also provided, these values are merged on top of it.

max_tokens: int | None

Maximum number of tokens to generate.

When using Anthropic models with InvokeModel API, if not set, defaults to 1024.

metadata: dict[str, Any] | None: Metadata to add to the run trace.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'allow', 'populate_by_name': True, 'protected_namespaces': (), 'validate_by_alias': True, 'validate_by_name': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_id: str: Id of the model to call, e.g., 'amazon.titan-text-express-v1', this is equivalent to the modelId property in the list-foundation-models api. For custom and provisioned models, an ARN value is expected.

model_kwargs: Dict[str, Any] | None: Keyword arguments to pass to the model.

name: str | None

The name of the Runnable.

Used for debugging and tracing.

output_version: str | None

Version of AIMessage output format to store in message content.

AIMessage.content_blocks will lazily parse the contents of content into a standard format. This flag can be used to additionally store the standard format in message content, e.g., for serialization purposes.

Supported values:

'v0': provider-specific format in content (can lazily-parse with
content_blocks)
'v1': standardized format in content (consistent with content_blocks)

Partner packages (e.g., [langchain-openai](https://pypi.org/project/langchain-openai)) can also use this field to roll out new content formats in a backward-compatible way.

!!! version-added "Added in langchain-core 1.0.0"

profile: ModelProfile | None

Profile detailing model capabilities.

!!! warning "Beta feature"

This is a beta feature. The format of model profiles is subject to change.

If not specified, automatically loaded from the provider package on initialization if data is available.

Example profile data includes context window sizes, supported modalities, or support for tool calling, structured output, and other features.

!!! version-added "Added in langchain-core 1.1.0"

provider: str | None: The model provider, e.g., 'amazon', 'cohere', 'ai21', etc. When not supplied, provider is extracted from the first part of the model_id e.g. 'amazon' in 'amazon.titan-text-express-v1'. This value should be provided for model IDs that do not have the provider in them, e.g., custom and provisioned models that have an ARN associated with them.

provider_stop_reason_key_map: Mapping[str, str]

provider_stop_sequence_key_name_map: Mapping[str, str]

rate_limiter: BaseRateLimiter | None: An optional rate limiter to use for limiting the number of requests.

region_name: str | None: The aws region e.g., us-west-2. Falls back to AWS_REGION or AWS_DEFAULT_REGION env variable or region specified in ~/.aws/config in case it is not provided here.

service_tier: Literal['priority', 'default', 'flex', 'reserved'] | None

Service tier for model invocation.

Specifies the processing tier type used for serving the request. Supported values are 'priority', 'default', 'flex', and 'reserved'.

'priority': Prioritized processing for lower latency
'default': Standard processing tier
'flex': Flexible processing tier with lower cost
'reserved': Reserved capacity for consistent performance

If not provided, AWS uses the default tier.

stop_sequences: List[str] | None: Stop sequence inference parameter from new Bedrock converse API providing a sequence of characters that causes a model to stop generating a response. See https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_InferenceConfiguration.html for more.

streaming: bool: Whether to stream the results.

system_prompt_with_tools: str

tags: list[str] | None: Tags to add to the run trace.

temperature: float | None

timeout: int | None: Request timeout in seconds. Sets both connect_timeout and read_timeout on the botocore Config. If config is also provided, these values are merged on top of it.

verbose: bool: Whether to print out response text.

class ChatBedrockConverse

Bases: AICoreBedrockBaseModel, ChatBedrockConverse

Drop-in replacement for LangChain ChatBedrockConverse.

__init__(*args, **kwargs)

Initializes the AICoreBedrockBaseModel with AICore specific parameters.: Extends the constructor of the base class with aicore specific parameters

Parameters:

model_id (str, optional) -- the model identifier, defaults to ""
deployment_id (str, optional) -- the deployment identifier, defaults to ""
model_name (str, optional) -- the model name, defaults to ""
config_id (str, optional) -- the configuration identifier, defaults to ""
config_name (str, optional) -- the configuration name, defaults to ""
proxy_client (Optional[BaseProxyClient], optional) -- the proxy client to use, defaults to None

extract_model_kwargs_parameters(kwargs)

Extracts specific parameters from model_kwargs and moves them to the top level of kwargs.

Parameters:: kwargs (Dict) -- the input keyword arguments

additional_model_request_fields: Dict[str, Any] | None

Additional inference parameters that the model supports.

Parameters beyond the base set of inference parameters that Converse supports in the additionalModelRequestFields field. Keys must match the exact format expected by the target model (e.g., inferenceConfig, not inference_config). Refer to the model's AWS documentation for supported parameters.

additional_model_response_field_paths: List[str] | None

Additional model parameters field paths to return in the response.

Converse returns the requested fields as a JSON Pointer object in the additionalModelResponseFields field. The following is example JSON for additionalModelResponseFieldPaths.

aws_access_key_id: SecretStr | None

AWS access key id.

If provided, aws_secret_access_key must also be provided. If not specified, the default credential profile or, if on an EC2 instance, credentials from IMDS will be used. See: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html

If not provided, will be read from 'AWS_ACCESS_KEY_ID' environment variable.

aws_secret_access_key: SecretStr | None

AWS secret_access_key.

If provided, aws_access_key_id must also be provided. If not specified, the default credential profile or, if on an EC2 instance, credentials from IMDS will be used. See: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html

If not provided, will be read from 'AWS_SECRET_ACCESS_KEY' environment variable.

aws_session_token: SecretStr | None

AWS session token.

If provided, aws_access_key_id and aws_secret_access_key must also be provided. Not required unless using temporary credentials. See: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html

If not provided, will be read from 'AWS_SESSION_TOKEN' environment variable.

base_model_id: str | None: An optional field to pass the base model id. If provided, this will be used over the value of model_id to identify the base model.

bedrock_api_key: SecretStr | None

Bedrock API key.

Enables authentication using Bedrock API keys instead of standard AWS credentials. When provided, the key is set as the AWS_BEARER_TOKEN_BEDROCK environment variable.

Warning

See: https://docs.aws.amazon.com/bedrock/latest/userguide/api-keys-use.html

If not provided, will be read from AWS_BEARER_TOKEN_BEDROCK environment variable (if it exists).

If both an API key and AWS credentials are present, the API key takes precedence.

bedrock_client: Any: The bedrock client for making control plane API calls

cache: BaseCache | bool | None

Whether to cache the response.

If True, will use the global cache.
If False, will not use a cache
If None, will use the global cache if it's set, otherwise no cache.
If instance of BaseCache, will use the provided cache.

Caching is not currently supported for streaming methods of models.

callbacks: Callbacks: Callbacks to add to the run trace.

client: Any: The bedrock runtime client for making data plane API calls

config: Any: An optional botocore.config.Config instance to pass to the client.

credentials_profile_name: str | None

The name of the profile in the ~/.aws/credentials or ~/.aws/config files.

Profile should either have access keys or role information specified. If not specified, the default credential profile or, if on an EC2 instance, credentials from IMDS will be used. See: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html

custom_get_token_ids: Callable[[str], list[int]] | None: Optional encoder to use for counting tokens.

default_headers: Mapping[str, str] | None: Headers to pass to the Anthropic clients, will be used for every API call.

disable_streaming: bool | Literal['tool_calling']

Whether to disable streaming for this model.

If streaming is bypassed, then stream/astream/astream_events will defer to invoke/ainvoke.

If True, will always bypass streaming case.
If 'tool_calling', will bypass streaming case only when the model is called
with a tools keyword argument. In other words, LangChain will automatically switch to non-streaming behavior (invoke) only when the tools argument is provided. This offers the best of both worlds.
If False (Default), will always use streaming case if available.

The main reason for this flag is that code might be written using stream and a user may want to swap out a given model for another model whose implementation does not properly support streaming.

endpoint_url: str | None: Needed if you don't want to default to us-east-1 endpoint

guard_last_turn_only: bool: Boolean flag for applying the guardrail to only the last turn.

guardrail_config: Dict[str, Any] | None: Configuration information for a guardrail that you want to use in the request.

max_retries: int | None: Maximum number of retry attempts. Sets retries.max_attempts on the botocore Config. If config is also provided, these values are merged on top of it.

max_tokens: int | None: Max tokens to generate.

metadata: dict[str, Any] | None: Metadata to add to the run trace.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'allow', 'populate_by_name': True, 'protected_namespaces': (), 'validate_by_alias': True, 'validate_by_name': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_id: str

ID of the model to call.

e.g., "anthropic.claude-3-sonnet-20240229-v1:0". This is equivalent to the modelID property in the list-foundation-models api. For custom and provisioned models, an ARN value is expected. See https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html#model-ids-arns for a list of all supported built-in models.

name: str | None

The name of the Runnable.

Used for debugging and tracing.

output_config: Dict[str, Any] | None

Output configuration for structured model responses.

Configures native JSON schema output format via the Bedrock outputConfig parameter. Only supported on select models (Claude 4.5+, select open-weight). See https://docs.aws.amazon.com/bedrock/latest/userguide/structured-output.html

output_version: str | None

Version of AIMessage output format to store in message content.

Supported values:

'v0': provider-specific format in content (can lazily-parse with
content_blocks)
'v1': standardized format in content (consistent with content_blocks)

Partner packages (e.g., [langchain-openai](https://pypi.org/project/langchain-openai)) can also use this field to roll out new content formats in a backward-compatible way.

!!! version-added "Added in langchain-core 1.0.0"

performance_config: Mapping[str, Any] | None

profile: ModelProfile | None

Profile detailing model capabilities.

!!! warning "Beta feature"

This is a beta feature. The format of model profiles is subject to change.

If not specified, automatically loaded from the provider package on initialization if data is available.

Example profile data includes context window sizes, supported modalities, or support for tool calling, structured output, and other features.

!!! version-added "Added in langchain-core 1.1.0"

provider: str

The model provider, e.g., amazon, cohere, ai21, etc.

When not supplied, provider is extracted from the first part of the model_id, e.g. 'amazon' in 'amazon.titan-text-express-v1'. This value should be provided for model IDs that do not have the provider in them, like custom and provisioned models that have an ARN associated with them.

rate_limiter: BaseRateLimiter | None: An optional rate limiter to use for limiting the number of requests.

raw_blocks: List[Dict[str, Any]] | None

Raw Bedrock message blocks that can be passed in.

LangChain will relay them unchanged, enabling any combination of content block types. This is useful for custom guardrail wrapping.

region_name: str | None

The aws region, e.g., us-west-2.

Falls back to AWS_REGION or AWS_DEFAULT_REGION env variable or region specified in ~/.aws/config in case it is not provided here.

request_metadata: Dict[str, str] | None: Key-Value pairs that you can use to filter invocation logs.

service_tier: Literal['priority', 'default', 'flex', 'reserved'] | None

stop_sequences: List[str] | None: Stop generation if any of these substrings occurs.

supports_tool_choice_values: Sequence[Literal['auto', 'any', 'tool']] | None

Which types of tool_choice values the model supports.

Inferred if not specified. Inferred as ('auto', 'any', 'tool') if a 'claude-3' model is used, ('auto', 'any') if a 'mistral-large' model is used, ('auto') if a 'nova' model is used, empty otherwise.

system: List[str | Dict[str, Any]] | None

Optional list of system prompts for the LLM.

Each entry can be either:

a simple string (for straightforward text-based system prompts), or
a dictionary matching the Converse API system message schema, allowing inclusion of additional fields like guardContent, cachePoint, etc.

Example:

system = [

"a simple system prompt", {

"text": "another system prompt", "guardContent": {"text": {"text": "string"}}, "cachePoint": {"type": "default"}

]

String inputs will be internally converted to the appropriate message format, while dict entries will be passed through as-is. Any invalid formats will be rejected by the Converse API.

tags: list[str] | None: Tags to add to the run trace.

temperature: float | None: Sampling temperature. Must be 0 to 1.

timeout: int | None: Request timeout in seconds. Sets both connect_timeout and read_timeout on the botocore Config. If config is also provided, these values are merged on top of it.

top_p: float | None

The percentage of most-likely candidates that are considered for the next token.

Must be 0 to 1.

For example, if you choose a value of 0.8 for topP, the model selects from the top 80% of the probability distribution of tokens that could be next in the sequence.

verbose: bool: Whether to print out response text.

init_chat_converse_model(proxy_client, deployment, temperature=0.0, max_tokens=256, top_k=None, top_p=1.0, stop_sequences=None, model_id='', config=None)

Initializes a chat model using the newer Bedrock Converse API (ChatBedrockConverse). The Converse API offers several advantages over the older Invoke API:

Unified interface for different models and modalities.
Native support for tool use (function calling).
Standardized request/response structure.

Parameters:

proxy_client (BaseProxyClient) -- the proxy client to use
deployment (Deployment) -- the deployment information
temperature (float, optional) -- the temperature for the model, defaults to 0.0
max_tokens (int, optional) -- the maximum number of tokens to generate, defaults to 256
top_k (Optional[int], optional) -- the top-k sampling parameter, defaults to None
top_p (float, optional) -- the top-p sampling parameter, defaults to 1.0
stop_sequences (List[str], optional) -- the stop sequences for the model, defaults to None
model_id (Optional[str], optional) -- the model identifier, defaults to ''
config (Optional[Config], optional) -- the botocore configuration, defaults to None

Returns:

the initialized chat model

Return type:

ChatBedrockConverse

init_chat_model(proxy_client, deployment, temperature=0.0, max_tokens=256, top_k=None, top_p=1.0, stop_sequences=None, model_id='', config=None)

Initializes a chat model using the legacy Bedrock Invoke API (ChatBedrock).

Parameters:

proxy_client (BaseProxyClient) -- the proxy client to use
deployment (Deployment) -- the deployment information
temperature (float, optional) -- the temperature for the model, defaults to 0.0
max_tokens (int, optional) -- the maximum number of tokens to generate, defaults to 256
top_k (Optional[int], optional) -- the top-k sampling parameter, defaults to None
top_p (float, optional) -- the top-p sampling parameter, defaults to 1.0
stop_sequences (List[str], optional) -- the stop sequences for the model, defaults to None
model_id (Optional[str], optional) -- the model identifier, defaults to ''
config (Optional[Config], optional) -- the botocore configuration, defaults to None

Returns:

the initialized chat model

Return type:

ChatBedrock

init_embedding_model(proxy_client, deployment, model_id='')

Initializes an embedding model using BedrockEmbeddings.

Parameters:

proxy_client (BaseProxyClient) -- the proxy client to use
deployment (Deployment) -- the deployment information
model_id (Optional[str], optional) -- the model identifier, defaults to ''

Returns:

the initialized embedding model

Return type:

BedrockEmbeddings

transform_to_model_id(model_name)

gen_ai_hub.proxy.langchain.base module

class BaseAuth

Bases: BaseModel

Base class for authentication models.

Parameters:: BaseModel (pydantic.BaseModel) -- The base model class to inherit from.
Returns:: An instance of the BaseAuth class.
Return type:: BaseAuth

config_id: str | None

config_name: str | None

deployment_id: str | None

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

proxy_model_name: str | None

gen_ai_hub.proxy.langchain.google_genai module

Drop-in replacements for langchain_google_genai models with SAP AI Core integration.

class ChatGoogleGenerativeAI

Bases: _BaseGoogleGenerativeAI, ChatGoogleGenerativeAI

Drop-in replacement for langchain_google_genai.ChatGoogleGenerativeAI.

additional_headers: dict[str, str] | None

Additional HTTP headers to include in API requests.

Passed as headers to HttpOptions when creating the client.

!!! example

```python llm = ChatGoogleGenerativeAI(

model="gemini-2.5-flash", additional_headers={

"X-Custom-Header": "value",

},

base_url: str | dict | None

Custom base URL for the API client.

If not provided, defaults depend on the API being used:

Gemini Developer API (
[api_key][langchain_google_genai.ChatGoogleGenerativeAI.google_api_key]/ [google_api_key][langchain_google_genai.ChatGoogleGenerativeAI.google_api_key] ): https://generativelanguage.googleapis.com/
Vertex AI (
[credentials][langchain_google_genai.ChatGoogleGenerativeAI.credentials]): https://{location}-aiplatform.googleapis.com/

!!! note "Backwards compatibility"

Typed to accept dict to support backwards compatibility for the (now removed) client_options param.

If a dict is passed in, it will only extract the 'api_endpoint' key.

cache: BaseCache | bool | None

Whether to cache the response.

If True, will use the global cache.
If False, will not use a cache
If None, will use the global cache if it's set, otherwise no cache.
If instance of BaseCache, will use the provided cache.

Caching is not currently supported for streaming methods of models.

cached_content: str | None

The name of the cached content used as context to serve the prediction.

!!! note

Only used in explicit caching, where users can have control over caching (e.g. what content to cache) and enjoy guaranteed cost savings. Format: cachedContents/{cachedContent}.

callbacks: Callbacks: Callbacks to add to the run trace.

client: Client | None

client_args: dict[str, Any] | None

Additional arguments to pass to the underlying HTTP client.

Applied to both sync and async clients.

!!! example "SOCKS5 proxy"

```python llm = ChatGoogleGenerativeAI(

model="gemini-2.5-flash", client_args={"proxy": "socks5://user:pass@host:port"},

convert_system_message_to_human: bool

Whether to merge any leading SystemMessage into the following HumanMessage.

Gemini does not support system messages; any unsupported messages will raise an error.

credentials: Any

Custom credentials for Vertex AI authentication.

When provided, forces Vertex AI backend (regardless of API key presence in google_api_key/api_key).

Accepts a [google.auth.credentials.Credentials](https://googleapis.dev/python/google-auth/latest/reference/google.auth.credentials.html#google.auth.credentials.Credentials) object.

If omitted and no API key is found, the SDK uses [Application Default Credentials (ADC)](https://cloud.google.com/docs/authentication/application-default-credentials).

!!! example "Service account credentials"

```python from google.oauth2 import service_account

credentials = service_account.Credentials.from_service_account_file(
"path/to/service-account.json", scopes=["https://www.googleapis.com/auth/cloud-platform"],

)

llm = ChatGoogleGenerativeAI(
model="gemini-2.5-flash", credentials=credentials, project="my-project-id",

custom_get_token_ids: Callable[[str], list[int]] | None: Optional encoder to use for counting tokens.

default_metadata: Sequence[tuple[str, str]] | None

disable_streaming: bool | Literal['tool_calling']

Whether to disable streaming for this model.

If streaming is bypassed, then stream/astream/astream_events will defer to invoke/ainvoke.

If True, will always bypass streaming case.
If 'tool_calling', will bypass streaming case only when the model is called
with a tools keyword argument. In other words, LangChain will automatically switch to non-streaming behavior (invoke) only when the tools argument is provided. This offers the best of both worlds.
If False (Default), will always use streaming case if available.

The main reason for this flag is that code might be written using stream and a user may want to swap out a given model for another model whose implementation does not properly support streaming.

google_api_key: SecretStr | None

API key for authentication.

If not specified, will check the env vars GOOGLE_API_KEY and GEMINI_API_KEY with precedence given to GOOGLE_API_KEY.

Gemini Developer API: API key is required (default when no project is set)
Vertex AI: API key is optional (set vertexai=True or provide project)
- If provided, uses API key for authentication
- If not provided, uses [Application Default Credentials (ADC)](https://docs.cloud.google.com/docs/authentication/application-default-credentials)
  or credentials parameter

!!! tip "Vertex AI with API key"

You can now use Vertex AI with API key authentication instead of service account credentials. Set GOOGLE_GENAI_USE_VERTEXAI=true or vertexai=True along with your API key and project.

image_config: dict[str, Any] | None

Configuration for image generation.

Provides control over generated image dimensions and quality for image generation models.

See [genai.types.ImageConfig](https://googleapis.github.io/python-genai/genai.html#genai.types.ImageConfig) for a list of supported fields and their values.

!!! note "Model compatibility"

This parameter only applies to image generation models. Supported parameters vary by model and backend (Gemini Developer API and Vertex AI each support different subsets of parameters and models).

See [the docs](https://docs.langchain.com/oss/python/integrations/chat/google_generative_ai#image-generation) for more details and examples.

include_thoughts: bool | None

Indicates whether to include thoughts in the response.

!!! note

This parameter is only applicable for models that support thinking.

This does not disable thinking; to disable thinking, set thinking_budget to 0. for supported models. See the thinking_budget parameter for more details.

labels: dict[str, str] | None

User-defined key-value metadata for organizing and filtering billing reports.

Attach labels to categorize API usage by team, environment, or feature.

Can be overridden per-request via invoke kwargs.

See: https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/add-labels-to-api-calls

location: str | None

Google Cloud region (Vertex AI only).

If not provided, falls back to the GOOGLE_CLOUD_LOCATION env var, then 'global'.

max_output_tokens: int | None

Maximum number of tokens to include in a candidate.

Must be greater than zero.

If unset, will use the model's default value, which varies by model.

See [docs](https://ai.google.dev/gemini-api/docs/models) for model-specific limits.

To constrain the number of thinking tokens to use when generating a response, see the thinking_budget parameter.

max_retries: int

The maximum number of retries to make when generating.

!!! warning "Disabling retries"

To disable retries, set max_retries=1 (not 0) due to a quirk in the underlying Google SDK. max_retries=0 is interpreted as "use the (Google) default" (5 retries).

Setting max_retries=1 means only the initial request is made with no retries.

!!! warning "Handling rate limits (429 errors)"

When you exceed quota limits, the API returns a 429 error with a suggested retry_delay. The SDK's built-in retry logic ignores this value and uses fixed exponential backoff instead. This is a known issue in Google's SDK and an issue has been [raised upstream](https://github.com/googleapis/python-genai/issues/1875). We plan to implement proper handling once it's supported.

If you need to respect the server's suggested retry delay, disable SDK retries with max_retries=1 and implement custom retry logic:

```python import re import time

from langchain_google_genai import ChatGoogleGenerativeAI from langchain_google_genai.chat_models import ChatGoogleGenerativeAIError

llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash", max_retries=1)

try:
response = llm.invoke("Hello")

except ChatGoogleGenerativeAIError as e:

if "429" in str(e):
# Parse retry_delay from error: "[retry_delay { seconds: N }]" match = re.search(r"retry_delays*{s*seconds:s*(d+)", str(e)) delay = int(match.group(1)) if match else 60 time.sleep(delay) # Retry...

```

media_resolution: MediaResolution | None

Media resolution for the input media.

May be defined at the individual part level, allowing for mixed-resolution requests (e.g., images and videos of different resolutions in the same request).

May be 'low', 'medium', or 'high'.

Can be set either per-part or globally for all media inputs in the request. To set globally, set in the generation_config.

!!! warning "Model compatibility"

Setting per-part media resolution requests to Gemini 2.5 models is not supported.

metadata: dict[str, Any] | None: Metadata to add to the run trace.

model: str: Model name to use.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'ignore', 'populate_by_name': True, 'protected_namespaces': (), 'validate_by_alias': True, 'validate_by_name': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_kwargs: dict[str, Any]: Holds any unexpected initialization parameters.

n: int

Number of chat completions to generate for each prompt.

Note that the API may not return the full n completions if duplicates are generated.

name: str | None

The name of the Runnable.

Used for debugging and tracing.

output_version: str | None

Version of AIMessage output format to store in message content.

Supported values:

'v0': provider-specific format in content (can lazily-parse with
content_blocks)
'v1': standardized format in content (consistent with content_blocks)

Partner packages (e.g., [langchain-openai](https://pypi.org/project/langchain-openai)) can also use this field to roll out new content formats in a backward-compatible way.

!!! version-added "Added in langchain-core 1.0.0"

profile: ModelProfile | None

Profile detailing model capabilities.

!!! warning "Beta feature"

This is a beta feature. The format of model profiles is subject to change.

If not specified, automatically loaded from the provider package on initialization if data is available.

Example profile data includes context window sizes, supported modalities, or support for tool calling, structured output, and other features.

!!! version-added "Added in langchain-core 1.1.0"

project: str | None

Google Cloud project ID (Vertex AI only).

Required when using Vertex AI.

Falls back to GOOGLE_CLOUD_PROJECT env var if not provided.

rate_limiter: BaseRateLimiter | None: An optional rate limiter to use for limiting the number of requests.

response_mime_type: str | None

Output response MIME type of the generated candidate text.

Supported MIME types:

'text/plain': (default) Text output.
'application/json': JSON response in the candidates.
'text/x.enum': Enum in plain text. (legacy; use JSON schema output instead)

!!! note

The model also needs to be prompted to output the appropriate response type, otherwise the behavior is undefined.

(In other words, simply setting this param doesn't force the model to comply; it only tells the model the kind of output expected. You still need to prompt it correctly.)

response_modalities: list[Modality] | None: A list of modalities of the response

response_schema: dict[str, Any] | None

Enforce a schema to the output.

The format of the dictionary should follow JSON Schema specification.

!!! note "Schema Transformation"

The Google GenAI SDK automatically transforms schemas for Gemini compatibility:

Inlines $defs definitions (enables Union types with anyOf)

Resolves $ref pointers for nested/recursive schemas

Preserves property ordering

Supports constraints like minimum/maximum, minItems/maxItems

!!! tip "Using Union Types"

Union types in Pydantic models (e.g., field: Union[TypeA, TypeB]) are automatically converted to anyOf schemas and work correctly with the json_schema method.

Refer to the Gemini API [docs](https://ai.google.dev/gemini-api/docs/structured-output) for more details on supported JSON Schema features.

safety_settings: SafetySettingDict | None

Default safety settings to use for all generations.

!!! example

```python from google.genai.types import HarmBlockThreshold, HarmCategory

safety_settings = {
HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE, HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_ONLY_HIGH, HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_LOW_AND_ABOVE, HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: HarmBlockThreshold.BLOCK_NONE,

seed: int | None

Seed used in decoding for reproducible generations.

By default, a random number is used.

!!! note

Using the same seed does not guarantee identical outputs, but makes them more deterministic. Reproducibility is "best effort" based on the model and infrastructure.

stop: list[str] | None: Stop sequences for the model.

streaming: bool | None: Whether to stream responses from the model.

tags: list[str] | None: Tags to add to the run trace.

temperature: float

Run inference with this temperature.

Must be within [0.0, 2.0].

!!! note "Automatic override for Gemini 3.0+ models"

If temperature is not explicitly set and the model is Gemini 3.0 or later, it will be automatically set to 1.0 instead of the default 0.7 per the Google GenAI API best practices, as it can cause infinite loops, degraded reasoning performance, and failure on complex tasks.

thinking_budget: int | None

Indicates the thinking budget in tokens.

Used to disable thinking for supported models (when set to 0) or to constrain the number of tokens used for thinking.

Dynamic thinking (allowing the model to decide how many tokens to use) is enabled when set to -1.

More information, including per-model limits, can be found in the [Gemini API docs](https://ai.google.dev/gemini-api/docs/thinking#set-budget).

thinking_level: Literal['minimal', 'low', 'medium', 'high'] | None

Indicates the thinking level.

Supported values:

'low': Minimizes latency and cost.
'medium': Balances latency/cost with reasoning depth.
'high': Maximizes reasoning depth.

!!! note "Replaces thinking_budget"

thinking_budget is deprecated for Gemini 3+ models. If both parameters are provided, thinking_level takes precedence.

If left unspecified, the model's default thinking level is used. For Gemini 3+, this defaults to 'high'.

timeout: float | None: The maximum number of seconds to wait for a response.

top_k: int | None

Decode using top-k sampling: consider the set of top_k most probable tokens.

Must be positive.

top_p: float | None

Decode using nucleus sampling.

Consider the smallest set of tokens whose probability sum is at least top_p.

Must be within [0.0, 1.0].

verbose: bool: Whether to print out response text.

vertexai: bool | None

Whether to use Vertex AI backend.

If None (default), backend is automatically determined as follows:

If the GOOGLE_GENAI_USE_VERTEXAI env var is set, uses Vertex AI
If the [credentials][langchain_google_genai.ChatGoogleGenerativeAI.credentials]
parameter is provided, uses Vertex AI
If the [project][langchain_google_genai.ChatGoogleGenerativeAI.project]
parameter is provided, uses Vertex AI
Otherwise, uses Gemini Developer API

Set explicitly to True or False to override auto-detection.

!!! tip "Vertex AI with API key"

You can use Vertex AI with API key authentication by setting:

`bash export GEMINI_API_KEY='your-api-key' export GOOGLE_GENAI_USE_VERTEXAI=true export GOOGLE_CLOUD_PROJECT='your-project-id' `

Or programmatically:

```python llm = ChatGoogleGenerativeAI(

model="gemini-3.1-pro-preview", api_key="your-api-key", project="your-project-id", vertexai=True,

This allows for simpler authentication compared to service account JSON files.

class GoogleGenerativeAIEmbeddings

Bases: _BaseGoogleGenerativeAI, GoogleGenerativeAIEmbeddings

Drop-in replacement for langchain_google_genai.GoogleGenerativeAIEmbeddings.

additional_headers: dict[str, str] | None: Additional HTTP headers to include in API requests.

base_url: str | None: The base URL to use for the API client.

client: Any: The Google GenAI client instance.

client_args: dict[str, Any] | None

Additional arguments to pass to the underlying HTTP client.

Applied to both sync and async clients.

credentials: Any

Custom credentials for Vertex AI authentication.

When provided, forces Vertex AI backend.

Accepts a google.auth.credentials.Credentials object.

google_api_key: SecretStr | None

The Google API key to use.

If not provided, will check the env vars GOOGLE_API_KEY and GEMINI_API_KEY.

location: str | None

Google Cloud region (Vertex AI only).

Defaults to GOOGLE_CLOUD_LOCATION env var, then 'us-central1'.

model: str: The name of the embedding model to use.

model_config: ClassVar[ConfigDict] = {'extra': 'allow', 'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

output_dimensionality: int | None

Default output dimensionality for embeddings.

If set, all embed calls use this dimension unless explicitly overridden.

project: str | None

Google Cloud project ID (Vertex AI only).

Falls back to GOOGLE_CLOUD_PROJECT env var if not provided.

request_options: dict | None

A dictionary of request options to pass to the Google API client.

Example: {'timeout': 10}

task_type: str | None

The task type.

Valid options include:

'TASK_TYPE_UNSPECIFIED'
'RETRIEVAL_QUERY'
'RETRIEVAL_DOCUMENT'
'SEMANTIC_SIMILARITY'
'CLASSIFICATION'
'CLUSTERING'
'QUESTION_ANSWERING'
'FACT_VERIFICATION'
'CODE_RETRIEVAL_QUERY'

See [TaskType](https://ai.google.dev/api/embeddings#tasktype) for details.

vertexai: bool | None

Whether to use Vertex AI backend.

If None (default), backend is automatically determined:

If GOOGLE_GENAI_USE_VERTEXAI env var is set, uses that value
If credentials parameter is provided, uses Vertex AI
If project parameter is provided, uses Vertex AI
Otherwise, uses Gemini Developer API

init_chat_model(proxy_client, deployment, temperature=0.0, max_tokens=256, top_k=None, top_p=1.0)

Initialize a ChatGoogleGenerativeAI model with the given parameters.

Parameters:

proxy_client (BaseProxyClient) -- proxy client to use for the model
deployment (Deployment) -- deployment information for the model
temperature (float, optional) -- sampling temperature, defaults to 0.0
max_tokens (int, optional) -- maximum number of tokens to generate, defaults to 256
top_k (Optional[int], optional) -- k for top-k sampling, defaults to None
top_p (float, optional) -- p for nucleus sampling, defaults to 1.0

Returns:

initialized ChatGoogleGenerativeAI model

Return type:

ChatGoogleGenerativeAI

init_embedding_model(proxy_client, deployment)

Parameters:

proxy_client (BaseProxyClient)
deployment (Deployment)

gen_ai_hub.proxy.langchain.init_models module

class Catalog

Bases: object

Catalog for registering and retrieving model deployments.

__init__()

all_embedding_models(proxy_client=None)

Retrieves all registered embedding models for the specified proxy client.

Parameters:: proxy_client (Optional[Union[str, BaseProxyClient]], optional) -- the proxy client to retrieve models for, defaults to None
Raises:: TypeError -- if the proxy client is invalid
Returns:: A dictionary of model names and their corresponding embedding model instances
Return type:: Dict[str, Embeddings]

all_llms(proxy_client=None)

Retrieves all registered language models for the specified proxy client.

Parameters:: proxy_client (Optional[Union[str, BaseProxyClient]], optional) -- the proxy client to retrieve models for, defaults to None
Raises:: TypeError -- if the proxy client is invalid
Returns:: A dictionary of model names and their corresponding language model instances
Return type:: Dict[str, BaseLanguageModel]

register(proxy_client, base_class, *model_names, f_select_deployment=None)

Registers a model deployment in the catalog.

Parameters:

proxy_client (Union[str, BaseProxyClient]) -- the proxy client to register the model for
base_class (Type[Union[BaseLanguageModel, Embeddings]]) -- the base class of the model (LLM or Embeddings)
f_select_deployment (Optional[Callable], optional) -- function to select the deployment, defaults to None

Raises:

TypeError -- if the base class is not supported

Returns:

Decorator function for registering the model

Return type:

Callable

retrieve(proxy_client=None, args=None, kwargs=None, model_type=None)

Retrieves a model deployment from the catalog.

Parameters:

proxy_client (Optional[BaseProxyClient], optional) -- the proxy client to use for retrieving the model
args (List[str], optional) -- the positional arguments for model identification, defaults to None
kwargs (Dict[str, str], optional) -- the keyword arguments for model identification, defaults to None
model_type (Union[str, ModelType], optional) -- the type of the model to retrieve, defaults to None

Returns:

The retrieval result containing the proxy client, deployment, and registry entry

Return type:

RetrievalResult

class ModelType

Bases: Enum

EMBEDDINGS = 2

LLM = 1

class RegisterDeployment

Bases: object

Registry entry for a model deployment.

__init__(model, init_func, f_select_deployment=None)

Parameters:

model (BaseLanguageModel | Embeddings)
init_func (Callable)
f_select_deployment (Callable[[BaseProxyClient, Dict[str, str]], BaseDeployment] | None)

Return type:

None

f_select_deployment: Callable[[BaseProxyClient, Dict[str, str]], BaseDeployment] | None = None

init_func: Callable

model: BaseLanguageModel | Embeddings

class RetrievalResult

Bases: object

Result of retrieving a model from the catalog.

__init__(proxy_client, deployment, registry_entry)

Parameters:

proxy_client (BaseProxyClient)
deployment (BaseDeployment)
registry_entry (RegisterDeployment)

Return type:

None

deployment: BaseDeployment

proxy_client: BaseProxyClient

registry_entry: RegisterDeployment

default_f_select_deployment(proxy_client, **model_identification_kwargs)

Default function to select a deployment based on model identification kwargs.

Parameters:

proxy_client (BaseProxyClient) -- The proxy client to use for selecting the deployment
model_identification_kwargs (Dict[str, str])

Returns:

The selected deployment

Return type:

BaseDeployment

get_model_class(*args, model_type=None, proxy_client=None, **kwargs)

Retrieves the model class for the specified model.

Parameters:

model_type (Union[str, ModelType]) -- The type of the model to retrieve (optional)
proxy_client (BaseProxyClient) -- The proxy client to use for the model (optional)

Returns:

The model class

Return type:

Union[BaseLanguageModel, Embeddings]

handle_model_args_kwargs(proxy_client, args, kwargs)

Handles model identification arguments and keyword arguments.

Parameters:

proxy_client (_type_) -- the proxy client to use for model identification
args (List[Any]) -- list of positional arguments
kwargs (Dict[str, Any]) -- dictionary of keyword arguments

Raises:

ValueError -- if no model identification argument is provided

Returns:

A tuple containing the model name, model identification kwargs, and remaining kwargs

Return type:

Tuple[str, Dict[str, str], Dict[str, Any]]

init_embedding_model(*args, proxy_client=None, init_func=None, model_id='', **kwargs)

Initializes an embedding model using the specified parameters.

Parameters:

proxy_client (BaseProxyClient) -- The proxy client to use for the model (optional)
init_func (Callable) -- Function to call for initializing the model, optional
model_id (str) -- id of the Amazon Bedrock model, needed in case a custom Amazon Bedrock model is being initiated (optional)

Returns:

The initialized embedding model

Return type:

Embeddings

init_llm(*args, proxy_client=None, temperature=0.0, max_tokens=256, top_k=None, top_p=1.0, init_func=None, model_id='', **kwargs)

Initializes a language model using the specified parameters.

Parameters:

proxy_client (ProxyClient) -- The proxy client to use for the model (optional)
temperature (float) -- The temperature parameter for model generation (default: 0.0)
max_tokens (int) -- The maximum number of tokens to generate (default: 256)
top_k (int) -- The top-k parameter for model generation (optional)
top_p (float) -- The top-p parameter for model generation (default: 1.0)
init_func (Callable) -- Function to call for initializing the model, optional
model_id (str) -- id of the Amazon Bedrock model, needed in case a custom Amazon Bedrock model is being initiated (optional)

Returns:

The initialized language model

Return type:

BaseLanguageModel

gen_ai_hub.proxy.langchain.openai module

LangChain wrappers for OpenAI models via Generative AI Hub.

class ChatOpenAI

Bases: ProxyOpenAI, ChatOpenAI

ChatOpenAI model using a proxy.

Parameters:

ProxyOpenAI (class) -- Base class for OpenAI models using a proxy
ChatOpenAI (class) -- ChatOpenAI class from langchain_openai

classmethod validate_environment(values)

Validates the environment.

Parameters:: values (Dict) -- The input values
Raises:: ValueError -- n must be at least 1.
Returns:: The validated values
Return type:: Dict

static __new__(cls, **data)

Initialize the OpenAI object. :param data: Additional data to initialize the object :type data: Any :return: The initialized OpenAI object :rtype: OpenAIBase

Parameters:: data (Any)

__init__(*args, **kwargs): Initialize the ChatOpenAI object.

async_client: Any

cache: BaseCache | bool | None

Whether to cache the response.

If True, will use the global cache.
If False, will not use a cache
If None, will use the global cache if it's set, otherwise no cache.
If instance of BaseCache, will use the provided cache.

Caching is not currently supported for streaming methods of models.

callbacks: Callbacks: Callbacks to add to the run trace.

client: Any

config_id: str | None

config_name: str | None

context_management: list[dict[str, Any]] | None: Configuration for [context management](https://developers.openai.com/api/docs/guides/compaction).

custom_get_token_ids: Callable[[str], list[int]] | None: Optional encoder to use for counting tokens.

default_headers: Mapping[str, str] | None

default_query: Mapping[str, object] | None

deployment_id: str | None

disable_streaming: bool | Literal['tool_calling']

Whether to disable streaming for this model.

If streaming is bypassed, then stream/astream/astream_events will defer to invoke/ainvoke.

If True, will always bypass streaming case.
If 'tool_calling', will bypass streaming case only when the model is called
with a tools keyword argument. In other words, LangChain will automatically switch to non-streaming behavior (invoke) only when the tools argument is provided. This offers the best of both worlds.
If False (Default), will always use streaming case if available.

The main reason for this flag is that code might be written using stream and a user may want to swap out a given model for another model whose implementation does not properly support streaming.

disabled_params: dict[str, Any] | None

Parameters of the OpenAI client or chat.completions endpoint that should be disabled for the given model.

Should be specified as {"param": None | ['val1', 'val2']} where the key is the parameter and the value is either None, meaning that parameter should never be used, or it's a list of disabled values for the parameter.

For example, older models may not support the 'parallel_tool_calls' parameter at all, in which case disabled_params={"parallel_tool_calls": None} can be passed in.

If a parameter is disabled then it will not be used by default in any methods, e.g. in with_structured_output. However this does not prevent a user from directly passed in the parameter during invocation.

extra_body: Mapping[str, Any] | None

Optional additional JSON properties to include in the request parameters when making requests to OpenAI compatible APIs, such as vLLM, LM Studio, or other providers.

This is the recommended way to pass custom parameters that are specific to your OpenAI-compatible API provider but not part of the standard OpenAI API.

Examples: - [LM Studio](https://lmstudio.ai/) TTL parameter: extra_body={"ttl": 300} - [vLLM](https://github.com/vllm-project/vllm) custom parameters:

extra_body={"use_beam_search": True}

Any other provider-specific parameters

!!! warning

Do not use model_kwargs for custom parameters that are not part of the standard OpenAI API, as this will cause errors when making API calls. Use extra_body instead.

frequency_penalty: float | None: Penalizes repeated tokens according to frequency.

http_async_client: Any | None

Optional httpx.AsyncClient.

Only used for async invocations. Must specify http_client as well if you'd like a custom client for sync invocations.

http_client: Any | None

Optional httpx.Client.

Only used for sync invocations. Must specify http_async_client as well if you'd like a custom client for async invocations.

include: list[str] | None

Additional fields to include in generations from Responses API.

Supported values:

'file_search_call.results'
'message.input_image.image_url'
'computer_call_output.output.image_url'
'reasoning.encrypted_content'
'code_interpreter_call.outputs'

!!! version-added "Added in langchain-openai 0.3.24"

include_response_headers: bool: Whether to include response headers in the output message response_metadata.

logit_bias: dict[int, int] | None: Modify the likelihood of specified tokens appearing in the completion.

logprobs: bool | None: Whether to return logprobs.

max_retries: int | None: Maximum number of retries to make when generating.

max_tokens: int | None: Maximum number of tokens to generate.

metadata: dict[str, Any] | None: Metadata to add to the run trace.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'allow', 'populate_by_name': True, 'protected_namespaces': (), 'validate_by_alias': True, 'validate_by_name': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_kwargs: dict[str, Any]: Holds any model parameters valid for create call not explicitly specified.

model_name: str | None: Model name to use.

n: int | None: Number of chat completions to generate for each prompt.

name: str | None

The name of the Runnable.

Used for debugging and tracing.

openai_api_base: str | None: Base URL path for API requests, leave blank if not using a proxy or service emulator.

openai_api_key: SecretStr | None | Callable[[], str] | Callable[[], Awaitable[str]]

API key to use.

Can be inferred from the OPENAI_API_KEY environment variable, or specified as a string, or sync or async callable that returns a string.

??? example "Specify with environment variable"

`bash export OPENAI_API_KEY=... ` ```python from langchain_openai import ChatOpenAI

model = ChatOpenAI(model="gpt-5-nano") ```

??? example "Specify with a string"

```python from langchain_openai import ChatOpenAI

model = ChatOpenAI(model="gpt-5-nano", api_key="...") ```

??? example "Specify with a sync callable"

```python from langchain_openai import ChatOpenAI

def get_api_key() -> str:
# Custom logic to retrieve API key return "..."

model = ChatOpenAI(model="gpt-5-nano", api_key=get_api_key) ```

??? example "Specify with an async callable"

```python from langchain_openai import ChatOpenAI

async def get_api_key() -> str:
# Custom async logic to retrieve API key return "..."

model = ChatOpenAI(model="gpt-5-nano", api_key=get_api_key) ```

openai_api_version: str | None

openai_organization: str | None: Automatically inferred from env var OPENAI_ORG_ID if not provided.

openai_proxy: str | None

output_version: str | None

Version of AIMessage output format to use.

This field is used to roll-out new output formats for chat model AIMessage responses in a backwards-compatible way.

Supported values:

'v0': AIMessage format as of langchain-openai 0.3.x.
'responses/v1': Formats Responses API output items into AIMessage content blocks
(Responses API only)
'v1': v1 of LangChain cross-provider standard.

!!! warning "Behavior changed in langchain-openai 1.0.0"

Default updated to "responses/v1".

presence_penalty: float | None: Penalizes repeated tokens.

profile: ModelProfile | None

Profile detailing model capabilities.

!!! warning "Beta feature"

This is a beta feature. The format of model profiles is subject to change.

If not specified, automatically loaded from the provider package on initialization if data is available.

Example profile data includes context window sizes, supported modalities, or support for tool calling, structured output, and other features.

!!! version-added "Added in langchain-core 1.1.0"

proxy_model_name: str | None

rate_limiter: BaseRateLimiter | None: An optional rate limiter to use for limiting the number of requests.

reasoning: dict[str, Any] | None

Reasoning parameters for reasoning models.

For use with the Responses API.

```python reasoning={

"effort": "medium", # Can be "low", "medium", or "high" "summary": "auto", # Can be "auto", "concise", or "detailed"

}

!!! version-added "Added in langchain-openai 0.3.24"

reasoning_effort: str | None

Constrains effort on reasoning for reasoning models.

For use with the Chat Completions API. Reasoning models only.

Currently supported values are 'minimal', 'low', 'medium', and 'high'. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.

request_timeout: float | tuple[float, float] | Any | None

Timeout for requests to OpenAI completion API.

Can be float, httpx.Timeout or None.

root_async_client: Any

root_client: Any

seed: int | None: Seed for generation

service_tier: str | None

Latency tier for request.

Options are 'auto', 'default', or 'flex'.

Relevant for users of OpenAI's scale tier service.

stop: list[str] | str | None: Default stop sequences.

store: bool | None

If True, OpenAI may store response data for future use.

Defaults to True for the Responses API and False for the Chat Completions API.

!!! version-added "Added in langchain-openai 0.3.24"

stream_usage: bool | None

Whether to include usage metadata in streaming output.

If enabled, an additional message chunk will be generated during the stream including usage metadata.

This parameter is enabled unless openai_api_base is set or the model is initialized with a custom client, as many chat completions APIs do not support streaming token usage.

!!! version-added "Added in langchain-openai 0.3.9"

!!! warning "Behavior changed in langchain-openai 0.3.35"

Enabled for default base URL and client.

streaming: bool: Whether to stream the results or not.

tags: list[str] | None: Tags to add to the run trace.

temperature: float | None: What sampling temperature to use.

tiktoken_model_name: str | None

The model name to pass to tiktoken when using this class.

Tiktoken is used to count the number of tokens in documents to constrain them to be under a certain limit.

top_logprobs: int | None

Number of most likely tokens to return at each token position, each with an associated log probability.

logprobs must be set to true if this parameter is used.

top_p: float | None: Total probability mass of tokens to consider at each step.

truncation: str | None

Truncation strategy (Responses API).

Can be 'auto' or 'disabled' (default).

If 'auto', model may drop input items from the middle of the message sequence to fit the context window.

!!! version-added "Added in langchain-openai 0.3.24"

use_previous_response_id: bool

If True, always pass previous_response_id using the ID of the most recent response. Responses API only.

Input messages up to the most recent response will be dropped from request payloads.

For example, the following two are equivalent:

```python model = ChatOpenAI(

model="...", use_previous_response_id=True,

) model.invoke(

[
HumanMessage("Hello"), AIMessage("Hi there!", response_metadata={"id": "resp_123"}), HumanMessage("How are you?"),

]

)

`python model = ChatOpenAI(model="...", use_responses_api=True) model.invoke([HumanMessage("How are you?")], previous_response_id="resp_123") `

!!! version-added "Added in langchain-openai 0.3.26"

use_responses_api: bool | None

Whether to use the Responses API instead of the Chat API.

If not specified then will be inferred based on invocation params.

!!! version-added "Added in langchain-openai 0.3.9"

verbose: bool: Whether to print out response text.

verbosity: str | None

Controls the verbosity level of responses for reasoning models.

For use with the Responses API.

Currently supported values are 'low', 'medium', and 'high'.

!!! version-added "Added in langchain-openai 0.3.28"

class OpenAI

Bases: ProxyOpenAI, OpenAI

OpenAI model using a proxy.

classmethod validate_environment(values)

Validates the environment.

Parameters:: values (Dict) -- The input values
Returns:: The validated values
Return type:: Dict

static __new__(cls, **data)

Initialize the OpenAI object.

Parameters:: data (Any)

__init__(*args, **kwargs): Initialize the OpenAI object.

allowed_special: Literal['all'] | set[str]: Set of special tokens that are allowed。

async_client: Any

batch_size: int: Batch size to use when passing multiple documents to generate.

best_of: int: Generates best_of completions server-side and returns the "best".

cache: BaseCache | bool | None

Whether to cache the response.

If True, will use the global cache.
If False, will not use a cache
If None, will use the global cache if it's set, otherwise no cache.
If instance of BaseCache, will use the provided cache.

Caching is not currently supported for streaming methods of models.

callbacks: Callbacks: Callbacks to add to the run trace.

client: Any

config_id: str | None

config_name: str | None

custom_get_token_ids: Callable[[str], list[int]] | None: Optional encoder to use for counting tokens.

default_headers: Mapping[str, str] | None

default_query: Mapping[str, object] | None

deployment_id: str | None

disallowed_special: Literal['all'] | Collection[str]: Set of special tokens that are not allowed。

extra_body: Mapping[str, Any] | None: Optional additional JSON properties to include in the request parameters when making requests to OpenAI compatible APIs, such as vLLM.

frequency_penalty: float: Penalizes repeated tokens according to frequency.

http_async_client: Any | None

Optional httpx.AsyncClient.

Only used for async invocations. Must specify http_client as well if you'd like a custom client for sync invocations.

http_client: Any | None

Optional httpx.Client.

Only used for sync invocations. Must specify http_async_client as well if you'd like a custom client for async invocations.

logit_bias: dict[str, float] | None: Adjust the probability of specific tokens being generated.

logprobs: int | None: Include the log probabilities on the logprobs most likely output tokens, as well the chosen tokens.

max_retries: int: Maximum number of retries to make when generating.

max_tokens: int: The maximum number of tokens to generate in the completion. -1 returns as many tokens as possible given the prompt and the models maximal context size.

metadata: dict[str, Any] | None: Metadata to add to the run trace.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'allow', 'populate_by_name': True, 'protected_namespaces': (), 'validate_by_alias': True, 'validate_by_name': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_kwargs: dict[str, Any]: Holds any model parameters valid for create call not explicitly specified.

model_name: str | None: Model name to use.

n: int: How many completions to generate for each prompt.

name: str | None

The name of the Runnable.

Used for debugging and tracing.

openai_api_base: str | None: Base URL path for API requests, leave blank if not using a proxy or service emulator.

openai_api_key: SecretStr | None | Callable[[], str]: Automatically inferred from env var OPENAI_API_KEY if not provided.

openai_api_version: str | None

openai_organization: str | None: Automatically inferred from env var OPENAI_ORG_ID if not provided.

openai_proxy: str | None

presence_penalty: float: Penalizes repeated tokens.

proxy_model_name: str | None

request_timeout: float | tuple[float, float] | Any | None: Timeout for requests to OpenAI completion API. Can be float, httpx.Timeout or None.

seed: int | None: Seed for generation

streaming: bool: Whether to stream the results or not.

tags: list[str] | None: Tags to add to the run trace.

temperature: float: What sampling temperature to use.

tiktoken_model_name: str | None

The model name to pass to tiktoken when using this class.

Tiktoken is used to count the number of tokens in documents to constrain them to be under a certain limit.

top_p: float: Total probability mass of tokens to consider at each step.

verbose: bool: Whether to print out response text.

class OpenAIEmbeddings

Bases: ProxyOpenAI, OpenAIEmbeddings

OpenAI Embeddings model using a proxy.

classmethod validate_environment(values)

Validates the environment.

Parameters:: values (Dict) -- The input values
Returns:: The validated values
Return type:: Dict

__init__(*args, **kwargs): Initialize the OpenAIEmbeddings object.

allowed_special: Literal['all'] | set[str] | None

async_client: Any

check_embedding_ctx_length: bool

Whether to check the token length of inputs and automatically split inputs longer than embedding_ctx_length.

Set to False to send raw text strings directly to the API instead of tokenizing. Useful for many non-OpenAI providers (e.g. OpenRouter, Ollama, vLLM).

chunk_size: int: Maximum number of texts to embed in each batch

client: Any

config_id: str | None

config_name: str | None

default_headers: Mapping[str, str] | None

default_query: Mapping[str, object] | None

deployment: str | None

deployment_id: str | None

dimensions: int | None

The number of dimensions the resulting output embeddings should have.

Only supported in 'text-embedding-3' and later models.

disallowed_special: Literal['all'] | set[str] | Sequence[str] | None

embedding_ctx_length: int: The maximum number of tokens to embed at once.

headers: Any

http_async_client: Any | None

Optional httpx.AsyncClient.

Only used for async invocations. Must specify http_client as well if you'd like a custom client for sync invocations.

http_client: Any | None

Optional httpx.Client.

Only used for sync invocations. Must specify http_async_client as well if you'd like a custom client for async invocations.

input_type: str | None

max_retries: int: Maximum number of retries to make when generating.

model: str | None

model_config: ClassVar[ConfigDict] = {'extra': 'allow', 'populate_by_name': True, 'protected_namespaces': (), 'validate_by_alias': True, 'validate_by_name': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_kwargs: dict[str, Any]: Holds any model parameters valid for create call not explicitly specified.

openai_api_base: str | None

Base URL path for API requests, leave blank if not using a proxy or service emulator.

Automatically inferred from env var OPENAI_API_BASE if not provided.

openai_api_key: SecretStr | None | Callable[[], str] | Callable[[], Awaitable[str]]

API key to use for API calls.

Automatically inferred from env var OPENAI_API_KEY if not provided.

openai_api_type: str | None

openai_api_version: str | None

Version of the OpenAI API to use.

Automatically inferred from env var OPENAI_API_VERSION if not provided.

openai_organization: str | None

OpenAI organization ID to use for API calls.

Automatically inferred from env var OPENAI_ORG_ID if not provided.

openai_proxy: str | None

proxy_model_name: str | None

request_timeout: float | tuple[float, float] | Any | None

Timeout for requests to OpenAI completion API.

Can be float, httpx.Timeout or None.

retry_max_seconds: int: Max number of seconds to wait between retries

retry_min_seconds: int: Min number of seconds to wait between retries

show_progress_bar: bool: Whether to show a progress bar when embedding.

skip_empty: bool: Whether to skip empty strings when embedding or raise an error.

tiktoken_enabled: bool

Set this to False to use HuggingFace transformers tokenization.

For non-OpenAI providers (OpenRouter, Ollama, vLLM, etc.), consider setting check_embedding_ctx_length=False instead, as it bypasses tokenization entirely.

tiktoken_model_name: str | None

The model name to pass to tiktoken when using this class.

Tiktoken is used to count the number of tokens in documents to constrain them to be under a certain limit.

class ProxyOpenAI

Bases: BaseAuth

Base class for OpenAI models using a proxy.

Parameters:: BaseAuth (class) -- Base authentication class
Returns:: The ProxyOpenAI class
Return type:: class

classmethod validate_clients(values)

Validate and initialize OpenAI clients.

Parameters:: values (Dict) -- The input values
Returns:: The validated values
Return type:: Dict

config_id: str | None

config_name: str | None

deployment_id: str | None

model_config: ClassVar[ConfigDict] = {'extra': 'allow'}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

proxy_model_name: str | None

get_client_params(values): Get the client parameters. :param values: The client values :return: client values + proxy_client

init_chat_model(proxy_client, deployment, temperature=0.0, max_tokens=256, top_k=None, top_p=1.0)

Initialize the ChatOpenAI model.

Parameters:

proxy_client (BaseProxyClient) -- the proxy client
deployment (BaseDeployment) -- the deployment
temperature (float, optional) -- the temperature, defaults to 0.0
max_tokens (int, optional) -- the maximum tokens, defaults to 256
top_k (Optional[int], optional) -- the top k, defaults to None
top_p (float, optional) -- the top p, defaults to 1.0

Returns:

the ChatOpenAI model

Return type:

ChatOpenAI

init_embedding_model(proxy_client, deployment)

Initialize the OpenAIEmbeddings model.

Parameters:

proxy_client (BaseProxyClient) -- the proxy client
deployment (BaseDeployment) -- the deployment

Returns:

the OpenAIEmbeddings model

Return type:

OpenAIEmbeddings