gen_ai_hub.proxy.langchain package

class ChatBedrock

Bases: AICoreBedrockBaseModel, ChatBedrock

Drop-in replacement for LangChain ChatBedrock.

__init__(*args, **kwargs)
Initializes the AICoreBedrockBaseModel with AICore specific parameters.

Extends the constructor of the base class with aicore specific parameters

Parameters:
  • model_id (str, optional) -- the model identifier, defaults to ""

  • deployment_id (str, optional) -- the deployment identifier, defaults to ""

  • model_name (str, optional) -- the model name, defaults to ""

  • config_id (str, optional) -- the configuration identifier, defaults to ""

  • config_name (str, optional) -- the configuration name, defaults to ""

  • proxy_client (Optional[BaseProxyClient], optional) -- the proxy client to use, defaults to None

beta_use_converse_api: bool

Use the new Bedrock converse API which provides a standardized interface to all Bedrock models. Support still in beta. See ChatBedrockConverse docs for more.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'allow', 'populate_by_name': True, 'protected_namespaces': (), 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

stop_sequences: List[str] | None

Stop sequence inference parameter from new Bedrock converse API providing a sequence of characters that causes a model to stop generating a response. See https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_InferenceConfiguration.html for more.

system_prompt_with_tools: str
class ChatGoogleGenerativeAI

Bases: _BaseGoogleGenerativeAI, ChatGoogleGenerativeAI

Drop-in replacement for langchain_google_genai.ChatGoogleGenerativeAI.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'ignore', 'populate_by_name': True, 'protected_namespaces': (), 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ChatOpenAI

Bases: ProxyOpenAI, ChatOpenAI

ChatOpenAI model using a proxy.

Parameters:
  • ProxyOpenAI (class) -- Base class for OpenAI models using a proxy

  • ChatOpenAI (class) -- ChatOpenAI class from langchain_openai

classmethod validate_environment(values)

Validates the environment.

Parameters:

values (Dict) -- The input values

Raises:

ValueError -- n must be at least 1.

Returns:

The validated values

Return type:

Dict

static __new__(cls, **data)

Initialize the OpenAI object. :param data: Additional data to initialize the object :type data: Any :return: The initialized OpenAI object :rtype: OpenAIBase

Parameters:

data (Any)

__init__(*args, **kwargs)

Initialize the ChatOpenAI object.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'allow', 'populate_by_name': True, 'protected_namespaces': (), 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_name: str | None

Model name to use.

openai_api_version: str | None
class GoogleGenerativeAIEmbeddings

Bases: _BaseGoogleGenerativeAI, GoogleGenerativeAIEmbeddings

Drop-in replacement for langchain_google_genai.GoogleGenerativeAIEmbeddings.

model_config: ClassVar[ConfigDict] = {'extra': 'allow', 'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class OpenAI

Bases: ProxyOpenAI, OpenAI

OpenAI model using a proxy.

classmethod validate_environment(values)

Validates the environment.

Parameters:

values (Dict) -- The input values

Returns:

The validated values

Return type:

Dict

static __new__(cls, **data)

Initialize the OpenAI object.

Parameters:

data (Any)

__init__(*args, **kwargs)

Initialize the OpenAI object.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'allow', 'populate_by_name': True, 'protected_namespaces': (), 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_name: str | None

Model name to use.

openai_api_version: str | None
class OpenAIEmbeddings

Bases: ProxyOpenAI, OpenAIEmbeddings

OpenAI Embeddings model using a proxy.

classmethod validate_environment(values)

Validates the environment.

Parameters:

values (Dict) -- The input values

Returns:

The validated values

Return type:

Dict

__init__(*args, **kwargs)

Initialize the OpenAIEmbeddings object.

chunk_size: int

Maximum number of texts to embed in each batch

input_type: str | None
model: str | None
model_config: ClassVar[ConfigDict] = {'extra': 'allow', 'populate_by_name': True, 'protected_namespaces': (), 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

openai_api_version: str | None

Version of the OpenAI API to use.

Automatically inferred from env var OPENAI_API_VERSION if not provided.

tiktoken_model_name: str | None

The model name to pass to tiktoken when using this class.

Tiktoken is used to count the number of tokens in documents to constrain them to be under a certain limit.

By default, when set to None, this will be the same as the embedding model name. However, there are some cases where you may want to use this Embedding class with a model name not supported by tiktoken. This can include when using Azure embeddings or when using one of the many model providers that expose an OpenAI-like API but with different models. In those cases, in order to avoid erroring when tiktoken is called, you can specify a model name to use here.

init_embedding_model(*args, proxy_client=None, init_func=None, model_id='', **kwargs)

Initializes an embedding model using the specified parameters.

Parameters:
  • proxy_client (BaseProxyClient) -- The proxy client to use for the model (optional)

  • init_func (Callable) -- Function to call for initializing the model, optional

  • model_id (str) -- id of the Amazon Bedrock model, needed in case a custom Amazon Bedrock model is being initiated (optional)

Returns:

The initialized embedding model

Return type:

Embeddings

init_llm(*args, proxy_client=None, temperature=0.0, max_tokens=256, top_k=None, top_p=1.0, init_func=None, model_id='', **kwargs)

Initializes a language model using the specified parameters.

Parameters:
  • proxy_client (ProxyClient) -- The proxy client to use for the model (optional)

  • temperature (float) -- The temperature parameter for model generation (default: 0.0)

  • max_tokens (int) -- The maximum number of tokens to generate (default: 256)

  • top_k (int) -- The top-k parameter for model generation (optional)

  • top_p (float) -- The top-p parameter for model generation (default: 1.0)

  • init_func (Callable) -- Function to call for initializing the model, optional

  • model_id (str) -- id of the Amazon Bedrock model, needed in case a custom Amazon Bedrock model is being initiated (optional)

Returns:

The initialized language model

Return type:

BaseLanguageModel

Submodules

gen_ai_hub.proxy.langchain.amazon module

class AICoreBedrockBaseModel

Bases: BaseModel

AICoreBedrockBaseModel provides all adjustments to boto3 based LangChain classes to enable communication with SAP AI Core.

classmethod get_corresponding_model_id(model_name)

Gets the corresponding model ID for a given model name.

Parameters:

model_name (str) -- the model name

Raises:

ValueError -- if the model name is not supported

Returns:

the corresponding model ID

Return type:

str

classmethod validate_environment(values)

Validates and sets up the environment for the model.

Parameters:

values (Dict) -- the input values

Returns:

the validated values

Return type:

Dict

__init__(*args, model_id='', deployment_id='', model_name='', config_id='', config_name='', proxy_client=None, **kwargs)
Initializes the AICoreBedrockBaseModel with AICore specific parameters.

Extends the constructor of the base class with aicore specific parameters

Parameters:
  • model_id (str, optional) -- the model identifier, defaults to ""

  • deployment_id (str, optional) -- the deployment identifier, defaults to ""

  • model_name (str, optional) -- the model name, defaults to ""

  • config_id (str, optional) -- the configuration identifier, defaults to ""

  • config_name (str, optional) -- the configuration name, defaults to ""

  • proxy_client (Optional[BaseProxyClient], optional) -- the proxy client to use, defaults to None

model_config: ClassVar[ConfigDict] = {'extra': 'allow'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class BedrockEmbeddings

Bases: AICoreBedrockBaseModel, BedrockEmbeddings

Drop-in replacement for LangChain BedrockEmbeddings.

__init__(*args, **kwargs)
Initializes the AICoreBedrockBaseModel with AICore specific parameters.

Extends the constructor of the base class with aicore specific parameters

Parameters:
  • model_id (str, optional) -- the model identifier, defaults to ""

  • deployment_id (str, optional) -- the deployment identifier, defaults to ""

  • model_name (str, optional) -- the model name, defaults to ""

  • config_id (str, optional) -- the configuration identifier, defaults to ""

  • config_name (str, optional) -- the configuration name, defaults to ""

  • proxy_client (Optional[BaseProxyClient], optional) -- the proxy client to use, defaults to None

model_config: ClassVar[ConfigDict] = {'extra': 'allow', 'protected_namespaces': ()}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ChatBedrock

Bases: AICoreBedrockBaseModel, ChatBedrock

Drop-in replacement for LangChain ChatBedrock.

__init__(*args, **kwargs)
Initializes the AICoreBedrockBaseModel with AICore specific parameters.

Extends the constructor of the base class with aicore specific parameters

Parameters:
  • model_id (str, optional) -- the model identifier, defaults to ""

  • deployment_id (str, optional) -- the deployment identifier, defaults to ""

  • model_name (str, optional) -- the model name, defaults to ""

  • config_id (str, optional) -- the configuration identifier, defaults to ""

  • config_name (str, optional) -- the configuration name, defaults to ""

  • proxy_client (Optional[BaseProxyClient], optional) -- the proxy client to use, defaults to None

aws_access_key_id: SecretStr | None

AWS access key id.

If provided, aws_secret_access_key must also be provided.

If not specified, the default credential profile or, if on an EC2 instance, credentials from IMDS will be used.

See: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html

If not provided, will be read from AWS_ACCESS_KEY_ID environment variable.

aws_secret_access_key: SecretStr | None

AWS secret_access_key.

If provided, aws_access_key_id must also be provided.

If not specified, the default credential profile or, if on an EC2 instance, credentials from IMDS will be used.

See: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html

If not provided, will be read from AWS_SECRET_ACCESS_KEY environment variable.

aws_session_token: SecretStr | None

AWS session token.

If provided, aws_access_key_id and aws_secret_access_key must also be provided.

Not required unless using temporary credentials.

See: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html

If not provided, will be read from AWS_SESSION_TOKEN environment variable.

base_model_id: str | None

An optional field to pass the base model id. If provided, this will be used over the value of model_id to identify the base model.

bedrock_api_key: SecretStr | None

Bedrock API key.

Enables authentication using Bedrock API keys instead of standard AWS credentials. When provided, the key is set as the AWS_BEARER_TOKEN_BEDROCK environment variable.

See: https://docs.aws.amazon.com/bedrock/latest/userguide/api-keys-use.html

If not provided, will be read from AWS_BEARER_TOKEN_BEDROCK environment variable.

If both an API key and AWS credentials are present, the API key takes precedence.

bedrock_client: Any

The bedrock client for making control plane API calls

beta_use_converse_api: bool

Use the new Bedrock converse API which provides a standardized interface to all Bedrock models. Support still in beta. See ChatBedrockConverse docs for more.

cache: BaseCache | bool | None

Whether to cache the response.

  • If True, will use the global cache.

  • If False, will not use a cache

  • If None, will use the global cache if it's set, otherwise no cache.

  • If instance of BaseCache, will use the provided cache.

Caching is not currently supported for streaming methods of models.

callbacks: Callbacks

Callbacks to add to the run trace.

client: Any

The bedrock runtime client for making data plane API calls

config: Any

An optional botocore.config.Config instance to pass to the client.

credentials_profile_name: str | None

The name of the profile in the ~/.aws/credentials or ~/.aws/config files, which has either access keys or role information specified.

If not specified, the default credential profile or, if on an EC2 instance, credentials from IMDS will be used.

See: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html

custom_get_token_ids: Callable[[str], list[int]] | None

Optional encoder to use for counting tokens.

disable_streaming: bool | Literal['tool_calling']

Whether to disable streaming for this model.

If streaming is bypassed, then stream/astream/astream_events will defer to invoke/ainvoke.

  • If True, will always bypass streaming case.

  • If 'tool_calling', will bypass streaming case only when the model is called

    with a tools keyword argument. In other words, LangChain will automatically switch to non-streaming behavior (invoke) only when the tools argument is provided. This offers the best of both worlds.

  • If False (Default), will always use streaming case if available.

The main reason for this flag is that code might be written using stream and a user may want to swap out a given model for another model whose implementation does not properly support streaming.

endpoint_url: str | None

Needed if you don't want to default to 'us-east-1' endpoint

guardrails: Mapping[str, Any] | None

An optional dictionary to configure guardrails for Bedrock.

This field guardrails consists of two keys: 'guardrailId' and 'guardrailVersion', which should be strings, but are initialized to None.

It's used to determine if specific guardrails are enabled and properly set.

Type:

Optional[Mapping[str, str]]: A mapping with 'guardrailId' and 'guardrailVersion' keys.

Example:

```python llm = BedrockLLM(model_id="<model_id>", client=<bedrock_client>,

model_kwargs={}, guardrails={

"guardrailId": "<guardrail_id>", "guardrailVersion": "<guardrail_version>"})

```

To enable tracing for guardrails, set the 'trace' key to True and pass a callback handler to the 'run_manager' parameter of the 'generate', '_call' methods.

Example:

```python llm = BedrockLLM(model_id="<model_id>", client=<bedrock_client>,

model_kwargs={}, guardrails={

"guardrailId": "<guardrail_id>", "guardrailVersion": "<guardrail_version>", "trace": True},

callbacks=[BedrockAsyncCallbackHandler()])

```

https://python.langchain.com/docs/concepts/callbacks/ for more information on callback handlers.

class BedrockAsyncCallbackHandler(AsyncCallbackHandler):
async def on_llm_error(

self, error: BaseException, **kwargs: Any,

) -> Any:

reason = kwargs.get("reason") if reason == "GUARDRAIL_INTERVENED":

...Logic to handle guardrail intervention...

max_tokens: int | None

Maximum number of tokens to generate.

When using Anthropic models with InvokeModel API, if not set, defaults to 1024.

metadata: dict[str, Any] | None

Metadata to add to the run trace.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'allow', 'populate_by_name': True, 'protected_namespaces': (), 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_id: str

Id of the model to call, e.g., 'amazon.titan-text-express-v1', this is equivalent to the modelId property in the list-foundation-models api. For custom and provisioned models, an ARN value is expected.

model_kwargs: Dict[str, Any] | None

Keyword arguments to pass to the model.

name: str | None

The name of the Runnable.

Used for debugging and tracing.

output_version: str | None

Version of AIMessage output format to store in message content.

AIMessage.content_blocks will lazily parse the contents of content into a standard format. This flag can be used to additionally store the standard format in message content, e.g., for serialization purposes.

Supported values:

  • 'v0': provider-specific format in content (can lazily-parse with

    content_blocks)

  • 'v1': standardized format in content (consistent with content_blocks)

Partner packages (e.g., [langchain-openai](https://pypi.org/project/langchain-openai)) can also use this field to roll out new content formats in a backward-compatible way.

!!! version-added "Added in langchain-core 1.0.0"

profile: ModelProfile | None

Profile detailing model capabilities.

!!! warning "Beta feature"

This is a beta feature. The format of model profiles is subject to change.

If not specified, automatically loaded from the provider package on initialization if data is available.

Example profile data includes context window sizes, supported modalities, or support for tool calling, structured output, and other features.

!!! version-added "Added in langchain-core 1.1.0"

provider: str | None

The model provider, e.g., 'amazon', 'cohere', 'ai21', etc. When not supplied, provider is extracted from the first part of the model_id e.g. 'amazon' in 'amazon.titan-text-express-v1'. This value should be provided for model IDs that do not have the provider in them, e.g., custom and provisioned models that have an ARN associated with them.

provider_stop_reason_key_map: Mapping[str, str]
provider_stop_sequence_key_name_map: Mapping[str, str]
rate_limiter: BaseRateLimiter | None

An optional rate limiter to use for limiting the number of requests.

region_name: str | None

The aws region e.g., us-west-2. Falls back to AWS_REGION or AWS_DEFAULT_REGION env variable or region specified in ~/.aws/config in case it is not provided here.

service_tier: Literal['priority', 'default', 'flex', 'reserved'] | None

Service tier for model invocation.

Specifies the processing tier type used for serving the request. Supported values are 'priority', 'default', 'flex', and 'reserved'.

  • 'priority': Prioritized processing for lower latency

  • 'default': Standard processing tier

  • 'flex': Flexible processing tier with lower cost

  • 'reserved': Reserved capacity for consistent performance

If not provided, AWS uses the default tier.

stop_sequences: List[str] | None

Stop sequence inference parameter from new Bedrock converse API providing a sequence of characters that causes a model to stop generating a response. See https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_InferenceConfiguration.html for more.

streaming: bool

Whether to stream the results.

system_prompt_with_tools: str
tags: list[str] | None

Tags to add to the run trace.

temperature: float | None
verbose: bool

Whether to print out response text.

class ChatBedrockConverse

Bases: AICoreBedrockBaseModel, ChatBedrockConverse

Drop-in replacement for LangChain ChatBedrockConverse.

__init__(*args, **kwargs)
Initializes the AICoreBedrockBaseModel with AICore specific parameters.

Extends the constructor of the base class with aicore specific parameters

Parameters:
  • model_id (str, optional) -- the model identifier, defaults to ""

  • deployment_id (str, optional) -- the deployment identifier, defaults to ""

  • model_name (str, optional) -- the model name, defaults to ""

  • config_id (str, optional) -- the configuration identifier, defaults to ""

  • config_name (str, optional) -- the configuration name, defaults to ""

  • proxy_client (Optional[BaseProxyClient], optional) -- the proxy client to use, defaults to None

extract_model_kwargs_parameters(kwargs)

Extracts specific parameters from model_kwargs and moves them to the top level of kwargs.

Parameters:

kwargs (Dict) -- the input keyword arguments

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'allow', 'populate_by_name': True, 'protected_namespaces': (), 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

init_chat_converse_model(proxy_client, deployment, temperature=0.0, max_tokens=256, top_k=None, top_p=1.0, stop_sequences=None, model_id='', config=None)

Initializes a chat model using the newer Bedrock Converse API (ChatBedrockConverse). The Converse API offers several advantages over the older Invoke API:

  • Unified interface for different models and modalities.

  • Native support for tool use (function calling).

  • Standardized request/response structure.

Parameters:
  • proxy_client (BaseProxyClient) -- the proxy client to use

  • deployment (Deployment) -- the deployment information

  • temperature (float, optional) -- the temperature for the model, defaults to 0.0

  • max_tokens (int, optional) -- the maximum number of tokens to generate, defaults to 256

  • top_k (Optional[int], optional) -- the top-k sampling parameter, defaults to None

  • top_p (float, optional) -- the top-p sampling parameter, defaults to 1.0

  • stop_sequences (List[str], optional) -- the stop sequences for the model, defaults to None

  • model_id (Optional[str], optional) -- the model identifier, defaults to ''

  • config (Optional[Config], optional) -- the botocore configuration, defaults to None

Returns:

the initialized chat model

Return type:

ChatBedrockConverse

init_chat_model(proxy_client, deployment, temperature=0.0, max_tokens=256, top_k=None, top_p=1.0, stop_sequences=None, model_id='', config=None)

Initializes a chat model using the legacy Bedrock Invoke API (ChatBedrock).

Parameters:
  • proxy_client (BaseProxyClient) -- the proxy client to use

  • deployment (Deployment) -- the deployment information

  • temperature (float, optional) -- the temperature for the model, defaults to 0.0

  • max_tokens (int, optional) -- the maximum number of tokens to generate, defaults to 256

  • top_k (Optional[int], optional) -- the top-k sampling parameter, defaults to None

  • top_p (float, optional) -- the top-p sampling parameter, defaults to 1.0

  • stop_sequences (List[str], optional) -- the stop sequences for the model, defaults to None

  • model_id (Optional[str], optional) -- the model identifier, defaults to ''

  • config (Optional[Config], optional) -- the botocore configuration, defaults to None

Returns:

the initialized chat model

Return type:

ChatBedrock

init_embedding_model(proxy_client, deployment, model_id='')

Initializes an embedding model using BedrockEmbeddings.

Parameters:
  • proxy_client (BaseProxyClient) -- the proxy client to use

  • deployment (Deployment) -- the deployment information

  • model_id (Optional[str], optional) -- the model identifier, defaults to ''

Returns:

the initialized embedding model

Return type:

BedrockEmbeddings

gen_ai_hub.proxy.langchain.base module

class BaseAuth

Bases: BaseModel

Base class for authentication models.

Parameters:

BaseModel (pydantic.BaseModel) -- The base model class to inherit from.

Returns:

An instance of the BaseAuth class.

Return type:

BaseAuth

config_id: str | None
config_name: str | None
deployment_id: str | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

proxy_model_name: str | None

gen_ai_hub.proxy.langchain.google_genai module

Drop-in replacements for langchain_google_genai models with SAP AI Core integration.

class ChatGoogleGenerativeAI

Bases: _BaseGoogleGenerativeAI, ChatGoogleGenerativeAI

Drop-in replacement for langchain_google_genai.ChatGoogleGenerativeAI.

additional_headers: dict[str, str] | None

Additional HTTP headers to include in API requests.

Passed as headers to HttpOptions when creating the client.

!!! example

```python llm = ChatGoogleGenerativeAI(

model="gemini-2.5-flash", additional_headers={

"X-Custom-Header": "value",

},

base_url: str | dict | None

Custom base URL for the API client.

If not provided, defaults depend on the API being used:

  • Gemini Developer API (

    [api_key][langchain_google_genai.ChatGoogleGenerativeAI.google_api_key]/ [google_api_key][langchain_google_genai.ChatGoogleGenerativeAI.google_api_key] ): https://generativelanguage.googleapis.com/

  • Vertex AI (

    [credentials][langchain_google_genai.ChatGoogleGenerativeAI.credentials]): https://{location}-aiplatform.googleapis.com/

!!! note "Backwards compatibility"

Typed to accept dict to support backwards compatibility for the (now removed) client_options param.

If a dict is passed in, it will only extract the 'api_endpoint' key.

cache: BaseCache | bool | None

Whether to cache the response.

  • If True, will use the global cache.

  • If False, will not use a cache

  • If None, will use the global cache if it's set, otherwise no cache.

  • If instance of BaseCache, will use the provided cache.

Caching is not currently supported for streaming methods of models.

cached_content: str | None

The name of the cached content used as context to serve the prediction.

!!! note

Only used in explicit caching, where users can have control over caching (e.g. what content to cache) and enjoy guaranteed cost savings. Format: cachedContents/{cachedContent}.

callbacks: Callbacks

Callbacks to add to the run trace.

client: Client | None
client_args: dict[str, Any] | None

Additional arguments to pass to the underlying HTTP client.

Applied to both sync and async clients.

!!! example "SOCKS5 proxy"

```python llm = ChatGoogleGenerativeAI(

model="gemini-2.5-flash", client_args={"proxy": "socks5://user:pass@host:port"},

convert_system_message_to_human: bool

Whether to merge any leading SystemMessage into the following HumanMessage.

Gemini does not support system messages; any unsupported messages will raise an error.

credentials: Any

Custom credentials for Vertex AI authentication.

When provided, forces Vertex AI backend (regardless of API key presence in google_api_key/api_key).

Accepts a [google.auth.credentials.Credentials](https://googleapis.dev/python/google-auth/latest/reference/google.auth.credentials.html#google.auth.credentials.Credentials) object.

If omitted and no API key is found, the SDK uses [Application Default Credentials (ADC)](https://cloud.google.com/docs/authentication/application-default-credentials).

!!! example "Service account credentials"

```python from google.oauth2 import service_account

credentials = service_account.Credentials.from_service_account_file(

"path/to/service-account.json", scopes=["https://www.googleapis.com/auth/cloud-platform"],

)

llm = ChatGoogleGenerativeAI(

model="gemini-2.5-flash", credentials=credentials, project="my-project-id",

custom_get_token_ids: Callable[[str], list[int]] | None

Optional encoder to use for counting tokens.

default_metadata: Sequence[tuple[str, str]] | None
disable_streaming: bool | Literal['tool_calling']

Whether to disable streaming for this model.

If streaming is bypassed, then stream/astream/astream_events will defer to invoke/ainvoke.

  • If True, will always bypass streaming case.

  • If 'tool_calling', will bypass streaming case only when the model is called

    with a tools keyword argument. In other words, LangChain will automatically switch to non-streaming behavior (invoke) only when the tools argument is provided. This offers the best of both worlds.

  • If False (Default), will always use streaming case if available.

The main reason for this flag is that code might be written using stream and a user may want to swap out a given model for another model whose implementation does not properly support streaming.

google_api_key: SecretStr | None

API key for authentication.

If not specified, will check the env vars GOOGLE_API_KEY and GEMINI_API_KEY with precedence given to GOOGLE_API_KEY.

!!! tip "Vertex AI with API key"

You can now use Vertex AI with API key authentication instead of service account credentials. Set GOOGLE_GENAI_USE_VERTEXAI=true or vertexai=True along with your API key and project.

image_config: dict[str, Any] | None

Configuration for image generation.

Provides control over generated image dimensions and quality for image generation models.

See [genai.types.ImageConfig](https://googleapis.github.io/python-genai/genai.html#genai.types.ImageConfig) for a list of supported fields and their values.

!!! note "Model compatibility"

This parameter only applies to image generation models. Supported parameters vary by model and backend (Gemini Developer API and Vertex AI each support different subsets of parameters and models).

See [the docs](https://docs.langchain.com/oss/python/integrations/chat/google_generative_ai#image-generation) for more details and examples.

include_thoughts: bool | None

Indicates whether to include thoughts in the response.

!!! note

This parameter is only applicable for models that support thinking.

This does not disable thinking; to disable thinking, set thinking_budget to 0. for supported models. See the thinking_budget parameter for more details.

labels: dict[str, str] | None

User-defined key-value metadata for organizing and filtering billing reports.

Attach labels to categorize API usage by team, environment, or feature.

Can be overridden per-request via invoke kwargs.

See: https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/add-labels-to-api-calls

location: str | None

Google Cloud region (Vertex AI only).

If not provided, falls back to the GOOGLE_CLOUD_LOCATION env var, then 'global'.

max_output_tokens: int | None

Maximum number of tokens to include in a candidate.

Must be greater than zero.

If unset, will use the model's default value, which varies by model.

See [docs](https://ai.google.dev/gemini-api/docs/models) for model-specific limits.

To constrain the number of thinking tokens to use when generating a response, see the thinking_budget parameter.

max_retries: int

The maximum number of retries to make when generating.

!!! warning "Disabling retries"

To disable retries, set max_retries=1 (not 0) due to a quirk in the underlying Google SDK. max_retries=0 is interpreted as "use the (Google) default" (5 retries).

Setting max_retries=1 means only the initial request is made with no retries.

!!! warning "Handling rate limits (429 errors)"

When you exceed quota limits, the API returns a 429 error with a suggested retry_delay. The SDK's built-in retry logic ignores this value and uses fixed exponential backoff instead. This is a known issue in Google's SDK and an issue has been [raised upstream](https://github.com/googleapis/python-genai/issues/1875). We plan to implement proper handling once it's supported.

If you need to respect the server's suggested retry delay, disable SDK retries with max_retries=1 and implement custom retry logic:

```python import re import time

from langchain_google_genai import ChatGoogleGenerativeAI from langchain_google_genai.chat_models import ChatGoogleGenerativeAIError

llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash", max_retries=1)

try:

response = llm.invoke("Hello")

except ChatGoogleGenerativeAIError as e:
if "429" in str(e):

# Parse retry_delay from error: "[retry_delay { seconds: N }]" match = re.search(r"retry_delays*{s*seconds:s*(d+)", str(e)) delay = int(match.group(1)) if match else 60 time.sleep(delay) # Retry...

```

media_resolution: MediaResolution | None

Media resolution for the input media.

May be defined at the individual part level, allowing for mixed-resolution requests (e.g., images and videos of different resolutions in the same request).

May be 'low', 'medium', or 'high'.

Can be set either per-part or globally for all media inputs in the request. To set globally, set in the generation_config.

!!! warning "Model compatibility"

Setting per-part media resolution requests to Gemini 2.5 models is not supported.

metadata: dict[str, Any] | None

Metadata to add to the run trace.

model: str

Model name to use.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'ignore', 'populate_by_name': True, 'protected_namespaces': (), 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_kwargs: dict[str, Any]

Holds any unexpected initialization parameters.

n: int

Number of chat completions to generate for each prompt.

Note that the API may not return the full n completions if duplicates are generated.

name: str | None

The name of the Runnable.

Used for debugging and tracing.

output_version: str | None

Version of AIMessage output format to store in message content.

AIMessage.content_blocks will lazily parse the contents of content into a standard format. This flag can be used to additionally store the standard format in message content, e.g., for serialization purposes.

Supported values:

  • 'v0': provider-specific format in content (can lazily-parse with

    content_blocks)

  • 'v1': standardized format in content (consistent with content_blocks)

Partner packages (e.g., [langchain-openai](https://pypi.org/project/langchain-openai)) can also use this field to roll out new content formats in a backward-compatible way.

!!! version-added "Added in langchain-core 1.0.0"

profile: ModelProfile | None

Profile detailing model capabilities.

!!! warning "Beta feature"

This is a beta feature. The format of model profiles is subject to change.

If not specified, automatically loaded from the provider package on initialization if data is available.

Example profile data includes context window sizes, supported modalities, or support for tool calling, structured output, and other features.

!!! version-added "Added in langchain-core 1.1.0"

project: str | None

Google Cloud project ID (Vertex AI only).

Required when using Vertex AI.

Falls back to GOOGLE_CLOUD_PROJECT env var if not provided.

rate_limiter: BaseRateLimiter | None

An optional rate limiter to use for limiting the number of requests.

response_mime_type: str | None

Output response MIME type of the generated candidate text.

Supported MIME types:
  • 'text/plain': (default) Text output.

  • 'application/json': JSON response in the candidates.

  • 'text/x.enum': Enum in plain text. (legacy; use JSON schema output instead)

!!! note

The model also needs to be prompted to output the appropriate response type, otherwise the behavior is undefined.

(In other words, simply setting this param doesn't force the model to comply; it only tells the model the kind of output expected. You still need to prompt it correctly.)

response_modalities: list[Modality] | None

A list of modalities of the response

response_schema: dict[str, Any] | None

Enforce a schema to the output.

The format of the dictionary should follow JSON Schema specification.

!!! note "Schema Transformation"

The Google GenAI SDK automatically transforms schemas for Gemini compatibility:

  • Inlines $defs definitions (enables Union types with anyOf)

  • Resolves $ref pointers for nested/recursive schemas

  • Preserves property ordering

  • Supports constraints like minimum/maximum, minItems/maxItems

!!! tip "Using Union Types"

Union types in Pydantic models (e.g., field: Union[TypeA, TypeB]) are automatically converted to anyOf schemas and work correctly with the json_schema method.

Refer to the Gemini API [docs](https://ai.google.dev/gemini-api/docs/structured-output) for more details on supported JSON Schema features.

safety_settings: SafetySettingDict | None

Default safety settings to use for all generations.

!!! example

```python from google.genai.types import HarmBlockThreshold, HarmCategory

safety_settings = {

HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE, HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_ONLY_HIGH, HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_LOW_AND_ABOVE, HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: HarmBlockThreshold.BLOCK_NONE,

seed: int | None

Seed used in decoding for reproducible generations.

By default, a random number is used.

!!! note

Using the same seed does not guarantee identical outputs, but makes them more deterministic. Reproducibility is "best effort" based on the model and infrastructure.

stop: list[str] | None

Stop sequences for the model.

streaming: bool | None

Whether to stream responses from the model.

tags: list[str] | None

Tags to add to the run trace.

temperature: float

Run inference with this temperature.

Must be within [0.0, 2.0].

!!! note "Automatic override for Gemini 3.0+ models"

If temperature is not explicitly set and the model is Gemini 3.0 or later, it will be automatically set to 1.0 instead of the default 0.7 per the Google GenAI API best practices, as it can cause infinite loops, degraded reasoning performance, and failure on complex tasks.

thinking_budget: int | None

Indicates the thinking budget in tokens.

Used to disable thinking for supported models (when set to 0) or to constrain the number of tokens used for thinking.

Dynamic thinking (allowing the model to decide how many tokens to use) is enabled when set to -1.

More information, including per-model limits, can be found in the [Gemini API docs](https://ai.google.dev/gemini-api/docs/thinking#set-budget).

thinking_level: Literal['minimal', 'low', 'medium', 'high'] | None

Indicates the thinking level.

Supported values:
  • 'low': Minimizes latency and cost.

  • 'medium': Balances latency/cost with reasoning depth.

  • 'high': Maximizes reasoning depth.

!!! note "Replaces thinking_budget"

thinking_budget is deprecated for Gemini 3+ models. If both parameters are provided, thinking_level takes precedence.

If left unspecified, the model's default thinking level is used. For Gemini 3+, this defaults to 'high'.

timeout: float | None

The maximum number of seconds to wait for a response.

top_k: int | None

Decode using top-k sampling: consider the set of top_k most probable tokens.

Must be positive.

top_p: float | None

Decode using nucleus sampling.

Consider the smallest set of tokens whose probability sum is at least top_p.

Must be within [0.0, 1.0].

verbose: bool

Whether to print out response text.

vertexai: bool | None

Whether to use Vertex AI backend.

If None (default), backend is automatically determined as follows:

  1. If the GOOGLE_GENAI_USE_VERTEXAI env var is set, uses Vertex AI

  2. If the [credentials][langchain_google_genai.ChatGoogleGenerativeAI.credentials]

    parameter is provided, uses Vertex AI

  3. If the [project][langchain_google_genai.ChatGoogleGenerativeAI.project]

    parameter is provided, uses Vertex AI

  4. Otherwise, uses Gemini Developer API

Set explicitly to True or False to override auto-detection.

!!! tip "Vertex AI with API key"

You can use Vertex AI with API key authentication by setting:

`bash export GEMINI_API_KEY='your-api-key' export GOOGLE_GENAI_USE_VERTEXAI=true export GOOGLE_CLOUD_PROJECT='your-project-id' `

Or programmatically:

```python llm = ChatGoogleGenerativeAI(

model="gemini-3-pro-preview", api_key="your-api-key", project="your-project-id", vertexai=True,

This allows for simpler authentication compared to service account JSON files.

class GoogleGenerativeAIEmbeddings

Bases: _BaseGoogleGenerativeAI, GoogleGenerativeAIEmbeddings

Drop-in replacement for langchain_google_genai.GoogleGenerativeAIEmbeddings.

additional_headers: dict[str, str] | None

Additional HTTP headers to include in API requests.

base_url: str | None

The base URL to use for the API client.

client: Any

The Google GenAI client instance.

client_args: dict[str, Any] | None

Additional arguments to pass to the underlying HTTP client.

Applied to both sync and async clients.

credentials: Any

Custom credentials for Vertex AI authentication.

When provided, forces Vertex AI backend.

Accepts a google.auth.credentials.Credentials object.

google_api_key: SecretStr | None

The Google API key to use.

If not provided, will check the env vars GOOGLE_API_KEY and GEMINI_API_KEY.

location: str | None

Google Cloud region (Vertex AI only).

Defaults to GOOGLE_CLOUD_LOCATION env var, then 'us-central1'.

model: str

The name of the embedding model to use.

Example: 'gemini-embedding-001'

model_config: ClassVar[ConfigDict] = {'extra': 'allow', 'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

output_dimensionality: int | None

Default output dimensionality for embeddings.

If set, all embed calls use this dimension unless explicitly overridden.

project: str | None

Google Cloud project ID (Vertex AI only).

Falls back to GOOGLE_CLOUD_PROJECT env var if not provided.

request_options: dict | None

A dictionary of request options to pass to the Google API client.

Example: {'timeout': 10}

task_type: str | None

The task type.

Valid options include:

  • 'TASK_TYPE_UNSPECIFIED'

  • 'RETRIEVAL_QUERY'

  • 'RETRIEVAL_DOCUMENT'

  • 'SEMANTIC_SIMILARITY'

  • 'CLASSIFICATION'

  • 'CLUSTERING'

  • 'QUESTION_ANSWERING'

  • 'FACT_VERIFICATION'

  • 'CODE_RETRIEVAL_QUERY'

See [TaskType](https://ai.google.dev/api/embeddings#tasktype) for details.

vertexai: bool | None

Whether to use Vertex AI backend.

If None (default), backend is automatically determined:

  1. If GOOGLE_GENAI_USE_VERTEXAI env var is set, uses that value

  2. If credentials parameter is provided, uses Vertex AI

  3. If project parameter is provided, uses Vertex AI

  4. Otherwise, uses Gemini Developer API

init_chat_model(proxy_client, deployment, temperature=0.0, max_tokens=256, top_k=None, top_p=1.0)

Initialize a ChatGoogleGenerativeAI model with the given parameters.

Parameters:
  • proxy_client (BaseProxyClient) -- proxy client to use for the model

  • deployment (Deployment) -- deployment information for the model

  • temperature (float, optional) -- sampling temperature, defaults to 0.0

  • max_tokens (int, optional) -- maximum number of tokens to generate, defaults to 256

  • top_k (Optional[int], optional) -- k for top-k sampling, defaults to None

  • top_p (float, optional) -- p for nucleus sampling, defaults to 1.0

Returns:

initialized ChatGoogleGenerativeAI model

Return type:

ChatGoogleGenerativeAI

init_embedding_model(proxy_client, deployment)
Parameters:

gen_ai_hub.proxy.langchain.init_models module

class Catalog

Bases: object

Catalog for registering and retrieving model deployments.

__init__()
all_embedding_models(proxy_client=None)

Retrieves all registered embedding models for the specified proxy client.

Parameters:

proxy_client (Optional[Union[str, BaseProxyClient]], optional) -- the proxy client to retrieve models for, defaults to None

Raises:

TypeError -- if the proxy client is invalid

Returns:

A dictionary of model names and their corresponding embedding model instances

Return type:

Dict[str, Embeddings]

all_llms(proxy_client=None)

Retrieves all registered language models for the specified proxy client.

Parameters:

proxy_client (Optional[Union[str, BaseProxyClient]], optional) -- the proxy client to retrieve models for, defaults to None

Raises:

TypeError -- if the proxy client is invalid

Returns:

A dictionary of model names and their corresponding language model instances

Return type:

Dict[str, BaseLanguageModel]

register(proxy_client, base_class, *model_names, f_select_deployment=None)

Registers a model deployment in the catalog.

Parameters:
  • proxy_client (Union[str, BaseProxyClient]) -- the proxy client to register the model for

  • base_class (Type[Union[BaseLanguageModel, Embeddings]]) -- the base class of the model (LLM or Embeddings)

  • f_select_deployment (Optional[Callable], optional) -- function to select the deployment, defaults to None

Raises:

TypeError -- if the base class is not supported

Returns:

Decorator function for registering the model

Return type:

Callable

retrieve(proxy_client=None, args=None, kwargs=None, model_type=None)

Retrieves a model deployment from the catalog.

Parameters:
  • proxy_client (Optional[BaseProxyClient], optional) -- the proxy client to use for retrieving the model

  • args (List[str], optional) -- the positional arguments for model identification, defaults to None

  • kwargs (Dict[str, str], optional) -- the keyword arguments for model identification, defaults to None

  • model_type (Union[str, ModelType], optional) -- the type of the model to retrieve, defaults to None

Returns:

The retrieval result containing the proxy client, deployment, and registry entry

Return type:

RetrievalResult

class ModelType

Bases: Enum

EMBEDDINGS = 2
LLM = 1
class RegisterDeployment

Bases: object

Registry entry for a model deployment.

__init__(model, init_func, f_select_deployment=None)
Parameters:
  • model (BaseLanguageModel | Embeddings)

  • init_func (Callable)

  • f_select_deployment (Callable[[BaseProxyClient, Dict[str, str]], BaseDeployment] | None)

Return type:

None

f_select_deployment: Callable[[BaseProxyClient, Dict[str, str]], BaseDeployment] | None = None
init_func: Callable
model: BaseLanguageModel | Embeddings
class RetrievalResult

Bases: object

Result of retrieving a model from the catalog.

__init__(proxy_client, deployment, registry_entry)
Parameters:
Return type:

None

deployment: BaseDeployment
proxy_client: BaseProxyClient
registry_entry: RegisterDeployment
default_f_select_deployment(proxy_client, **model_identification_kwargs)

Default function to select a deployment based on model identification kwargs.

Parameters:
  • proxy_client (BaseProxyClient) -- The proxy client to use for selecting the deployment

  • model_identification_kwargs (Dict[str, str])

Returns:

The selected deployment

Return type:

BaseDeployment

get_model_class(*args, model_type=None, proxy_client=None, **kwargs)

Retrieves the model class for the specified model.

Parameters:
  • model_type (Union[str, ModelType]) -- The type of the model to retrieve (optional)

  • proxy_client (BaseProxyClient) -- The proxy client to use for the model (optional)

Returns:

The model class

Return type:

Union[BaseLanguageModel, Embeddings]

handle_model_args_kwargs(proxy_client, args, kwargs)

Handles model identification arguments and keyword arguments.

Parameters:
  • proxy_client (_type_) -- the proxy client to use for model identification

  • args (List[Any]) -- list of positional arguments

  • kwargs (Dict[str, Any]) -- dictionary of keyword arguments

Raises:

ValueError -- if no model identification argument is provided

Returns:

A tuple containing the model name, model identification kwargs, and remaining kwargs

Return type:

Tuple[str, Dict[str, str], Dict[str, Any]]

init_embedding_model(*args, proxy_client=None, init_func=None, model_id='', **kwargs)

Initializes an embedding model using the specified parameters.

Parameters:
  • proxy_client (BaseProxyClient) -- The proxy client to use for the model (optional)

  • init_func (Callable) -- Function to call for initializing the model, optional

  • model_id (str) -- id of the Amazon Bedrock model, needed in case a custom Amazon Bedrock model is being initiated (optional)

Returns:

The initialized embedding model

Return type:

Embeddings

init_llm(*args, proxy_client=None, temperature=0.0, max_tokens=256, top_k=None, top_p=1.0, init_func=None, model_id='', **kwargs)

Initializes a language model using the specified parameters.

Parameters:
  • proxy_client (ProxyClient) -- The proxy client to use for the model (optional)

  • temperature (float) -- The temperature parameter for model generation (default: 0.0)

  • max_tokens (int) -- The maximum number of tokens to generate (default: 256)

  • top_k (int) -- The top-k parameter for model generation (optional)

  • top_p (float) -- The top-p parameter for model generation (default: 1.0)

  • init_func (Callable) -- Function to call for initializing the model, optional

  • model_id (str) -- id of the Amazon Bedrock model, needed in case a custom Amazon Bedrock model is being initiated (optional)

Returns:

The initialized language model

Return type:

BaseLanguageModel

gen_ai_hub.proxy.langchain.openai module

LangChain wrappers for OpenAI models via Generative AI Hub.

class ChatOpenAI

Bases: ProxyOpenAI, ChatOpenAI

ChatOpenAI model using a proxy.

Parameters:
  • ProxyOpenAI (class) -- Base class for OpenAI models using a proxy

  • ChatOpenAI (class) -- ChatOpenAI class from langchain_openai

classmethod validate_environment(values)

Validates the environment.

Parameters:

values (Dict) -- The input values

Raises:

ValueError -- n must be at least 1.

Returns:

The validated values

Return type:

Dict

static __new__(cls, **data)

Initialize the OpenAI object. :param data: Additional data to initialize the object :type data: Any :return: The initialized OpenAI object :rtype: OpenAIBase

Parameters:

data (Any)

__init__(*args, **kwargs)

Initialize the ChatOpenAI object.

async_client: Any
cache: BaseCache | bool | None

Whether to cache the response.

  • If True, will use the global cache.

  • If False, will not use a cache

  • If None, will use the global cache if it's set, otherwise no cache.

  • If instance of BaseCache, will use the provided cache.

Caching is not currently supported for streaming methods of models.

callbacks: Callbacks

Callbacks to add to the run trace.

client: Any
config_id: str | None
config_name: str | None
context_management: list[dict[str, Any]] | None

Configuration for [context management](https://developers.openai.com/api/docs/guides/compaction).

custom_get_token_ids: Callable[[str], list[int]] | None

Optional encoder to use for counting tokens.

default_headers: Mapping[str, str] | None
default_query: Mapping[str, object] | None
deployment_id: str | None
disable_streaming: bool | Literal['tool_calling']

Whether to disable streaming for this model.

If streaming is bypassed, then stream/astream/astream_events will defer to invoke/ainvoke.

  • If True, will always bypass streaming case.

  • If 'tool_calling', will bypass streaming case only when the model is called

    with a tools keyword argument. In other words, LangChain will automatically switch to non-streaming behavior (invoke) only when the tools argument is provided. This offers the best of both worlds.

  • If False (Default), will always use streaming case if available.

The main reason for this flag is that code might be written using stream and a user may want to swap out a given model for another model whose implementation does not properly support streaming.

disabled_params: dict[str, Any] | None

Parameters of the OpenAI client or chat.completions endpoint that should be disabled for the given model.

Should be specified as {"param": None | ['val1', 'val2']} where the key is the parameter and the value is either None, meaning that parameter should never be used, or it's a list of disabled values for the parameter.

For example, older models may not support the 'parallel_tool_calls' parameter at all, in which case disabled_params={"parallel_tool_calls": None} can be passed in.

If a parameter is disabled then it will not be used by default in any methods, e.g. in with_structured_output. However this does not prevent a user from directly passed in the parameter during invocation.

extra_body: Mapping[str, Any] | None

Optional additional JSON properties to include in the request parameters when making requests to OpenAI compatible APIs, such as vLLM, LM Studio, or other providers.

This is the recommended way to pass custom parameters that are specific to your OpenAI-compatible API provider but not part of the standard OpenAI API.

Examples: - [LM Studio](https://lmstudio.ai/) TTL parameter: extra_body={"ttl": 300} - [vLLM](https://github.com/vllm-project/vllm) custom parameters:

extra_body={"use_beam_search": True}

  • Any other provider-specific parameters

!!! warning

Do not use model_kwargs for custom parameters that are not part of the standard OpenAI API, as this will cause errors when making API calls. Use extra_body instead.

frequency_penalty: float | None

Penalizes repeated tokens according to frequency.

http_async_client: Any | None

Optional httpx.AsyncClient.

Only used for async invocations. Must specify http_client as well if you'd like a custom client for sync invocations.

http_client: Any | None

Optional httpx.Client.

Only used for sync invocations. Must specify http_async_client as well if you'd like a custom client for async invocations.

include: list[str] | None

Additional fields to include in generations from Responses API.

Supported values:

  • 'file_search_call.results'

  • 'message.input_image.image_url'

  • 'computer_call_output.output.image_url'

  • 'reasoning.encrypted_content'

  • 'code_interpreter_call.outputs'

!!! version-added "Added in langchain-openai 0.3.24"

include_response_headers: bool

Whether to include response headers in the output message response_metadata.

logit_bias: dict[int, int] | None

Modify the likelihood of specified tokens appearing in the completion.

logprobs: bool | None

Whether to return logprobs.

max_retries: int | None

Maximum number of retries to make when generating.

max_tokens: int | None

Maximum number of tokens to generate.

metadata: dict[str, Any] | None

Metadata to add to the run trace.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'allow', 'populate_by_name': True, 'protected_namespaces': (), 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_kwargs: dict[str, Any]

Holds any model parameters valid for create call not explicitly specified.

model_name: str | None

Model name to use.

n: int | None

Number of chat completions to generate for each prompt.

name: str | None

The name of the Runnable.

Used for debugging and tracing.

openai_api_base: str | None

Base URL path for API requests, leave blank if not using a proxy or service emulator.

openai_api_key: SecretStr | None | Callable[[], str] | Callable[[], Awaitable[str]]

API key to use.

Can be inferred from the OPENAI_API_KEY environment variable, or specified as a string, or sync or async callable that returns a string.

??? example "Specify with environment variable"

`bash export OPENAI_API_KEY=... ` ```python from langchain_openai import ChatOpenAI

model = ChatOpenAI(model="gpt-5-nano") ```

??? example "Specify with a string"

```python from langchain_openai import ChatOpenAI

model = ChatOpenAI(model="gpt-5-nano", api_key="...") ```

??? example "Specify with a sync callable"

```python from langchain_openai import ChatOpenAI

def get_api_key() -> str:

# Custom logic to retrieve API key return "..."

model = ChatOpenAI(model="gpt-5-nano", api_key=get_api_key) ```

??? example "Specify with an async callable"

```python from langchain_openai import ChatOpenAI

async def get_api_key() -> str:

# Custom async logic to retrieve API key return "..."

model = ChatOpenAI(model="gpt-5-nano", api_key=get_api_key) ```

openai_api_version: str | None
openai_organization: str | None

Automatically inferred from env var OPENAI_ORG_ID if not provided.

openai_proxy: str | None
output_version: str | None

Version of AIMessage output format to use.

This field is used to roll-out new output formats for chat model AIMessage responses in a backwards-compatible way.

Supported values:

  • 'v0': AIMessage format as of langchain-openai 0.3.x.

  • 'responses/v1': Formats Responses API output items into AIMessage content blocks

    (Responses API only)

  • 'v1': v1 of LangChain cross-provider standard.

!!! warning "Behavior changed in langchain-openai 1.0.0"

Default updated to "responses/v1".

presence_penalty: float | None

Penalizes repeated tokens.

profile: ModelProfile | None

Profile detailing model capabilities.

!!! warning "Beta feature"

This is a beta feature. The format of model profiles is subject to change.

If not specified, automatically loaded from the provider package on initialization if data is available.

Example profile data includes context window sizes, supported modalities, or support for tool calling, structured output, and other features.

!!! version-added "Added in langchain-core 1.1.0"

proxy_model_name: str | None
rate_limiter: BaseRateLimiter | None

An optional rate limiter to use for limiting the number of requests.

reasoning: dict[str, Any] | None

Reasoning parameters for reasoning models.

For use with the Responses API.

```python reasoning={

"effort": "medium", # Can be "low", "medium", or "high" "summary": "auto", # Can be "auto", "concise", or "detailed"

}

!!! version-added "Added in langchain-openai 0.3.24"

reasoning_effort: str | None

Constrains effort on reasoning for reasoning models.

For use with the Chat Completions API. Reasoning models only.

Currently supported values are 'minimal', 'low', 'medium', and 'high'. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.

request_timeout: float | tuple[float, float] | Any | None

Timeout for requests to OpenAI completion API.

Can be float, httpx.Timeout or None.

root_async_client: Any
root_client: Any
seed: int | None

Seed for generation

service_tier: str | None

Latency tier for request.

Options are 'auto', 'default', or 'flex'.

Relevant for users of OpenAI's scale tier service.

stop: list[str] | str | None

Default stop sequences.

store: bool | None

If True, OpenAI may store response data for future use.

Defaults to True for the Responses API and False for the Chat Completions API.

!!! version-added "Added in langchain-openai 0.3.24"

stream_usage: bool | None

Whether to include usage metadata in streaming output.

If enabled, an additional message chunk will be generated during the stream including usage metadata.

This parameter is enabled unless openai_api_base is set or the model is initialized with a custom client, as many chat completions APIs do not support streaming token usage.

!!! version-added "Added in langchain-openai 0.3.9"

!!! warning "Behavior changed in langchain-openai 0.3.35"

Enabled for default base URL and client.

streaming: bool

Whether to stream the results or not.

tags: list[str] | None

Tags to add to the run trace.

temperature: float | None

What sampling temperature to use.

tiktoken_model_name: str | None

The model name to pass to tiktoken when using this class.

Tiktoken is used to count the number of tokens in documents to constrain them to be under a certain limit.

By default, when set to None, this will be the same as the embedding model name. However, there are some cases where you may want to use this Embedding class with a model name not supported by tiktoken. This can include when using Azure embeddings or when using one of the many model providers that expose an OpenAI-like API but with different models. In those cases, in order to avoid erroring when tiktoken is called, you can specify a model name to use here.

top_logprobs: int | None

Number of most likely tokens to return at each token position, each with an associated log probability.

logprobs must be set to true if this parameter is used.

top_p: float | None

Total probability mass of tokens to consider at each step.

truncation: str | None

Truncation strategy (Responses API).

Can be 'auto' or 'disabled' (default).

If 'auto', model may drop input items from the middle of the message sequence to fit the context window.

!!! version-added "Added in langchain-openai 0.3.24"

use_previous_response_id: bool

If True, always pass previous_response_id using the ID of the most recent response. Responses API only.

Input messages up to the most recent response will be dropped from request payloads.

For example, the following two are equivalent:

```python model = ChatOpenAI(

model="...", use_previous_response_id=True,

) model.invoke(

[

HumanMessage("Hello"), AIMessage("Hi there!", response_metadata={"id": "resp_123"}), HumanMessage("How are you?"),

]

)

`python model = ChatOpenAI(model="...", use_responses_api=True) model.invoke([HumanMessage("How are you?")], previous_response_id="resp_123") `

!!! version-added "Added in langchain-openai 0.3.26"

use_responses_api: bool | None

Whether to use the Responses API instead of the Chat API.

If not specified then will be inferred based on invocation params.

!!! version-added "Added in langchain-openai 0.3.9"

verbose: bool

Whether to print out response text.

verbosity: str | None

Controls the verbosity level of responses for reasoning models.

For use with the Responses API.

Currently supported values are 'low', 'medium', and 'high'.

!!! version-added "Added in langchain-openai 0.3.28"

class OpenAI

Bases: ProxyOpenAI, OpenAI

OpenAI model using a proxy.

classmethod validate_environment(values)

Validates the environment.

Parameters:

values (Dict) -- The input values

Returns:

The validated values

Return type:

Dict

static __new__(cls, **data)

Initialize the OpenAI object.

Parameters:

data (Any)

__init__(*args, **kwargs)

Initialize the OpenAI object.

allowed_special: Literal['all'] | set[str]

Set of special tokens that are allowed。

async_client: Any
batch_size: int

Batch size to use when passing multiple documents to generate.

best_of: int

Generates best_of completions server-side and returns the "best".

cache: BaseCache | bool | None

Whether to cache the response.

  • If True, will use the global cache.

  • If False, will not use a cache

  • If None, will use the global cache if it's set, otherwise no cache.

  • If instance of BaseCache, will use the provided cache.

Caching is not currently supported for streaming methods of models.

callbacks: Callbacks

Callbacks to add to the run trace.

client: Any
config_id: str | None
config_name: str | None
custom_get_token_ids: Callable[[str], list[int]] | None

Optional encoder to use for counting tokens.

default_headers: Mapping[str, str] | None
default_query: Mapping[str, object] | None
deployment_id: str | None
disallowed_special: Literal['all'] | Collection[str]

Set of special tokens that are not allowed。

extra_body: Mapping[str, Any] | None

Optional additional JSON properties to include in the request parameters when making requests to OpenAI compatible APIs, such as vLLM.

frequency_penalty: float

Penalizes repeated tokens according to frequency.

http_async_client: Any | None

Optional httpx.AsyncClient.

Only used for async invocations. Must specify http_client as well if you'd like a custom client for sync invocations.

http_client: Any | None

Optional httpx.Client.

Only used for sync invocations. Must specify http_async_client as well if you'd like a custom client for async invocations.

logit_bias: dict[str, float] | None

Adjust the probability of specific tokens being generated.

logprobs: int | None

Include the log probabilities on the logprobs most likely output tokens, as well the chosen tokens.

max_retries: int

Maximum number of retries to make when generating.

max_tokens: int

The maximum number of tokens to generate in the completion. -1 returns as many tokens as possible given the prompt and the models maximal context size.

metadata: dict[str, Any] | None

Metadata to add to the run trace.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'allow', 'populate_by_name': True, 'protected_namespaces': (), 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_kwargs: dict[str, Any]

Holds any model parameters valid for create call not explicitly specified.

model_name: str | None

Model name to use.

n: int

How many completions to generate for each prompt.

name: str | None

The name of the Runnable.

Used for debugging and tracing.

openai_api_base: str | None

Base URL path for API requests, leave blank if not using a proxy or service emulator.

openai_api_key: SecretStr | None | Callable[[], str]

Automatically inferred from env var OPENAI_API_KEY if not provided.

openai_api_version: str | None
openai_organization: str | None

Automatically inferred from env var OPENAI_ORG_ID if not provided.

openai_proxy: str | None
presence_penalty: float

Penalizes repeated tokens.

proxy_model_name: str | None
request_timeout: float | tuple[float, float] | Any | None

Timeout for requests to OpenAI completion API. Can be float, httpx.Timeout or None.

seed: int | None

Seed for generation

streaming: bool

Whether to stream the results or not.

tags: list[str] | None

Tags to add to the run trace.

temperature: float

What sampling temperature to use.

tiktoken_model_name: str | None

The model name to pass to tiktoken when using this class.

Tiktoken is used to count the number of tokens in documents to constrain them to be under a certain limit.

By default, when set to None, this will be the same as the embedding model name. However, there are some cases where you may want to use this Embedding class with a model name not supported by tiktoken. This can include when using Azure embeddings or when using one of the many model providers that expose an OpenAI-like API but with different models. In those cases, in order to avoid erroring when tiktoken is called, you can specify a model name to use here.

top_p: float

Total probability mass of tokens to consider at each step.

verbose: bool

Whether to print out response text.

class OpenAIEmbeddings

Bases: ProxyOpenAI, OpenAIEmbeddings

OpenAI Embeddings model using a proxy.

classmethod validate_environment(values)

Validates the environment.

Parameters:

values (Dict) -- The input values

Returns:

The validated values

Return type:

Dict

__init__(*args, **kwargs)

Initialize the OpenAIEmbeddings object.

allowed_special: Literal['all'] | set[str] | None
async_client: Any
check_embedding_ctx_length: bool

Whether to check the token length of inputs and automatically split inputs longer than embedding_ctx_length.

Set to False to send raw text strings directly to the API instead of tokenizing. Useful for many non-OpenAI providers (e.g. OpenRouter, Ollama, vLLM).

chunk_size: int

Maximum number of texts to embed in each batch

client: Any
config_id: str | None
config_name: str | None
default_headers: Mapping[str, str] | None
default_query: Mapping[str, object] | None
deployment: str | None
deployment_id: str | None
dimensions: int | None

The number of dimensions the resulting output embeddings should have.

Only supported in 'text-embedding-3' and later models.

disallowed_special: Literal['all'] | set[str] | Sequence[str] | None
embedding_ctx_length: int

The maximum number of tokens to embed at once.

headers: Any
http_async_client: Any | None

Optional httpx.AsyncClient.

Only used for async invocations. Must specify http_client as well if you'd like a custom client for sync invocations.

http_client: Any | None

Optional httpx.Client.

Only used for sync invocations. Must specify http_async_client as well if you'd like a custom client for async invocations.

input_type: str | None
max_retries: int

Maximum number of retries to make when generating.

model: str | None
model_config: ClassVar[ConfigDict] = {'extra': 'allow', 'populate_by_name': True, 'protected_namespaces': (), 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_kwargs: dict[str, Any]

Holds any model parameters valid for create call not explicitly specified.

openai_api_base: str | None

Base URL path for API requests, leave blank if not using a proxy or service emulator.

Automatically inferred from env var OPENAI_API_BASE if not provided.

openai_api_key: SecretStr | None | Callable[[], str] | Callable[[], Awaitable[str]]

API key to use for API calls.

Automatically inferred from env var OPENAI_API_KEY if not provided.

openai_api_type: str | None
openai_api_version: str | None

Version of the OpenAI API to use.

Automatically inferred from env var OPENAI_API_VERSION if not provided.

openai_organization: str | None

OpenAI organization ID to use for API calls.

Automatically inferred from env var OPENAI_ORG_ID if not provided.

openai_proxy: str | None
proxy_model_name: str | None
request_timeout: float | tuple[float, float] | Any | None

Timeout for requests to OpenAI completion API.

Can be float, httpx.Timeout or None.

retry_max_seconds: int

Max number of seconds to wait between retries

retry_min_seconds: int

Min number of seconds to wait between retries

show_progress_bar: bool

Whether to show a progress bar when embedding.

skip_empty: bool

Whether to skip empty strings when embedding or raise an error.

tiktoken_enabled: bool

Set this to False to use HuggingFace transformers tokenization.

For non-OpenAI providers (OpenRouter, Ollama, vLLM, etc.), consider setting check_embedding_ctx_length=False instead, as it bypasses tokenization entirely.

tiktoken_model_name: str | None

The model name to pass to tiktoken when using this class.

Tiktoken is used to count the number of tokens in documents to constrain them to be under a certain limit.

By default, when set to None, this will be the same as the embedding model name. However, there are some cases where you may want to use this Embedding class with a model name not supported by tiktoken. This can include when using Azure embeddings or when using one of the many model providers that expose an OpenAI-like API but with different models. In those cases, in order to avoid erroring when tiktoken is called, you can specify a model name to use here.

class ProxyOpenAI

Bases: BaseAuth

Base class for OpenAI models using a proxy.

Parameters:

BaseAuth (class) -- Base authentication class

Returns:

The ProxyOpenAI class

Return type:

class

classmethod validate_clients(values)

Validate and initialize OpenAI clients.

Parameters:

values (Dict) -- The input values

Returns:

The validated values

Return type:

Dict

config_id: str | None
config_name: str | None
deployment_id: str | None
model_config: ClassVar[ConfigDict] = {'extra': 'allow'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

proxy_model_name: str | None
get_client_params(values)

Get the client parameters. :param values: The client values :return: client values + proxy_client

init_chat_model(proxy_client, deployment, temperature=0.0, max_tokens=256, top_k=None, top_p=1.0)

Initialize the ChatOpenAI model.

Parameters:
  • proxy_client (BaseProxyClient) -- the proxy client

  • deployment (BaseDeployment) -- the deployment

  • temperature (float, optional) -- the temperature, defaults to 0.0

  • max_tokens (int, optional) -- the maximum tokens, defaults to 256

  • top_k (Optional[int], optional) -- the top k, defaults to None

  • top_p (float, optional) -- the top p, defaults to 1.0

Returns:

the ChatOpenAI model

Return type:

ChatOpenAI

init_embedding_model(proxy_client, deployment)

Initialize the OpenAIEmbeddings model.

Parameters:
Returns:

the OpenAIEmbeddings model

Return type:

OpenAIEmbeddings