gen_ai_hub.proxy.langchain package
- class ChatBedrock
Bases:
AICoreBedrockBaseModel,ChatBedrockDrop-in replacement for LangChain ChatBedrock.
- __init__(*args, **kwargs)
- Initializes the AICoreBedrockBaseModel with AICore specific parameters.
Extends the constructor of the base class with aicore specific parameters
- Parameters:
model_id (str, optional) -- the model identifier, defaults to ""
deployment_id (str, optional) -- the deployment identifier, defaults to ""
model_name (str, optional) -- the model name, defaults to ""
config_id (str, optional) -- the configuration identifier, defaults to ""
config_name (str, optional) -- the configuration name, defaults to ""
proxy_client (Optional[BaseProxyClient], optional) -- the proxy client to use, defaults to None
- beta_use_converse_api: bool
Use the new Bedrock converse API which provides a standardized interface to all Bedrock models. Support still in beta. See ChatBedrockConverse docs for more.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'allow', 'populate_by_name': True, 'protected_namespaces': (), 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- stop_sequences: List[str] | None
Stop sequence inference parameter from new Bedrock converse API providing a sequence of characters that causes a model to stop generating a response. See https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_InferenceConfiguration.html for more.
- system_prompt_with_tools: str
- class ChatGoogleGenerativeAI
Bases:
_BaseGoogleGenerativeAI,ChatGoogleGenerativeAIDrop-in replacement for langchain_google_genai.ChatGoogleGenerativeAI.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'ignore', 'populate_by_name': True, 'protected_namespaces': (), 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class ChatOpenAI
Bases:
ProxyOpenAI,ChatOpenAIChatOpenAI model using a proxy.
- Parameters:
ProxyOpenAI (class) -- Base class for OpenAI models using a proxy
ChatOpenAI (class) -- ChatOpenAI class from langchain_openai
- classmethod validate_environment(values)
Validates the environment.
- Parameters:
values (Dict) -- The input values
- Raises:
ValueError -- n must be at least 1.
- Returns:
The validated values
- Return type:
Dict
- static __new__(cls, **data)
Initialize the OpenAI object. :param data: Additional data to initialize the object :type data: Any :return: The initialized OpenAI object :rtype: OpenAIBase
- Parameters:
data (Any)
- __init__(*args, **kwargs)
Initialize the ChatOpenAI object.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'allow', 'populate_by_name': True, 'protected_namespaces': (), 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_name: str | None
Model name to use.
- openai_api_version: str | None
- class GoogleGenerativeAIEmbeddings
Bases:
_BaseGoogleGenerativeAI,GoogleGenerativeAIEmbeddingsDrop-in replacement for langchain_google_genai.GoogleGenerativeAIEmbeddings.
- model_config: ClassVar[ConfigDict] = {'extra': 'allow', 'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class OpenAI
Bases:
ProxyOpenAI,OpenAIOpenAI model using a proxy.
- classmethod validate_environment(values)
Validates the environment.
- Parameters:
values (Dict) -- The input values
- Returns:
The validated values
- Return type:
Dict
- static __new__(cls, **data)
Initialize the OpenAI object.
- Parameters:
data (Any)
- __init__(*args, **kwargs)
Initialize the OpenAI object.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'allow', 'populate_by_name': True, 'protected_namespaces': (), 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_name: str | None
Model name to use.
- openai_api_version: str | None
- class OpenAIEmbeddings
Bases:
ProxyOpenAI,OpenAIEmbeddingsOpenAI Embeddings model using a proxy.
- classmethod validate_environment(values)
Validates the environment.
- Parameters:
values (Dict) -- The input values
- Returns:
The validated values
- Return type:
Dict
- __init__(*args, **kwargs)
Initialize the OpenAIEmbeddings object.
- chunk_size: int
Maximum number of texts to embed in each batch
- input_type: str | None
- model: str | None
- model_config: ClassVar[ConfigDict] = {'extra': 'allow', 'populate_by_name': True, 'protected_namespaces': (), 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- openai_api_version: str | None
Version of the OpenAI API to use.
Automatically inferred from env var OPENAI_API_VERSION if not provided.
- tiktoken_model_name: str | None
The model name to pass to tiktoken when using this class.
Tiktoken is used to count the number of tokens in documents to constrain them to be under a certain limit.
By default, when set to None, this will be the same as the embedding model name. However, there are some cases where you may want to use this Embedding class with a model name not supported by tiktoken. This can include when using Azure embeddings or when using one of the many model providers that expose an OpenAI-like API but with different models. In those cases, in order to avoid erroring when tiktoken is called, you can specify a model name to use here.
- init_embedding_model(*args, proxy_client=None, init_func=None, model_id='', **kwargs)
Initializes an embedding model using the specified parameters.
- Parameters:
proxy_client (BaseProxyClient) -- The proxy client to use for the model (optional)
init_func (Callable) -- Function to call for initializing the model, optional
model_id (str) -- id of the Amazon Bedrock model, needed in case a custom Amazon Bedrock model is being initiated (optional)
- Returns:
The initialized embedding model
- Return type:
- init_llm(*args, proxy_client=None, temperature=0.0, max_tokens=256, top_k=None, top_p=1.0, init_func=None, model_id='', **kwargs)
Initializes a language model using the specified parameters.
- Parameters:
proxy_client (ProxyClient) -- The proxy client to use for the model (optional)
temperature (float) -- The temperature parameter for model generation (default: 0.0)
max_tokens (int) -- The maximum number of tokens to generate (default: 256)
top_k (int) -- The top-k parameter for model generation (optional)
top_p (float) -- The top-p parameter for model generation (default: 1.0)
init_func (Callable) -- Function to call for initializing the model, optional
model_id (str) -- id of the Amazon Bedrock model, needed in case a custom Amazon Bedrock model is being initiated (optional)
- Returns:
The initialized language model
- Return type:
BaseLanguageModel
Submodules
gen_ai_hub.proxy.langchain.amazon module
- class AICoreBedrockBaseModel
Bases:
BaseModelAICoreBedrockBaseModel provides all adjustments to boto3 based LangChain classes to enable communication with SAP AI Core.
- classmethod get_corresponding_model_id(model_name)
Gets the corresponding model ID for a given model name.
- Parameters:
model_name (str) -- the model name
- Raises:
ValueError -- if the model name is not supported
- Returns:
the corresponding model ID
- Return type:
str
- classmethod validate_environment(values)
Validates and sets up the environment for the model.
- Parameters:
values (Dict) -- the input values
- Returns:
the validated values
- Return type:
Dict
- __init__(*args, model_id='', deployment_id='', model_name='', config_id='', config_name='', proxy_client=None, **kwargs)
- Initializes the AICoreBedrockBaseModel with AICore specific parameters.
Extends the constructor of the base class with aicore specific parameters
- Parameters:
model_id (str, optional) -- the model identifier, defaults to ""
deployment_id (str, optional) -- the deployment identifier, defaults to ""
model_name (str, optional) -- the model name, defaults to ""
config_id (str, optional) -- the configuration identifier, defaults to ""
config_name (str, optional) -- the configuration name, defaults to ""
proxy_client (Optional[BaseProxyClient], optional) -- the proxy client to use, defaults to None
- model_config: ClassVar[ConfigDict] = {'extra': 'allow'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class BedrockEmbeddings
Bases:
AICoreBedrockBaseModel,BedrockEmbeddingsDrop-in replacement for LangChain BedrockEmbeddings.
- __init__(*args, **kwargs)
- Initializes the AICoreBedrockBaseModel with AICore specific parameters.
Extends the constructor of the base class with aicore specific parameters
- Parameters:
model_id (str, optional) -- the model identifier, defaults to ""
deployment_id (str, optional) -- the deployment identifier, defaults to ""
model_name (str, optional) -- the model name, defaults to ""
config_id (str, optional) -- the configuration identifier, defaults to ""
config_name (str, optional) -- the configuration name, defaults to ""
proxy_client (Optional[BaseProxyClient], optional) -- the proxy client to use, defaults to None
- model_config: ClassVar[ConfigDict] = {'extra': 'allow', 'protected_namespaces': ()}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class ChatBedrock
Bases:
AICoreBedrockBaseModel,ChatBedrockDrop-in replacement for LangChain ChatBedrock.
- __init__(*args, **kwargs)
- Initializes the AICoreBedrockBaseModel with AICore specific parameters.
Extends the constructor of the base class with aicore specific parameters
- Parameters:
model_id (str, optional) -- the model identifier, defaults to ""
deployment_id (str, optional) -- the deployment identifier, defaults to ""
model_name (str, optional) -- the model name, defaults to ""
config_id (str, optional) -- the configuration identifier, defaults to ""
config_name (str, optional) -- the configuration name, defaults to ""
proxy_client (Optional[BaseProxyClient], optional) -- the proxy client to use, defaults to None
- aws_access_key_id: SecretStr | None
AWS access key id.
If provided, aws_secret_access_key must also be provided.
If not specified, the default credential profile or, if on an EC2 instance, credentials from IMDS will be used.
See: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html
If not provided, will be read from AWS_ACCESS_KEY_ID environment variable.
- aws_secret_access_key: SecretStr | None
AWS secret_access_key.
If provided, aws_access_key_id must also be provided.
If not specified, the default credential profile or, if on an EC2 instance, credentials from IMDS will be used.
See: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html
If not provided, will be read from AWS_SECRET_ACCESS_KEY environment variable.
- aws_session_token: SecretStr | None
AWS session token.
If provided, aws_access_key_id and aws_secret_access_key must also be provided.
Not required unless using temporary credentials.
See: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html
If not provided, will be read from AWS_SESSION_TOKEN environment variable.
- base_model_id: str | None
An optional field to pass the base model id. If provided, this will be used over the value of model_id to identify the base model.
- bedrock_api_key: SecretStr | None
Bedrock API key.
Enables authentication using Bedrock API keys instead of standard AWS credentials. When provided, the key is set as the AWS_BEARER_TOKEN_BEDROCK environment variable.
See: https://docs.aws.amazon.com/bedrock/latest/userguide/api-keys-use.html
If not provided, will be read from AWS_BEARER_TOKEN_BEDROCK environment variable.
If both an API key and AWS credentials are present, the API key takes precedence.
- bedrock_client: Any
The bedrock client for making control plane API calls
- beta_use_converse_api: bool
Use the new Bedrock converse API which provides a standardized interface to all Bedrock models. Support still in beta. See ChatBedrockConverse docs for more.
- cache: BaseCache | bool | None
Whether to cache the response.
If True, will use the global cache.
If False, will not use a cache
If None, will use the global cache if it's set, otherwise no cache.
If instance of BaseCache, will use the provided cache.
Caching is not currently supported for streaming methods of models.
- callbacks: Callbacks
Callbacks to add to the run trace.
- client: Any
The bedrock runtime client for making data plane API calls
- config: Any
An optional botocore.config.Config instance to pass to the client.
- credentials_profile_name: str | None
The name of the profile in the ~/.aws/credentials or ~/.aws/config files, which has either access keys or role information specified.
If not specified, the default credential profile or, if on an EC2 instance, credentials from IMDS will be used.
See: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html
- custom_get_token_ids: Callable[[str], list[int]] | None
Optional encoder to use for counting tokens.
- disable_streaming: bool | Literal['tool_calling']
Whether to disable streaming for this model.
If streaming is bypassed, then stream/astream/astream_events will defer to invoke/ainvoke.
If True, will always bypass streaming case.
- If 'tool_calling', will bypass streaming case only when the model is called
with a tools keyword argument. In other words, LangChain will automatically switch to non-streaming behavior (invoke) only when the tools argument is provided. This offers the best of both worlds.
If False (Default), will always use streaming case if available.
The main reason for this flag is that code might be written using stream and a user may want to swap out a given model for another model whose implementation does not properly support streaming.
- endpoint_url: str | None
Needed if you don't want to default to 'us-east-1' endpoint
- guardrails: Mapping[str, Any] | None
An optional dictionary to configure guardrails for Bedrock.
This field guardrails consists of two keys: 'guardrailId' and 'guardrailVersion', which should be strings, but are initialized to None.
It's used to determine if specific guardrails are enabled and properly set.
- Type:
Optional[Mapping[str, str]]: A mapping with 'guardrailId' and 'guardrailVersion' keys.
- Example:
```python llm = BedrockLLM(model_id="<model_id>", client=<bedrock_client>,
model_kwargs={}, guardrails={
"guardrailId": "<guardrail_id>", "guardrailVersion": "<guardrail_version>"})
To enable tracing for guardrails, set the 'trace' key to True and pass a callback handler to the 'run_manager' parameter of the 'generate', '_call' methods.
- Example:
```python llm = BedrockLLM(model_id="<model_id>", client=<bedrock_client>,
model_kwargs={}, guardrails={
"guardrailId": "<guardrail_id>", "guardrailVersion": "<guardrail_version>", "trace": True},
callbacks=[BedrockAsyncCallbackHandler()])
https://python.langchain.com/docs/concepts/callbacks/ for more information on callback handlers.
- class BedrockAsyncCallbackHandler(AsyncCallbackHandler):
- async def on_llm_error(
self, error: BaseException, **kwargs: Any,
- ) -> Any:
reason = kwargs.get("reason") if reason == "GUARDRAIL_INTERVENED":
...Logic to handle guardrail intervention...
- max_tokens: int | None
Maximum number of tokens to generate.
When using Anthropic models with InvokeModel API, if not set, defaults to 1024.
- metadata: dict[str, Any] | None
Metadata to add to the run trace.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'allow', 'populate_by_name': True, 'protected_namespaces': (), 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_id: str
Id of the model to call, e.g., 'amazon.titan-text-express-v1', this is equivalent to the modelId property in the list-foundation-models api. For custom and provisioned models, an ARN value is expected.
- model_kwargs: Dict[str, Any] | None
Keyword arguments to pass to the model.
- name: str | None
The name of the Runnable.
Used for debugging and tracing.
- output_version: str | None
Version of AIMessage output format to store in message content.
AIMessage.content_blocks will lazily parse the contents of content into a standard format. This flag can be used to additionally store the standard format in message content, e.g., for serialization purposes.
Supported values:
- 'v0': provider-specific format in content (can lazily-parse with
content_blocks)
'v1': standardized format in content (consistent with content_blocks)
Partner packages (e.g., [langchain-openai](https://pypi.org/project/langchain-openai)) can also use this field to roll out new content formats in a backward-compatible way.
!!! version-added "Added in langchain-core 1.0.0"
- profile: ModelProfile | None
Profile detailing model capabilities.
!!! warning "Beta feature"
This is a beta feature. The format of model profiles is subject to change.
If not specified, automatically loaded from the provider package on initialization if data is available.
Example profile data includes context window sizes, supported modalities, or support for tool calling, structured output, and other features.
!!! version-added "Added in langchain-core 1.1.0"
- provider: str | None
The model provider, e.g., 'amazon', 'cohere', 'ai21', etc. When not supplied, provider is extracted from the first part of the model_id e.g. 'amazon' in 'amazon.titan-text-express-v1'. This value should be provided for model IDs that do not have the provider in them, e.g., custom and provisioned models that have an ARN associated with them.
- provider_stop_reason_key_map: Mapping[str, str]
- provider_stop_sequence_key_name_map: Mapping[str, str]
- rate_limiter: BaseRateLimiter | None
An optional rate limiter to use for limiting the number of requests.
- region_name: str | None
The aws region e.g., us-west-2. Falls back to AWS_REGION or AWS_DEFAULT_REGION env variable or region specified in ~/.aws/config in case it is not provided here.
- service_tier: Literal['priority', 'default', 'flex', 'reserved'] | None
Service tier for model invocation.
Specifies the processing tier type used for serving the request. Supported values are 'priority', 'default', 'flex', and 'reserved'.
'priority': Prioritized processing for lower latency
'default': Standard processing tier
'flex': Flexible processing tier with lower cost
'reserved': Reserved capacity for consistent performance
If not provided, AWS uses the default tier.
- stop_sequences: List[str] | None
Stop sequence inference parameter from new Bedrock converse API providing a sequence of characters that causes a model to stop generating a response. See https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_InferenceConfiguration.html for more.
- streaming: bool
Whether to stream the results.
- system_prompt_with_tools: str
- tags: list[str] | None
Tags to add to the run trace.
- temperature: float | None
- verbose: bool
Whether to print out response text.
- class ChatBedrockConverse
Bases:
AICoreBedrockBaseModel,ChatBedrockConverseDrop-in replacement for LangChain ChatBedrockConverse.
- __init__(*args, **kwargs)
- Initializes the AICoreBedrockBaseModel with AICore specific parameters.
Extends the constructor of the base class with aicore specific parameters
- Parameters:
model_id (str, optional) -- the model identifier, defaults to ""
deployment_id (str, optional) -- the deployment identifier, defaults to ""
model_name (str, optional) -- the model name, defaults to ""
config_id (str, optional) -- the configuration identifier, defaults to ""
config_name (str, optional) -- the configuration name, defaults to ""
proxy_client (Optional[BaseProxyClient], optional) -- the proxy client to use, defaults to None
- extract_model_kwargs_parameters(kwargs)
Extracts specific parameters from model_kwargs and moves them to the top level of kwargs.
- Parameters:
kwargs (Dict) -- the input keyword arguments
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'allow', 'populate_by_name': True, 'protected_namespaces': (), 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- init_chat_converse_model(proxy_client, deployment, temperature=0.0, max_tokens=256, top_k=None, top_p=1.0, stop_sequences=None, model_id='', config=None)
Initializes a chat model using the newer Bedrock Converse API (ChatBedrockConverse). The Converse API offers several advantages over the older Invoke API:
Unified interface for different models and modalities.
Native support for tool use (function calling).
Standardized request/response structure.
- Parameters:
proxy_client (BaseProxyClient) -- the proxy client to use
deployment (Deployment) -- the deployment information
temperature (float, optional) -- the temperature for the model, defaults to 0.0
max_tokens (int, optional) -- the maximum number of tokens to generate, defaults to 256
top_k (Optional[int], optional) -- the top-k sampling parameter, defaults to None
top_p (float, optional) -- the top-p sampling parameter, defaults to 1.0
stop_sequences (List[str], optional) -- the stop sequences for the model, defaults to None
model_id (Optional[str], optional) -- the model identifier, defaults to ''
config (Optional[Config], optional) -- the botocore configuration, defaults to None
- Returns:
the initialized chat model
- Return type:
- init_chat_model(proxy_client, deployment, temperature=0.0, max_tokens=256, top_k=None, top_p=1.0, stop_sequences=None, model_id='', config=None)
Initializes a chat model using the legacy Bedrock Invoke API (ChatBedrock).
- Parameters:
proxy_client (BaseProxyClient) -- the proxy client to use
deployment (Deployment) -- the deployment information
temperature (float, optional) -- the temperature for the model, defaults to 0.0
max_tokens (int, optional) -- the maximum number of tokens to generate, defaults to 256
top_k (Optional[int], optional) -- the top-k sampling parameter, defaults to None
top_p (float, optional) -- the top-p sampling parameter, defaults to 1.0
stop_sequences (List[str], optional) -- the stop sequences for the model, defaults to None
model_id (Optional[str], optional) -- the model identifier, defaults to ''
config (Optional[Config], optional) -- the botocore configuration, defaults to None
- Returns:
the initialized chat model
- Return type:
- init_embedding_model(proxy_client, deployment, model_id='')
Initializes an embedding model using BedrockEmbeddings.
- Parameters:
proxy_client (BaseProxyClient) -- the proxy client to use
deployment (Deployment) -- the deployment information
model_id (Optional[str], optional) -- the model identifier, defaults to ''
- Returns:
the initialized embedding model
- Return type:
gen_ai_hub.proxy.langchain.base module
- class BaseAuth
Bases:
BaseModelBase class for authentication models.
- Parameters:
BaseModel (pydantic.BaseModel) -- The base model class to inherit from.
- Returns:
An instance of the BaseAuth class.
- Return type:
- config_id: str | None
- config_name: str | None
- deployment_id: str | None
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- proxy_model_name: str | None
gen_ai_hub.proxy.langchain.google_genai module
Drop-in replacements for langchain_google_genai models with SAP AI Core integration.
- class ChatGoogleGenerativeAI
Bases:
_BaseGoogleGenerativeAI,ChatGoogleGenerativeAIDrop-in replacement for langchain_google_genai.ChatGoogleGenerativeAI.
- additional_headers: dict[str, str] | None
Additional HTTP headers to include in API requests.
Passed as headers to HttpOptions when creating the client.
!!! example
- base_url: str | dict | None
Custom base URL for the API client.
If not provided, defaults depend on the API being used:
- Gemini Developer API (
[api_key][langchain_google_genai.ChatGoogleGenerativeAI.google_api_key]/ [google_api_key][langchain_google_genai.ChatGoogleGenerativeAI.google_api_key] ): https://generativelanguage.googleapis.com/
- Vertex AI (
[credentials][langchain_google_genai.ChatGoogleGenerativeAI.credentials]): https://{location}-aiplatform.googleapis.com/
!!! note "Backwards compatibility"
Typed to accept dict to support backwards compatibility for the (now removed) client_options param.
If a dict is passed in, it will only extract the 'api_endpoint' key.
- cache: BaseCache | bool | None
Whether to cache the response.
If True, will use the global cache.
If False, will not use a cache
If None, will use the global cache if it's set, otherwise no cache.
If instance of BaseCache, will use the provided cache.
Caching is not currently supported for streaming methods of models.
- cached_content: str | None
The name of the cached content used as context to serve the prediction.
!!! note
Only used in explicit caching, where users can have control over caching (e.g. what content to cache) and enjoy guaranteed cost savings. Format: cachedContents/{cachedContent}.
- callbacks: Callbacks
Callbacks to add to the run trace.
- client: Client | None
- client_args: dict[str, Any] | None
Additional arguments to pass to the underlying HTTP client.
Applied to both sync and async clients.
!!! example "SOCKS5 proxy"
- convert_system_message_to_human: bool
Whether to merge any leading SystemMessage into the following HumanMessage.
Gemini does not support system messages; any unsupported messages will raise an error.
- credentials: Any
Custom credentials for Vertex AI authentication.
When provided, forces Vertex AI backend (regardless of API key presence in google_api_key/api_key).
Accepts a [google.auth.credentials.Credentials](https://googleapis.dev/python/google-auth/latest/reference/google.auth.credentials.html#google.auth.credentials.Credentials) object.
If omitted and no API key is found, the SDK uses [Application Default Credentials (ADC)](https://cloud.google.com/docs/authentication/application-default-credentials).
!!! example "Service account credentials"
```python from google.oauth2 import service_account
- credentials = service_account.Credentials.from_service_account_file(
"path/to/service-account.json", scopes=["https://www.googleapis.com/auth/cloud-platform"],
)
- llm = ChatGoogleGenerativeAI(
model="gemini-2.5-flash", credentials=credentials, project="my-project-id",
- custom_get_token_ids: Callable[[str], list[int]] | None
Optional encoder to use for counting tokens.
- default_metadata: Sequence[tuple[str, str]] | None
- disable_streaming: bool | Literal['tool_calling']
Whether to disable streaming for this model.
If streaming is bypassed, then stream/astream/astream_events will defer to invoke/ainvoke.
If True, will always bypass streaming case.
- If 'tool_calling', will bypass streaming case only when the model is called
with a tools keyword argument. In other words, LangChain will automatically switch to non-streaming behavior (invoke) only when the tools argument is provided. This offers the best of both worlds.
If False (Default), will always use streaming case if available.
The main reason for this flag is that code might be written using stream and a user may want to swap out a given model for another model whose implementation does not properly support streaming.
- google_api_key: SecretStr | None
API key for authentication.
If not specified, will check the env vars GOOGLE_API_KEY and GEMINI_API_KEY with precedence given to GOOGLE_API_KEY.
Gemini Developer API: API key is required (default when no project is set)
- Vertex AI: API key is optional (set vertexai=True or provide project)
If provided, uses API key for authentication
- If not provided, uses [Application Default Credentials (ADC)](https://docs.cloud.google.com/docs/authentication/application-default-credentials)
or credentials parameter
!!! tip "Vertex AI with API key"
You can now use Vertex AI with API key authentication instead of service account credentials. Set GOOGLE_GENAI_USE_VERTEXAI=true or vertexai=True along with your API key and project.
- image_config: dict[str, Any] | None
Configuration for image generation.
Provides control over generated image dimensions and quality for image generation models.
See [genai.types.ImageConfig](https://googleapis.github.io/python-genai/genai.html#genai.types.ImageConfig) for a list of supported fields and their values.
!!! note "Model compatibility"
This parameter only applies to image generation models. Supported parameters vary by model and backend (Gemini Developer API and Vertex AI each support different subsets of parameters and models).
See [the docs](https://docs.langchain.com/oss/python/integrations/chat/google_generative_ai#image-generation) for more details and examples.
- include_thoughts: bool | None
Indicates whether to include thoughts in the response.
!!! note
This parameter is only applicable for models that support thinking.
This does not disable thinking; to disable thinking, set thinking_budget to 0. for supported models. See the thinking_budget parameter for more details.
- labels: dict[str, str] | None
User-defined key-value metadata for organizing and filtering billing reports.
Attach labels to categorize API usage by team, environment, or feature.
Can be overridden per-request via invoke kwargs.
See: https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/add-labels-to-api-calls
- location: str | None
Google Cloud region (Vertex AI only).
If not provided, falls back to the GOOGLE_CLOUD_LOCATION env var, then 'global'.
- max_output_tokens: int | None
Maximum number of tokens to include in a candidate.
Must be greater than zero.
If unset, will use the model's default value, which varies by model.
See [docs](https://ai.google.dev/gemini-api/docs/models) for model-specific limits.
To constrain the number of thinking tokens to use when generating a response, see the thinking_budget parameter.
- max_retries: int
The maximum number of retries to make when generating.
!!! warning "Disabling retries"
To disable retries, set max_retries=1 (not 0) due to a quirk in the underlying Google SDK. max_retries=0 is interpreted as "use the (Google) default" (5 retries).
Setting max_retries=1 means only the initial request is made with no retries.
!!! warning "Handling rate limits (429 errors)"
When you exceed quota limits, the API returns a 429 error with a suggested retry_delay. The SDK's built-in retry logic ignores this value and uses fixed exponential backoff instead. This is a known issue in Google's SDK and an issue has been [raised upstream](https://github.com/googleapis/python-genai/issues/1875). We plan to implement proper handling once it's supported.
If you need to respect the server's suggested retry delay, disable SDK retries with max_retries=1 and implement custom retry logic:
```python import re import time
from langchain_google_genai import ChatGoogleGenerativeAI from langchain_google_genai.chat_models import ChatGoogleGenerativeAIError
llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash", max_retries=1)
- try:
response = llm.invoke("Hello")
- except ChatGoogleGenerativeAIError as e:
- if "429" in str(e):
# Parse retry_delay from error: "[retry_delay { seconds: N }]" match = re.search(r"retry_delays*{s*seconds:s*(d+)", str(e)) delay = int(match.group(1)) if match else 60 time.sleep(delay) # Retry...
- media_resolution: MediaResolution | None
Media resolution for the input media.
May be defined at the individual part level, allowing for mixed-resolution requests (e.g., images and videos of different resolutions in the same request).
May be 'low', 'medium', or 'high'.
Can be set either per-part or globally for all media inputs in the request. To set globally, set in the generation_config.
!!! warning "Model compatibility"
Setting per-part media resolution requests to Gemini 2.5 models is not supported.
- metadata: dict[str, Any] | None
Metadata to add to the run trace.
- model: str
Model name to use.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'ignore', 'populate_by_name': True, 'protected_namespaces': (), 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_kwargs: dict[str, Any]
Holds any unexpected initialization parameters.
- n: int
Number of chat completions to generate for each prompt.
Note that the API may not return the full n completions if duplicates are generated.
- name: str | None
The name of the Runnable.
Used for debugging and tracing.
- output_version: str | None
Version of AIMessage output format to store in message content.
AIMessage.content_blocks will lazily parse the contents of content into a standard format. This flag can be used to additionally store the standard format in message content, e.g., for serialization purposes.
Supported values:
- 'v0': provider-specific format in content (can lazily-parse with
content_blocks)
'v1': standardized format in content (consistent with content_blocks)
Partner packages (e.g., [langchain-openai](https://pypi.org/project/langchain-openai)) can also use this field to roll out new content formats in a backward-compatible way.
!!! version-added "Added in langchain-core 1.0.0"
- profile: ModelProfile | None
Profile detailing model capabilities.
!!! warning "Beta feature"
This is a beta feature. The format of model profiles is subject to change.
If not specified, automatically loaded from the provider package on initialization if data is available.
Example profile data includes context window sizes, supported modalities, or support for tool calling, structured output, and other features.
!!! version-added "Added in langchain-core 1.1.0"
- project: str | None
Google Cloud project ID (Vertex AI only).
Required when using Vertex AI.
Falls back to GOOGLE_CLOUD_PROJECT env var if not provided.
- rate_limiter: BaseRateLimiter | None
An optional rate limiter to use for limiting the number of requests.
- response_mime_type: str | None
Output response MIME type of the generated candidate text.
- Supported MIME types:
'text/plain': (default) Text output.
'application/json': JSON response in the candidates.
'text/x.enum': Enum in plain text. (legacy; use JSON schema output instead)
!!! note
The model also needs to be prompted to output the appropriate response type, otherwise the behavior is undefined.
(In other words, simply setting this param doesn't force the model to comply; it only tells the model the kind of output expected. You still need to prompt it correctly.)
- response_modalities: list[Modality] | None
A list of modalities of the response
- response_schema: dict[str, Any] | None
Enforce a schema to the output.
The format of the dictionary should follow JSON Schema specification.
!!! note "Schema Transformation"
The Google GenAI SDK automatically transforms schemas for Gemini compatibility:
Inlines $defs definitions (enables Union types with anyOf)
Resolves $ref pointers for nested/recursive schemas
Preserves property ordering
Supports constraints like minimum/maximum, minItems/maxItems
!!! tip "Using Union Types"
Union types in Pydantic models (e.g., field: Union[TypeA, TypeB]) are automatically converted to anyOf schemas and work correctly with the json_schema method.
Refer to the Gemini API [docs](https://ai.google.dev/gemini-api/docs/structured-output) for more details on supported JSON Schema features.
- safety_settings: SafetySettingDict | None
Default safety settings to use for all generations.
!!! example
```python from google.genai.types import HarmBlockThreshold, HarmCategory
- safety_settings = {
HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE, HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_ONLY_HIGH, HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_LOW_AND_ABOVE, HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: HarmBlockThreshold.BLOCK_NONE,
- seed: int | None
Seed used in decoding for reproducible generations.
By default, a random number is used.
!!! note
Using the same seed does not guarantee identical outputs, but makes them more deterministic. Reproducibility is "best effort" based on the model and infrastructure.
- stop: list[str] | None
Stop sequences for the model.
- streaming: bool | None
Whether to stream responses from the model.
- tags: list[str] | None
Tags to add to the run trace.
- temperature: float
Run inference with this temperature.
Must be within [0.0, 2.0].
!!! note "Automatic override for Gemini 3.0+ models"
If temperature is not explicitly set and the model is Gemini 3.0 or later, it will be automatically set to 1.0 instead of the default 0.7 per the Google GenAI API best practices, as it can cause infinite loops, degraded reasoning performance, and failure on complex tasks.
- thinking_budget: int | None
Indicates the thinking budget in tokens.
Used to disable thinking for supported models (when set to 0) or to constrain the number of tokens used for thinking.
Dynamic thinking (allowing the model to decide how many tokens to use) is enabled when set to -1.
More information, including per-model limits, can be found in the [Gemini API docs](https://ai.google.dev/gemini-api/docs/thinking#set-budget).
- thinking_level: Literal['minimal', 'low', 'medium', 'high'] | None
Indicates the thinking level.
- Supported values:
'low': Minimizes latency and cost.
'medium': Balances latency/cost with reasoning depth.
'high': Maximizes reasoning depth.
!!! note "Replaces thinking_budget"
thinking_budget is deprecated for Gemini 3+ models. If both parameters are provided, thinking_level takes precedence.
If left unspecified, the model's default thinking level is used. For Gemini 3+, this defaults to 'high'.
- timeout: float | None
The maximum number of seconds to wait for a response.
- top_k: int | None
Decode using top-k sampling: consider the set of top_k most probable tokens.
Must be positive.
- top_p: float | None
Decode using nucleus sampling.
Consider the smallest set of tokens whose probability sum is at least top_p.
Must be within [0.0, 1.0].
- verbose: bool
Whether to print out response text.
- vertexai: bool | None
Whether to use Vertex AI backend.
If None (default), backend is automatically determined as follows:
If the GOOGLE_GENAI_USE_VERTEXAI env var is set, uses Vertex AI
- If the [credentials][langchain_google_genai.ChatGoogleGenerativeAI.credentials]
parameter is provided, uses Vertex AI
- If the [project][langchain_google_genai.ChatGoogleGenerativeAI.project]
parameter is provided, uses Vertex AI
Otherwise, uses Gemini Developer API
Set explicitly to True or False to override auto-detection.
!!! tip "Vertex AI with API key"
You can use Vertex AI with API key authentication by setting:
`bash export GEMINI_API_KEY='your-api-key' export GOOGLE_GENAI_USE_VERTEXAI=true export GOOGLE_CLOUD_PROJECT='your-project-id' `Or programmatically:
```python llm = ChatGoogleGenerativeAI(
model="gemini-3-pro-preview", api_key="your-api-key", project="your-project-id", vertexai=True,
This allows for simpler authentication compared to service account JSON files.
- class GoogleGenerativeAIEmbeddings
Bases:
_BaseGoogleGenerativeAI,GoogleGenerativeAIEmbeddingsDrop-in replacement for langchain_google_genai.GoogleGenerativeAIEmbeddings.
- additional_headers: dict[str, str] | None
Additional HTTP headers to include in API requests.
- base_url: str | None
The base URL to use for the API client.
- client: Any
The Google GenAI client instance.
- client_args: dict[str, Any] | None
Additional arguments to pass to the underlying HTTP client.
Applied to both sync and async clients.
- credentials: Any
Custom credentials for Vertex AI authentication.
When provided, forces Vertex AI backend.
Accepts a google.auth.credentials.Credentials object.
- google_api_key: SecretStr | None
The Google API key to use.
If not provided, will check the env vars GOOGLE_API_KEY and GEMINI_API_KEY.
- location: str | None
Google Cloud region (Vertex AI only).
Defaults to GOOGLE_CLOUD_LOCATION env var, then 'us-central1'.
- model: str
The name of the embedding model to use.
Example: 'gemini-embedding-001'
- model_config: ClassVar[ConfigDict] = {'extra': 'allow', 'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- output_dimensionality: int | None
Default output dimensionality for embeddings.
If set, all embed calls use this dimension unless explicitly overridden.
- project: str | None
Google Cloud project ID (Vertex AI only).
Falls back to GOOGLE_CLOUD_PROJECT env var if not provided.
- request_options: dict | None
A dictionary of request options to pass to the Google API client.
Example: {'timeout': 10}
- task_type: str | None
The task type.
Valid options include:
'TASK_TYPE_UNSPECIFIED'
'RETRIEVAL_QUERY'
'RETRIEVAL_DOCUMENT'
'SEMANTIC_SIMILARITY'
'CLASSIFICATION'
'CLUSTERING'
'QUESTION_ANSWERING'
'FACT_VERIFICATION'
'CODE_RETRIEVAL_QUERY'
See [TaskType](https://ai.google.dev/api/embeddings#tasktype) for details.
- vertexai: bool | None
Whether to use Vertex AI backend.
If None (default), backend is automatically determined:
If GOOGLE_GENAI_USE_VERTEXAI env var is set, uses that value
If credentials parameter is provided, uses Vertex AI
If project parameter is provided, uses Vertex AI
Otherwise, uses Gemini Developer API
- init_chat_model(proxy_client, deployment, temperature=0.0, max_tokens=256, top_k=None, top_p=1.0)
Initialize a ChatGoogleGenerativeAI model with the given parameters.
- Parameters:
proxy_client (BaseProxyClient) -- proxy client to use for the model
deployment (Deployment) -- deployment information for the model
temperature (float, optional) -- sampling temperature, defaults to 0.0
max_tokens (int, optional) -- maximum number of tokens to generate, defaults to 256
top_k (Optional[int], optional) -- k for top-k sampling, defaults to None
top_p (float, optional) -- p for nucleus sampling, defaults to 1.0
- Returns:
initialized ChatGoogleGenerativeAI model
- Return type:
- init_embedding_model(proxy_client, deployment)
- Parameters:
proxy_client (BaseProxyClient)
deployment (Deployment)
gen_ai_hub.proxy.langchain.init_models module
- class Catalog
Bases:
objectCatalog for registering and retrieving model deployments.
- __init__()
- all_embedding_models(proxy_client=None)
Retrieves all registered embedding models for the specified proxy client.
- Parameters:
proxy_client (Optional[Union[str, BaseProxyClient]], optional) -- the proxy client to retrieve models for, defaults to None
- Raises:
TypeError -- if the proxy client is invalid
- Returns:
A dictionary of model names and their corresponding embedding model instances
- Return type:
Dict[str, Embeddings]
- all_llms(proxy_client=None)
Retrieves all registered language models for the specified proxy client.
- Parameters:
proxy_client (Optional[Union[str, BaseProxyClient]], optional) -- the proxy client to retrieve models for, defaults to None
- Raises:
TypeError -- if the proxy client is invalid
- Returns:
A dictionary of model names and their corresponding language model instances
- Return type:
Dict[str, BaseLanguageModel]
- register(proxy_client, base_class, *model_names, f_select_deployment=None)
Registers a model deployment in the catalog.
- Parameters:
proxy_client (Union[str, BaseProxyClient]) -- the proxy client to register the model for
base_class (Type[Union[BaseLanguageModel, Embeddings]]) -- the base class of the model (LLM or Embeddings)
f_select_deployment (Optional[Callable], optional) -- function to select the deployment, defaults to None
- Raises:
TypeError -- if the base class is not supported
- Returns:
Decorator function for registering the model
- Return type:
Callable
- retrieve(proxy_client=None, args=None, kwargs=None, model_type=None)
Retrieves a model deployment from the catalog.
- Parameters:
proxy_client (Optional[BaseProxyClient], optional) -- the proxy client to use for retrieving the model
args (List[str], optional) -- the positional arguments for model identification, defaults to None
kwargs (Dict[str, str], optional) -- the keyword arguments for model identification, defaults to None
model_type (Union[str, ModelType], optional) -- the type of the model to retrieve, defaults to None
- Returns:
The retrieval result containing the proxy client, deployment, and registry entry
- Return type:
- class ModelType
Bases:
Enum- EMBEDDINGS = 2
- LLM = 1
- class RegisterDeployment
Bases:
objectRegistry entry for a model deployment.
- __init__(model, init_func, f_select_deployment=None)
- Parameters:
model (BaseLanguageModel | Embeddings)
init_func (Callable)
f_select_deployment (Callable[[BaseProxyClient, Dict[str, str]], BaseDeployment] | None)
- Return type:
None
- f_select_deployment: Callable[[BaseProxyClient, Dict[str, str]], BaseDeployment] | None = None
- init_func: Callable
- model: BaseLanguageModel | Embeddings
- class RetrievalResult
Bases:
objectResult of retrieving a model from the catalog.
- __init__(proxy_client, deployment, registry_entry)
- Parameters:
proxy_client (BaseProxyClient)
deployment (BaseDeployment)
registry_entry (RegisterDeployment)
- Return type:
None
- deployment: BaseDeployment
- proxy_client: BaseProxyClient
- registry_entry: RegisterDeployment
- default_f_select_deployment(proxy_client, **model_identification_kwargs)
Default function to select a deployment based on model identification kwargs.
- Parameters:
proxy_client (BaseProxyClient) -- The proxy client to use for selecting the deployment
model_identification_kwargs (Dict[str, str])
- Returns:
The selected deployment
- Return type:
- get_model_class(*args, model_type=None, proxy_client=None, **kwargs)
Retrieves the model class for the specified model.
- Parameters:
model_type (Union[str, ModelType]) -- The type of the model to retrieve (optional)
proxy_client (BaseProxyClient) -- The proxy client to use for the model (optional)
- Returns:
The model class
- Return type:
Union[BaseLanguageModel, Embeddings]
- handle_model_args_kwargs(proxy_client, args, kwargs)
Handles model identification arguments and keyword arguments.
- Parameters:
proxy_client (_type_) -- the proxy client to use for model identification
args (List[Any]) -- list of positional arguments
kwargs (Dict[str, Any]) -- dictionary of keyword arguments
- Raises:
ValueError -- if no model identification argument is provided
- Returns:
A tuple containing the model name, model identification kwargs, and remaining kwargs
- Return type:
Tuple[str, Dict[str, str], Dict[str, Any]]
- init_embedding_model(*args, proxy_client=None, init_func=None, model_id='', **kwargs)
Initializes an embedding model using the specified parameters.
- Parameters:
proxy_client (BaseProxyClient) -- The proxy client to use for the model (optional)
init_func (Callable) -- Function to call for initializing the model, optional
model_id (str) -- id of the Amazon Bedrock model, needed in case a custom Amazon Bedrock model is being initiated (optional)
- Returns:
The initialized embedding model
- Return type:
- init_llm(*args, proxy_client=None, temperature=0.0, max_tokens=256, top_k=None, top_p=1.0, init_func=None, model_id='', **kwargs)
Initializes a language model using the specified parameters.
- Parameters:
proxy_client (ProxyClient) -- The proxy client to use for the model (optional)
temperature (float) -- The temperature parameter for model generation (default: 0.0)
max_tokens (int) -- The maximum number of tokens to generate (default: 256)
top_k (int) -- The top-k parameter for model generation (optional)
top_p (float) -- The top-p parameter for model generation (default: 1.0)
init_func (Callable) -- Function to call for initializing the model, optional
model_id (str) -- id of the Amazon Bedrock model, needed in case a custom Amazon Bedrock model is being initiated (optional)
- Returns:
The initialized language model
- Return type:
BaseLanguageModel
gen_ai_hub.proxy.langchain.openai module
LangChain wrappers for OpenAI models via Generative AI Hub.
- class ChatOpenAI
Bases:
ProxyOpenAI,ChatOpenAIChatOpenAI model using a proxy.
- Parameters:
ProxyOpenAI (class) -- Base class for OpenAI models using a proxy
ChatOpenAI (class) -- ChatOpenAI class from langchain_openai
- classmethod validate_environment(values)
Validates the environment.
- Parameters:
values (Dict) -- The input values
- Raises:
ValueError -- n must be at least 1.
- Returns:
The validated values
- Return type:
Dict
- static __new__(cls, **data)
Initialize the OpenAI object. :param data: Additional data to initialize the object :type data: Any :return: The initialized OpenAI object :rtype: OpenAIBase
- Parameters:
data (Any)
- __init__(*args, **kwargs)
Initialize the ChatOpenAI object.
- async_client: Any
- cache: BaseCache | bool | None
Whether to cache the response.
If True, will use the global cache.
If False, will not use a cache
If None, will use the global cache if it's set, otherwise no cache.
If instance of BaseCache, will use the provided cache.
Caching is not currently supported for streaming methods of models.
- callbacks: Callbacks
Callbacks to add to the run trace.
- client: Any
- config_id: str | None
- config_name: str | None
- context_management: list[dict[str, Any]] | None
Configuration for [context management](https://developers.openai.com/api/docs/guides/compaction).
- custom_get_token_ids: Callable[[str], list[int]] | None
Optional encoder to use for counting tokens.
- default_headers: Mapping[str, str] | None
- default_query: Mapping[str, object] | None
- deployment_id: str | None
- disable_streaming: bool | Literal['tool_calling']
Whether to disable streaming for this model.
If streaming is bypassed, then stream/astream/astream_events will defer to invoke/ainvoke.
If True, will always bypass streaming case.
- If 'tool_calling', will bypass streaming case only when the model is called
with a tools keyword argument. In other words, LangChain will automatically switch to non-streaming behavior (invoke) only when the tools argument is provided. This offers the best of both worlds.
If False (Default), will always use streaming case if available.
The main reason for this flag is that code might be written using stream and a user may want to swap out a given model for another model whose implementation does not properly support streaming.
- disabled_params: dict[str, Any] | None
Parameters of the OpenAI client or chat.completions endpoint that should be disabled for the given model.
Should be specified as {"param": None | ['val1', 'val2']} where the key is the parameter and the value is either None, meaning that parameter should never be used, or it's a list of disabled values for the parameter.
For example, older models may not support the 'parallel_tool_calls' parameter at all, in which case disabled_params={"parallel_tool_calls": None} can be passed in.
If a parameter is disabled then it will not be used by default in any methods, e.g. in with_structured_output. However this does not prevent a user from directly passed in the parameter during invocation.
- extra_body: Mapping[str, Any] | None
Optional additional JSON properties to include in the request parameters when making requests to OpenAI compatible APIs, such as vLLM, LM Studio, or other providers.
This is the recommended way to pass custom parameters that are specific to your OpenAI-compatible API provider but not part of the standard OpenAI API.
Examples: - [LM Studio](https://lmstudio.ai/) TTL parameter: extra_body={"ttl": 300} - [vLLM](https://github.com/vllm-project/vllm) custom parameters:
extra_body={"use_beam_search": True}
Any other provider-specific parameters
!!! warning
Do not use model_kwargs for custom parameters that are not part of the standard OpenAI API, as this will cause errors when making API calls. Use extra_body instead.
- frequency_penalty: float | None
Penalizes repeated tokens according to frequency.
- http_async_client: Any | None
Optional httpx.AsyncClient.
Only used for async invocations. Must specify http_client as well if you'd like a custom client for sync invocations.
- http_client: Any | None
Optional httpx.Client.
Only used for sync invocations. Must specify http_async_client as well if you'd like a custom client for async invocations.
- include: list[str] | None
Additional fields to include in generations from Responses API.
Supported values:
'file_search_call.results'
'message.input_image.image_url'
'computer_call_output.output.image_url'
'reasoning.encrypted_content'
'code_interpreter_call.outputs'
!!! version-added "Added in langchain-openai 0.3.24"
- include_response_headers: bool
Whether to include response headers in the output message response_metadata.
- logit_bias: dict[int, int] | None
Modify the likelihood of specified tokens appearing in the completion.
- logprobs: bool | None
Whether to return logprobs.
- max_retries: int | None
Maximum number of retries to make when generating.
- max_tokens: int | None
Maximum number of tokens to generate.
- metadata: dict[str, Any] | None
Metadata to add to the run trace.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'allow', 'populate_by_name': True, 'protected_namespaces': (), 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_kwargs: dict[str, Any]
Holds any model parameters valid for create call not explicitly specified.
- model_name: str | None
Model name to use.
- n: int | None
Number of chat completions to generate for each prompt.
- name: str | None
The name of the Runnable.
Used for debugging and tracing.
- openai_api_base: str | None
Base URL path for API requests, leave blank if not using a proxy or service emulator.
- openai_api_key: SecretStr | None | Callable[[], str] | Callable[[], Awaitable[str]]
API key to use.
Can be inferred from the OPENAI_API_KEY environment variable, or specified as a string, or sync or async callable that returns a string.
??? example "Specify with environment variable"
??? example "Specify with a string"
??? example "Specify with a sync callable"
??? example "Specify with an async callable"
- openai_api_version: str | None
- openai_organization: str | None
Automatically inferred from env var OPENAI_ORG_ID if not provided.
- openai_proxy: str | None
- output_version: str | None
Version of AIMessage output format to use.
This field is used to roll-out new output formats for chat model AIMessage responses in a backwards-compatible way.
Supported values:
'v0': AIMessage format as of langchain-openai 0.3.x.
- 'responses/v1': Formats Responses API output items into AIMessage content blocks
(Responses API only)
'v1': v1 of LangChain cross-provider standard.
!!! warning "Behavior changed in langchain-openai 1.0.0"
Default updated to "responses/v1".
- presence_penalty: float | None
Penalizes repeated tokens.
- profile: ModelProfile | None
Profile detailing model capabilities.
!!! warning "Beta feature"
This is a beta feature. The format of model profiles is subject to change.
If not specified, automatically loaded from the provider package on initialization if data is available.
Example profile data includes context window sizes, supported modalities, or support for tool calling, structured output, and other features.
!!! version-added "Added in langchain-core 1.1.0"
- proxy_model_name: str | None
- rate_limiter: BaseRateLimiter | None
An optional rate limiter to use for limiting the number of requests.
- reasoning: dict[str, Any] | None
Reasoning parameters for reasoning models.
For use with the Responses API.
"effort": "medium", # Can be "low", "medium", or "high" "summary": "auto", # Can be "auto", "concise", or "detailed"
}
!!! version-added "Added in langchain-openai 0.3.24"
- reasoning_effort: str | None
Constrains effort on reasoning for reasoning models.
For use with the Chat Completions API. Reasoning models only.
Currently supported values are 'minimal', 'low', 'medium', and 'high'. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.
- request_timeout: float | tuple[float, float] | Any | None
Timeout for requests to OpenAI completion API.
Can be float, httpx.Timeout or None.
- root_async_client: Any
- root_client: Any
- seed: int | None
Seed for generation
- service_tier: str | None
Latency tier for request.
Options are 'auto', 'default', or 'flex'.
Relevant for users of OpenAI's scale tier service.
- stop: list[str] | str | None
Default stop sequences.
- store: bool | None
If True, OpenAI may store response data for future use.
Defaults to True for the Responses API and False for the Chat Completions API.
!!! version-added "Added in langchain-openai 0.3.24"
- stream_usage: bool | None
Whether to include usage metadata in streaming output.
If enabled, an additional message chunk will be generated during the stream including usage metadata.
This parameter is enabled unless openai_api_base is set or the model is initialized with a custom client, as many chat completions APIs do not support streaming token usage.
!!! version-added "Added in langchain-openai 0.3.9"
!!! warning "Behavior changed in langchain-openai 0.3.35"
Enabled for default base URL and client.
- streaming: bool
Whether to stream the results or not.
- tags: list[str] | None
Tags to add to the run trace.
- temperature: float | None
What sampling temperature to use.
- tiktoken_model_name: str | None
The model name to pass to tiktoken when using this class.
Tiktoken is used to count the number of tokens in documents to constrain them to be under a certain limit.
By default, when set to None, this will be the same as the embedding model name. However, there are some cases where you may want to use this Embedding class with a model name not supported by tiktoken. This can include when using Azure embeddings or when using one of the many model providers that expose an OpenAI-like API but with different models. In those cases, in order to avoid erroring when tiktoken is called, you can specify a model name to use here.
- top_logprobs: int | None
Number of most likely tokens to return at each token position, each with an associated log probability.
logprobs must be set to true if this parameter is used.
- top_p: float | None
Total probability mass of tokens to consider at each step.
- truncation: str | None
Truncation strategy (Responses API).
Can be 'auto' or 'disabled' (default).
If 'auto', model may drop input items from the middle of the message sequence to fit the context window.
!!! version-added "Added in langchain-openai 0.3.24"
- use_previous_response_id: bool
If True, always pass previous_response_id using the ID of the most recent response. Responses API only.
Input messages up to the most recent response will be dropped from request payloads.
For example, the following two are equivalent:
model="...", use_previous_response_id=True,
) model.invoke(
- [
HumanMessage("Hello"), AIMessage("Hi there!", response_metadata={"id": "resp_123"}), HumanMessage("How are you?"),
]
)
`python model = ChatOpenAI(model="...", use_responses_api=True) model.invoke([HumanMessage("How are you?")], previous_response_id="resp_123") `!!! version-added "Added in langchain-openai 0.3.26"
- use_responses_api: bool | None
Whether to use the Responses API instead of the Chat API.
If not specified then will be inferred based on invocation params.
!!! version-added "Added in langchain-openai 0.3.9"
- verbose: bool
Whether to print out response text.
- verbosity: str | None
Controls the verbosity level of responses for reasoning models.
For use with the Responses API.
Currently supported values are 'low', 'medium', and 'high'.
!!! version-added "Added in langchain-openai 0.3.28"
- class OpenAI
Bases:
ProxyOpenAI,OpenAIOpenAI model using a proxy.
- classmethod validate_environment(values)
Validates the environment.
- Parameters:
values (Dict) -- The input values
- Returns:
The validated values
- Return type:
Dict
- static __new__(cls, **data)
Initialize the OpenAI object.
- Parameters:
data (Any)
- __init__(*args, **kwargs)
Initialize the OpenAI object.
- allowed_special: Literal['all'] | set[str]
Set of special tokens that are allowed。
- async_client: Any
- batch_size: int
Batch size to use when passing multiple documents to generate.
- best_of: int
Generates best_of completions server-side and returns the "best".
- cache: BaseCache | bool | None
Whether to cache the response.
If True, will use the global cache.
If False, will not use a cache
If None, will use the global cache if it's set, otherwise no cache.
If instance of BaseCache, will use the provided cache.
Caching is not currently supported for streaming methods of models.
- callbacks: Callbacks
Callbacks to add to the run trace.
- client: Any
- config_id: str | None
- config_name: str | None
- custom_get_token_ids: Callable[[str], list[int]] | None
Optional encoder to use for counting tokens.
- default_headers: Mapping[str, str] | None
- default_query: Mapping[str, object] | None
- deployment_id: str | None
- disallowed_special: Literal['all'] | Collection[str]
Set of special tokens that are not allowed。
- extra_body: Mapping[str, Any] | None
Optional additional JSON properties to include in the request parameters when making requests to OpenAI compatible APIs, such as vLLM.
- frequency_penalty: float
Penalizes repeated tokens according to frequency.
- http_async_client: Any | None
Optional httpx.AsyncClient.
Only used for async invocations. Must specify http_client as well if you'd like a custom client for sync invocations.
- http_client: Any | None
Optional httpx.Client.
Only used for sync invocations. Must specify http_async_client as well if you'd like a custom client for async invocations.
- logit_bias: dict[str, float] | None
Adjust the probability of specific tokens being generated.
- logprobs: int | None
Include the log probabilities on the logprobs most likely output tokens, as well the chosen tokens.
- max_retries: int
Maximum number of retries to make when generating.
- max_tokens: int
The maximum number of tokens to generate in the completion. -1 returns as many tokens as possible given the prompt and the models maximal context size.
- metadata: dict[str, Any] | None
Metadata to add to the run trace.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'allow', 'populate_by_name': True, 'protected_namespaces': (), 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_kwargs: dict[str, Any]
Holds any model parameters valid for create call not explicitly specified.
- model_name: str | None
Model name to use.
- n: int
How many completions to generate for each prompt.
- name: str | None
The name of the Runnable.
Used for debugging and tracing.
- openai_api_base: str | None
Base URL path for API requests, leave blank if not using a proxy or service emulator.
- openai_api_key: SecretStr | None | Callable[[], str]
Automatically inferred from env var OPENAI_API_KEY if not provided.
- openai_api_version: str | None
- openai_organization: str | None
Automatically inferred from env var OPENAI_ORG_ID if not provided.
- openai_proxy: str | None
- presence_penalty: float
Penalizes repeated tokens.
- proxy_model_name: str | None
- request_timeout: float | tuple[float, float] | Any | None
Timeout for requests to OpenAI completion API. Can be float, httpx.Timeout or None.
- seed: int | None
Seed for generation
- streaming: bool
Whether to stream the results or not.
- tags: list[str] | None
Tags to add to the run trace.
- temperature: float
What sampling temperature to use.
- tiktoken_model_name: str | None
The model name to pass to tiktoken when using this class.
Tiktoken is used to count the number of tokens in documents to constrain them to be under a certain limit.
By default, when set to None, this will be the same as the embedding model name. However, there are some cases where you may want to use this Embedding class with a model name not supported by tiktoken. This can include when using Azure embeddings or when using one of the many model providers that expose an OpenAI-like API but with different models. In those cases, in order to avoid erroring when tiktoken is called, you can specify a model name to use here.
- top_p: float
Total probability mass of tokens to consider at each step.
- verbose: bool
Whether to print out response text.
- class OpenAIEmbeddings
Bases:
ProxyOpenAI,OpenAIEmbeddingsOpenAI Embeddings model using a proxy.
- classmethod validate_environment(values)
Validates the environment.
- Parameters:
values (Dict) -- The input values
- Returns:
The validated values
- Return type:
Dict
- __init__(*args, **kwargs)
Initialize the OpenAIEmbeddings object.
- allowed_special: Literal['all'] | set[str] | None
- async_client: Any
- check_embedding_ctx_length: bool
Whether to check the token length of inputs and automatically split inputs longer than embedding_ctx_length.
Set to False to send raw text strings directly to the API instead of tokenizing. Useful for many non-OpenAI providers (e.g. OpenRouter, Ollama, vLLM).
- chunk_size: int
Maximum number of texts to embed in each batch
- client: Any
- config_id: str | None
- config_name: str | None
- default_headers: Mapping[str, str] | None
- default_query: Mapping[str, object] | None
- deployment: str | None
- deployment_id: str | None
- dimensions: int | None
The number of dimensions the resulting output embeddings should have.
Only supported in 'text-embedding-3' and later models.
- disallowed_special: Literal['all'] | set[str] | Sequence[str] | None
- embedding_ctx_length: int
The maximum number of tokens to embed at once.
- headers: Any
- http_async_client: Any | None
Optional httpx.AsyncClient.
Only used for async invocations. Must specify http_client as well if you'd like a custom client for sync invocations.
- http_client: Any | None
Optional httpx.Client.
Only used for sync invocations. Must specify http_async_client as well if you'd like a custom client for async invocations.
- input_type: str | None
- max_retries: int
Maximum number of retries to make when generating.
- model: str | None
- model_config: ClassVar[ConfigDict] = {'extra': 'allow', 'populate_by_name': True, 'protected_namespaces': (), 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_kwargs: dict[str, Any]
Holds any model parameters valid for create call not explicitly specified.
- openai_api_base: str | None
Base URL path for API requests, leave blank if not using a proxy or service emulator.
Automatically inferred from env var OPENAI_API_BASE if not provided.
- openai_api_key: SecretStr | None | Callable[[], str] | Callable[[], Awaitable[str]]
API key to use for API calls.
Automatically inferred from env var OPENAI_API_KEY if not provided.
- openai_api_type: str | None
- openai_api_version: str | None
Version of the OpenAI API to use.
Automatically inferred from env var OPENAI_API_VERSION if not provided.
- openai_organization: str | None
OpenAI organization ID to use for API calls.
Automatically inferred from env var OPENAI_ORG_ID if not provided.
- openai_proxy: str | None
- proxy_model_name: str | None
- request_timeout: float | tuple[float, float] | Any | None
Timeout for requests to OpenAI completion API.
Can be float, httpx.Timeout or None.
- retry_max_seconds: int
Max number of seconds to wait between retries
- retry_min_seconds: int
Min number of seconds to wait between retries
- show_progress_bar: bool
Whether to show a progress bar when embedding.
- skip_empty: bool
Whether to skip empty strings when embedding or raise an error.
- tiktoken_enabled: bool
Set this to False to use HuggingFace transformers tokenization.
For non-OpenAI providers (OpenRouter, Ollama, vLLM, etc.), consider setting check_embedding_ctx_length=False instead, as it bypasses tokenization entirely.
- tiktoken_model_name: str | None
The model name to pass to tiktoken when using this class.
Tiktoken is used to count the number of tokens in documents to constrain them to be under a certain limit.
By default, when set to None, this will be the same as the embedding model name. However, there are some cases where you may want to use this Embedding class with a model name not supported by tiktoken. This can include when using Azure embeddings or when using one of the many model providers that expose an OpenAI-like API but with different models. In those cases, in order to avoid erroring when tiktoken is called, you can specify a model name to use here.
- class ProxyOpenAI
Bases:
BaseAuthBase class for OpenAI models using a proxy.
- Parameters:
BaseAuth (class) -- Base authentication class
- Returns:
The ProxyOpenAI class
- Return type:
class
- classmethod validate_clients(values)
Validate and initialize OpenAI clients.
- Parameters:
values (Dict) -- The input values
- Returns:
The validated values
- Return type:
Dict
- config_id: str | None
- config_name: str | None
- deployment_id: str | None
- model_config: ClassVar[ConfigDict] = {'extra': 'allow'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- proxy_model_name: str | None
- get_client_params(values)
Get the client parameters. :param values: The client values :return: client values + proxy_client
- init_chat_model(proxy_client, deployment, temperature=0.0, max_tokens=256, top_k=None, top_p=1.0)
Initialize the ChatOpenAI model.
- Parameters:
proxy_client (BaseProxyClient) -- the proxy client
deployment (BaseDeployment) -- the deployment
temperature (float, optional) -- the temperature, defaults to 0.0
max_tokens (int, optional) -- the maximum tokens, defaults to 256
top_k (Optional[int], optional) -- the top k, defaults to None
top_p (float, optional) -- the top p, defaults to 1.0
- Returns:
the ChatOpenAI model
- Return type:
- init_embedding_model(proxy_client, deployment)
Initialize the OpenAIEmbeddings model.
- Parameters:
proxy_client (BaseProxyClient) -- the proxy client
deployment (BaseDeployment) -- the deployment
- Returns:
the OpenAIEmbeddings model
- Return type: