gen_ai_hub.evaluations package
- class ArtifactSource
Bases:
objectExtends the artifact object with the relative path user can provide inside to be used for EvaluationConfig Example Usage:
>>> ArtifactSource( artifact={ "id": "xyfz-rtyu-2456-ojns-yu6s", "name": "dataset-artifact", "url": "ai://default/eval_dataset" ... }, path= "rootfolder/data.csv, file_type="csv" ) >>> ArtifactSource( artifact="xyfz-rtyu-2456-ojns-yu6s", path="rootfolder/data.json, file_type="json" ) )- __init__(file_type, artifact, path=None)
- Parameters:
artifact(Union[str,Artifact]): Can just provide the artifact id as a string or the Artifact object of the AI_API_Client sdk. path(Optional[str]): Relative path within the artifact path provided and should point to a single file. file_type(Literal["csv", "json", "jsonl"]): One of the supported file_types
- Parameters:
file_type (Literal['csv', 'json', 'jsonl'])
artifact (str | Artifact)
path (str | None)
- class Dataset
Bases:
objectDataset object for the evaluations flow.
The Dataset class accepts various source types for evaluation datasets including local file paths (as strings or Path objects) or AI Core artifacts.
- Parameters:
source (Union[str, Path, ArtifactSource]) -- Source of the dataset - can be a file path string, Path object, or ArtifactSource
Examples:
Using a Path object:
>>> Dataset(Path("data/sample.json"))Using a string path:
>>> Dataset("data/sample.json")Using an ArtifactSource with artifact dictionary:
>>> Dataset( ... ArtifactSource( ... artifact={ ... "id": "xyfz-rtyu-2456-ojns-yu6s", ... "name": "dataset-artifact", ... "url": "ai://default/eval_dataset" ... }, ... path="rootfolder/data.csv", ... file_type="csv" ... ) ... )Using an ArtifactSource with artifact ID:
>>> Dataset( ... ArtifactSource( ... artifact="xyfz-rtyu-2456-ojns-yu6s", ... path="rootfolder/data.csv", ... file_type="csv" ... ) ... )- __init__(source)
Initialize a Dataset instance.
- Parameters:
source (Union[str, Path, ArtifactSource]) -- Source of the dataset - can be a file path string, Path object, or ArtifactSource
- property file_type: str | None
Infer the file type from the source.
For ArtifactSource, returns the explicitly set file_type. For file paths, infers the type from the file extension.
- Returns:
File type (e.g., "json", "jsonl", "csv") or None if cannot be determined
- Return type:
Optional[str]
- class EvaluationClient
Bases:
objectBase Client for the Evaluations service
- static from_env(profile_name=None, **kwargs)
Alternative way to create an EvaluationClient object.
Parameter resolution precedence: 1. Explicit keyword arguments 2. Environment variables 3. Configuration file 4. VCAP_SERVICES environment variable
- Parameters:
profile_name (str, optional) -- Profile name defined in configuration.
kwargs -- Additional parameters passed to constructor.
- Returns:
Configured EvaluationClient instance.
- Return type:
- __init__(base_url, auth_url=None, client_id=None, client_secret=None, cert_str=None, key_str=None, cert_file_path=None, key_file_path=None, resource_group=None, aws_access_key_id=None, aws_secret_access_key=None, ai_core_client=None, orchestration_url=None, input_object_store_secret_name=None, provider_name='aws')
EvaluationsClient root object to be used for Evaluations.
- Parameters:
base_url (str) -- Base URL of the AI Core instance (must include /v2 suffix).
auth_url (str, optional) -- Authentication URL used to retrieve access tokens.
client_id (str, optional) -- OAuth client ID.
client_secret (str, optional) -- OAuth client secret.
cert_str (str, optional) -- X.509 certificate content as a string.
key_str (str, optional) -- X.509 private key content as a string.
cert_file_path (str, optional) -- File path to X.509 certificate.
key_file_path (str, optional) -- File path to X.509 private key.
resource_group (str, optional) -- Resource group name within the AI Core instance.
aws_access_key_id (str, optional) -- AWS access key ID.
aws_secret_access_key (str, optional) -- AWS secret access key.
ai_core_client (AICoreV2Client, optional) -- Pre-configured AI Core client instance.
orchestration_url (str, optional) -- Pre-existing orchestration deployment URL.
input_object_store_secret_name (str, optional) -- Name of input object store secret.
provider_name (str, optional) -- Hyperscaler provider name (e.g., "aws").
- Raises:
ValueError -- If required hyperscaler provider parameters are missing.
- create_or_update_object_store_secret(*, context, secret_body, is_default, result_key, attr_name, creator_mapping, replace_existing, result)
- Parameters:
secret_body (dict)
is_default (bool)
result_key (str)
attr_name (str)
creator_mapping (dict)
replace_existing (bool)
result (dict)
- evaluate(evaluation_configs)
Main evaluate function to create the Evaluation job
- Parameters:
evaluation_configs(List[EvaluationConfig]): A list of one or more of the EvaluationConfig objects
- Returns:
List[EvaluationRun]: A list of EvaluationRun objects, one for each EvaluationConfig provided.
- Parameters:
evaluation_configs (List[EvaluationConfig])
- Return type:
List[EvaluationRun]
- get_system_supported_metrics()
helper method to get the list of all supported metric ids
- Return type:
List[str]
- list_available_models()
Method to list all the available llm models
- resolve_orchestration_deployment_url()
Resolves the orchestration deployment URL.
For non-default resource groups, creates a new deployment. For default resource group, attempts to discover existing deployment with the default config name using the orchestration service, or creates one if not found.
- Returns:
The orchestration deployment URL.
- Return type:
str
- setup(input_secret_body=None, default_secret_body=None, replace_existing=False)
One time setup function which does object store secrets creation and orchestration deployment url creation if not provided.
- Parameters:
input_secret_body (dict | None)
default_secret_body (dict | None)
replace_existing (bool)
- validate_secret_type(secret_type, creator_mapping)
- Parameters:
secret_type (str)
creator_mapping (dict)
- class EvaluationConfig
Bases:
objectDefines the evaluation configuration object for the Evaluations flow.
This class encapsulates all configuration parameters needed to run an evaluation job, including the model/template configuration, dataset, metrics, and execution settings.
At least one of the following must be provided:
llmandtemplatecombination (using orchestration_v2 models)orchestration_registry_reference(UUID of a registered orchestration configuration)
- Parameters:
dataset_config (Dataset) -- Dataset configuration object specifying the evaluation dataset
metrics (List[MetricConfig]) -- List of metric configurations for evaluation
llm (Optional[LLM]) -- LLM configuration from orchestration_v2 (LLMModelDetails)
template (Optional[Union[str, PromptTemplateSpec, TemplateRef]]) -- Prompt template as string, PromptTemplateSpec, or TemplateRef
orchestration_registry_reference (Optional[str]) -- UUID of registered orchestration configuration
template_variable_mapping (Optional[dict]) -- Variable mapping for the prompt template
test_row_count (Optional[int]) -- Number of rows to sample from dataset (-1 for all rows), defaults to -1
repetitions (Optional[int]) -- Number of times to repeat evaluation over the dataset, defaults to 1
tags (Optional[dict]) -- User-defined metadata as key-value pairs, defaults to "{}"
debug_mode (Optional[bool]) -- Enable debug logs in hyperscaler output path, defaults to False
Note
This module uses orchestration_v2 models directly.
Example using TemplateRef with ID:
>>> from gen_ai_hub.evaluations.models import EvaluationConfig, Dataset, MetricConfig >>> from gen_ai_hub.orchestration_v2.models.llm_model_details import LLMModelDetails as LLM >>> from gen_ai_hub.orchestration_v2.models.template_ref import TemplateRef, TemplateRefByID >>> config = EvaluationConfig( ... dataset_config=Dataset("data/test.jsonl"), ... metrics=[MetricConfig(name="accuracy")], ... llm=LLM(name="gpt-4", version="latest"), ... template=TemplateRef(template_ref=TemplateRefByID(id="template-id-here")), ... test_row_count=100 ... )Example using TemplateRef with scenario/name/version:
>>> from gen_ai_hub.orchestration_v2.models.template_ref import TemplateRefByScenarioNameVersion >>> config = EvaluationConfig( ... dataset_config=Dataset("data/test.jsonl"), ... metrics=[MetricConfig(name="accuracy")], ... llm=LLM(name="gpt-4", version="latest", params={"temperature": 0.7}), ... template=TemplateRef(template_ref=TemplateRefByScenarioNameVersion( ... scenario="foundation-models", name="prompt1", version="1.0" ... )), ... test_row_count=100 ... )- __init__(dataset_config, metrics, llm=None, template=None, orchestration_registry_reference=None, template_variable_mapping=None, test_row_count=-1, repetitions=1, tags='{}', debug_mode=False)
Initialize an EvaluationConfig instance.
- Parameters:
dataset_config (Dataset) -- Dataset configuration object
metrics (List[MetricConfig]) -- List of metric configurations
llm (Optional[LLM]) -- LLM object from orchestration_v2 (LLMModelDetails), defaults to None
template (Optional[Union[str, PromptTemplateSpec, TemplateRef]]) -- Prompt template (string, PromptTemplateSpec, or TemplateRef), defaults to None
orchestration_registry_reference (Optional[str]) -- UUID of orchestration config, defaults to None
template_variable_mapping (Optional[dict]) -- Variable mapping for prompt template, defaults to None
test_row_count (Optional[int]) -- Number of dataset rows to sample (-1 for all), defaults to -1
repetitions (Optional[int]) -- Number of evaluation repetitions (minimum: 1), defaults to 1
tags (Optional[dict]) -- Key-value metadata pairs applied to all runs, defaults to "{}"
debug_mode (Optional[bool]) -- Enable debug logging, defaults to False
- Raises:
ValueError -- If neither (llm, template) nor orchestration_registry_reference is provided
- class EvaluationRun
Bases:
objectRepresents an individual EvaluationRun object and its associated context.
- Parameters:
run_id (str) -- Unique identifier for the evaluation run
execution_id (str) -- ID of the AI Core execution
ai_core_client (AICoreV2Client) -- AI Core client instance
configuration_id (str) -- ID of the configuration, defaults to None
artifact_id (str) -- ID of the artifact, defaults to None
resource_group (str) -- Resource group name, defaults to None
object_store_credentials (_AWSObjectStoreData) -- Object store credentials, defaults to None
metrics_list (List[str]) -- List of metrics to evaluate, defaults to None
- __init__(run_id, execution_id, ai_core_client, configuration_id=None, artifact_id=None, resource_group=None, object_store_credentials=None, metrics_list=None)
- Parameters:
run_id (str)
execution_id (str)
ai_core_client (AICoreV2Client)
configuration_id (str)
artifact_id (str)
resource_group (str)
object_store_credentials (_AWSObjectStoreData)
metrics_list (List[str])
- get_current_status()
Get the current status of the evaluation run.
- Returns:
Current status of the run
- Return type:
Status
- Raises:
ValueError -- If failed to retrieve the current status
- get_debug_info()
Provide debug information when execution status is FAILED or DEAD.
- Returns:
Execution status details including failed pod information
- Return type:
ExecutionStatusDetails
- get_debug_logs()
Get the complete trace of execution logs.
- Returns:
List of log entries as dictionaries
- Return type:
list
- load_results_tables()
Download results from S3 and load the required table data.
- Returns:
Dictionary containing completions and metrics table data
- Return type:
dict
- Raises:
RuntimeError -- If failed to download results
- results()
Get the results of the evaluation run.
- Returns:
Results object for accessing completion and metric results
- Return type:
- Raises:
ValueError -- If execution is not completed
- set_cached_results_data(data)
Set the cached results data from the child results class.
- Parameters:
data (Any) -- Results data to cache
- wait_for_completion(timeout=None)
Wait for the evaluation run to complete by polling status.
- Parameters:
timeout (Optional[int]) -- Maximum time to wait in seconds, defaults to 3600 (1 hour)
- class MetricConfig
Bases:
objectDefines the metric config of the evaluation flow
- Parameters:
reference(MetricRef): Provide the reference of metric to be evaluated, can be one of name,uuid(id), scenario/name/version variable_mapping(Optional[dict]): Any variable maping associated with the metric
- __init__(reference, variable_mapping=None)
- Parameters:
reference (MetricRef)
variable_mapping (dict)
- class MetricRef
Bases:
objectRepresents a reference to a specific metric definition.
A metric can be identified in multiple ways: - By its UUID from metric management service (id) - By name (name) - By a combination of scenario, name, and version (scenario, name, version)
- __init__(scenario=None, name=None, version=None, id=None)
- Parameters:
scenario (str)
name (str)
version (str)
id (str)
- class Results
Bases:
objectRepresents the Results handler for an EvaluationRun object.
This class provides methods to access completion results, metric results, and aggregated results for a specific evaluation run.
- Parameters:
run (EvaluationRun) -- The parent EvaluationRun object
- __init__(run)
- Parameters:
run (EvaluationRun)
- aggregations()
Get the aggregated results for the run from the tracking service.
- Returns:
JSON response containing aggregated metric results
- Return type:
dict
- Raises:
ValueError -- If error occurs while fetching aggregation results
- completions()
Get the completion results for the run.
- Returns:
DataFrame containing completion results for the run
- Return type:
pd.DataFrame
- Raises:
ValueError -- If error occurs while fetching completions
- metrics()
Get the metric-level results for the run.
- Returns:
DataFrame containing metric results for the run
- Return type:
pd.DataFrame
- Raises:
ValueError -- If error occurs while fetching metric results
Subpackages
Submodules
gen_ai_hub.evaluations.client module
- class EvaluationClient
Bases:
objectBase Client for the Evaluations service
- static from_env(profile_name=None, **kwargs)
Alternative way to create an EvaluationClient object.
Parameter resolution precedence: 1. Explicit keyword arguments 2. Environment variables 3. Configuration file 4. VCAP_SERVICES environment variable
- Parameters:
profile_name (str, optional) -- Profile name defined in configuration.
kwargs -- Additional parameters passed to constructor.
- Returns:
Configured EvaluationClient instance.
- Return type:
- __init__(base_url, auth_url=None, client_id=None, client_secret=None, cert_str=None, key_str=None, cert_file_path=None, key_file_path=None, resource_group=None, aws_access_key_id=None, aws_secret_access_key=None, ai_core_client=None, orchestration_url=None, input_object_store_secret_name=None, provider_name='aws')
EvaluationsClient root object to be used for Evaluations.
- Parameters:
base_url (str) -- Base URL of the AI Core instance (must include /v2 suffix).
auth_url (str, optional) -- Authentication URL used to retrieve access tokens.
client_id (str, optional) -- OAuth client ID.
client_secret (str, optional) -- OAuth client secret.
cert_str (str, optional) -- X.509 certificate content as a string.
key_str (str, optional) -- X.509 private key content as a string.
cert_file_path (str, optional) -- File path to X.509 certificate.
key_file_path (str, optional) -- File path to X.509 private key.
resource_group (str, optional) -- Resource group name within the AI Core instance.
aws_access_key_id (str, optional) -- AWS access key ID.
aws_secret_access_key (str, optional) -- AWS secret access key.
ai_core_client (AICoreV2Client, optional) -- Pre-configured AI Core client instance.
orchestration_url (str, optional) -- Pre-existing orchestration deployment URL.
input_object_store_secret_name (str, optional) -- Name of input object store secret.
provider_name (str, optional) -- Hyperscaler provider name (e.g., "aws").
- Raises:
ValueError -- If required hyperscaler provider parameters are missing.
- create_or_update_object_store_secret(*, context, secret_body, is_default, result_key, attr_name, creator_mapping, replace_existing, result)
- Parameters:
secret_body (dict)
is_default (bool)
result_key (str)
attr_name (str)
creator_mapping (dict)
replace_existing (bool)
result (dict)
- evaluate(evaluation_configs)
Main evaluate function to create the Evaluation job
- Parameters:
evaluation_configs(List[EvaluationConfig]): A list of one or more of the EvaluationConfig objects
- Returns:
List[EvaluationRun]: A list of EvaluationRun objects, one for each EvaluationConfig provided.
- Parameters:
evaluation_configs (List[EvaluationConfig])
- Return type:
List[EvaluationRun]
- get_system_supported_metrics()
helper method to get the list of all supported metric ids
- Return type:
List[str]
- list_available_models()
Method to list all the available llm models
- resolve_orchestration_deployment_url()
Resolves the orchestration deployment URL.
For non-default resource groups, creates a new deployment. For default resource group, attempts to discover existing deployment with the default config name using the orchestration service, or creates one if not found.
- Returns:
The orchestration deployment URL.
- Return type:
str
- setup(input_secret_body=None, default_secret_body=None, replace_existing=False)
One time setup function which does object store secrets creation and orchestration deployment url creation if not provided.
- Parameters:
input_secret_body (dict | None)
default_secret_body (dict | None)
replace_existing (bool)
- validate_secret_type(secret_type, creator_mapping)
- Parameters:
secret_type (str)
creator_mapping (dict)
gen_ai_hub.evaluations.constants module
gen_ai_hub.evaluations.credentials module
- class CredentialsValue
Bases:
objectCredentialsValue(name: 'str', vcap_key: 'Optional[Tuple[str, ...]]' = None, transform_fn: 'Optional[Callable]' = None)
- __init__(name, vcap_key=None, transform_fn=None)
- Parameters:
name (str)
vcap_key (Tuple[str, ...] | None)
transform_fn (Callable | None)
- Return type:
None
- name: str
- transform_fn: Callable | None = None
- vcap_key: Tuple[str, ...] | None = None
- class Service
Bases:
object- __init__(env)
- Parameters:
env (Dict[str, Any])
- get(key, default=<object object>)
- property label: str | None
- property name: str | None
- class Source
Bases:
objectSource(name: 'str', get: 'Callable[[CredentialsValue], Optional[str]]')
- __init__(name, get)
- Parameters:
name (str)
get (Callable[[CredentialsValue], str | None])
- Return type:
None
- get: Callable[[CredentialsValue], str | None]
- name: str
- class VCAPEnvironment
Bases:
objectVCAPEnvironment(services: 'List[Service]')
- classmethod from_dict(env)
- Parameters:
env (Dict[str, Any])
- classmethod from_env(env_var=None)
- Parameters:
env_var (str | None)
- __init__(services)
- Parameters:
services (List[Service])
- Return type:
None
- get_service(label, exactly_one=True)
- Parameters:
exactly_one (bool)
- Return type:
- get_service_by_name(name, exactly_one=True)
- Parameters:
exactly_one (bool)
- Return type:
- services: List[Service]
- extract_credentials(source, exclude=None)
Extract all credentials from a source.
- Parameters:
source (Source)
exclude (List[str])
- Return type:
Dict[str, str]
- fetch_credentials(profile=None, **kwargs)
Fetch credentials from a single source based on precedence.
Precedence order: kwargs > environment variables > config file > VCAP service
Once a source is selected (first one with any credential), all credentials come from that source only. Resource group is an exception and follows precedence independently.
- Parameters:
profile (str)
- Return type:
Dict[str, str]
- get_home()
- Return type:
str
- get_nested_value(data_dict, keys)
Retrieve a nested value from a dictionary using a list of strings.
- Parameters:
data_dict -- The dictionary to search.
keys (List[str]) -- A list of strings representing nested keys.
- Returns:
The value associated with the nested keys, or None if not found.
- init_conf(profile=None)
- Parameters:
profile (str)
- resolve_credentials(sources)
Extract credentials from the first source that has any defined.
- Parameters:
sources (List[Source])
- Return type:
Dict[str, str]
- resolve_resource_group(sources)
Find resource_group from the first source that defines it.
- Parameters:
sources (List[Source])
- Return type:
str | None
- validate_credentials(credentials)
Validate that we have a complete authentication method.
- Parameters:
credentials (Dict[str, str])
- Return type:
None