Real-time Serving

These dataclasses are used in the SDK to represent API requests and responses for services in the databricks.sdk.service.serving module.

class databricks.sdk.service.serving.Ai21LabsConfig(ai21labs_api_key: 'Optional[str]' = None, ai21labs_api_key_plaintext: 'Optional[str]' = None)
ai21labs_api_key: str | None = None

The Databricks secret key reference for an AI21 Labs API key. If you prefer to paste your API key directly, see ai21labs_api_key_plaintext. You must provide an API key using one of the following fields: ai21labs_api_key or ai21labs_api_key_plaintext.

ai21labs_api_key_plaintext: str | None = None

An AI21 Labs API key provided as a plaintext string. If you prefer to reference your key using Databricks Secrets, see ai21labs_api_key. You must provide an API key using one of the following fields: ai21labs_api_key or ai21labs_api_key_plaintext.

as_dict() dict

Serializes the Ai21LabsConfig into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the Ai21LabsConfig into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) Ai21LabsConfig

Deserializes the Ai21LabsConfig from a dictionary.

class databricks.sdk.service.serving.AiGatewayConfig(fallback_config: 'Optional[FallbackConfig]' = None, guardrails: 'Optional[AiGatewayGuardrails]' = None, inference_table_config: 'Optional[AiGatewayInferenceTableConfig]' = None, rate_limits: 'Optional[List[AiGatewayRateLimit]]' = None, usage_tracking_config: 'Optional[AiGatewayUsageTrackingConfig]' = None)
fallback_config: FallbackConfig | None = None

Configuration for traffic fallback which auto fallbacks to other served entities if the request to a served entity fails with certain error codes, to increase availability.

guardrails: AiGatewayGuardrails | None = None

Configuration for AI Guardrails to prevent unwanted data and unsafe data in requests and responses.

inference_table_config: AiGatewayInferenceTableConfig | None = None

Configuration for payload logging using inference tables. Use these tables to monitor and audit data being sent to and received from model APIs and to improve model quality.

rate_limits: List[AiGatewayRateLimit] | None = None

Configuration for rate limits which can be set to limit endpoint traffic.

usage_tracking_config: AiGatewayUsageTrackingConfig | None = None

Configuration to enable usage tracking using system tables. These tables allow you to monitor operational usage on endpoints and their associated costs.

as_dict() dict

Serializes the AiGatewayConfig into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the AiGatewayConfig into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) AiGatewayConfig

Deserializes the AiGatewayConfig from a dictionary.

class databricks.sdk.service.serving.AiGatewayGuardrailParameters(invalid_keywords: 'Optional[List[str]]' = None, pii: 'Optional[AiGatewayGuardrailPiiBehavior]' = None, safety: 'Optional[bool]' = None, valid_topics: 'Optional[List[str]]' = None)
invalid_keywords: List[str] | None = None

List of invalid keywords. AI guardrail uses keyword or string matching to decide if the keyword exists in the request or response content.

pii: AiGatewayGuardrailPiiBehavior | None = None

Configuration for guardrail PII filter.

safety: bool | None = None

Indicates whether the safety filter is enabled.

valid_topics: List[str] | None = None

The list of allowed topics. Given a chat request, this guardrail flags the request if its topic is not in the allowed topics.

as_dict() dict

Serializes the AiGatewayGuardrailParameters into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the AiGatewayGuardrailParameters into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) AiGatewayGuardrailParameters

Deserializes the AiGatewayGuardrailParameters from a dictionary.

class databricks.sdk.service.serving.AiGatewayGuardrailPiiBehavior(behavior: 'Optional[AiGatewayGuardrailPiiBehaviorBehavior]' = None)
behavior: AiGatewayGuardrailPiiBehaviorBehavior | None = None

Configuration for input guardrail filters.

as_dict() dict

Serializes the AiGatewayGuardrailPiiBehavior into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the AiGatewayGuardrailPiiBehavior into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) AiGatewayGuardrailPiiBehavior

Deserializes the AiGatewayGuardrailPiiBehavior from a dictionary.

class databricks.sdk.service.serving.AiGatewayGuardrailPiiBehaviorBehavior
BLOCK = "BLOCK"
MASK = "MASK"
NONE = "NONE"
class databricks.sdk.service.serving.AiGatewayGuardrails(input: 'Optional[AiGatewayGuardrailParameters]' = None, output: 'Optional[AiGatewayGuardrailParameters]' = None)
input: AiGatewayGuardrailParameters | None = None

Configuration for input guardrail filters.

output: AiGatewayGuardrailParameters | None = None

Configuration for output guardrail filters.

as_dict() dict

Serializes the AiGatewayGuardrails into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the AiGatewayGuardrails into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) AiGatewayGuardrails

Deserializes the AiGatewayGuardrails from a dictionary.

class databricks.sdk.service.serving.AiGatewayInferenceTableConfig(catalog_name: 'Optional[str]' = None, enabled: 'Optional[bool]' = None, schema_name: 'Optional[str]' = None, table_name_prefix: 'Optional[str]' = None)
catalog_name: str | None = None

The name of the catalog in Unity Catalog. Required when enabling inference tables. NOTE: On update, you have to disable inference table first in order to change the catalog name.

enabled: bool | None = None

Indicates whether the inference table is enabled.

schema_name: str | None = None

The name of the schema in Unity Catalog. Required when enabling inference tables. NOTE: On update, you have to disable inference table first in order to change the schema name.

table_name_prefix: str | None = None

The prefix of the table in Unity Catalog. NOTE: On update, you have to disable inference table first in order to change the prefix name.

as_dict() dict

Serializes the AiGatewayInferenceTableConfig into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the AiGatewayInferenceTableConfig into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) AiGatewayInferenceTableConfig

Deserializes the AiGatewayInferenceTableConfig from a dictionary.

class databricks.sdk.service.serving.AiGatewayRateLimit(renewal_period: 'AiGatewayRateLimitRenewalPeriod', calls: 'Optional[int]' = None, key: 'Optional[AiGatewayRateLimitKey]' = None, principal: 'Optional[str]' = None, tokens: 'Optional[int]' = None)
renewal_period: AiGatewayRateLimitRenewalPeriod

Renewal period field for a rate limit. Currently, only ‘minute’ is supported.

calls: int | None = None

Used to specify how many calls are allowed for a key within the renewal_period.

key: AiGatewayRateLimitKey | None = None

Key field for a rate limit. Currently, ‘user’, ‘user_group, ‘service_principal’, and ‘endpoint’ are supported, with ‘endpoint’ being the default if not specified.

principal: str | None = None

Principal field for a user, user group, or service principal to apply rate limiting to. Accepts a user email, group name, or service principal application ID.

tokens: int | None = None

Used to specify how many tokens are allowed for a key within the renewal_period.

as_dict() dict

Serializes the AiGatewayRateLimit into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the AiGatewayRateLimit into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) AiGatewayRateLimit

Deserializes the AiGatewayRateLimit from a dictionary.

class databricks.sdk.service.serving.AiGatewayRateLimitKey
ENDPOINT = "ENDPOINT"
SERVICE_PRINCIPAL = "SERVICE_PRINCIPAL"
USER = "USER"
USER_GROUP = "USER_GROUP"
class databricks.sdk.service.serving.AiGatewayRateLimitRenewalPeriod
MINUTE = "MINUTE"
class databricks.sdk.service.serving.AiGatewayUsageTrackingConfig(enabled: 'Optional[bool]' = None)
enabled: bool | None = None

Whether to enable usage tracking.

as_dict() dict

Serializes the AiGatewayUsageTrackingConfig into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the AiGatewayUsageTrackingConfig into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) AiGatewayUsageTrackingConfig

Deserializes the AiGatewayUsageTrackingConfig from a dictionary.

class databricks.sdk.service.serving.AmazonBedrockConfig(aws_region: 'str', bedrock_provider: 'AmazonBedrockConfigBedrockProvider', aws_access_key_id: 'Optional[str]' = None, aws_access_key_id_plaintext: 'Optional[str]' = None, aws_secret_access_key: 'Optional[str]' = None, aws_secret_access_key_plaintext: 'Optional[str]' = None, instance_profile_arn: 'Optional[str]' = None)
aws_region: str

The AWS region to use. Bedrock has to be enabled there.

bedrock_provider: AmazonBedrockConfigBedrockProvider

The underlying provider in Amazon Bedrock. Supported values (case insensitive) include: Anthropic, Cohere, AI21Labs, Amazon.

aws_access_key_id: str | None = None

The Databricks secret key reference for an AWS access key ID with permissions to interact with Bedrock services. If you prefer to paste your API key directly, see aws_access_key_id_plaintext. You must provide an API key using one of the following fields: aws_access_key_id or aws_access_key_id_plaintext.

aws_access_key_id_plaintext: str | None = None

An AWS access key ID with permissions to interact with Bedrock services provided as a plaintext string. If you prefer to reference your key using Databricks Secrets, see aws_access_key_id. You must provide an API key using one of the following fields: aws_access_key_id or aws_access_key_id_plaintext.

aws_secret_access_key: str | None = None

The Databricks secret key reference for an AWS secret access key paired with the access key ID, with permissions to interact with Bedrock services. If you prefer to paste your API key directly, see aws_secret_access_key_plaintext. You must provide an API key using one of the following fields: aws_secret_access_key or aws_secret_access_key_plaintext.

aws_secret_access_key_plaintext: str | None = None

An AWS secret access key paired with the access key ID, with permissions to interact with Bedrock services provided as a plaintext string. If you prefer to reference your key using Databricks Secrets, see aws_secret_access_key. You must provide an API key using one of the following fields: aws_secret_access_key or aws_secret_access_key_plaintext.

instance_profile_arn: str | None = None

ARN of the instance profile that the external model will use to access AWS resources. You must authenticate using an instance profile or access keys. If you prefer to authenticate using access keys, see aws_access_key_id, aws_access_key_id_plaintext, aws_secret_access_key and aws_secret_access_key_plaintext.

as_dict() dict

Serializes the AmazonBedrockConfig into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the AmazonBedrockConfig into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) AmazonBedrockConfig

Deserializes the AmazonBedrockConfig from a dictionary.

class databricks.sdk.service.serving.AmazonBedrockConfigBedrockProvider
AI21LABS = "AI21LABS"
AMAZON = "AMAZON"
ANTHROPIC = "ANTHROPIC"
COHERE = "COHERE"
class databricks.sdk.service.serving.AnthropicConfig(anthropic_api_key: 'Optional[str]' = None, anthropic_api_key_plaintext: 'Optional[str]' = None)
anthropic_api_key: str | None = None

The Databricks secret key reference for an Anthropic API key. If you prefer to paste your API key directly, see anthropic_api_key_plaintext. You must provide an API key using one of the following fields: anthropic_api_key or anthropic_api_key_plaintext.

anthropic_api_key_plaintext: str | None = None

The Anthropic API key provided as a plaintext string. If you prefer to reference your key using Databricks Secrets, see anthropic_api_key. You must provide an API key using one of the following fields: anthropic_api_key or anthropic_api_key_plaintext.

as_dict() dict

Serializes the AnthropicConfig into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the AnthropicConfig into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) AnthropicConfig

Deserializes the AnthropicConfig from a dictionary.

class databricks.sdk.service.serving.ApiKeyAuth(key: 'str', value: 'Optional[str]' = None, value_plaintext: 'Optional[str]' = None)
key: str

The name of the API key parameter used for authentication.

value: str | None = None

The Databricks secret key reference for an API Key. If you prefer to paste your token directly, see value_plaintext.

value_plaintext: str | None = None

The API Key provided as a plaintext string. If you prefer to reference your token using Databricks Secrets, see value.

as_dict() dict

Serializes the ApiKeyAuth into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the ApiKeyAuth into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) ApiKeyAuth

Deserializes the ApiKeyAuth from a dictionary.

class databricks.sdk.service.serving.AutoCaptureConfigInput(catalog_name: str | None = None, enabled: bool | None = None, schema_name: str | None = None, table_name_prefix: str | None = None)

Deprecated: legacy inference table configuration. Please use AI Gateway inference tables instead. See https://docs.databricks.com/aws/en/ai-gateway/inference-tables.

catalog_name: str | None = None

The name of the catalog in Unity Catalog. NOTE: On update, you cannot change the catalog name if the inference table is already enabled.

enabled: bool | None = None

Indicates whether the inference table is enabled.

schema_name: str | None = None

The name of the schema in Unity Catalog. NOTE: On update, you cannot change the schema name if the inference table is already enabled.

table_name_prefix: str | None = None

The prefix of the table in Unity Catalog. NOTE: On update, you cannot change the prefix name if the inference table is already enabled.

as_dict() dict

Serializes the AutoCaptureConfigInput into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the AutoCaptureConfigInput into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) AutoCaptureConfigInput

Deserializes the AutoCaptureConfigInput from a dictionary.

class databricks.sdk.service.serving.AutoCaptureConfigOutput(catalog_name: str | None = None, enabled: bool | None = None, schema_name: str | None = None, state: AutoCaptureState | None = None, table_name_prefix: str | None = None)

Deprecated: legacy inference table configuration. Please use AI Gateway inference tables instead. See https://docs.databricks.com/aws/en/ai-gateway/inference-tables.

catalog_name: str | None = None

The name of the catalog in Unity Catalog. NOTE: On update, you cannot change the catalog name if the inference table is already enabled.

enabled: bool | None = None

Indicates whether the inference table is enabled.

schema_name: str | None = None

The name of the schema in Unity Catalog. NOTE: On update, you cannot change the schema name if the inference table is already enabled.

state: AutoCaptureState | None = None
table_name_prefix: str | None = None

The prefix of the table in Unity Catalog. NOTE: On update, you cannot change the prefix name if the inference table is already enabled.

as_dict() dict

Serializes the AutoCaptureConfigOutput into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the AutoCaptureConfigOutput into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) AutoCaptureConfigOutput

Deserializes the AutoCaptureConfigOutput from a dictionary.

class databricks.sdk.service.serving.AutoCaptureState(payload_table: 'Optional[PayloadTable]' = None)
payload_table: PayloadTable | None = None
as_dict() dict

Serializes the AutoCaptureState into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the AutoCaptureState into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) AutoCaptureState

Deserializes the AutoCaptureState from a dictionary.

class databricks.sdk.service.serving.BearerTokenAuth(token: 'Optional[str]' = None, token_plaintext: 'Optional[str]' = None)
token: str | None = None

The Databricks secret key reference for a token. If you prefer to paste your token directly, see token_plaintext.

token_plaintext: str | None = None

The token provided as a plaintext string. If you prefer to reference your token using Databricks Secrets, see token.

as_dict() dict

Serializes the BearerTokenAuth into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the BearerTokenAuth into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) BearerTokenAuth

Deserializes the BearerTokenAuth from a dictionary.

class databricks.sdk.service.serving.BuildLogsResponse(logs: 'str')
logs: str

The logs associated with building the served entity’s environment.

as_dict() dict

Serializes the BuildLogsResponse into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the BuildLogsResponse into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) BuildLogsResponse

Deserializes the BuildLogsResponse from a dictionary.

class databricks.sdk.service.serving.ChatMessage(content: 'Optional[str]' = None, role: 'Optional[ChatMessageRole]' = None)
content: str | None = None

The content of the message.

role: ChatMessageRole | None = None

The role of the message. One of [system, user, assistant].

as_dict() dict

Serializes the ChatMessage into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the ChatMessage into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) ChatMessage

Deserializes the ChatMessage from a dictionary.

class databricks.sdk.service.serving.ChatMessageRole

The role of the message. One of [system, user, assistant].

ASSISTANT = "ASSISTANT"
SYSTEM = "SYSTEM"
USER = "USER"
class databricks.sdk.service.serving.CohereConfig(cohere_api_base: 'Optional[str]' = None, cohere_api_key: 'Optional[str]' = None, cohere_api_key_plaintext: 'Optional[str]' = None)
cohere_api_base: str | None = None

This is an optional field to provide a customized base URL for the Cohere API. If left unspecified, the standard Cohere base URL is used.

cohere_api_key: str | None = None

The Databricks secret key reference for a Cohere API key. If you prefer to paste your API key directly, see cohere_api_key_plaintext. You must provide an API key using one of the following fields: cohere_api_key or cohere_api_key_plaintext.

cohere_api_key_plaintext: str | None = None

The Cohere API key provided as a plaintext string. If you prefer to reference your key using Databricks Secrets, see cohere_api_key. You must provide an API key using one of the following fields: cohere_api_key or cohere_api_key_plaintext.

as_dict() dict

Serializes the CohereConfig into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the CohereConfig into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) CohereConfig

Deserializes the CohereConfig from a dictionary.

class databricks.sdk.service.serving.CustomProviderConfig(custom_provider_url: str, api_key_auth: ApiKeyAuth | None = None, bearer_token_auth: BearerTokenAuth | None = None)

Configs needed to create a custom provider model route.

custom_provider_url: str

This is a field to provide the URL of the custom provider API.

api_key_auth: ApiKeyAuth | None = None

This is a field to provide API key authentication for the custom provider API. You can only specify one authentication method.

bearer_token_auth: BearerTokenAuth | None = None

This is a field to provide bearer token authentication for the custom provider API. You can only specify one authentication method.

as_dict() dict

Serializes the CustomProviderConfig into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the CustomProviderConfig into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) CustomProviderConfig

Deserializes the CustomProviderConfig from a dictionary.

class databricks.sdk.service.serving.DataPlaneInfo(authorization_details: str | None = None, endpoint_url: str | None = None)

Details necessary to query this object’s API through the DataPlane APIs.

authorization_details: str | None = None

Authorization details as a string.

endpoint_url: str | None = None

The URL of the endpoint for this operation in the dataplane.

as_dict() dict

Serializes the DataPlaneInfo into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the DataPlaneInfo into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) DataPlaneInfo

Deserializes the DataPlaneInfo from a dictionary.

class databricks.sdk.service.serving.DatabricksModelServingConfig(databricks_workspace_url: 'str', databricks_api_token: 'Optional[str]' = None, databricks_api_token_plaintext: 'Optional[str]' = None)
databricks_workspace_url: str

The URL of the Databricks workspace containing the model serving endpoint pointed to by this external model.

databricks_api_token: str | None = None

The Databricks secret key reference for a Databricks API token that corresponds to a user or service principal with Can Query access to the model serving endpoint pointed to by this external model. If you prefer to paste your API key directly, see databricks_api_token_plaintext. You must provide an API key using one of the following fields: databricks_api_token or databricks_api_token_plaintext.

databricks_api_token_plaintext: str | None = None

The Databricks API token that corresponds to a user or service principal with Can Query access to the model serving endpoint pointed to by this external model provided as a plaintext string. If you prefer to reference your key using Databricks Secrets, see databricks_api_token. You must provide an API key using one of the following fields: databricks_api_token or databricks_api_token_plaintext.

as_dict() dict

Serializes the DatabricksModelServingConfig into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the DatabricksModelServingConfig into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) DatabricksModelServingConfig

Deserializes the DatabricksModelServingConfig from a dictionary.

class databricks.sdk.service.serving.DataframeSplitInput(columns: 'Optional[List[Any]]' = None, data: 'Optional[List[Any]]' = None, index: 'Optional[List[int]]' = None)
columns: List[Any] | None = None

Columns array for the dataframe

data: List[Any] | None = None

Data array for the dataframe

index: List[int] | None = None

Index array for the dataframe

as_dict() dict

Serializes the DataframeSplitInput into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the DataframeSplitInput into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) DataframeSplitInput

Deserializes the DataframeSplitInput from a dictionary.

class databricks.sdk.service.serving.EmailNotifications(on_update_failure: 'Optional[List[str]]' = None, on_update_success: 'Optional[List[str]]' = None)
on_update_failure: List[str] | None = None

A list of email addresses to be notified when an endpoint fails to update its configuration or state.

on_update_success: List[str] | None = None

A list of email addresses to be notified when an endpoint successfully updates its configuration or state.

as_dict() dict

Serializes the EmailNotifications into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the EmailNotifications into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) EmailNotifications

Deserializes the EmailNotifications from a dictionary.

class databricks.sdk.service.serving.EmbeddingsV1ResponseEmbeddingElement(embedding: 'Optional[List[float]]' = None, index: 'Optional[int]' = None, object: 'Optional[EmbeddingsV1ResponseEmbeddingElementObject]' = None)
embedding: List[float] | None = None

The embedding vector

index: int | None = None

The index of the embedding in the response.

object: EmbeddingsV1ResponseEmbeddingElementObject | None = None

This will always be ‘embedding’.

as_dict() dict

Serializes the EmbeddingsV1ResponseEmbeddingElement into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the EmbeddingsV1ResponseEmbeddingElement into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) EmbeddingsV1ResponseEmbeddingElement

Deserializes the EmbeddingsV1ResponseEmbeddingElement from a dictionary.

class databricks.sdk.service.serving.EmbeddingsV1ResponseEmbeddingElementObject

This will always be ‘embedding’.

EMBEDDING = "EMBEDDING"
class databricks.sdk.service.serving.EndpointCoreConfigInput(name: 'str', auto_capture_config: 'Optional[AutoCaptureConfigInput]' = None, served_entities: 'Optional[List[ServedEntityInput]]' = None, served_models: 'Optional[List[ServedModelInput]]' = None, traffic_config: 'Optional[TrafficConfig]' = None)
name: str

The name of the serving endpoint to update. This field is required.

auto_capture_config: AutoCaptureConfigInput | None = None

Configuration for legacy Inference Tables which automatically log requests and responses to Unity Catalog. Deprecated: please use AI Gateway inference tables instead. See https://docs.databricks.com/aws/en/ai-gateway/inference-tables.

served_entities: List[ServedEntityInput] | None = None

The list of served entities under the serving endpoint config.

served_models: List[ServedModelInput] | None = None

(Deprecated, use served_entities instead) The list of served models under the serving endpoint config.

traffic_config: TrafficConfig | None = None

The traffic configuration associated with the serving endpoint config.

as_dict() dict

Serializes the EndpointCoreConfigInput into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the EndpointCoreConfigInput into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) EndpointCoreConfigInput

Deserializes the EndpointCoreConfigInput from a dictionary.

class databricks.sdk.service.serving.EndpointCoreConfigOutput(auto_capture_config: 'Optional[AutoCaptureConfigOutput]' = None, config_version: 'Optional[int]' = None, served_entities: 'Optional[List[ServedEntityOutput]]' = None, served_models: 'Optional[List[ServedModelOutput]]' = None, traffic_config: 'Optional[TrafficConfig]' = None)
auto_capture_config: AutoCaptureConfigOutput | None = None

Configuration for legacy Inference Tables which automatically log requests and responses to Unity Catalog. Deprecated: please use AI Gateway inference tables instead. See https://docs.databricks.com/aws/en/ai-gateway/inference-tables.

config_version: int | None = None

The config version that the serving endpoint is currently serving.

served_entities: List[ServedEntityOutput] | None = None

The list of served entities under the serving endpoint config.

served_models: List[ServedModelOutput] | None = None

(Deprecated, use served_entities instead) The list of served models under the serving endpoint config.

traffic_config: TrafficConfig | None = None

The traffic configuration associated with the serving endpoint config.

as_dict() dict

Serializes the EndpointCoreConfigOutput into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the EndpointCoreConfigOutput into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) EndpointCoreConfigOutput

Deserializes the EndpointCoreConfigOutput from a dictionary.

class databricks.sdk.service.serving.EndpointCoreConfigSummary(served_entities: 'Optional[List[ServedEntitySpec]]' = None, served_models: 'Optional[List[ServedModelSpec]]' = None)
served_entities: List[ServedEntitySpec] | None = None

The list of served entities under the serving endpoint config.

served_models: List[ServedModelSpec] | None = None

(Deprecated, use served_entities instead) The list of served models under the serving endpoint config.

as_dict() dict

Serializes the EndpointCoreConfigSummary into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the EndpointCoreConfigSummary into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) EndpointCoreConfigSummary

Deserializes the EndpointCoreConfigSummary from a dictionary.

class databricks.sdk.service.serving.EndpointPendingConfig(auto_capture_config: 'Optional[AutoCaptureConfigOutput]' = None, config_version: 'Optional[int]' = None, served_entities: 'Optional[List[ServedEntityOutput]]' = None, served_models: 'Optional[List[ServedModelOutput]]' = None, start_time: 'Optional[int]' = None, traffic_config: 'Optional[TrafficConfig]' = None)
auto_capture_config: AutoCaptureConfigOutput | None = None

Configuration for legacy Inference Tables which automatically log requests and responses to Unity Catalog. Deprecated: please use AI Gateway inference tables instead. See https://docs.databricks.com/aws/en/ai-gateway/inference-tables.

config_version: int | None = None

The config version that the serving endpoint is currently serving.

served_entities: List[ServedEntityOutput] | None = None

The list of served entities belonging to the last issued update to the serving endpoint.

served_models: List[ServedModelOutput] | None = None

(Deprecated, use served_entities instead) The list of served models belonging to the last issued update to the serving endpoint.

start_time: int | None = None

The timestamp when the update to the pending config started.

traffic_config: TrafficConfig | None = None

The traffic config defining how invocations to the serving endpoint should be routed.

as_dict() dict

Serializes the EndpointPendingConfig into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the EndpointPendingConfig into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) EndpointPendingConfig

Deserializes the EndpointPendingConfig from a dictionary.

class databricks.sdk.service.serving.EndpointState(config_update: 'Optional[EndpointStateConfigUpdate]' = None, ready: 'Optional[EndpointStateReady]' = None)
config_update: EndpointStateConfigUpdate | None = None

The state of an endpoint’s config update. This informs the user if the pending_config is in progress, if the update failed, or if there is no update in progress. Note that if the endpoint’s config_update state value is IN_PROGRESS, another update can not be made until the update completes or fails.

ready: EndpointStateReady | None = None

The state of an endpoint, indicating whether or not the endpoint is queryable. An endpoint is READY if all of the served entities in its active configuration are ready. If any of the actively served entities are in a non-ready state, the endpoint state will be NOT_READY.

as_dict() dict

Serializes the EndpointState into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the EndpointState into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) EndpointState

Deserializes the EndpointState from a dictionary.

class databricks.sdk.service.serving.EndpointStateConfigUpdate
IN_PROGRESS = "IN_PROGRESS"
NOT_UPDATING = "NOT_UPDATING"
UPDATE_CANCELED = "UPDATE_CANCELED"
UPDATE_FAILED = "UPDATE_FAILED"
class databricks.sdk.service.serving.EndpointStateReady
NOT_READY = "NOT_READY"
READY = "READY"
class databricks.sdk.service.serving.EndpointTag(key: 'str', value: 'Optional[str]' = None)
key: str

Key field for a serving endpoint tag.

value: str | None = None

Optional value field for a serving endpoint tag.

as_dict() dict

Serializes the EndpointTag into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the EndpointTag into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) EndpointTag

Deserializes the EndpointTag from a dictionary.

class databricks.sdk.service.serving.EndpointTags(tags: 'Optional[List[EndpointTag]]' = None)
tags: List[EndpointTag] | None = None
as_dict() dict

Serializes the EndpointTags into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the EndpointTags into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) EndpointTags

Deserializes the EndpointTags from a dictionary.

class databricks.sdk.service.serving.ExportMetricsResponse(contents: 'Optional[BinaryIO]' = None)
contents: BinaryIO | None = None
as_dict() dict

Serializes the ExportMetricsResponse into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the ExportMetricsResponse into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) ExportMetricsResponse

Deserializes the ExportMetricsResponse from a dictionary.

class databricks.sdk.service.serving.ExternalFunctionRequestHttpMethod
DELETE = "DELETE"
GET = "GET"
PATCH = "PATCH"
POST = "POST"
PUT = "PUT"
class databricks.sdk.service.serving.ExternalModel(provider: 'ExternalModelProvider', name: 'str', task: 'str', ai21labs_config: 'Optional[Ai21LabsConfig]' = None, amazon_bedrock_config: 'Optional[AmazonBedrockConfig]' = None, anthropic_config: 'Optional[AnthropicConfig]' = None, cohere_config: 'Optional[CohereConfig]' = None, custom_provider_config: 'Optional[CustomProviderConfig]' = None, databricks_model_serving_config: 'Optional[DatabricksModelServingConfig]' = None, google_cloud_vertex_ai_config: 'Optional[GoogleCloudVertexAiConfig]' = None, openai_config: 'Optional[OpenAiConfig]' = None, palm_config: 'Optional[PaLmConfig]' = None)
provider: ExternalModelProvider

The name of the provider for the external model. Currently, the supported providers are ‘ai21labs’, ‘anthropic’, ‘amazon-bedrock’, ‘cohere’, ‘databricks-model-serving’, ‘google-cloud-vertex-ai’, ‘openai’, ‘palm’, and ‘custom’.

name: str

The name of the external model.

task: str

The task type of the external model.

ai21labs_config: Ai21LabsConfig | None = None

AI21Labs Config. Only required if the provider is ‘ai21labs’.

amazon_bedrock_config: AmazonBedrockConfig | None = None

Amazon Bedrock Config. Only required if the provider is ‘amazon-bedrock’.

anthropic_config: AnthropicConfig | None = None

Anthropic Config. Only required if the provider is ‘anthropic’.

cohere_config: CohereConfig | None = None

Cohere Config. Only required if the provider is ‘cohere’.

custom_provider_config: CustomProviderConfig | None = None

Custom Provider Config. Only required if the provider is ‘custom’.

databricks_model_serving_config: DatabricksModelServingConfig | None = None

Databricks Model Serving Config. Only required if the provider is ‘databricks-model-serving’.

google_cloud_vertex_ai_config: GoogleCloudVertexAiConfig | None = None

Google Cloud Vertex AI Config. Only required if the provider is ‘google-cloud-vertex-ai’.

openai_config: OpenAiConfig | None = None

OpenAI Config. Only required if the provider is ‘openai’.

palm_config: PaLmConfig | None = None

PaLM Config. Only required if the provider is ‘palm’.

as_dict() dict

Serializes the ExternalModel into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the ExternalModel into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) ExternalModel

Deserializes the ExternalModel from a dictionary.

class databricks.sdk.service.serving.ExternalModelProvider
AI21LABS = "AI21LABS"
AMAZON_BEDROCK = "AMAZON_BEDROCK"
ANTHROPIC = "ANTHROPIC"
COHERE = "COHERE"
CUSTOM = "CUSTOM"
DATABRICKS_MODEL_SERVING = "DATABRICKS_MODEL_SERVING"
GOOGLE_CLOUD_VERTEX_AI = "GOOGLE_CLOUD_VERTEX_AI"
OPENAI = "OPENAI"
PALM = "PALM"
class databricks.sdk.service.serving.ExternalModelUsageElement(completion_tokens: 'Optional[int]' = None, prompt_tokens: 'Optional[int]' = None, total_tokens: 'Optional[int]' = None)
completion_tokens: int | None = None

The number of tokens in the chat/completions response.

prompt_tokens: int | None = None

The number of tokens in the prompt.

total_tokens: int | None = None

The total number of tokens in the prompt and response.

as_dict() dict

Serializes the ExternalModelUsageElement into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the ExternalModelUsageElement into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) ExternalModelUsageElement

Deserializes the ExternalModelUsageElement from a dictionary.

class databricks.sdk.service.serving.FallbackConfig(enabled: 'bool')
enabled: bool

Whether to enable traffic fallback. When a served entity in the serving endpoint returns specific error codes (e.g. 500), the request will automatically be round-robin attempted with other served entities in the same endpoint, following the order of served entity list, until a successful response is returned. If all attempts fail, return the last response with the error code.

as_dict() dict

Serializes the FallbackConfig into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the FallbackConfig into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) FallbackConfig

Deserializes the FallbackConfig from a dictionary.

class databricks.sdk.service.serving.FoundationModel(description: str | None = None, display_name: str | None = None, docs: str | None = None, name: str | None = None)

All fields are not sensitive as they are hard-coded in the system and made available to customers.

description: str | None = None
display_name: str | None = None
docs: str | None = None
name: str | None = None
as_dict() dict

Serializes the FoundationModel into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the FoundationModel into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) FoundationModel

Deserializes the FoundationModel from a dictionary.

class databricks.sdk.service.serving.GetOpenApiResponse(contents: 'Optional[BinaryIO]' = None)
contents: BinaryIO | None = None
as_dict() dict

Serializes the GetOpenApiResponse into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the GetOpenApiResponse into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) GetOpenApiResponse

Deserializes the GetOpenApiResponse from a dictionary.

class databricks.sdk.service.serving.GetServingEndpointPermissionLevelsResponse(permission_levels: 'Optional[List[ServingEndpointPermissionsDescription]]' = None)
permission_levels: List[ServingEndpointPermissionsDescription] | None = None

Specific permission levels

as_dict() dict

Serializes the GetServingEndpointPermissionLevelsResponse into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the GetServingEndpointPermissionLevelsResponse into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) GetServingEndpointPermissionLevelsResponse

Deserializes the GetServingEndpointPermissionLevelsResponse from a dictionary.

class databricks.sdk.service.serving.GoogleCloudVertexAiConfig(project_id: 'str', region: 'str', private_key: 'Optional[str]' = None, private_key_plaintext: 'Optional[str]' = None)
project_id: str

This is the Google Cloud project id that the service account is associated with.

region: str

This is the region for the Google Cloud Vertex AI Service. See [supported regions] for more details. Some models are only available in specific regions.

[supported regions]: https://cloud.google.com/vertex-ai/docs/general/locations

private_key: str | None = None

The Databricks secret key reference for a private key for the service account which has access to the Google Cloud Vertex AI Service. See [Best practices for managing service account keys]. If you prefer to paste your API key directly, see private_key_plaintext. You must provide an API key using one of the following fields: private_key or private_key_plaintext

[Best practices for managing service account keys]: https://cloud.google.com/iam/docs/best-practices-for-managing-service-account-keys

private_key_plaintext: str | None = None

The private key for the service account which has access to the Google Cloud Vertex AI Service provided as a plaintext secret. See [Best practices for managing service account keys]. If you prefer to reference your key using Databricks Secrets, see private_key. You must provide an API key using one of the following fields: private_key or private_key_plaintext.

[Best practices for managing service account keys]: https://cloud.google.com/iam/docs/best-practices-for-managing-service-account-keys

as_dict() dict

Serializes the GoogleCloudVertexAiConfig into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the GoogleCloudVertexAiConfig into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) GoogleCloudVertexAiConfig

Deserializes the GoogleCloudVertexAiConfig from a dictionary.

class databricks.sdk.service.serving.HttpRequestResponse(contents: 'Optional[BinaryIO]' = None)
contents: BinaryIO | None = None
as_dict() dict

Serializes the HttpRequestResponse into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the HttpRequestResponse into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) HttpRequestResponse

Deserializes the HttpRequestResponse from a dictionary.

class databricks.sdk.service.serving.ListEndpointsResponse(endpoints: 'Optional[List[ServingEndpoint]]' = None)
endpoints: List[ServingEndpoint] | None = None

The list of endpoints.

as_dict() dict

Serializes the ListEndpointsResponse into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the ListEndpointsResponse into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) ListEndpointsResponse

Deserializes the ListEndpointsResponse from a dictionary.

class databricks.sdk.service.serving.ModelDataPlaneInfo(query_info: DataPlaneInfo | None = None)

A representation of all DataPlaneInfo for operations that can be done on a model through Data Plane APIs.

query_info: DataPlaneInfo | None = None

Information required to query DataPlane API ‘query’ endpoint.

as_dict() dict

Serializes the ModelDataPlaneInfo into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the ModelDataPlaneInfo into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) ModelDataPlaneInfo

Deserializes the ModelDataPlaneInfo from a dictionary.

class databricks.sdk.service.serving.OpenAiConfig(microsoft_entra_client_id: str | None = None, microsoft_entra_client_secret: str | None = None, microsoft_entra_client_secret_plaintext: str | None = None, microsoft_entra_tenant_id: str | None = None, openai_api_base: str | None = None, openai_api_key: str | None = None, openai_api_key_plaintext: str | None = None, openai_api_type: str | None = None, openai_api_version: str | None = None, openai_deployment_name: str | None = None, openai_organization: str | None = None)

Configs needed to create an OpenAI model route.

microsoft_entra_client_id: str | None = None

This field is only required for Azure AD OpenAI and is the Microsoft Entra Client ID.

microsoft_entra_client_secret: str | None = None

The Databricks secret key reference for a client secret used for Microsoft Entra ID authentication. If you prefer to paste your client secret directly, see microsoft_entra_client_secret_plaintext. You must provide an API key using one of the following fields: microsoft_entra_client_secret or microsoft_entra_client_secret_plaintext.

microsoft_entra_client_secret_plaintext: str | None = None

The client secret used for Microsoft Entra ID authentication provided as a plaintext string. If you prefer to reference your key using Databricks Secrets, see microsoft_entra_client_secret. You must provide an API key using one of the following fields: microsoft_entra_client_secret or microsoft_entra_client_secret_plaintext.

microsoft_entra_tenant_id: str | None = None

This field is only required for Azure AD OpenAI and is the Microsoft Entra Tenant ID.

openai_api_base: str | None = None

This is a field to provide a customized base URl for the OpenAI API. For Azure OpenAI, this field is required, and is the base URL for the Azure OpenAI API service provided by Azure. For other OpenAI API types, this field is optional, and if left unspecified, the standard OpenAI base URL is used.

openai_api_key: str | None = None

The Databricks secret key reference for an OpenAI API key using the OpenAI or Azure service. If you prefer to paste your API key directly, see openai_api_key_plaintext. You must provide an API key using one of the following fields: openai_api_key or openai_api_key_plaintext.

openai_api_key_plaintext: str | None = None

The OpenAI API key using the OpenAI or Azure service provided as a plaintext string. If you prefer to reference your key using Databricks Secrets, see openai_api_key. You must provide an API key using one of the following fields: openai_api_key or openai_api_key_plaintext.

openai_api_type: str | None = None

This is an optional field to specify the type of OpenAI API to use. For Azure OpenAI, this field is required, and adjust this parameter to represent the preferred security access validation protocol. For access token validation, use azure. For authentication using Azure Active Directory (Azure AD) use, azuread.

openai_api_version: str | None = None

This is an optional field to specify the OpenAI API version. For Azure OpenAI, this field is required, and is the version of the Azure OpenAI service to utilize, specified by a date.

openai_deployment_name: str | None = None

This field is only required for Azure OpenAI and is the name of the deployment resource for the Azure OpenAI service.

openai_organization: str | None = None

This is an optional field to specify the organization in OpenAI or Azure OpenAI.

as_dict() dict

Serializes the OpenAiConfig into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the OpenAiConfig into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) OpenAiConfig

Deserializes the OpenAiConfig from a dictionary.

class databricks.sdk.service.serving.PaLmConfig(palm_api_key: 'Optional[str]' = None, palm_api_key_plaintext: 'Optional[str]' = None)
palm_api_key: str | None = None

The Databricks secret key reference for a PaLM API key. If you prefer to paste your API key directly, see palm_api_key_plaintext. You must provide an API key using one of the following fields: palm_api_key or palm_api_key_plaintext.

palm_api_key_plaintext: str | None = None

The PaLM API key provided as a plaintext string. If you prefer to reference your key using Databricks Secrets, see palm_api_key. You must provide an API key using one of the following fields: palm_api_key or palm_api_key_plaintext.

as_dict() dict

Serializes the PaLmConfig into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the PaLmConfig into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) PaLmConfig

Deserializes the PaLmConfig from a dictionary.

class databricks.sdk.service.serving.PayloadTable(name: 'Optional[str]' = None, status: 'Optional[str]' = None, status_message: 'Optional[str]' = None)
name: str | None = None
status: str | None = None
status_message: str | None = None
as_dict() dict

Serializes the PayloadTable into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the PayloadTable into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) PayloadTable

Deserializes the PayloadTable from a dictionary.

class databricks.sdk.service.serving.PtEndpointCoreConfig(served_entities: 'Optional[List[PtServedModel]]' = None, traffic_config: 'Optional[TrafficConfig]' = None)
served_entities: List[PtServedModel] | None = None

The list of served entities under the serving endpoint config.

traffic_config: TrafficConfig | None = None
as_dict() dict

Serializes the PtEndpointCoreConfig into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the PtEndpointCoreConfig into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) PtEndpointCoreConfig

Deserializes the PtEndpointCoreConfig from a dictionary.

class databricks.sdk.service.serving.PtServedModel(entity_name: 'str', provisioned_model_units: 'int', burst_scaling_enabled: 'Optional[bool]' = None, entity_version: 'Optional[str]' = None, name: 'Optional[str]' = None)
entity_name: str

The name of the entity to be served. The entity may be a model in the Databricks Model Registry, a model in the Unity Catalog (UC), or a function of type FEATURE_SPEC in the UC. If it is a UC object, the full name of the object should be given in the form of catalog_name.schema_name.model_name.

provisioned_model_units: int

The number of model units to be provisioned.

burst_scaling_enabled: bool | None = None

Whether burst scaling is enabled. When enabled (default), the endpoint can automatically scale up beyond provisioned capacity to handle traffic spikes. When disabled, the endpoint maintains fixed capacity at provisioned_model_units.

entity_version: str | None = None
name: str | None = None

The name of a served entity. It must be unique across an endpoint. A served entity name can consist of alphanumeric characters, dashes, and underscores. If not specified for an external model, this field defaults to external_model.name, with ‘.’ and ‘:’ replaced with ‘-’, and if not specified for other entities, it defaults to entity_name-entity_version.

as_dict() dict

Serializes the PtServedModel into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the PtServedModel into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) PtServedModel

Deserializes the PtServedModel from a dictionary.

class databricks.sdk.service.serving.PutAiGatewayResponse(fallback_config: 'Optional[FallbackConfig]' = None, guardrails: 'Optional[AiGatewayGuardrails]' = None, inference_table_config: 'Optional[AiGatewayInferenceTableConfig]' = None, rate_limits: 'Optional[List[AiGatewayRateLimit]]' = None, usage_tracking_config: 'Optional[AiGatewayUsageTrackingConfig]' = None)
fallback_config: FallbackConfig | None = None

Configuration for traffic fallback which auto fallbacks to other served entities if the request to a served entity fails with certain error codes, to increase availability.

guardrails: AiGatewayGuardrails | None = None

Configuration for AI Guardrails to prevent unwanted data and unsafe data in requests and responses.

inference_table_config: AiGatewayInferenceTableConfig | None = None

Configuration for payload logging using inference tables. Use these tables to monitor and audit data being sent to and received from model APIs and to improve model quality.

rate_limits: List[AiGatewayRateLimit] | None = None

Configuration for rate limits which can be set to limit endpoint traffic.

usage_tracking_config: AiGatewayUsageTrackingConfig | None = None

Configuration to enable usage tracking using system tables. These tables allow you to monitor operational usage on endpoints and their associated costs.

as_dict() dict

Serializes the PutAiGatewayResponse into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the PutAiGatewayResponse into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) PutAiGatewayResponse

Deserializes the PutAiGatewayResponse from a dictionary.

class databricks.sdk.service.serving.PutResponse(rate_limits: 'Optional[List[RateLimit]]' = None)
rate_limits: List[RateLimit] | None = None

The list of endpoint rate limits.

as_dict() dict

Serializes the PutResponse into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the PutResponse into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) PutResponse

Deserializes the PutResponse from a dictionary.

class databricks.sdk.service.serving.QueryEndpointResponse(choices: 'Optional[List[V1ResponseChoiceElement]]' = None, created: 'Optional[int]' = None, data: 'Optional[List[EmbeddingsV1ResponseEmbeddingElement]]' = None, id: 'Optional[str]' = None, model: 'Optional[str]' = None, object: 'Optional[QueryEndpointResponseObject]' = None, outputs: 'Optional[List[any]]' = None, predictions: 'Optional[List[Any]]' = None, served_model_name: 'Optional[str]' = None, usage: 'Optional[ExternalModelUsageElement]' = None)
choices: List[V1ResponseChoiceElement] | None = None

The list of choices returned by the __chat or completions external/foundation model__ serving endpoint.

created: int | None = None

The timestamp in seconds when the query was created in Unix time returned by a __completions or chat external/foundation model__ serving endpoint.

data: List[EmbeddingsV1ResponseEmbeddingElement] | None = None

The list of the embeddings returned by the __embeddings external/foundation model__ serving endpoint.

id: str | None = None

The ID of the query that may be returned by a __completions or chat external/foundation model__ serving endpoint.

model: str | None = None

The name of the __external/foundation model__ used for querying. This is the name of the model that was specified in the endpoint config.

object: QueryEndpointResponseObject | None = None

The type of object returned by the __external/foundation model__ serving endpoint, one of [text_completion, chat.completion, list (of embeddings)].

outputs: List[any] | None = None

The outputs of the feature serving endpoint.

predictions: List[Any] | None = None

The predictions returned by the serving endpoint.

served_model_name: str | None = None

The name of the served model that served the request. This is useful when there are multiple models behind the same endpoint with traffic split.

usage: ExternalModelUsageElement | None = None

The usage object that may be returned by the __external/foundation model__ serving endpoint. This contains information about the number of tokens used in the prompt and response.

as_dict() dict

Serializes the QueryEndpointResponse into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the QueryEndpointResponse into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) QueryEndpointResponse

Deserializes the QueryEndpointResponse from a dictionary.

class databricks.sdk.service.serving.QueryEndpointResponseObject

The type of object returned by the __external/foundation model__ serving endpoint, one of [text_completion, chat.completion, list (of embeddings)].

CHAT_COMPLETION = "CHAT_COMPLETION"
LIST = "LIST"
TEXT_COMPLETION = "TEXT_COMPLETION"
class databricks.sdk.service.serving.RateLimit(calls: 'int', renewal_period: 'RateLimitRenewalPeriod', key: 'Optional[RateLimitKey]' = None)
calls: int

Used to specify how many calls are allowed for a key within the renewal_period.

renewal_period: RateLimitRenewalPeriod

Renewal period field for a serving endpoint rate limit. Currently, only ‘minute’ is supported.

key: RateLimitKey | None = None

Key field for a serving endpoint rate limit. Currently, only ‘user’ and ‘endpoint’ are supported, with ‘endpoint’ being the default if not specified.

as_dict() dict

Serializes the RateLimit into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the RateLimit into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) RateLimit

Deserializes the RateLimit from a dictionary.

class databricks.sdk.service.serving.RateLimitKey
ENDPOINT = "ENDPOINT"
USER = "USER"
class databricks.sdk.service.serving.RateLimitRenewalPeriod
MINUTE = "MINUTE"
class databricks.sdk.service.serving.Route(traffic_percentage: 'int', served_entity_name: 'Optional[str]' = None, served_model_name: 'Optional[str]' = None)
traffic_percentage: int

The percentage of endpoint traffic to send to this route. It must be an integer between 0 and 100 inclusive.

served_entity_name: str | None = None
served_model_name: str | None = None

The name of the served model this route configures traffic for.

as_dict() dict

Serializes the Route into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the Route into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) Route

Deserializes the Route from a dictionary.

class databricks.sdk.service.serving.ServedEntityInput(burst_scaling_enabled: 'Optional[bool]' = None, entity_name: 'Optional[str]' = None, entity_version: 'Optional[str]' = None, environment_vars: 'Optional[Dict[str, str]]' = None, external_model: 'Optional[ExternalModel]' = None, instance_profile_arn: 'Optional[str]' = None, max_provisioned_concurrency: 'Optional[int]' = None, max_provisioned_throughput: 'Optional[int]' = None, min_provisioned_concurrency: 'Optional[int]' = None, min_provisioned_throughput: 'Optional[int]' = None, name: 'Optional[str]' = None, provisioned_model_units: 'Optional[int]' = None, scale_to_zero_enabled: 'Optional[bool]' = None, workload_size: 'Optional[str]' = None, workload_type: 'Optional[ServingModelWorkloadType]' = None)
burst_scaling_enabled: bool | None = None

Whether burst scaling is enabled. When enabled (default), the endpoint can automatically scale up beyond provisioned capacity to handle traffic spikes. When disabled, the endpoint maintains fixed capacity at provisioned_model_units.

entity_name: str | None = None

The name of the entity to be served. The entity may be a model in the Databricks Model Registry, a model in the Unity Catalog (UC), or a function of type FEATURE_SPEC in the UC. If it is a UC object, the full name of the object should be given in the form of catalog_name.schema_name.model_name.

entity_version: str | None = None
environment_vars: Dict[str, str] | None = None

An object containing a set of optional, user-specified environment variable key-value pairs used for serving this entity. Note: this is an experimental feature and subject to change. Example entity environment variables that refer to Databricks secrets: {“OPENAI_API_KEY”: “{{secrets/my_scope/my_key}}”, “DATABRICKS_TOKEN”: “{{secrets/my_scope2/my_key2}}”}

external_model: ExternalModel | None = None

The external model to be served. NOTE: Only one of external_model and (entity_name, entity_version, workload_size, workload_type, and scale_to_zero_enabled) can be specified with the latter set being used for custom model serving for a Databricks registered model. For an existing endpoint with external_model, it cannot be updated to an endpoint without external_model. If the endpoint is created without external_model, users cannot update it to add external_model later. The task type of all external models within an endpoint must be the same.

instance_profile_arn: str | None = None

ARN of the instance profile that the served entity uses to access AWS resources.

max_provisioned_concurrency: int | None = None

The maximum provisioned concurrency that the endpoint can scale up to. Do not use if workload_size is specified.

max_provisioned_throughput: int | None = None

The maximum tokens per second that the endpoint can scale up to.

min_provisioned_concurrency: int | None = None

The minimum provisioned concurrency that the endpoint can scale down to. Do not use if workload_size is specified.

min_provisioned_throughput: int | None = None

The minimum tokens per second that the endpoint can scale down to.

name: str | None = None

The name of a served entity. It must be unique across an endpoint. A served entity name can consist of alphanumeric characters, dashes, and underscores. If not specified for an external model, this field defaults to external_model.name, with ‘.’ and ‘:’ replaced with ‘-’, and if not specified for other entities, it defaults to entity_name-entity_version.

provisioned_model_units: int | None = None

The number of model units provisioned.

scale_to_zero_enabled: bool | None = None

Whether the compute resources for the served entity should scale down to zero.

workload_size: str | None = None

The workload size of the served entity. The workload size corresponds to a range of provisioned concurrency that the compute autoscales between. A single unit of provisioned concurrency can process one request at a time. Valid workload sizes are “Small” (4 - 4 provisioned concurrency), “Medium” (8 - 16 provisioned concurrency), and “Large” (16 - 64 provisioned concurrency). Additional custom workload sizes can also be used when available in the workspace. If scale-to-zero is enabled, the lower bound of the provisioned concurrency for each workload size is 0. Do not use if min_provisioned_concurrency and max_provisioned_concurrency are specified.

workload_type: ServingModelWorkloadType | None = None

The workload type of the served entity. The workload type selects which type of compute to use in the endpoint. The default value for this parameter is “CPU”. For deep learning workloads, GPU acceleration is available by selecting workload types like GPU_SMALL and others. See the available [GPU types].

[GPU types]: https://docs.databricks.com/en/machine-learning/model-serving/create-manage-serving-endpoints.html#gpu-workload-types

as_dict() dict

Serializes the ServedEntityInput into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the ServedEntityInput into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) ServedEntityInput

Deserializes the ServedEntityInput from a dictionary.

class databricks.sdk.service.serving.ServedEntityOutput(burst_scaling_enabled: 'Optional[bool]' = None, creation_timestamp: 'Optional[int]' = None, creator: 'Optional[str]' = None, entity_name: 'Optional[str]' = None, entity_version: 'Optional[str]' = None, environment_vars: 'Optional[Dict[str, str]]' = None, external_model: 'Optional[ExternalModel]' = None, foundation_model: 'Optional[FoundationModel]' = None, instance_profile_arn: 'Optional[str]' = None, max_provisioned_concurrency: 'Optional[int]' = None, max_provisioned_throughput: 'Optional[int]' = None, min_provisioned_concurrency: 'Optional[int]' = None, min_provisioned_throughput: 'Optional[int]' = None, name: 'Optional[str]' = None, provisioned_model_units: 'Optional[int]' = None, scale_to_zero_enabled: 'Optional[bool]' = None, state: 'Optional[ServedModelState]' = None, workload_size: 'Optional[str]' = None, workload_type: 'Optional[ServingModelWorkloadType]' = None)
burst_scaling_enabled: bool | None = None

Whether burst scaling is enabled. When enabled (default), the endpoint can automatically scale up beyond provisioned capacity to handle traffic spikes. When disabled, the endpoint maintains fixed capacity at provisioned_model_units.

creation_timestamp: int | None = None
creator: str | None = None
entity_name: str | None = None

The name of the entity to be served. The entity may be a model in the Databricks Model Registry, a model in the Unity Catalog (UC), or a function of type FEATURE_SPEC in the UC. If it is a UC object, the full name of the object should be given in the form of catalog_name.schema_name.model_name.

entity_version: str | None = None
environment_vars: Dict[str, str] | None = None

An object containing a set of optional, user-specified environment variable key-value pairs used for serving this entity. Note: this is an experimental feature and subject to change. Example entity environment variables that refer to Databricks secrets: {“OPENAI_API_KEY”: “{{secrets/my_scope/my_key}}”, “DATABRICKS_TOKEN”: “{{secrets/my_scope2/my_key2}}”}

external_model: ExternalModel | None = None

The external model to be served. NOTE: Only one of external_model and (entity_name, entity_version, workload_size, workload_type, and scale_to_zero_enabled) can be specified with the latter set being used for custom model serving for a Databricks registered model. For an existing endpoint with external_model, it cannot be updated to an endpoint without external_model. If the endpoint is created without external_model, users cannot update it to add external_model later. The task type of all external models within an endpoint must be the same.

foundation_model: FoundationModel | None = None
instance_profile_arn: str | None = None

ARN of the instance profile that the served entity uses to access AWS resources.

max_provisioned_concurrency: int | None = None

The maximum provisioned concurrency that the endpoint can scale up to. Do not use if workload_size is specified.

max_provisioned_throughput: int | None = None

The maximum tokens per second that the endpoint can scale up to.

min_provisioned_concurrency: int | None = None

The minimum provisioned concurrency that the endpoint can scale down to. Do not use if workload_size is specified.

min_provisioned_throughput: int | None = None

The minimum tokens per second that the endpoint can scale down to.

name: str | None = None

The name of a served entity. It must be unique across an endpoint. A served entity name can consist of alphanumeric characters, dashes, and underscores. If not specified for an external model, this field defaults to external_model.name, with ‘.’ and ‘:’ replaced with ‘-’, and if not specified for other entities, it defaults to entity_name-entity_version.

provisioned_model_units: int | None = None

The number of model units provisioned.

scale_to_zero_enabled: bool | None = None

Whether the compute resources for the served entity should scale down to zero.

state: ServedModelState | None = None
workload_size: str | None = None

The workload size of the served entity. The workload size corresponds to a range of provisioned concurrency that the compute autoscales between. A single unit of provisioned concurrency can process one request at a time. Valid workload sizes are “Small” (4 - 4 provisioned concurrency), “Medium” (8 - 16 provisioned concurrency), and “Large” (16 - 64 provisioned concurrency). Additional custom workload sizes can also be used when available in the workspace. If scale-to-zero is enabled, the lower bound of the provisioned concurrency for each workload size is 0. Do not use if min_provisioned_concurrency and max_provisioned_concurrency are specified.

workload_type: ServingModelWorkloadType | None = None

The workload type of the served entity. The workload type selects which type of compute to use in the endpoint. The default value for this parameter is “CPU”. For deep learning workloads, GPU acceleration is available by selecting workload types like GPU_SMALL and others. See the available [GPU types].

[GPU types]: https://docs.databricks.com/en/machine-learning/model-serving/create-manage-serving-endpoints.html#gpu-workload-types

as_dict() dict

Serializes the ServedEntityOutput into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the ServedEntityOutput into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) ServedEntityOutput

Deserializes the ServedEntityOutput from a dictionary.

class databricks.sdk.service.serving.ServedEntitySpec(entity_name: 'Optional[str]' = None, entity_version: 'Optional[str]' = None, external_model: 'Optional[ExternalModel]' = None, foundation_model: 'Optional[FoundationModel]' = None, name: 'Optional[str]' = None)
entity_name: str | None = None
entity_version: str | None = None
external_model: ExternalModel | None = None
foundation_model: FoundationModel | None = None
name: str | None = None
as_dict() dict

Serializes the ServedEntitySpec into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the ServedEntitySpec into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) ServedEntitySpec

Deserializes the ServedEntitySpec from a dictionary.

class databricks.sdk.service.serving.ServedModelInput(scale_to_zero_enabled: 'bool', model_name: 'str', model_version: 'str', burst_scaling_enabled: 'Optional[bool]' = None, environment_vars: 'Optional[Dict[str, str]]' = None, instance_profile_arn: 'Optional[str]' = None, max_provisioned_concurrency: 'Optional[int]' = None, max_provisioned_throughput: 'Optional[int]' = None, min_provisioned_concurrency: 'Optional[int]' = None, min_provisioned_throughput: 'Optional[int]' = None, name: 'Optional[str]' = None, provisioned_model_units: 'Optional[int]' = None, workload_size: 'Optional[str]' = None, workload_type: 'Optional[ServedModelInputWorkloadType]' = None)
scale_to_zero_enabled: bool

Whether the compute resources for the served entity should scale down to zero.

model_name: str
model_version: str
burst_scaling_enabled: bool | None = None

Whether burst scaling is enabled. When enabled (default), the endpoint can automatically scale up beyond provisioned capacity to handle traffic spikes. When disabled, the endpoint maintains fixed capacity at provisioned_model_units.

environment_vars: Dict[str, str] | None = None

An object containing a set of optional, user-specified environment variable key-value pairs used for serving this entity. Note: this is an experimental feature and subject to change. Example entity environment variables that refer to Databricks secrets: {“OPENAI_API_KEY”: “{{secrets/my_scope/my_key}}”, “DATABRICKS_TOKEN”: “{{secrets/my_scope2/my_key2}}”}

instance_profile_arn: str | None = None

ARN of the instance profile that the served entity uses to access AWS resources.

max_provisioned_concurrency: int | None = None

The maximum provisioned concurrency that the endpoint can scale up to. Do not use if workload_size is specified.

max_provisioned_throughput: int | None = None

The maximum tokens per second that the endpoint can scale up to.

min_provisioned_concurrency: int | None = None

The minimum provisioned concurrency that the endpoint can scale down to. Do not use if workload_size is specified.

min_provisioned_throughput: int | None = None

The minimum tokens per second that the endpoint can scale down to.

name: str | None = None

The name of a served entity. It must be unique across an endpoint. A served entity name can consist of alphanumeric characters, dashes, and underscores. If not specified for an external model, this field defaults to external_model.name, with ‘.’ and ‘:’ replaced with ‘-’, and if not specified for other entities, it defaults to entity_name-entity_version.

provisioned_model_units: int | None = None

The number of model units provisioned.

workload_size: str | None = None

The workload size of the served entity. The workload size corresponds to a range of provisioned concurrency that the compute autoscales between. A single unit of provisioned concurrency can process one request at a time. Valid workload sizes are “Small” (4 - 4 provisioned concurrency), “Medium” (8 - 16 provisioned concurrency), and “Large” (16 - 64 provisioned concurrency). Additional custom workload sizes can also be used when available in the workspace. If scale-to-zero is enabled, the lower bound of the provisioned concurrency for each workload size is 0. Do not use if min_provisioned_concurrency and max_provisioned_concurrency are specified.

workload_type: ServedModelInputWorkloadType | None = None

The workload type of the served entity. The workload type selects which type of compute to use in the endpoint. The default value for this parameter is “CPU”. For deep learning workloads, GPU acceleration is available by selecting workload types like GPU_SMALL and others. See the available [GPU types].

[GPU types]: https://docs.databricks.com/en/machine-learning/model-serving/create-manage-serving-endpoints.html#gpu-workload-types

as_dict() dict

Serializes the ServedModelInput into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the ServedModelInput into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) ServedModelInput

Deserializes the ServedModelInput from a dictionary.

class databricks.sdk.service.serving.ServedModelInputWorkloadType

Please keep this in sync with with workload types in InferenceEndpointEntities.scala

CPU = "CPU"
GPU_LARGE = "GPU_LARGE"
GPU_MEDIUM = "GPU_MEDIUM"
GPU_SMALL = "GPU_SMALL"
GPU_XLARGE = "GPU_XLARGE"
MULTIGPU_MEDIUM = "MULTIGPU_MEDIUM"
class databricks.sdk.service.serving.ServedModelOutput(burst_scaling_enabled: 'Optional[bool]' = None, creation_timestamp: 'Optional[int]' = None, creator: 'Optional[str]' = None, environment_vars: 'Optional[Dict[str, str]]' = None, instance_profile_arn: 'Optional[str]' = None, max_provisioned_concurrency: 'Optional[int]' = None, min_provisioned_concurrency: 'Optional[int]' = None, model_name: 'Optional[str]' = None, model_version: 'Optional[str]' = None, name: 'Optional[str]' = None, provisioned_model_units: 'Optional[int]' = None, scale_to_zero_enabled: 'Optional[bool]' = None, state: 'Optional[ServedModelState]' = None, workload_size: 'Optional[str]' = None, workload_type: 'Optional[ServingModelWorkloadType]' = None)
burst_scaling_enabled: bool | None = None

Whether burst scaling is enabled. When enabled (default), the endpoint can automatically scale up beyond provisioned capacity to handle traffic spikes. When disabled, the endpoint maintains fixed capacity at provisioned_model_units.

creation_timestamp: int | None = None
creator: str | None = None
environment_vars: Dict[str, str] | None = None

An object containing a set of optional, user-specified environment variable key-value pairs used for serving this entity. Note: this is an experimental feature and subject to change. Example entity environment variables that refer to Databricks secrets: {“OPENAI_API_KEY”: “{{secrets/my_scope/my_key}}”, “DATABRICKS_TOKEN”: “{{secrets/my_scope2/my_key2}}”}

instance_profile_arn: str | None = None

ARN of the instance profile that the served entity uses to access AWS resources.

max_provisioned_concurrency: int | None = None

The maximum provisioned concurrency that the endpoint can scale up to. Do not use if workload_size is specified.

min_provisioned_concurrency: int | None = None

The minimum provisioned concurrency that the endpoint can scale down to. Do not use if workload_size is specified.

model_name: str | None = None
model_version: str | None = None
name: str | None = None

The name of a served entity. It must be unique across an endpoint. A served entity name can consist of alphanumeric characters, dashes, and underscores. If not specified for an external model, this field defaults to external_model.name, with ‘.’ and ‘:’ replaced with ‘-’, and if not specified for other entities, it defaults to entity_name-entity_version.

provisioned_model_units: int | None = None

The number of model units provisioned.

scale_to_zero_enabled: bool | None = None

Whether the compute resources for the served entity should scale down to zero.

state: ServedModelState | None = None
workload_size: str | None = None

The workload size of the served entity. The workload size corresponds to a range of provisioned concurrency that the compute autoscales between. A single unit of provisioned concurrency can process one request at a time. Valid workload sizes are “Small” (4 - 4 provisioned concurrency), “Medium” (8 - 16 provisioned concurrency), and “Large” (16 - 64 provisioned concurrency). Additional custom workload sizes can also be used when available in the workspace. If scale-to-zero is enabled, the lower bound of the provisioned concurrency for each workload size is 0. Do not use if min_provisioned_concurrency and max_provisioned_concurrency are specified.

workload_type: ServingModelWorkloadType | None = None

The workload type of the served entity. The workload type selects which type of compute to use in the endpoint. The default value for this parameter is “CPU”. For deep learning workloads, GPU acceleration is available by selecting workload types like GPU_SMALL and others. See the available [GPU types].

[GPU types]: https://docs.databricks.com/en/machine-learning/model-serving/create-manage-serving-endpoints.html#gpu-workload-types

as_dict() dict

Serializes the ServedModelOutput into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the ServedModelOutput into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) ServedModelOutput

Deserializes the ServedModelOutput from a dictionary.

class databricks.sdk.service.serving.ServedModelSpec(model_name: 'Optional[str]' = None, model_version: 'Optional[str]' = None, name: 'Optional[str]' = None)
model_name: str | None = None

Only one of model_name and entity_name should be populated

model_version: str | None = None

Only one of model_version and entity_version should be populated

name: str | None = None
as_dict() dict

Serializes the ServedModelSpec into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the ServedModelSpec into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) ServedModelSpec

Deserializes the ServedModelSpec from a dictionary.

class databricks.sdk.service.serving.ServedModelState(deployment: 'Optional[ServedModelStateDeployment]' = None, deployment_state_message: 'Optional[str]' = None)
deployment: ServedModelStateDeployment | None = None
deployment_state_message: str | None = None
as_dict() dict

Serializes the ServedModelState into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the ServedModelState into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) ServedModelState

Deserializes the ServedModelState from a dictionary.

class databricks.sdk.service.serving.ServedModelStateDeployment
DEPLOYMENT_ABORTED = "DEPLOYMENT_ABORTED"
DEPLOYMENT_CREATING = "DEPLOYMENT_CREATING"
DEPLOYMENT_FAILED = "DEPLOYMENT_FAILED"
DEPLOYMENT_READY = "DEPLOYMENT_READY"
DEPLOYMENT_RECOVERING = "DEPLOYMENT_RECOVERING"
class databricks.sdk.service.serving.ServerLogsResponse(logs: 'str')
logs: str

The most recent log lines of the model server processing invocation requests.

as_dict() dict

Serializes the ServerLogsResponse into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the ServerLogsResponse into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) ServerLogsResponse

Deserializes the ServerLogsResponse from a dictionary.

class databricks.sdk.service.serving.ServingEndpoint(ai_gateway: 'Optional[AiGatewayConfig]' = None, budget_policy_id: 'Optional[str]' = None, config: 'Optional[EndpointCoreConfigSummary]' = None, creation_timestamp: 'Optional[int]' = None, creator: 'Optional[str]' = None, description: 'Optional[str]' = None, id: 'Optional[str]' = None, last_updated_timestamp: 'Optional[int]' = None, name: 'Optional[str]' = None, state: 'Optional[EndpointState]' = None, tags: 'Optional[List[EndpointTag]]' = None, task: 'Optional[str]' = None, usage_policy_id: 'Optional[str]' = None)
ai_gateway: AiGatewayConfig | None = None

The AI Gateway configuration for the serving endpoint. NOTE: External model, provisioned throughput, and pay-per-token endpoints are fully supported; agent endpoints currently only support inference tables.

budget_policy_id: str | None = None

The budget policy associated with the endpoint.

config: EndpointCoreConfigSummary | None = None

The config that is currently being served by the endpoint.

creation_timestamp: int | None = None

The timestamp when the endpoint was created in Unix time.

creator: str | None = None

The email of the user who created the serving endpoint.

description: str | None = None

Description of the endpoint

id: str | None = None

System-generated ID of the endpoint, included to be used by the Permissions API.

last_updated_timestamp: int | None = None

The timestamp when the endpoint was last updated by a user in Unix time.

name: str | None = None

The name of the serving endpoint.

state: EndpointState | None = None

Information corresponding to the state of the serving endpoint.

tags: List[EndpointTag] | None = None

Tags attached to the serving endpoint.

task: str | None = None

The task type of the serving endpoint.

usage_policy_id: str | None = None

The usage policy associated with serving endpoint.

as_dict() dict

Serializes the ServingEndpoint into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the ServingEndpoint into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) ServingEndpoint

Deserializes the ServingEndpoint from a dictionary.

class databricks.sdk.service.serving.ServingEndpointAccessControlRequest(group_name: 'Optional[str]' = None, permission_level: 'Optional[ServingEndpointPermissionLevel]' = None, service_principal_name: 'Optional[str]' = None, user_name: 'Optional[str]' = None)
group_name: str | None = None

name of the group

permission_level: ServingEndpointPermissionLevel | None = None
service_principal_name: str | None = None

application ID of a service principal

user_name: str | None = None

name of the user

as_dict() dict

Serializes the ServingEndpointAccessControlRequest into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the ServingEndpointAccessControlRequest into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) ServingEndpointAccessControlRequest

Deserializes the ServingEndpointAccessControlRequest from a dictionary.

class databricks.sdk.service.serving.ServingEndpointAccessControlResponse(all_permissions: 'Optional[List[ServingEndpointPermission]]' = None, display_name: 'Optional[str]' = None, group_name: 'Optional[str]' = None, service_principal_name: 'Optional[str]' = None, user_name: 'Optional[str]' = None)
all_permissions: List[ServingEndpointPermission] | None = None

All permissions.

display_name: str | None = None

Display name of the user or service principal.

group_name: str | None = None

name of the group

service_principal_name: str | None = None

Name of the service principal.

user_name: str | None = None

name of the user

as_dict() dict

Serializes the ServingEndpointAccessControlResponse into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the ServingEndpointAccessControlResponse into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) ServingEndpointAccessControlResponse

Deserializes the ServingEndpointAccessControlResponse from a dictionary.

class databricks.sdk.service.serving.ServingEndpointDetailed(ai_gateway: 'Optional[AiGatewayConfig]' = None, budget_policy_id: 'Optional[str]' = None, config: 'Optional[EndpointCoreConfigOutput]' = None, creation_timestamp: 'Optional[int]' = None, creator: 'Optional[str]' = None, data_plane_info: 'Optional[ModelDataPlaneInfo]' = None, description: 'Optional[str]' = None, email_notifications: 'Optional[EmailNotifications]' = None, endpoint_url: 'Optional[str]' = None, id: 'Optional[str]' = None, last_updated_timestamp: 'Optional[int]' = None, name: 'Optional[str]' = None, pending_config: 'Optional[EndpointPendingConfig]' = None, permission_level: 'Optional[ServingEndpointDetailedPermissionLevel]' = None, route_optimized: 'Optional[bool]' = None, state: 'Optional[EndpointState]' = None, tags: 'Optional[List[EndpointTag]]' = None, task: 'Optional[str]' = None)
ai_gateway: AiGatewayConfig | None = None

The AI Gateway configuration for the serving endpoint. NOTE: External model, provisioned throughput, and pay-per-token endpoints are fully supported; agent endpoints currently only support inference tables.

budget_policy_id: str | None = None

The budget policy associated with the endpoint.

config: EndpointCoreConfigOutput | None = None

The config that is currently being served by the endpoint.

creation_timestamp: int | None = None

The timestamp when the endpoint was created in Unix time.

creator: str | None = None

The email of the user who created the serving endpoint.

data_plane_info: ModelDataPlaneInfo | None = None

Information required to query DataPlane APIs.

description: str | None = None

Description of the serving model

email_notifications: EmailNotifications | None = None

Email notification settings.

endpoint_url: str | None = None

Endpoint invocation url if route optimization is enabled for endpoint

id: str | None = None

System-generated ID of the endpoint. This is used to refer to the endpoint in the Permissions API

last_updated_timestamp: int | None = None

The timestamp when the endpoint was last updated by a user in Unix time.

name: str | None = None

The name of the serving endpoint.

pending_config: EndpointPendingConfig | None = None

The config that the endpoint is attempting to update to.

permission_level: ServingEndpointDetailedPermissionLevel | None = None

The permission level of the principal making the request.

route_optimized: bool | None = None

Boolean representing if route optimization has been enabled for the endpoint

state: EndpointState | None = None

Information corresponding to the state of the serving endpoint.

tags: List[EndpointTag] | None = None

Tags attached to the serving endpoint.

task: str | None = None

The task type of the serving endpoint.

as_dict() dict

Serializes the ServingEndpointDetailed into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the ServingEndpointDetailed into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) ServingEndpointDetailed

Deserializes the ServingEndpointDetailed from a dictionary.

class databricks.sdk.service.serving.ServingEndpointDetailedPermissionLevel
CAN_MANAGE = "CAN_MANAGE"
CAN_QUERY = "CAN_QUERY"
CAN_VIEW = "CAN_VIEW"
class databricks.sdk.service.serving.ServingEndpointPermission(inherited: 'Optional[bool]' = None, inherited_from_object: 'Optional[List[str]]' = None, permission_level: 'Optional[ServingEndpointPermissionLevel]' = None)
inherited: bool | None = None
inherited_from_object: List[str] | None = None
permission_level: ServingEndpointPermissionLevel | None = None
as_dict() dict

Serializes the ServingEndpointPermission into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the ServingEndpointPermission into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) ServingEndpointPermission

Deserializes the ServingEndpointPermission from a dictionary.

class databricks.sdk.service.serving.ServingEndpointPermissionLevel

Permission level

CAN_MANAGE = "CAN_MANAGE"
CAN_QUERY = "CAN_QUERY"
CAN_VIEW = "CAN_VIEW"
class databricks.sdk.service.serving.ServingEndpointPermissions(access_control_list: 'Optional[List[ServingEndpointAccessControlResponse]]' = None, object_id: 'Optional[str]' = None, object_type: 'Optional[str]' = None)
access_control_list: List[ServingEndpointAccessControlResponse] | None = None
object_id: str | None = None
object_type: str | None = None
as_dict() dict

Serializes the ServingEndpointPermissions into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the ServingEndpointPermissions into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) ServingEndpointPermissions

Deserializes the ServingEndpointPermissions from a dictionary.

class databricks.sdk.service.serving.ServingEndpointPermissionsDescription(description: 'Optional[str]' = None, permission_level: 'Optional[ServingEndpointPermissionLevel]' = None)
description: str | None = None
permission_level: ServingEndpointPermissionLevel | None = None
as_dict() dict

Serializes the ServingEndpointPermissionsDescription into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the ServingEndpointPermissionsDescription into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) ServingEndpointPermissionsDescription

Deserializes the ServingEndpointPermissionsDescription from a dictionary.

class databricks.sdk.service.serving.ServingModelWorkloadType

Please keep this in sync with with workload types in InferenceEndpointEntities.scala

CPU = "CPU"
GPU_LARGE = "GPU_LARGE"
GPU_MEDIUM = "GPU_MEDIUM"
GPU_SMALL = "GPU_SMALL"
GPU_XLARGE = "GPU_XLARGE"
MULTIGPU_MEDIUM = "MULTIGPU_MEDIUM"
class databricks.sdk.service.serving.TrafficConfig(routes: 'Optional[List[Route]]' = None)
routes: List[Route] | None = None

The list of routes that define traffic to each served entity.

as_dict() dict

Serializes the TrafficConfig into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the TrafficConfig into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) TrafficConfig

Deserializes the TrafficConfig from a dictionary.

class databricks.sdk.service.serving.UpdateInferenceEndpointNotificationsResponse(email_notifications: 'Optional[EmailNotifications]' = None, name: 'Optional[str]' = None)
email_notifications: EmailNotifications | None = None
name: str | None = None
as_dict() dict

Serializes the UpdateInferenceEndpointNotificationsResponse into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the UpdateInferenceEndpointNotificationsResponse into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) UpdateInferenceEndpointNotificationsResponse

Deserializes the UpdateInferenceEndpointNotificationsResponse from a dictionary.

class databricks.sdk.service.serving.V1ResponseChoiceElement(finish_reason: 'Optional[str]' = None, index: 'Optional[int]' = None, logprobs: 'Optional[int]' = None, message: 'Optional[ChatMessage]' = None, text: 'Optional[str]' = None)
finish_reason: str | None = None

The finish reason returned by the endpoint.

index: int | None = None

The index of the choice in the __chat or completions__ response.

logprobs: int | None = None

The logprobs returned only by the __completions__ endpoint.

message: ChatMessage | None = None

The message response from the __chat__ endpoint.

text: str | None = None

The text response from the __completions__ endpoint.

as_dict() dict

Serializes the V1ResponseChoiceElement into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the V1ResponseChoiceElement into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) V1ResponseChoiceElement

Deserializes the V1ResponseChoiceElement from a dictionary.