Real-time Serving¶
These dataclasses are used in the SDK to represent API requests and responses for services in the databricks.sdk.service.serving module.
- class databricks.sdk.service.serving.Ai21LabsConfig(ai21labs_api_key: 'Optional[str]' = None, ai21labs_api_key_plaintext: 'Optional[str]' = None)¶
- ai21labs_api_key: str | None = None¶
The Databricks secret key reference for an AI21 Labs API key. If you prefer to paste your API key directly, see ai21labs_api_key_plaintext. You must provide an API key using one of the following fields: ai21labs_api_key or ai21labs_api_key_plaintext.
- ai21labs_api_key_plaintext: str | None = None¶
An AI21 Labs API key provided as a plaintext string. If you prefer to reference your key using Databricks Secrets, see ai21labs_api_key. You must provide an API key using one of the following fields: ai21labs_api_key or ai21labs_api_key_plaintext.
- as_dict() dict¶
Serializes the Ai21LabsConfig into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the Ai21LabsConfig into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) Ai21LabsConfig¶
Deserializes the Ai21LabsConfig from a dictionary.
- class databricks.sdk.service.serving.AiGatewayConfig(fallback_config: 'Optional[FallbackConfig]' = None, guardrails: 'Optional[AiGatewayGuardrails]' = None, inference_table_config: 'Optional[AiGatewayInferenceTableConfig]' = None, rate_limits: 'Optional[List[AiGatewayRateLimit]]' = None, usage_tracking_config: 'Optional[AiGatewayUsageTrackingConfig]' = None)¶
- fallback_config: FallbackConfig | None = None¶
Configuration for traffic fallback which auto fallbacks to other served entities if the request to a served entity fails with certain error codes, to increase availability.
- guardrails: AiGatewayGuardrails | None = None¶
Configuration for AI Guardrails to prevent unwanted data and unsafe data in requests and responses.
- inference_table_config: AiGatewayInferenceTableConfig | None = None¶
Configuration for payload logging using inference tables. Use these tables to monitor and audit data being sent to and received from model APIs and to improve model quality.
- rate_limits: List[AiGatewayRateLimit] | None = None¶
Configuration for rate limits which can be set to limit endpoint traffic.
- usage_tracking_config: AiGatewayUsageTrackingConfig | None = None¶
Configuration to enable usage tracking using system tables. These tables allow you to monitor operational usage on endpoints and their associated costs.
- as_dict() dict¶
Serializes the AiGatewayConfig into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the AiGatewayConfig into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) AiGatewayConfig¶
Deserializes the AiGatewayConfig from a dictionary.
- class databricks.sdk.service.serving.AiGatewayGuardrailParameters(invalid_keywords: 'Optional[List[str]]' = None, pii: 'Optional[AiGatewayGuardrailPiiBehavior]' = None, safety: 'Optional[bool]' = None, valid_topics: 'Optional[List[str]]' = None)¶
- invalid_keywords: List[str] | None = None¶
List of invalid keywords. AI guardrail uses keyword or string matching to decide if the keyword exists in the request or response content.
- pii: AiGatewayGuardrailPiiBehavior | None = None¶
Configuration for guardrail PII filter.
- safety: bool | None = None¶
Indicates whether the safety filter is enabled.
- valid_topics: List[str] | None = None¶
The list of allowed topics. Given a chat request, this guardrail flags the request if its topic is not in the allowed topics.
- as_dict() dict¶
Serializes the AiGatewayGuardrailParameters into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the AiGatewayGuardrailParameters into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) AiGatewayGuardrailParameters¶
Deserializes the AiGatewayGuardrailParameters from a dictionary.
- class databricks.sdk.service.serving.AiGatewayGuardrailPiiBehavior(behavior: 'Optional[AiGatewayGuardrailPiiBehaviorBehavior]' = None)¶
- behavior: AiGatewayGuardrailPiiBehaviorBehavior | None = None¶
Configuration for input guardrail filters.
- as_dict() dict¶
Serializes the AiGatewayGuardrailPiiBehavior into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the AiGatewayGuardrailPiiBehavior into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) AiGatewayGuardrailPiiBehavior¶
Deserializes the AiGatewayGuardrailPiiBehavior from a dictionary.
- class databricks.sdk.service.serving.AiGatewayGuardrailPiiBehaviorBehavior¶
- BLOCK = "BLOCK"¶
- MASK = "MASK"¶
- NONE = "NONE"¶
- class databricks.sdk.service.serving.AiGatewayGuardrails(input: 'Optional[AiGatewayGuardrailParameters]' = None, output: 'Optional[AiGatewayGuardrailParameters]' = None)¶
- input: AiGatewayGuardrailParameters | None = None¶
Configuration for input guardrail filters.
- output: AiGatewayGuardrailParameters | None = None¶
Configuration for output guardrail filters.
- as_dict() dict¶
Serializes the AiGatewayGuardrails into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the AiGatewayGuardrails into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) AiGatewayGuardrails¶
Deserializes the AiGatewayGuardrails from a dictionary.
- class databricks.sdk.service.serving.AiGatewayInferenceTableConfig(catalog_name: 'Optional[str]' = None, enabled: 'Optional[bool]' = None, schema_name: 'Optional[str]' = None, table_name_prefix: 'Optional[str]' = None)¶
- catalog_name: str | None = None¶
The name of the catalog in Unity Catalog. Required when enabling inference tables. NOTE: On update, you have to disable inference table first in order to change the catalog name.
- enabled: bool | None = None¶
Indicates whether the inference table is enabled.
- schema_name: str | None = None¶
The name of the schema in Unity Catalog. Required when enabling inference tables. NOTE: On update, you have to disable inference table first in order to change the schema name.
- table_name_prefix: str | None = None¶
The prefix of the table in Unity Catalog. NOTE: On update, you have to disable inference table first in order to change the prefix name.
- as_dict() dict¶
Serializes the AiGatewayInferenceTableConfig into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the AiGatewayInferenceTableConfig into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) AiGatewayInferenceTableConfig¶
Deserializes the AiGatewayInferenceTableConfig from a dictionary.
- class databricks.sdk.service.serving.AiGatewayRateLimit(renewal_period: 'AiGatewayRateLimitRenewalPeriod', calls: 'Optional[int]' = None, key: 'Optional[AiGatewayRateLimitKey]' = None, principal: 'Optional[str]' = None, tokens: 'Optional[int]' = None)¶
- renewal_period: AiGatewayRateLimitRenewalPeriod¶
Renewal period field for a rate limit. Currently, only ‘minute’ is supported.
- calls: int | None = None¶
Used to specify how many calls are allowed for a key within the renewal_period.
- key: AiGatewayRateLimitKey | None = None¶
Key field for a rate limit. Currently, ‘user’, ‘user_group, ‘service_principal’, and ‘endpoint’ are supported, with ‘endpoint’ being the default if not specified.
- principal: str | None = None¶
Principal field for a user, user group, or service principal to apply rate limiting to. Accepts a user email, group name, or service principal application ID.
- tokens: int | None = None¶
Used to specify how many tokens are allowed for a key within the renewal_period.
- as_dict() dict¶
Serializes the AiGatewayRateLimit into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the AiGatewayRateLimit into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) AiGatewayRateLimit¶
Deserializes the AiGatewayRateLimit from a dictionary.
- class databricks.sdk.service.serving.AiGatewayRateLimitKey¶
- ENDPOINT = "ENDPOINT"¶
- SERVICE_PRINCIPAL = "SERVICE_PRINCIPAL"¶
- USER = "USER"¶
- USER_GROUP = "USER_GROUP"¶
- class databricks.sdk.service.serving.AiGatewayUsageTrackingConfig(enabled: 'Optional[bool]' = None)¶
- enabled: bool | None = None¶
Whether to enable usage tracking.
- as_dict() dict¶
Serializes the AiGatewayUsageTrackingConfig into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the AiGatewayUsageTrackingConfig into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) AiGatewayUsageTrackingConfig¶
Deserializes the AiGatewayUsageTrackingConfig from a dictionary.
- class databricks.sdk.service.serving.AmazonBedrockConfig(aws_region: 'str', bedrock_provider: 'AmazonBedrockConfigBedrockProvider', aws_access_key_id: 'Optional[str]' = None, aws_access_key_id_plaintext: 'Optional[str]' = None, aws_secret_access_key: 'Optional[str]' = None, aws_secret_access_key_plaintext: 'Optional[str]' = None, instance_profile_arn: 'Optional[str]' = None)¶
- aws_region: str¶
The AWS region to use. Bedrock has to be enabled there.
- bedrock_provider: AmazonBedrockConfigBedrockProvider¶
The underlying provider in Amazon Bedrock. Supported values (case insensitive) include: Anthropic, Cohere, AI21Labs, Amazon.
- aws_access_key_id: str | None = None¶
The Databricks secret key reference for an AWS access key ID with permissions to interact with Bedrock services. If you prefer to paste your API key directly, see aws_access_key_id_plaintext. You must provide an API key using one of the following fields: aws_access_key_id or aws_access_key_id_plaintext.
- aws_access_key_id_plaintext: str | None = None¶
An AWS access key ID with permissions to interact with Bedrock services provided as a plaintext string. If you prefer to reference your key using Databricks Secrets, see aws_access_key_id. You must provide an API key using one of the following fields: aws_access_key_id or aws_access_key_id_plaintext.
- aws_secret_access_key: str | None = None¶
The Databricks secret key reference for an AWS secret access key paired with the access key ID, with permissions to interact with Bedrock services. If you prefer to paste your API key directly, see aws_secret_access_key_plaintext. You must provide an API key using one of the following fields: aws_secret_access_key or aws_secret_access_key_plaintext.
- aws_secret_access_key_plaintext: str | None = None¶
An AWS secret access key paired with the access key ID, with permissions to interact with Bedrock services provided as a plaintext string. If you prefer to reference your key using Databricks Secrets, see aws_secret_access_key. You must provide an API key using one of the following fields: aws_secret_access_key or aws_secret_access_key_plaintext.
- instance_profile_arn: str | None = None¶
ARN of the instance profile that the external model will use to access AWS resources. You must authenticate using an instance profile or access keys. If you prefer to authenticate using access keys, see aws_access_key_id, aws_access_key_id_plaintext, aws_secret_access_key and aws_secret_access_key_plaintext.
- as_dict() dict¶
Serializes the AmazonBedrockConfig into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the AmazonBedrockConfig into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) AmazonBedrockConfig¶
Deserializes the AmazonBedrockConfig from a dictionary.
- class databricks.sdk.service.serving.AmazonBedrockConfigBedrockProvider¶
- AI21LABS = "AI21LABS"¶
- AMAZON = "AMAZON"¶
- ANTHROPIC = "ANTHROPIC"¶
- COHERE = "COHERE"¶
- class databricks.sdk.service.serving.AnthropicConfig(anthropic_api_key: 'Optional[str]' = None, anthropic_api_key_plaintext: 'Optional[str]' = None)¶
- anthropic_api_key: str | None = None¶
The Databricks secret key reference for an Anthropic API key. If you prefer to paste your API key directly, see anthropic_api_key_plaintext. You must provide an API key using one of the following fields: anthropic_api_key or anthropic_api_key_plaintext.
- anthropic_api_key_plaintext: str | None = None¶
The Anthropic API key provided as a plaintext string. If you prefer to reference your key using Databricks Secrets, see anthropic_api_key. You must provide an API key using one of the following fields: anthropic_api_key or anthropic_api_key_plaintext.
- as_dict() dict¶
Serializes the AnthropicConfig into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the AnthropicConfig into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) AnthropicConfig¶
Deserializes the AnthropicConfig from a dictionary.
- class databricks.sdk.service.serving.ApiKeyAuth(key: 'str', value: 'Optional[str]' = None, value_plaintext: 'Optional[str]' = None)¶
- key: str¶
The name of the API key parameter used for authentication.
- value: str | None = None¶
The Databricks secret key reference for an API Key. If you prefer to paste your token directly, see value_plaintext.
- value_plaintext: str | None = None¶
The API Key provided as a plaintext string. If you prefer to reference your token using Databricks Secrets, see value.
- as_dict() dict¶
Serializes the ApiKeyAuth into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the ApiKeyAuth into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) ApiKeyAuth¶
Deserializes the ApiKeyAuth from a dictionary.
- class databricks.sdk.service.serving.AutoCaptureConfigInput(catalog_name: str | None = None, enabled: bool | None = None, schema_name: str | None = None, table_name_prefix: str | None = None)¶
Deprecated: legacy inference table configuration. Please use AI Gateway inference tables instead. See https://docs.databricks.com/aws/en/ai-gateway/inference-tables.
- catalog_name: str | None = None¶
The name of the catalog in Unity Catalog. NOTE: On update, you cannot change the catalog name if the inference table is already enabled.
- enabled: bool | None = None¶
Indicates whether the inference table is enabled.
- schema_name: str | None = None¶
The name of the schema in Unity Catalog. NOTE: On update, you cannot change the schema name if the inference table is already enabled.
- table_name_prefix: str | None = None¶
The prefix of the table in Unity Catalog. NOTE: On update, you cannot change the prefix name if the inference table is already enabled.
- as_dict() dict¶
Serializes the AutoCaptureConfigInput into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the AutoCaptureConfigInput into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) AutoCaptureConfigInput¶
Deserializes the AutoCaptureConfigInput from a dictionary.
- class databricks.sdk.service.serving.AutoCaptureConfigOutput(catalog_name: str | None = None, enabled: bool | None = None, schema_name: str | None = None, state: AutoCaptureState | None = None, table_name_prefix: str | None = None)¶
Deprecated: legacy inference table configuration. Please use AI Gateway inference tables instead. See https://docs.databricks.com/aws/en/ai-gateway/inference-tables.
- catalog_name: str | None = None¶
The name of the catalog in Unity Catalog. NOTE: On update, you cannot change the catalog name if the inference table is already enabled.
- enabled: bool | None = None¶
Indicates whether the inference table is enabled.
- schema_name: str | None = None¶
The name of the schema in Unity Catalog. NOTE: On update, you cannot change the schema name if the inference table is already enabled.
- state: AutoCaptureState | None = None¶
- table_name_prefix: str | None = None¶
The prefix of the table in Unity Catalog. NOTE: On update, you cannot change the prefix name if the inference table is already enabled.
- as_dict() dict¶
Serializes the AutoCaptureConfigOutput into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the AutoCaptureConfigOutput into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) AutoCaptureConfigOutput¶
Deserializes the AutoCaptureConfigOutput from a dictionary.
- class databricks.sdk.service.serving.AutoCaptureState(payload_table: 'Optional[PayloadTable]' = None)¶
- payload_table: PayloadTable | None = None¶
- as_dict() dict¶
Serializes the AutoCaptureState into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the AutoCaptureState into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) AutoCaptureState¶
Deserializes the AutoCaptureState from a dictionary.
- class databricks.sdk.service.serving.BearerTokenAuth(token: 'Optional[str]' = None, token_plaintext: 'Optional[str]' = None)¶
- token: str | None = None¶
The Databricks secret key reference for a token. If you prefer to paste your token directly, see token_plaintext.
- token_plaintext: str | None = None¶
The token provided as a plaintext string. If you prefer to reference your token using Databricks Secrets, see token.
- as_dict() dict¶
Serializes the BearerTokenAuth into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the BearerTokenAuth into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) BearerTokenAuth¶
Deserializes the BearerTokenAuth from a dictionary.
- class databricks.sdk.service.serving.BuildLogsResponse(logs: 'str')¶
- logs: str¶
The logs associated with building the served entity’s environment.
- as_dict() dict¶
Serializes the BuildLogsResponse into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the BuildLogsResponse into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) BuildLogsResponse¶
Deserializes the BuildLogsResponse from a dictionary.
- class databricks.sdk.service.serving.ChatMessage(content: 'Optional[str]' = None, role: 'Optional[ChatMessageRole]' = None)¶
- content: str | None = None¶
The content of the message.
- role: ChatMessageRole | None = None¶
The role of the message. One of [system, user, assistant].
- as_dict() dict¶
Serializes the ChatMessage into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the ChatMessage into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) ChatMessage¶
Deserializes the ChatMessage from a dictionary.
- class databricks.sdk.service.serving.ChatMessageRole¶
The role of the message. One of [system, user, assistant].
- ASSISTANT = "ASSISTANT"¶
- SYSTEM = "SYSTEM"¶
- USER = "USER"¶
- class databricks.sdk.service.serving.CohereConfig(cohere_api_base: 'Optional[str]' = None, cohere_api_key: 'Optional[str]' = None, cohere_api_key_plaintext: 'Optional[str]' = None)¶
- cohere_api_base: str | None = None¶
This is an optional field to provide a customized base URL for the Cohere API. If left unspecified, the standard Cohere base URL is used.
- cohere_api_key: str | None = None¶
The Databricks secret key reference for a Cohere API key. If you prefer to paste your API key directly, see cohere_api_key_plaintext. You must provide an API key using one of the following fields: cohere_api_key or cohere_api_key_plaintext.
- cohere_api_key_plaintext: str | None = None¶
The Cohere API key provided as a plaintext string. If you prefer to reference your key using Databricks Secrets, see cohere_api_key. You must provide an API key using one of the following fields: cohere_api_key or cohere_api_key_plaintext.
- as_dict() dict¶
Serializes the CohereConfig into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the CohereConfig into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) CohereConfig¶
Deserializes the CohereConfig from a dictionary.
- class databricks.sdk.service.serving.CustomProviderConfig(custom_provider_url: str, api_key_auth: ApiKeyAuth | None = None, bearer_token_auth: BearerTokenAuth | None = None)¶
Configs needed to create a custom provider model route.
- custom_provider_url: str¶
This is a field to provide the URL of the custom provider API.
- api_key_auth: ApiKeyAuth | None = None¶
This is a field to provide API key authentication for the custom provider API. You can only specify one authentication method.
- bearer_token_auth: BearerTokenAuth | None = None¶
This is a field to provide bearer token authentication for the custom provider API. You can only specify one authentication method.
- as_dict() dict¶
Serializes the CustomProviderConfig into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the CustomProviderConfig into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) CustomProviderConfig¶
Deserializes the CustomProviderConfig from a dictionary.
- class databricks.sdk.service.serving.DataPlaneInfo(authorization_details: str | None = None, endpoint_url: str | None = None)¶
Details necessary to query this object’s API through the DataPlane APIs.
- authorization_details: str | None = None¶
Authorization details as a string.
- endpoint_url: str | None = None¶
The URL of the endpoint for this operation in the dataplane.
- as_dict() dict¶
Serializes the DataPlaneInfo into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the DataPlaneInfo into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) DataPlaneInfo¶
Deserializes the DataPlaneInfo from a dictionary.
- class databricks.sdk.service.serving.DatabricksModelServingConfig(databricks_workspace_url: 'str', databricks_api_token: 'Optional[str]' = None, databricks_api_token_plaintext: 'Optional[str]' = None)¶
- databricks_workspace_url: str¶
The URL of the Databricks workspace containing the model serving endpoint pointed to by this external model.
- databricks_api_token: str | None = None¶
The Databricks secret key reference for a Databricks API token that corresponds to a user or service principal with Can Query access to the model serving endpoint pointed to by this external model. If you prefer to paste your API key directly, see databricks_api_token_plaintext. You must provide an API key using one of the following fields: databricks_api_token or databricks_api_token_plaintext.
- databricks_api_token_plaintext: str | None = None¶
The Databricks API token that corresponds to a user or service principal with Can Query access to the model serving endpoint pointed to by this external model provided as a plaintext string. If you prefer to reference your key using Databricks Secrets, see databricks_api_token. You must provide an API key using one of the following fields: databricks_api_token or databricks_api_token_plaintext.
- as_dict() dict¶
Serializes the DatabricksModelServingConfig into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the DatabricksModelServingConfig into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) DatabricksModelServingConfig¶
Deserializes the DatabricksModelServingConfig from a dictionary.
- class databricks.sdk.service.serving.DataframeSplitInput(columns: 'Optional[List[Any]]' = None, data: 'Optional[List[Any]]' = None, index: 'Optional[List[int]]' = None)¶
- columns: List[Any] | None = None¶
Columns array for the dataframe
- data: List[Any] | None = None¶
Data array for the dataframe
- index: List[int] | None = None¶
Index array for the dataframe
- as_dict() dict¶
Serializes the DataframeSplitInput into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the DataframeSplitInput into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) DataframeSplitInput¶
Deserializes the DataframeSplitInput from a dictionary.
- class databricks.sdk.service.serving.EmailNotifications(on_update_failure: 'Optional[List[str]]' = None, on_update_success: 'Optional[List[str]]' = None)¶
- on_update_failure: List[str] | None = None¶
A list of email addresses to be notified when an endpoint fails to update its configuration or state.
- on_update_success: List[str] | None = None¶
A list of email addresses to be notified when an endpoint successfully updates its configuration or state.
- as_dict() dict¶
Serializes the EmailNotifications into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the EmailNotifications into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) EmailNotifications¶
Deserializes the EmailNotifications from a dictionary.
- class databricks.sdk.service.serving.EmbeddingsV1ResponseEmbeddingElement(embedding: 'Optional[List[float]]' = None, index: 'Optional[int]' = None, object: 'Optional[EmbeddingsV1ResponseEmbeddingElementObject]' = None)¶
- embedding: List[float] | None = None¶
The embedding vector
- index: int | None = None¶
The index of the embedding in the response.
- object: EmbeddingsV1ResponseEmbeddingElementObject | None = None¶
This will always be ‘embedding’.
- as_dict() dict¶
Serializes the EmbeddingsV1ResponseEmbeddingElement into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the EmbeddingsV1ResponseEmbeddingElement into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) EmbeddingsV1ResponseEmbeddingElement¶
Deserializes the EmbeddingsV1ResponseEmbeddingElement from a dictionary.
- class databricks.sdk.service.serving.EmbeddingsV1ResponseEmbeddingElementObject¶
This will always be ‘embedding’.
- EMBEDDING = "EMBEDDING"¶
- class databricks.sdk.service.serving.EndpointCoreConfigInput(name: 'str', auto_capture_config: 'Optional[AutoCaptureConfigInput]' = None, served_entities: 'Optional[List[ServedEntityInput]]' = None, served_models: 'Optional[List[ServedModelInput]]' = None, traffic_config: 'Optional[TrafficConfig]' = None)¶
- name: str¶
The name of the serving endpoint to update. This field is required.
- auto_capture_config: AutoCaptureConfigInput | None = None¶
Configuration for legacy Inference Tables which automatically log requests and responses to Unity Catalog. Deprecated: please use AI Gateway inference tables instead. See https://docs.databricks.com/aws/en/ai-gateway/inference-tables.
- served_entities: List[ServedEntityInput] | None = None¶
The list of served entities under the serving endpoint config.
- served_models: List[ServedModelInput] | None = None¶
(Deprecated, use served_entities instead) The list of served models under the serving endpoint config.
- traffic_config: TrafficConfig | None = None¶
The traffic configuration associated with the serving endpoint config.
- as_dict() dict¶
Serializes the EndpointCoreConfigInput into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the EndpointCoreConfigInput into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) EndpointCoreConfigInput¶
Deserializes the EndpointCoreConfigInput from a dictionary.
- class databricks.sdk.service.serving.EndpointCoreConfigOutput(auto_capture_config: 'Optional[AutoCaptureConfigOutput]' = None, config_version: 'Optional[int]' = None, served_entities: 'Optional[List[ServedEntityOutput]]' = None, served_models: 'Optional[List[ServedModelOutput]]' = None, traffic_config: 'Optional[TrafficConfig]' = None)¶
- auto_capture_config: AutoCaptureConfigOutput | None = None¶
Configuration for legacy Inference Tables which automatically log requests and responses to Unity Catalog. Deprecated: please use AI Gateway inference tables instead. See https://docs.databricks.com/aws/en/ai-gateway/inference-tables.
- config_version: int | None = None¶
The config version that the serving endpoint is currently serving.
- served_entities: List[ServedEntityOutput] | None = None¶
The list of served entities under the serving endpoint config.
- served_models: List[ServedModelOutput] | None = None¶
(Deprecated, use served_entities instead) The list of served models under the serving endpoint config.
- traffic_config: TrafficConfig | None = None¶
The traffic configuration associated with the serving endpoint config.
- as_dict() dict¶
Serializes the EndpointCoreConfigOutput into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the EndpointCoreConfigOutput into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) EndpointCoreConfigOutput¶
Deserializes the EndpointCoreConfigOutput from a dictionary.
- class databricks.sdk.service.serving.EndpointCoreConfigSummary(served_entities: 'Optional[List[ServedEntitySpec]]' = None, served_models: 'Optional[List[ServedModelSpec]]' = None)¶
- served_entities: List[ServedEntitySpec] | None = None¶
The list of served entities under the serving endpoint config.
- served_models: List[ServedModelSpec] | None = None¶
(Deprecated, use served_entities instead) The list of served models under the serving endpoint config.
- as_dict() dict¶
Serializes the EndpointCoreConfigSummary into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the EndpointCoreConfigSummary into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) EndpointCoreConfigSummary¶
Deserializes the EndpointCoreConfigSummary from a dictionary.
- class databricks.sdk.service.serving.EndpointPendingConfig(auto_capture_config: 'Optional[AutoCaptureConfigOutput]' = None, config_version: 'Optional[int]' = None, served_entities: 'Optional[List[ServedEntityOutput]]' = None, served_models: 'Optional[List[ServedModelOutput]]' = None, start_time: 'Optional[int]' = None, traffic_config: 'Optional[TrafficConfig]' = None)¶
- auto_capture_config: AutoCaptureConfigOutput | None = None¶
Configuration for legacy Inference Tables which automatically log requests and responses to Unity Catalog. Deprecated: please use AI Gateway inference tables instead. See https://docs.databricks.com/aws/en/ai-gateway/inference-tables.
- config_version: int | None = None¶
The config version that the serving endpoint is currently serving.
- served_entities: List[ServedEntityOutput] | None = None¶
The list of served entities belonging to the last issued update to the serving endpoint.
- served_models: List[ServedModelOutput] | None = None¶
(Deprecated, use served_entities instead) The list of served models belonging to the last issued update to the serving endpoint.
- start_time: int | None = None¶
The timestamp when the update to the pending config started.
- traffic_config: TrafficConfig | None = None¶
The traffic config defining how invocations to the serving endpoint should be routed.
- as_dict() dict¶
Serializes the EndpointPendingConfig into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the EndpointPendingConfig into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) EndpointPendingConfig¶
Deserializes the EndpointPendingConfig from a dictionary.
- class databricks.sdk.service.serving.EndpointState(config_update: 'Optional[EndpointStateConfigUpdate]' = None, ready: 'Optional[EndpointStateReady]' = None)¶
- config_update: EndpointStateConfigUpdate | None = None¶
The state of an endpoint’s config update. This informs the user if the pending_config is in progress, if the update failed, or if there is no update in progress. Note that if the endpoint’s config_update state value is IN_PROGRESS, another update can not be made until the update completes or fails.
- ready: EndpointStateReady | None = None¶
The state of an endpoint, indicating whether or not the endpoint is queryable. An endpoint is READY if all of the served entities in its active configuration are ready. If any of the actively served entities are in a non-ready state, the endpoint state will be NOT_READY.
- as_dict() dict¶
Serializes the EndpointState into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the EndpointState into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) EndpointState¶
Deserializes the EndpointState from a dictionary.
- class databricks.sdk.service.serving.EndpointStateConfigUpdate¶
- IN_PROGRESS = "IN_PROGRESS"¶
- NOT_UPDATING = "NOT_UPDATING"¶
- UPDATE_CANCELED = "UPDATE_CANCELED"¶
- UPDATE_FAILED = "UPDATE_FAILED"¶
- class databricks.sdk.service.serving.EndpointTag(key: 'str', value: 'Optional[str]' = None)¶
- key: str¶
Key field for a serving endpoint tag.
- value: str | None = None¶
Optional value field for a serving endpoint tag.
- as_dict() dict¶
Serializes the EndpointTag into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the EndpointTag into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) EndpointTag¶
Deserializes the EndpointTag from a dictionary.
- class databricks.sdk.service.serving.EndpointTags(tags: 'Optional[List[EndpointTag]]' = None)¶
- tags: List[EndpointTag] | None = None¶
- as_dict() dict¶
Serializes the EndpointTags into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the EndpointTags into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) EndpointTags¶
Deserializes the EndpointTags from a dictionary.
- class databricks.sdk.service.serving.ExportMetricsResponse(contents: 'Optional[BinaryIO]' = None)¶
- contents: BinaryIO | None = None¶
- as_dict() dict¶
Serializes the ExportMetricsResponse into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the ExportMetricsResponse into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) ExportMetricsResponse¶
Deserializes the ExportMetricsResponse from a dictionary.
- class databricks.sdk.service.serving.ExternalFunctionRequestHttpMethod¶
- DELETE = "DELETE"¶
- GET = "GET"¶
- PATCH = "PATCH"¶
- POST = "POST"¶
- PUT = "PUT"¶
- class databricks.sdk.service.serving.ExternalModel(provider: 'ExternalModelProvider', name: 'str', task: 'str', ai21labs_config: 'Optional[Ai21LabsConfig]' = None, amazon_bedrock_config: 'Optional[AmazonBedrockConfig]' = None, anthropic_config: 'Optional[AnthropicConfig]' = None, cohere_config: 'Optional[CohereConfig]' = None, custom_provider_config: 'Optional[CustomProviderConfig]' = None, databricks_model_serving_config: 'Optional[DatabricksModelServingConfig]' = None, google_cloud_vertex_ai_config: 'Optional[GoogleCloudVertexAiConfig]' = None, openai_config: 'Optional[OpenAiConfig]' = None, palm_config: 'Optional[PaLmConfig]' = None)¶
- provider: ExternalModelProvider¶
The name of the provider for the external model. Currently, the supported providers are ‘ai21labs’, ‘anthropic’, ‘amazon-bedrock’, ‘cohere’, ‘databricks-model-serving’, ‘google-cloud-vertex-ai’, ‘openai’, ‘palm’, and ‘custom’.
- name: str¶
The name of the external model.
- task: str¶
The task type of the external model.
- ai21labs_config: Ai21LabsConfig | None = None¶
AI21Labs Config. Only required if the provider is ‘ai21labs’.
- amazon_bedrock_config: AmazonBedrockConfig | None = None¶
Amazon Bedrock Config. Only required if the provider is ‘amazon-bedrock’.
- anthropic_config: AnthropicConfig | None = None¶
Anthropic Config. Only required if the provider is ‘anthropic’.
- cohere_config: CohereConfig | None = None¶
Cohere Config. Only required if the provider is ‘cohere’.
- custom_provider_config: CustomProviderConfig | None = None¶
Custom Provider Config. Only required if the provider is ‘custom’.
- databricks_model_serving_config: DatabricksModelServingConfig | None = None¶
Databricks Model Serving Config. Only required if the provider is ‘databricks-model-serving’.
- google_cloud_vertex_ai_config: GoogleCloudVertexAiConfig | None = None¶
Google Cloud Vertex AI Config. Only required if the provider is ‘google-cloud-vertex-ai’.
- openai_config: OpenAiConfig | None = None¶
OpenAI Config. Only required if the provider is ‘openai’.
- palm_config: PaLmConfig | None = None¶
PaLM Config. Only required if the provider is ‘palm’.
- as_dict() dict¶
Serializes the ExternalModel into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the ExternalModel into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) ExternalModel¶
Deserializes the ExternalModel from a dictionary.
- class databricks.sdk.service.serving.ExternalModelProvider¶
- AI21LABS = "AI21LABS"¶
- AMAZON_BEDROCK = "AMAZON_BEDROCK"¶
- ANTHROPIC = "ANTHROPIC"¶
- COHERE = "COHERE"¶
- CUSTOM = "CUSTOM"¶
- DATABRICKS_MODEL_SERVING = "DATABRICKS_MODEL_SERVING"¶
- GOOGLE_CLOUD_VERTEX_AI = "GOOGLE_CLOUD_VERTEX_AI"¶
- OPENAI = "OPENAI"¶
- PALM = "PALM"¶
- class databricks.sdk.service.serving.ExternalModelUsageElement(completion_tokens: 'Optional[int]' = None, prompt_tokens: 'Optional[int]' = None, total_tokens: 'Optional[int]' = None)¶
- completion_tokens: int | None = None¶
The number of tokens in the chat/completions response.
- prompt_tokens: int | None = None¶
The number of tokens in the prompt.
- total_tokens: int | None = None¶
The total number of tokens in the prompt and response.
- as_dict() dict¶
Serializes the ExternalModelUsageElement into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the ExternalModelUsageElement into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) ExternalModelUsageElement¶
Deserializes the ExternalModelUsageElement from a dictionary.
- class databricks.sdk.service.serving.FallbackConfig(enabled: 'bool')¶
- enabled: bool¶
Whether to enable traffic fallback. When a served entity in the serving endpoint returns specific error codes (e.g. 500), the request will automatically be round-robin attempted with other served entities in the same endpoint, following the order of served entity list, until a successful response is returned. If all attempts fail, return the last response with the error code.
- as_dict() dict¶
Serializes the FallbackConfig into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the FallbackConfig into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) FallbackConfig¶
Deserializes the FallbackConfig from a dictionary.
- class databricks.sdk.service.serving.FoundationModel(description: str | None = None, display_name: str | None = None, docs: str | None = None, name: str | None = None)¶
All fields are not sensitive as they are hard-coded in the system and made available to customers.
- description: str | None = None¶
- display_name: str | None = None¶
- docs: str | None = None¶
- name: str | None = None¶
- as_dict() dict¶
Serializes the FoundationModel into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the FoundationModel into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) FoundationModel¶
Deserializes the FoundationModel from a dictionary.
- class databricks.sdk.service.serving.GetOpenApiResponse(contents: 'Optional[BinaryIO]' = None)¶
- contents: BinaryIO | None = None¶
- as_dict() dict¶
Serializes the GetOpenApiResponse into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the GetOpenApiResponse into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) GetOpenApiResponse¶
Deserializes the GetOpenApiResponse from a dictionary.
- class databricks.sdk.service.serving.GetServingEndpointPermissionLevelsResponse(permission_levels: 'Optional[List[ServingEndpointPermissionsDescription]]' = None)¶
- permission_levels: List[ServingEndpointPermissionsDescription] | None = None¶
Specific permission levels
- as_dict() dict¶
Serializes the GetServingEndpointPermissionLevelsResponse into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the GetServingEndpointPermissionLevelsResponse into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) GetServingEndpointPermissionLevelsResponse¶
Deserializes the GetServingEndpointPermissionLevelsResponse from a dictionary.
- class databricks.sdk.service.serving.GoogleCloudVertexAiConfig(project_id: 'str', region: 'str', private_key: 'Optional[str]' = None, private_key_plaintext: 'Optional[str]' = None)¶
- project_id: str¶
This is the Google Cloud project id that the service account is associated with.
- region: str¶
This is the region for the Google Cloud Vertex AI Service. See [supported regions] for more details. Some models are only available in specific regions.
[supported regions]: https://cloud.google.com/vertex-ai/docs/general/locations
- private_key: str | None = None¶
The Databricks secret key reference for a private key for the service account which has access to the Google Cloud Vertex AI Service. See [Best practices for managing service account keys]. If you prefer to paste your API key directly, see private_key_plaintext. You must provide an API key using one of the following fields: private_key or private_key_plaintext
[Best practices for managing service account keys]: https://cloud.google.com/iam/docs/best-practices-for-managing-service-account-keys
- private_key_plaintext: str | None = None¶
The private key for the service account which has access to the Google Cloud Vertex AI Service provided as a plaintext secret. See [Best practices for managing service account keys]. If you prefer to reference your key using Databricks Secrets, see private_key. You must provide an API key using one of the following fields: private_key or private_key_plaintext.
[Best practices for managing service account keys]: https://cloud.google.com/iam/docs/best-practices-for-managing-service-account-keys
- as_dict() dict¶
Serializes the GoogleCloudVertexAiConfig into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the GoogleCloudVertexAiConfig into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) GoogleCloudVertexAiConfig¶
Deserializes the GoogleCloudVertexAiConfig from a dictionary.
- class databricks.sdk.service.serving.HttpRequestResponse(contents: 'Optional[BinaryIO]' = None)¶
- contents: BinaryIO | None = None¶
- as_dict() dict¶
Serializes the HttpRequestResponse into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the HttpRequestResponse into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) HttpRequestResponse¶
Deserializes the HttpRequestResponse from a dictionary.
- class databricks.sdk.service.serving.ListEndpointsResponse(endpoints: 'Optional[List[ServingEndpoint]]' = None)¶
- endpoints: List[ServingEndpoint] | None = None¶
The list of endpoints.
- as_dict() dict¶
Serializes the ListEndpointsResponse into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the ListEndpointsResponse into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) ListEndpointsResponse¶
Deserializes the ListEndpointsResponse from a dictionary.
- class databricks.sdk.service.serving.ModelDataPlaneInfo(query_info: DataPlaneInfo | None = None)¶
A representation of all DataPlaneInfo for operations that can be done on a model through Data Plane APIs.
- query_info: DataPlaneInfo | None = None¶
Information required to query DataPlane API ‘query’ endpoint.
- as_dict() dict¶
Serializes the ModelDataPlaneInfo into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the ModelDataPlaneInfo into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) ModelDataPlaneInfo¶
Deserializes the ModelDataPlaneInfo from a dictionary.
- class databricks.sdk.service.serving.OpenAiConfig(microsoft_entra_client_id: str | None = None, microsoft_entra_client_secret: str | None = None, microsoft_entra_client_secret_plaintext: str | None = None, microsoft_entra_tenant_id: str | None = None, openai_api_base: str | None = None, openai_api_key: str | None = None, openai_api_key_plaintext: str | None = None, openai_api_type: str | None = None, openai_api_version: str | None = None, openai_deployment_name: str | None = None, openai_organization: str | None = None)¶
Configs needed to create an OpenAI model route.
- microsoft_entra_client_id: str | None = None¶
This field is only required for Azure AD OpenAI and is the Microsoft Entra Client ID.
- microsoft_entra_client_secret: str | None = None¶
The Databricks secret key reference for a client secret used for Microsoft Entra ID authentication. If you prefer to paste your client secret directly, see microsoft_entra_client_secret_plaintext. You must provide an API key using one of the following fields: microsoft_entra_client_secret or microsoft_entra_client_secret_plaintext.
- microsoft_entra_client_secret_plaintext: str | None = None¶
The client secret used for Microsoft Entra ID authentication provided as a plaintext string. If you prefer to reference your key using Databricks Secrets, see microsoft_entra_client_secret. You must provide an API key using one of the following fields: microsoft_entra_client_secret or microsoft_entra_client_secret_plaintext.
- microsoft_entra_tenant_id: str | None = None¶
This field is only required for Azure AD OpenAI and is the Microsoft Entra Tenant ID.
- openai_api_base: str | None = None¶
This is a field to provide a customized base URl for the OpenAI API. For Azure OpenAI, this field is required, and is the base URL for the Azure OpenAI API service provided by Azure. For other OpenAI API types, this field is optional, and if left unspecified, the standard OpenAI base URL is used.
- openai_api_key: str | None = None¶
The Databricks secret key reference for an OpenAI API key using the OpenAI or Azure service. If you prefer to paste your API key directly, see openai_api_key_plaintext. You must provide an API key using one of the following fields: openai_api_key or openai_api_key_plaintext.
- openai_api_key_plaintext: str | None = None¶
The OpenAI API key using the OpenAI or Azure service provided as a plaintext string. If you prefer to reference your key using Databricks Secrets, see openai_api_key. You must provide an API key using one of the following fields: openai_api_key or openai_api_key_plaintext.
- openai_api_type: str | None = None¶
This is an optional field to specify the type of OpenAI API to use. For Azure OpenAI, this field is required, and adjust this parameter to represent the preferred security access validation protocol. For access token validation, use azure. For authentication using Azure Active Directory (Azure AD) use, azuread.
- openai_api_version: str | None = None¶
This is an optional field to specify the OpenAI API version. For Azure OpenAI, this field is required, and is the version of the Azure OpenAI service to utilize, specified by a date.
- openai_deployment_name: str | None = None¶
This field is only required for Azure OpenAI and is the name of the deployment resource for the Azure OpenAI service.
- openai_organization: str | None = None¶
This is an optional field to specify the organization in OpenAI or Azure OpenAI.
- as_dict() dict¶
Serializes the OpenAiConfig into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the OpenAiConfig into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) OpenAiConfig¶
Deserializes the OpenAiConfig from a dictionary.
- class databricks.sdk.service.serving.PaLmConfig(palm_api_key: 'Optional[str]' = None, palm_api_key_plaintext: 'Optional[str]' = None)¶
- palm_api_key: str | None = None¶
The Databricks secret key reference for a PaLM API key. If you prefer to paste your API key directly, see palm_api_key_plaintext. You must provide an API key using one of the following fields: palm_api_key or palm_api_key_plaintext.
- palm_api_key_plaintext: str | None = None¶
The PaLM API key provided as a plaintext string. If you prefer to reference your key using Databricks Secrets, see palm_api_key. You must provide an API key using one of the following fields: palm_api_key or palm_api_key_plaintext.
- as_dict() dict¶
Serializes the PaLmConfig into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the PaLmConfig into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) PaLmConfig¶
Deserializes the PaLmConfig from a dictionary.
- class databricks.sdk.service.serving.PayloadTable(name: 'Optional[str]' = None, status: 'Optional[str]' = None, status_message: 'Optional[str]' = None)¶
- name: str | None = None¶
- status: str | None = None¶
- status_message: str | None = None¶
- as_dict() dict¶
Serializes the PayloadTable into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the PayloadTable into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) PayloadTable¶
Deserializes the PayloadTable from a dictionary.
- class databricks.sdk.service.serving.PtEndpointCoreConfig(served_entities: 'Optional[List[PtServedModel]]' = None, traffic_config: 'Optional[TrafficConfig]' = None)¶
- served_entities: List[PtServedModel] | None = None¶
The list of served entities under the serving endpoint config.
- traffic_config: TrafficConfig | None = None¶
- as_dict() dict¶
Serializes the PtEndpointCoreConfig into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the PtEndpointCoreConfig into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) PtEndpointCoreConfig¶
Deserializes the PtEndpointCoreConfig from a dictionary.
- class databricks.sdk.service.serving.PtServedModel(entity_name: 'str', provisioned_model_units: 'int', burst_scaling_enabled: 'Optional[bool]' = None, entity_version: 'Optional[str]' = None, name: 'Optional[str]' = None)¶
- entity_name: str¶
The name of the entity to be served. The entity may be a model in the Databricks Model Registry, a model in the Unity Catalog (UC), or a function of type FEATURE_SPEC in the UC. If it is a UC object, the full name of the object should be given in the form of catalog_name.schema_name.model_name.
- provisioned_model_units: int¶
The number of model units to be provisioned.
- burst_scaling_enabled: bool | None = None¶
Whether burst scaling is enabled. When enabled (default), the endpoint can automatically scale up beyond provisioned capacity to handle traffic spikes. When disabled, the endpoint maintains fixed capacity at provisioned_model_units.
- entity_version: str | None = None¶
- name: str | None = None¶
The name of a served entity. It must be unique across an endpoint. A served entity name can consist of alphanumeric characters, dashes, and underscores. If not specified for an external model, this field defaults to external_model.name, with ‘.’ and ‘:’ replaced with ‘-’, and if not specified for other entities, it defaults to entity_name-entity_version.
- as_dict() dict¶
Serializes the PtServedModel into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the PtServedModel into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) PtServedModel¶
Deserializes the PtServedModel from a dictionary.
- class databricks.sdk.service.serving.PutAiGatewayResponse(fallback_config: 'Optional[FallbackConfig]' = None, guardrails: 'Optional[AiGatewayGuardrails]' = None, inference_table_config: 'Optional[AiGatewayInferenceTableConfig]' = None, rate_limits: 'Optional[List[AiGatewayRateLimit]]' = None, usage_tracking_config: 'Optional[AiGatewayUsageTrackingConfig]' = None)¶
- fallback_config: FallbackConfig | None = None¶
Configuration for traffic fallback which auto fallbacks to other served entities if the request to a served entity fails with certain error codes, to increase availability.
- guardrails: AiGatewayGuardrails | None = None¶
Configuration for AI Guardrails to prevent unwanted data and unsafe data in requests and responses.
- inference_table_config: AiGatewayInferenceTableConfig | None = None¶
Configuration for payload logging using inference tables. Use these tables to monitor and audit data being sent to and received from model APIs and to improve model quality.
- rate_limits: List[AiGatewayRateLimit] | None = None¶
Configuration for rate limits which can be set to limit endpoint traffic.
- usage_tracking_config: AiGatewayUsageTrackingConfig | None = None¶
Configuration to enable usage tracking using system tables. These tables allow you to monitor operational usage on endpoints and their associated costs.
- as_dict() dict¶
Serializes the PutAiGatewayResponse into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the PutAiGatewayResponse into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) PutAiGatewayResponse¶
Deserializes the PutAiGatewayResponse from a dictionary.
- class databricks.sdk.service.serving.PutResponse(rate_limits: 'Optional[List[RateLimit]]' = None)¶
-
- as_dict() dict¶
Serializes the PutResponse into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the PutResponse into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) PutResponse¶
Deserializes the PutResponse from a dictionary.
- class databricks.sdk.service.serving.QueryEndpointResponse(choices: 'Optional[List[V1ResponseChoiceElement]]' = None, created: 'Optional[int]' = None, data: 'Optional[List[EmbeddingsV1ResponseEmbeddingElement]]' = None, id: 'Optional[str]' = None, model: 'Optional[str]' = None, object: 'Optional[QueryEndpointResponseObject]' = None, outputs: 'Optional[List[any]]' = None, predictions: 'Optional[List[Any]]' = None, served_model_name: 'Optional[str]' = None, usage: 'Optional[ExternalModelUsageElement]' = None)¶
- choices: List[V1ResponseChoiceElement] | None = None¶
The list of choices returned by the __chat or completions external/foundation model__ serving endpoint.
- created: int | None = None¶
The timestamp in seconds when the query was created in Unix time returned by a __completions or chat external/foundation model__ serving endpoint.
- data: List[EmbeddingsV1ResponseEmbeddingElement] | None = None¶
The list of the embeddings returned by the __embeddings external/foundation model__ serving endpoint.
- id: str | None = None¶
The ID of the query that may be returned by a __completions or chat external/foundation model__ serving endpoint.
- model: str | None = None¶
The name of the __external/foundation model__ used for querying. This is the name of the model that was specified in the endpoint config.
- object: QueryEndpointResponseObject | None = None¶
The type of object returned by the __external/foundation model__ serving endpoint, one of [text_completion, chat.completion, list (of embeddings)].
- outputs: List[any] | None = None¶
The outputs of the feature serving endpoint.
- predictions: List[Any] | None = None¶
The predictions returned by the serving endpoint.
- served_model_name: str | None = None¶
The name of the served model that served the request. This is useful when there are multiple models behind the same endpoint with traffic split.
- usage: ExternalModelUsageElement | None = None¶
The usage object that may be returned by the __external/foundation model__ serving endpoint. This contains information about the number of tokens used in the prompt and response.
- as_dict() dict¶
Serializes the QueryEndpointResponse into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the QueryEndpointResponse into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) QueryEndpointResponse¶
Deserializes the QueryEndpointResponse from a dictionary.
- class databricks.sdk.service.serving.QueryEndpointResponseObject¶
The type of object returned by the __external/foundation model__ serving endpoint, one of [text_completion, chat.completion, list (of embeddings)].
- CHAT_COMPLETION = "CHAT_COMPLETION"¶
- LIST = "LIST"¶
- TEXT_COMPLETION = "TEXT_COMPLETION"¶
- class databricks.sdk.service.serving.RateLimit(calls: 'int', renewal_period: 'RateLimitRenewalPeriod', key: 'Optional[RateLimitKey]' = None)¶
- calls: int¶
Used to specify how many calls are allowed for a key within the renewal_period.
- renewal_period: RateLimitRenewalPeriod¶
Renewal period field for a serving endpoint rate limit. Currently, only ‘minute’ is supported.
- key: RateLimitKey | None = None¶
Key field for a serving endpoint rate limit. Currently, only ‘user’ and ‘endpoint’ are supported, with ‘endpoint’ being the default if not specified.
- as_dict() dict¶
Serializes the RateLimit into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the RateLimit into a shallow dictionary of its immediate attributes.
- class databricks.sdk.service.serving.Route(traffic_percentage: 'int', served_entity_name: 'Optional[str]' = None, served_model_name: 'Optional[str]' = None)¶
- traffic_percentage: int¶
The percentage of endpoint traffic to send to this route. It must be an integer between 0 and 100 inclusive.
- served_entity_name: str | None = None¶
- served_model_name: str | None = None¶
The name of the served model this route configures traffic for.
- as_dict() dict¶
Serializes the Route into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the Route into a shallow dictionary of its immediate attributes.
- class databricks.sdk.service.serving.ServedEntityInput(burst_scaling_enabled: 'Optional[bool]' = None, entity_name: 'Optional[str]' = None, entity_version: 'Optional[str]' = None, environment_vars: 'Optional[Dict[str, str]]' = None, external_model: 'Optional[ExternalModel]' = None, instance_profile_arn: 'Optional[str]' = None, max_provisioned_concurrency: 'Optional[int]' = None, max_provisioned_throughput: 'Optional[int]' = None, min_provisioned_concurrency: 'Optional[int]' = None, min_provisioned_throughput: 'Optional[int]' = None, name: 'Optional[str]' = None, provisioned_model_units: 'Optional[int]' = None, scale_to_zero_enabled: 'Optional[bool]' = None, workload_size: 'Optional[str]' = None, workload_type: 'Optional[ServingModelWorkloadType]' = None)¶
- burst_scaling_enabled: bool | None = None¶
Whether burst scaling is enabled. When enabled (default), the endpoint can automatically scale up beyond provisioned capacity to handle traffic spikes. When disabled, the endpoint maintains fixed capacity at provisioned_model_units.
- entity_name: str | None = None¶
The name of the entity to be served. The entity may be a model in the Databricks Model Registry, a model in the Unity Catalog (UC), or a function of type FEATURE_SPEC in the UC. If it is a UC object, the full name of the object should be given in the form of catalog_name.schema_name.model_name.
- entity_version: str | None = None¶
- environment_vars: Dict[str, str] | None = None¶
An object containing a set of optional, user-specified environment variable key-value pairs used for serving this entity. Note: this is an experimental feature and subject to change. Example entity environment variables that refer to Databricks secrets: {“OPENAI_API_KEY”: “{{secrets/my_scope/my_key}}”, “DATABRICKS_TOKEN”: “{{secrets/my_scope2/my_key2}}”}
- external_model: ExternalModel | None = None¶
The external model to be served. NOTE: Only one of external_model and (entity_name, entity_version, workload_size, workload_type, and scale_to_zero_enabled) can be specified with the latter set being used for custom model serving for a Databricks registered model. For an existing endpoint with external_model, it cannot be updated to an endpoint without external_model. If the endpoint is created without external_model, users cannot update it to add external_model later. The task type of all external models within an endpoint must be the same.
- instance_profile_arn: str | None = None¶
ARN of the instance profile that the served entity uses to access AWS resources.
- max_provisioned_concurrency: int | None = None¶
The maximum provisioned concurrency that the endpoint can scale up to. Do not use if workload_size is specified.
- max_provisioned_throughput: int | None = None¶
The maximum tokens per second that the endpoint can scale up to.
- min_provisioned_concurrency: int | None = None¶
The minimum provisioned concurrency that the endpoint can scale down to. Do not use if workload_size is specified.
- min_provisioned_throughput: int | None = None¶
The minimum tokens per second that the endpoint can scale down to.
- name: str | None = None¶
The name of a served entity. It must be unique across an endpoint. A served entity name can consist of alphanumeric characters, dashes, and underscores. If not specified for an external model, this field defaults to external_model.name, with ‘.’ and ‘:’ replaced with ‘-’, and if not specified for other entities, it defaults to entity_name-entity_version.
- provisioned_model_units: int | None = None¶
The number of model units provisioned.
- scale_to_zero_enabled: bool | None = None¶
Whether the compute resources for the served entity should scale down to zero.
- workload_size: str | None = None¶
The workload size of the served entity. The workload size corresponds to a range of provisioned concurrency that the compute autoscales between. A single unit of provisioned concurrency can process one request at a time. Valid workload sizes are “Small” (4 - 4 provisioned concurrency), “Medium” (8 - 16 provisioned concurrency), and “Large” (16 - 64 provisioned concurrency). Additional custom workload sizes can also be used when available in the workspace. If scale-to-zero is enabled, the lower bound of the provisioned concurrency for each workload size is 0. Do not use if min_provisioned_concurrency and max_provisioned_concurrency are specified.
- workload_type: ServingModelWorkloadType | None = None¶
The workload type of the served entity. The workload type selects which type of compute to use in the endpoint. The default value for this parameter is “CPU”. For deep learning workloads, GPU acceleration is available by selecting workload types like GPU_SMALL and others. See the available [GPU types].
[GPU types]: https://docs.databricks.com/en/machine-learning/model-serving/create-manage-serving-endpoints.html#gpu-workload-types
- as_dict() dict¶
Serializes the ServedEntityInput into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the ServedEntityInput into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) ServedEntityInput¶
Deserializes the ServedEntityInput from a dictionary.
- class databricks.sdk.service.serving.ServedEntityOutput(burst_scaling_enabled: 'Optional[bool]' = None, creation_timestamp: 'Optional[int]' = None, creator: 'Optional[str]' = None, entity_name: 'Optional[str]' = None, entity_version: 'Optional[str]' = None, environment_vars: 'Optional[Dict[str, str]]' = None, external_model: 'Optional[ExternalModel]' = None, foundation_model: 'Optional[FoundationModel]' = None, instance_profile_arn: 'Optional[str]' = None, max_provisioned_concurrency: 'Optional[int]' = None, max_provisioned_throughput: 'Optional[int]' = None, min_provisioned_concurrency: 'Optional[int]' = None, min_provisioned_throughput: 'Optional[int]' = None, name: 'Optional[str]' = None, provisioned_model_units: 'Optional[int]' = None, scale_to_zero_enabled: 'Optional[bool]' = None, state: 'Optional[ServedModelState]' = None, workload_size: 'Optional[str]' = None, workload_type: 'Optional[ServingModelWorkloadType]' = None)¶
- burst_scaling_enabled: bool | None = None¶
Whether burst scaling is enabled. When enabled (default), the endpoint can automatically scale up beyond provisioned capacity to handle traffic spikes. When disabled, the endpoint maintains fixed capacity at provisioned_model_units.
- creation_timestamp: int | None = None¶
- creator: str | None = None¶
- entity_name: str | None = None¶
The name of the entity to be served. The entity may be a model in the Databricks Model Registry, a model in the Unity Catalog (UC), or a function of type FEATURE_SPEC in the UC. If it is a UC object, the full name of the object should be given in the form of catalog_name.schema_name.model_name.
- entity_version: str | None = None¶
- environment_vars: Dict[str, str] | None = None¶
An object containing a set of optional, user-specified environment variable key-value pairs used for serving this entity. Note: this is an experimental feature and subject to change. Example entity environment variables that refer to Databricks secrets: {“OPENAI_API_KEY”: “{{secrets/my_scope/my_key}}”, “DATABRICKS_TOKEN”: “{{secrets/my_scope2/my_key2}}”}
- external_model: ExternalModel | None = None¶
The external model to be served. NOTE: Only one of external_model and (entity_name, entity_version, workload_size, workload_type, and scale_to_zero_enabled) can be specified with the latter set being used for custom model serving for a Databricks registered model. For an existing endpoint with external_model, it cannot be updated to an endpoint without external_model. If the endpoint is created without external_model, users cannot update it to add external_model later. The task type of all external models within an endpoint must be the same.
- foundation_model: FoundationModel | None = None¶
- instance_profile_arn: str | None = None¶
ARN of the instance profile that the served entity uses to access AWS resources.
- max_provisioned_concurrency: int | None = None¶
The maximum provisioned concurrency that the endpoint can scale up to. Do not use if workload_size is specified.
- max_provisioned_throughput: int | None = None¶
The maximum tokens per second that the endpoint can scale up to.
- min_provisioned_concurrency: int | None = None¶
The minimum provisioned concurrency that the endpoint can scale down to. Do not use if workload_size is specified.
- min_provisioned_throughput: int | None = None¶
The minimum tokens per second that the endpoint can scale down to.
- name: str | None = None¶
The name of a served entity. It must be unique across an endpoint. A served entity name can consist of alphanumeric characters, dashes, and underscores. If not specified for an external model, this field defaults to external_model.name, with ‘.’ and ‘:’ replaced with ‘-’, and if not specified for other entities, it defaults to entity_name-entity_version.
- provisioned_model_units: int | None = None¶
The number of model units provisioned.
- scale_to_zero_enabled: bool | None = None¶
Whether the compute resources for the served entity should scale down to zero.
- state: ServedModelState | None = None¶
- workload_size: str | None = None¶
The workload size of the served entity. The workload size corresponds to a range of provisioned concurrency that the compute autoscales between. A single unit of provisioned concurrency can process one request at a time. Valid workload sizes are “Small” (4 - 4 provisioned concurrency), “Medium” (8 - 16 provisioned concurrency), and “Large” (16 - 64 provisioned concurrency). Additional custom workload sizes can also be used when available in the workspace. If scale-to-zero is enabled, the lower bound of the provisioned concurrency for each workload size is 0. Do not use if min_provisioned_concurrency and max_provisioned_concurrency are specified.
- workload_type: ServingModelWorkloadType | None = None¶
The workload type of the served entity. The workload type selects which type of compute to use in the endpoint. The default value for this parameter is “CPU”. For deep learning workloads, GPU acceleration is available by selecting workload types like GPU_SMALL and others. See the available [GPU types].
[GPU types]: https://docs.databricks.com/en/machine-learning/model-serving/create-manage-serving-endpoints.html#gpu-workload-types
- as_dict() dict¶
Serializes the ServedEntityOutput into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the ServedEntityOutput into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) ServedEntityOutput¶
Deserializes the ServedEntityOutput from a dictionary.
- class databricks.sdk.service.serving.ServedEntitySpec(entity_name: 'Optional[str]' = None, entity_version: 'Optional[str]' = None, external_model: 'Optional[ExternalModel]' = None, foundation_model: 'Optional[FoundationModel]' = None, name: 'Optional[str]' = None)¶
- entity_name: str | None = None¶
- entity_version: str | None = None¶
- external_model: ExternalModel | None = None¶
- foundation_model: FoundationModel | None = None¶
- name: str | None = None¶
- as_dict() dict¶
Serializes the ServedEntitySpec into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the ServedEntitySpec into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) ServedEntitySpec¶
Deserializes the ServedEntitySpec from a dictionary.
- class databricks.sdk.service.serving.ServedModelInput(scale_to_zero_enabled: 'bool', model_name: 'str', model_version: 'str', burst_scaling_enabled: 'Optional[bool]' = None, environment_vars: 'Optional[Dict[str, str]]' = None, instance_profile_arn: 'Optional[str]' = None, max_provisioned_concurrency: 'Optional[int]' = None, max_provisioned_throughput: 'Optional[int]' = None, min_provisioned_concurrency: 'Optional[int]' = None, min_provisioned_throughput: 'Optional[int]' = None, name: 'Optional[str]' = None, provisioned_model_units: 'Optional[int]' = None, workload_size: 'Optional[str]' = None, workload_type: 'Optional[ServedModelInputWorkloadType]' = None)¶
- scale_to_zero_enabled: bool¶
Whether the compute resources for the served entity should scale down to zero.
- model_name: str¶
- model_version: str¶
- burst_scaling_enabled: bool | None = None¶
Whether burst scaling is enabled. When enabled (default), the endpoint can automatically scale up beyond provisioned capacity to handle traffic spikes. When disabled, the endpoint maintains fixed capacity at provisioned_model_units.
- environment_vars: Dict[str, str] | None = None¶
An object containing a set of optional, user-specified environment variable key-value pairs used for serving this entity. Note: this is an experimental feature and subject to change. Example entity environment variables that refer to Databricks secrets: {“OPENAI_API_KEY”: “{{secrets/my_scope/my_key}}”, “DATABRICKS_TOKEN”: “{{secrets/my_scope2/my_key2}}”}
- instance_profile_arn: str | None = None¶
ARN of the instance profile that the served entity uses to access AWS resources.
- max_provisioned_concurrency: int | None = None¶
The maximum provisioned concurrency that the endpoint can scale up to. Do not use if workload_size is specified.
- max_provisioned_throughput: int | None = None¶
The maximum tokens per second that the endpoint can scale up to.
- min_provisioned_concurrency: int | None = None¶
The minimum provisioned concurrency that the endpoint can scale down to. Do not use if workload_size is specified.
- min_provisioned_throughput: int | None = None¶
The minimum tokens per second that the endpoint can scale down to.
- name: str | None = None¶
The name of a served entity. It must be unique across an endpoint. A served entity name can consist of alphanumeric characters, dashes, and underscores. If not specified for an external model, this field defaults to external_model.name, with ‘.’ and ‘:’ replaced with ‘-’, and if not specified for other entities, it defaults to entity_name-entity_version.
- provisioned_model_units: int | None = None¶
The number of model units provisioned.
- workload_size: str | None = None¶
The workload size of the served entity. The workload size corresponds to a range of provisioned concurrency that the compute autoscales between. A single unit of provisioned concurrency can process one request at a time. Valid workload sizes are “Small” (4 - 4 provisioned concurrency), “Medium” (8 - 16 provisioned concurrency), and “Large” (16 - 64 provisioned concurrency). Additional custom workload sizes can also be used when available in the workspace. If scale-to-zero is enabled, the lower bound of the provisioned concurrency for each workload size is 0. Do not use if min_provisioned_concurrency and max_provisioned_concurrency are specified.
- workload_type: ServedModelInputWorkloadType | None = None¶
The workload type of the served entity. The workload type selects which type of compute to use in the endpoint. The default value for this parameter is “CPU”. For deep learning workloads, GPU acceleration is available by selecting workload types like GPU_SMALL and others. See the available [GPU types].
[GPU types]: https://docs.databricks.com/en/machine-learning/model-serving/create-manage-serving-endpoints.html#gpu-workload-types
- as_dict() dict¶
Serializes the ServedModelInput into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the ServedModelInput into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) ServedModelInput¶
Deserializes the ServedModelInput from a dictionary.
- class databricks.sdk.service.serving.ServedModelInputWorkloadType¶
Please keep this in sync with with workload types in InferenceEndpointEntities.scala
- CPU = "CPU"¶
- GPU_LARGE = "GPU_LARGE"¶
- GPU_MEDIUM = "GPU_MEDIUM"¶
- GPU_SMALL = "GPU_SMALL"¶
- GPU_XLARGE = "GPU_XLARGE"¶
- MULTIGPU_MEDIUM = "MULTIGPU_MEDIUM"¶
- class databricks.sdk.service.serving.ServedModelOutput(burst_scaling_enabled: 'Optional[bool]' = None, creation_timestamp: 'Optional[int]' = None, creator: 'Optional[str]' = None, environment_vars: 'Optional[Dict[str, str]]' = None, instance_profile_arn: 'Optional[str]' = None, max_provisioned_concurrency: 'Optional[int]' = None, min_provisioned_concurrency: 'Optional[int]' = None, model_name: 'Optional[str]' = None, model_version: 'Optional[str]' = None, name: 'Optional[str]' = None, provisioned_model_units: 'Optional[int]' = None, scale_to_zero_enabled: 'Optional[bool]' = None, state: 'Optional[ServedModelState]' = None, workload_size: 'Optional[str]' = None, workload_type: 'Optional[ServingModelWorkloadType]' = None)¶
- burst_scaling_enabled: bool | None = None¶
Whether burst scaling is enabled. When enabled (default), the endpoint can automatically scale up beyond provisioned capacity to handle traffic spikes. When disabled, the endpoint maintains fixed capacity at provisioned_model_units.
- creation_timestamp: int | None = None¶
- creator: str | None = None¶
- environment_vars: Dict[str, str] | None = None¶
An object containing a set of optional, user-specified environment variable key-value pairs used for serving this entity. Note: this is an experimental feature and subject to change. Example entity environment variables that refer to Databricks secrets: {“OPENAI_API_KEY”: “{{secrets/my_scope/my_key}}”, “DATABRICKS_TOKEN”: “{{secrets/my_scope2/my_key2}}”}
- instance_profile_arn: str | None = None¶
ARN of the instance profile that the served entity uses to access AWS resources.
- max_provisioned_concurrency: int | None = None¶
The maximum provisioned concurrency that the endpoint can scale up to. Do not use if workload_size is specified.
- min_provisioned_concurrency: int | None = None¶
The minimum provisioned concurrency that the endpoint can scale down to. Do not use if workload_size is specified.
- model_name: str | None = None¶
- model_version: str | None = None¶
- name: str | None = None¶
The name of a served entity. It must be unique across an endpoint. A served entity name can consist of alphanumeric characters, dashes, and underscores. If not specified for an external model, this field defaults to external_model.name, with ‘.’ and ‘:’ replaced with ‘-’, and if not specified for other entities, it defaults to entity_name-entity_version.
- provisioned_model_units: int | None = None¶
The number of model units provisioned.
- scale_to_zero_enabled: bool | None = None¶
Whether the compute resources for the served entity should scale down to zero.
- state: ServedModelState | None = None¶
- workload_size: str | None = None¶
The workload size of the served entity. The workload size corresponds to a range of provisioned concurrency that the compute autoscales between. A single unit of provisioned concurrency can process one request at a time. Valid workload sizes are “Small” (4 - 4 provisioned concurrency), “Medium” (8 - 16 provisioned concurrency), and “Large” (16 - 64 provisioned concurrency). Additional custom workload sizes can also be used when available in the workspace. If scale-to-zero is enabled, the lower bound of the provisioned concurrency for each workload size is 0. Do not use if min_provisioned_concurrency and max_provisioned_concurrency are specified.
- workload_type: ServingModelWorkloadType | None = None¶
The workload type of the served entity. The workload type selects which type of compute to use in the endpoint. The default value for this parameter is “CPU”. For deep learning workloads, GPU acceleration is available by selecting workload types like GPU_SMALL and others. See the available [GPU types].
[GPU types]: https://docs.databricks.com/en/machine-learning/model-serving/create-manage-serving-endpoints.html#gpu-workload-types
- as_dict() dict¶
Serializes the ServedModelOutput into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the ServedModelOutput into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) ServedModelOutput¶
Deserializes the ServedModelOutput from a dictionary.
- class databricks.sdk.service.serving.ServedModelSpec(model_name: 'Optional[str]' = None, model_version: 'Optional[str]' = None, name: 'Optional[str]' = None)¶
- model_name: str | None = None¶
Only one of model_name and entity_name should be populated
- model_version: str | None = None¶
Only one of model_version and entity_version should be populated
- name: str | None = None¶
- as_dict() dict¶
Serializes the ServedModelSpec into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the ServedModelSpec into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) ServedModelSpec¶
Deserializes the ServedModelSpec from a dictionary.
- class databricks.sdk.service.serving.ServedModelState(deployment: 'Optional[ServedModelStateDeployment]' = None, deployment_state_message: 'Optional[str]' = None)¶
- deployment: ServedModelStateDeployment | None = None¶
- deployment_state_message: str | None = None¶
- as_dict() dict¶
Serializes the ServedModelState into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the ServedModelState into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) ServedModelState¶
Deserializes the ServedModelState from a dictionary.
- class databricks.sdk.service.serving.ServedModelStateDeployment¶
- DEPLOYMENT_ABORTED = "DEPLOYMENT_ABORTED"¶
- DEPLOYMENT_CREATING = "DEPLOYMENT_CREATING"¶
- DEPLOYMENT_FAILED = "DEPLOYMENT_FAILED"¶
- DEPLOYMENT_READY = "DEPLOYMENT_READY"¶
- DEPLOYMENT_RECOVERING = "DEPLOYMENT_RECOVERING"¶
- class databricks.sdk.service.serving.ServerLogsResponse(logs: 'str')¶
- logs: str¶
The most recent log lines of the model server processing invocation requests.
- as_dict() dict¶
Serializes the ServerLogsResponse into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the ServerLogsResponse into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) ServerLogsResponse¶
Deserializes the ServerLogsResponse from a dictionary.
- class databricks.sdk.service.serving.ServingEndpoint(ai_gateway: 'Optional[AiGatewayConfig]' = None, budget_policy_id: 'Optional[str]' = None, config: 'Optional[EndpointCoreConfigSummary]' = None, creation_timestamp: 'Optional[int]' = None, creator: 'Optional[str]' = None, description: 'Optional[str]' = None, id: 'Optional[str]' = None, last_updated_timestamp: 'Optional[int]' = None, name: 'Optional[str]' = None, state: 'Optional[EndpointState]' = None, tags: 'Optional[List[EndpointTag]]' = None, task: 'Optional[str]' = None, usage_policy_id: 'Optional[str]' = None)¶
- ai_gateway: AiGatewayConfig | None = None¶
The AI Gateway configuration for the serving endpoint. NOTE: External model, provisioned throughput, and pay-per-token endpoints are fully supported; agent endpoints currently only support inference tables.
- budget_policy_id: str | None = None¶
The budget policy associated with the endpoint.
- config: EndpointCoreConfigSummary | None = None¶
The config that is currently being served by the endpoint.
- creation_timestamp: int | None = None¶
The timestamp when the endpoint was created in Unix time.
- creator: str | None = None¶
The email of the user who created the serving endpoint.
- description: str | None = None¶
Description of the endpoint
- id: str | None = None¶
System-generated ID of the endpoint, included to be used by the Permissions API.
- last_updated_timestamp: int | None = None¶
The timestamp when the endpoint was last updated by a user in Unix time.
- name: str | None = None¶
The name of the serving endpoint.
- state: EndpointState | None = None¶
Information corresponding to the state of the serving endpoint.
- tags: List[EndpointTag] | None = None¶
Tags attached to the serving endpoint.
- task: str | None = None¶
The task type of the serving endpoint.
- usage_policy_id: str | None = None¶
The usage policy associated with serving endpoint.
- as_dict() dict¶
Serializes the ServingEndpoint into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the ServingEndpoint into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) ServingEndpoint¶
Deserializes the ServingEndpoint from a dictionary.
- class databricks.sdk.service.serving.ServingEndpointAccessControlRequest(group_name: 'Optional[str]' = None, permission_level: 'Optional[ServingEndpointPermissionLevel]' = None, service_principal_name: 'Optional[str]' = None, user_name: 'Optional[str]' = None)¶
- group_name: str | None = None¶
name of the group
- permission_level: ServingEndpointPermissionLevel | None = None¶
- service_principal_name: str | None = None¶
application ID of a service principal
- user_name: str | None = None¶
name of the user
- as_dict() dict¶
Serializes the ServingEndpointAccessControlRequest into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the ServingEndpointAccessControlRequest into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) ServingEndpointAccessControlRequest¶
Deserializes the ServingEndpointAccessControlRequest from a dictionary.
- class databricks.sdk.service.serving.ServingEndpointAccessControlResponse(all_permissions: 'Optional[List[ServingEndpointPermission]]' = None, display_name: 'Optional[str]' = None, group_name: 'Optional[str]' = None, service_principal_name: 'Optional[str]' = None, user_name: 'Optional[str]' = None)¶
- all_permissions: List[ServingEndpointPermission] | None = None¶
All permissions.
- display_name: str | None = None¶
Display name of the user or service principal.
- group_name: str | None = None¶
name of the group
- service_principal_name: str | None = None¶
Name of the service principal.
- user_name: str | None = None¶
name of the user
- as_dict() dict¶
Serializes the ServingEndpointAccessControlResponse into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the ServingEndpointAccessControlResponse into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) ServingEndpointAccessControlResponse¶
Deserializes the ServingEndpointAccessControlResponse from a dictionary.
- class databricks.sdk.service.serving.ServingEndpointDetailed(ai_gateway: 'Optional[AiGatewayConfig]' = None, budget_policy_id: 'Optional[str]' = None, config: 'Optional[EndpointCoreConfigOutput]' = None, creation_timestamp: 'Optional[int]' = None, creator: 'Optional[str]' = None, data_plane_info: 'Optional[ModelDataPlaneInfo]' = None, description: 'Optional[str]' = None, email_notifications: 'Optional[EmailNotifications]' = None, endpoint_url: 'Optional[str]' = None, id: 'Optional[str]' = None, last_updated_timestamp: 'Optional[int]' = None, name: 'Optional[str]' = None, pending_config: 'Optional[EndpointPendingConfig]' = None, permission_level: 'Optional[ServingEndpointDetailedPermissionLevel]' = None, route_optimized: 'Optional[bool]' = None, state: 'Optional[EndpointState]' = None, tags: 'Optional[List[EndpointTag]]' = None, task: 'Optional[str]' = None)¶
- ai_gateway: AiGatewayConfig | None = None¶
The AI Gateway configuration for the serving endpoint. NOTE: External model, provisioned throughput, and pay-per-token endpoints are fully supported; agent endpoints currently only support inference tables.
- budget_policy_id: str | None = None¶
The budget policy associated with the endpoint.
- config: EndpointCoreConfigOutput | None = None¶
The config that is currently being served by the endpoint.
- creation_timestamp: int | None = None¶
The timestamp when the endpoint was created in Unix time.
- creator: str | None = None¶
The email of the user who created the serving endpoint.
- data_plane_info: ModelDataPlaneInfo | None = None¶
Information required to query DataPlane APIs.
- description: str | None = None¶
Description of the serving model
- email_notifications: EmailNotifications | None = None¶
Email notification settings.
- endpoint_url: str | None = None¶
Endpoint invocation url if route optimization is enabled for endpoint
- id: str | None = None¶
System-generated ID of the endpoint. This is used to refer to the endpoint in the Permissions API
- last_updated_timestamp: int | None = None¶
The timestamp when the endpoint was last updated by a user in Unix time.
- name: str | None = None¶
The name of the serving endpoint.
- pending_config: EndpointPendingConfig | None = None¶
The config that the endpoint is attempting to update to.
- permission_level: ServingEndpointDetailedPermissionLevel | None = None¶
The permission level of the principal making the request.
- route_optimized: bool | None = None¶
Boolean representing if route optimization has been enabled for the endpoint
- state: EndpointState | None = None¶
Information corresponding to the state of the serving endpoint.
- tags: List[EndpointTag] | None = None¶
Tags attached to the serving endpoint.
- task: str | None = None¶
The task type of the serving endpoint.
- as_dict() dict¶
Serializes the ServingEndpointDetailed into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the ServingEndpointDetailed into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) ServingEndpointDetailed¶
Deserializes the ServingEndpointDetailed from a dictionary.
- class databricks.sdk.service.serving.ServingEndpointDetailedPermissionLevel¶
- CAN_MANAGE = "CAN_MANAGE"¶
- CAN_QUERY = "CAN_QUERY"¶
- CAN_VIEW = "CAN_VIEW"¶
- class databricks.sdk.service.serving.ServingEndpointPermission(inherited: 'Optional[bool]' = None, inherited_from_object: 'Optional[List[str]]' = None, permission_level: 'Optional[ServingEndpointPermissionLevel]' = None)¶
- inherited: bool | None = None¶
- inherited_from_object: List[str] | None = None¶
- permission_level: ServingEndpointPermissionLevel | None = None¶
- as_dict() dict¶
Serializes the ServingEndpointPermission into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the ServingEndpointPermission into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) ServingEndpointPermission¶
Deserializes the ServingEndpointPermission from a dictionary.
- class databricks.sdk.service.serving.ServingEndpointPermissionLevel¶
Permission level
- CAN_MANAGE = "CAN_MANAGE"¶
- CAN_QUERY = "CAN_QUERY"¶
- CAN_VIEW = "CAN_VIEW"¶
- class databricks.sdk.service.serving.ServingEndpointPermissions(access_control_list: 'Optional[List[ServingEndpointAccessControlResponse]]' = None, object_id: 'Optional[str]' = None, object_type: 'Optional[str]' = None)¶
- access_control_list: List[ServingEndpointAccessControlResponse] | None = None¶
- object_id: str | None = None¶
- object_type: str | None = None¶
- as_dict() dict¶
Serializes the ServingEndpointPermissions into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the ServingEndpointPermissions into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) ServingEndpointPermissions¶
Deserializes the ServingEndpointPermissions from a dictionary.
- class databricks.sdk.service.serving.ServingEndpointPermissionsDescription(description: 'Optional[str]' = None, permission_level: 'Optional[ServingEndpointPermissionLevel]' = None)¶
- description: str | None = None¶
- permission_level: ServingEndpointPermissionLevel | None = None¶
- as_dict() dict¶
Serializes the ServingEndpointPermissionsDescription into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the ServingEndpointPermissionsDescription into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) ServingEndpointPermissionsDescription¶
Deserializes the ServingEndpointPermissionsDescription from a dictionary.
- class databricks.sdk.service.serving.ServingModelWorkloadType¶
Please keep this in sync with with workload types in InferenceEndpointEntities.scala
- CPU = "CPU"¶
- GPU_LARGE = "GPU_LARGE"¶
- GPU_MEDIUM = "GPU_MEDIUM"¶
- GPU_SMALL = "GPU_SMALL"¶
- GPU_XLARGE = "GPU_XLARGE"¶
- MULTIGPU_MEDIUM = "MULTIGPU_MEDIUM"¶
- class databricks.sdk.service.serving.TrafficConfig(routes: 'Optional[List[Route]]' = None)¶
-
- as_dict() dict¶
Serializes the TrafficConfig into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the TrafficConfig into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) TrafficConfig¶
Deserializes the TrafficConfig from a dictionary.
- class databricks.sdk.service.serving.UpdateInferenceEndpointNotificationsResponse(email_notifications: 'Optional[EmailNotifications]' = None, name: 'Optional[str]' = None)¶
- email_notifications: EmailNotifications | None = None¶
- name: str | None = None¶
- as_dict() dict¶
Serializes the UpdateInferenceEndpointNotificationsResponse into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the UpdateInferenceEndpointNotificationsResponse into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) UpdateInferenceEndpointNotificationsResponse¶
Deserializes the UpdateInferenceEndpointNotificationsResponse from a dictionary.
- class databricks.sdk.service.serving.V1ResponseChoiceElement(finish_reason: 'Optional[str]' = None, index: 'Optional[int]' = None, logprobs: 'Optional[int]' = None, message: 'Optional[ChatMessage]' = None, text: 'Optional[str]' = None)¶
- finish_reason: str | None = None¶
The finish reason returned by the endpoint.
- index: int | None = None¶
The index of the choice in the __chat or completions__ response.
- logprobs: int | None = None¶
The logprobs returned only by the __completions__ endpoint.
- message: ChatMessage | None = None¶
The message response from the __chat__ endpoint.
- text: str | None = None¶
The text response from the __completions__ endpoint.
- as_dict() dict¶
Serializes the V1ResponseChoiceElement into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the V1ResponseChoiceElement into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) V1ResponseChoiceElement¶
Deserializes the V1ResponseChoiceElement from a dictionary.