w.serving_endpoints: Serving endpoints

class databricks.sdk.service.serving.ServingEndpointsExt

The Serving Endpoints API allows you to create, update, and delete model serving endpoints.

You can use a serving endpoint to serve models from the Databricks Model Registry or from Unity Catalog. Endpoints expose the underlying models as scalable REST API endpoints using serverless compute. This means the endpoints and associated compute resources are fully managed by Databricks and will not appear in your cloud account. A serving endpoint can consist of one or more MLflow models from the Databricks Model Registry, called served entities. A serving endpoint can have at most ten served entities. You can configure traffic settings to define how requests should be routed to your served entities behind an endpoint. Additionally, you can configure the scale of resources that should be applied to each served entity.

build_logs(name: str, served_model_name: str) BuildLogsResponse

Get build logs for a served model.

Retrieves the build logs associated with the provided served model.

Parameters:
  • name – str The name of the serving endpoint that the served model belongs to. This field is required.

  • served_model_name – str The name of the served model that build logs will be retrieved for. This field is required.

Returns:

BuildLogsResponse

create(name: str [, ai_gateway: Optional[AiGatewayConfig], budget_policy_id: Optional[str], config: Optional[EndpointCoreConfigInput], rate_limits: Optional[List[RateLimit]], route_optimized: Optional[bool], tags: Optional[List[EndpointTag]]]) Wait[ServingEndpointDetailed]

Create a new serving endpoint.

Parameters:
  • name – str The name of the serving endpoint. This field is required and must be unique across a Databricks workspace. An endpoint name can consist of alphanumeric characters, dashes, and underscores.

  • ai_gatewayAiGatewayConfig (optional) The AI Gateway configuration for the serving endpoint. NOTE: External model, provisioned throughput, and pay-per-token endpoints are fully supported; agent endpoints currently only support inference tables.

  • budget_policy_id – str (optional) The budget policy to be applied to the serving endpoint.

  • configEndpointCoreConfigInput (optional) The core config of the serving endpoint.

  • rate_limits – List[RateLimit] (optional) Rate limits to be applied to the serving endpoint. NOTE: this field is deprecated, please use AI Gateway to manage rate limits.

  • route_optimized – bool (optional) Enable route optimization for the serving endpoint.

  • tags – List[EndpointTag] (optional) Tags to be attached to the serving endpoint and automatically propagated to billing logs.

Returns:

Long-running operation waiter for ServingEndpointDetailed. See :method:wait_get_serving_endpoint_not_updating for more details.

create_and_wait(name: str [, ai_gateway: Optional[AiGatewayConfig], budget_policy_id: Optional[str], config: Optional[EndpointCoreConfigInput], rate_limits: Optional[List[RateLimit]], route_optimized: Optional[bool], tags: Optional[List[EndpointTag]], timeout: datetime.timedelta = 0:20:00]) ServingEndpointDetailed
delete(name: str)

Delete a serving endpoint.

Parameters:

name – str

export_metrics(name: str) ExportMetricsResponse

Get metrics of a serving endpoint.

Retrieves the metrics associated with the provided serving endpoint in either Prometheus or OpenMetrics exposition format.

Parameters:

name – str The name of the serving endpoint to retrieve metrics for. This field is required.

Returns:

ExportMetricsResponse

get(name: str) ServingEndpointDetailed

Get a single serving endpoint.

Retrieves the details for a single serving endpoint.

Parameters:

name – str The name of the serving endpoint. This field is required.

Returns:

ServingEndpointDetailed

get_langchain_chat_open_ai_client(model)
get_open_ai_client()
get_open_api(name: str) GetOpenApiResponse

Get the schema for a serving endpoint.

Get the query schema of the serving endpoint in OpenAPI format. The schema contains information for the supported paths, input and output format and datatypes.

Parameters:

name – str The name of the serving endpoint that the served model belongs to. This field is required.

Returns:

GetOpenApiResponse

get_permission_levels(serving_endpoint_id: str) GetServingEndpointPermissionLevelsResponse

Get serving endpoint permission levels.

Gets the permission levels that a user can have on an object.

Parameters:

serving_endpoint_id – str The serving endpoint for which to get or manage permissions.

Returns:

GetServingEndpointPermissionLevelsResponse

get_permissions(serving_endpoint_id: str) ServingEndpointPermissions

Get serving endpoint permissions.

Gets the permissions of a serving endpoint. Serving endpoints can inherit permissions from their root object.

Parameters:

serving_endpoint_id – str The serving endpoint for which to get or manage permissions.

Returns:

ServingEndpointPermissions

http_request(conn: str, method: ExternalFunctionRequestHttpMethod, path: str [, headers: typing.Dict[str, str], json: typing.Dict[str, str], params: typing.Dict[str, str]]) Response

Make external services call using the credentials stored in UC Connection. NOTE: Experimental: This API may change or be removed in a future release without warning. :param conn: str

The connection name to use. This is required to identify the external connection.

Parameters:
  • methodExternalFunctionRequestHttpMethod The HTTP method to use (e.g., ‘GET’, ‘POST’). This is required.

  • path – str The relative path for the API endpoint. This is required.

  • headers – Dict[str,str] (optional) Additional headers for the request. If not provided, only auth headers from connections would be passed.

  • json – Dict[str,str] (optional) JSON payload for the request.

  • params – Dict[str,str] (optional) Query parameters for the request.

Returns:

Response

list() Iterator[ServingEndpoint]

Get all serving endpoints.

Returns:

Iterator over ServingEndpoint

logs(name: str, served_model_name: str) ServerLogsResponse

Get the latest logs for a served model.

Retrieves the service logs associated with the provided served model.

Parameters:
  • name – str The name of the serving endpoint that the served model belongs to. This field is required.

  • served_model_name – str The name of the served model that logs will be retrieved for. This field is required.

Returns:

ServerLogsResponse

patch(name: str [, add_tags: Optional[List[EndpointTag]], delete_tags: Optional[List[str]]]) EndpointTags

Update tags of a serving endpoint.

Used to batch add and delete tags from a serving endpoint with a single API call.

Parameters:
  • name – str The name of the serving endpoint who’s tags to patch. This field is required.

  • add_tags – List[EndpointTag] (optional) List of endpoint tags to add

  • delete_tags – List[str] (optional) List of tag keys to delete

Returns:

EndpointTags

put(name: str [, rate_limits: Optional[List[RateLimit]]]) PutResponse

Update rate limits of a serving endpoint.

Deprecated: Please use AI Gateway to manage rate limits instead.

Parameters:
  • name – str The name of the serving endpoint whose rate limits are being updated. This field is required.

  • rate_limits – List[RateLimit] (optional) The list of endpoint rate limits.

Returns:

PutResponse

put_ai_gateway(name: str [, fallback_config: Optional[FallbackConfig], guardrails: Optional[AiGatewayGuardrails], inference_table_config: Optional[AiGatewayInferenceTableConfig], rate_limits: Optional[List[AiGatewayRateLimit]], usage_tracking_config: Optional[AiGatewayUsageTrackingConfig]]) PutAiGatewayResponse

Update AI Gateway of a serving endpoint.

Used to update the AI Gateway of a serving endpoint. NOTE: External model, provisioned throughput, and pay-per-token endpoints are fully supported; agent endpoints currently only support inference tables.

Parameters:
  • name – str The name of the serving endpoint whose AI Gateway is being updated. This field is required.

  • fallback_configFallbackConfig (optional) Configuration for traffic fallback which auto fallbacks to other served entities if the request to a served entity fails with certain error codes, to increase availability.

  • guardrailsAiGatewayGuardrails (optional) Configuration for AI Guardrails to prevent unwanted data and unsafe data in requests and responses.

  • inference_table_configAiGatewayInferenceTableConfig (optional) Configuration for payload logging using inference tables. Use these tables to monitor and audit data being sent to and received from model APIs and to improve model quality.

  • rate_limits – List[AiGatewayRateLimit] (optional) Configuration for rate limits which can be set to limit endpoint traffic.

  • usage_tracking_configAiGatewayUsageTrackingConfig (optional) Configuration to enable usage tracking using system tables. These tables allow you to monitor operational usage on endpoints and their associated costs.

Returns:

PutAiGatewayResponse

query(name: str [, dataframe_records: Optional[List[Any]], dataframe_split: Optional[DataframeSplitInput], extra_params: Optional[Dict[str, str]], input: Optional[Any], inputs: Optional[Any], instances: Optional[List[Any]], max_tokens: Optional[int], messages: Optional[List[ChatMessage]], n: Optional[int], prompt: Optional[Any], stop: Optional[List[str]], stream: Optional[bool], temperature: Optional[float]]) QueryEndpointResponse

Query a serving endpoint.

Parameters:
  • name – str The name of the serving endpoint. This field is required.

  • dataframe_records – List[Any] (optional) Pandas Dataframe input in the records orientation.

  • dataframe_splitDataframeSplitInput (optional) Pandas Dataframe input in the split orientation.

  • extra_params – Dict[str,str] (optional) The extra parameters field used ONLY for __completions, chat,__ and __embeddings external & foundation model__ serving endpoints. This is a map of strings and should only be used with other external/foundation model query fields.

  • input – Any (optional) The input string (or array of strings) field used ONLY for __embeddings external & foundation model__ serving endpoints and is the only field (along with extra_params if needed) used by embeddings queries.

  • inputs – Any (optional) Tensor-based input in columnar format.

  • instances – List[Any] (optional) Tensor-based input in row format.

  • max_tokens – int (optional) The max tokens field used ONLY for __completions__ and __chat external & foundation model__ serving endpoints. This is an integer and should only be used with other chat/completions query fields.

  • messages – List[ChatMessage] (optional) The messages field used ONLY for __chat external & foundation model__ serving endpoints. This is a map of strings and should only be used with other chat query fields.

  • n – int (optional) The n (number of candidates) field used ONLY for __completions__ and __chat external & foundation model__ serving endpoints. This is an integer between 1 and 5 with a default of 1 and should only be used with other chat/completions query fields.

  • prompt – Any (optional) The prompt string (or array of strings) field used ONLY for __completions external & foundation model__ serving endpoints and should only be used with other completions query fields.

  • stop – List[str] (optional) The stop sequences field used ONLY for __completions__ and __chat external & foundation model__ serving endpoints. This is a list of strings and should only be used with other chat/completions query fields.

  • stream – bool (optional) The stream field used ONLY for __completions__ and __chat external & foundation model__ serving endpoints. This is a boolean defaulting to false and should only be used with other chat/completions query fields.

  • temperature – float (optional) The temperature field used ONLY for __completions__ and __chat external & foundation model__ serving endpoints. This is a float between 0.0 and 2.0 with a default of 1.0 and should only be used with other chat/completions query fields.

Returns:

QueryEndpointResponse

set_permissions(serving_endpoint_id: str [, access_control_list: Optional[List[ServingEndpointAccessControlRequest]]]) ServingEndpointPermissions

Set serving endpoint permissions.

Sets permissions on an object, replacing existing permissions if they exist. Deletes all direct permissions if none are specified. Objects can inherit permissions from their root object.

Parameters:
Returns:

ServingEndpointPermissions

update_config(name: str [, auto_capture_config: Optional[AutoCaptureConfigInput], served_entities: Optional[List[ServedEntityInput]], served_models: Optional[List[ServedModelInput]], traffic_config: Optional[TrafficConfig]]) Wait[ServingEndpointDetailed]

Update config of a serving endpoint.

Updates any combination of the serving endpoint’s served entities, the compute configuration of those served entities, and the endpoint’s traffic config. An endpoint that already has an update in progress can not be updated until the current update completes or fails.

Parameters:
  • name – str The name of the serving endpoint to update. This field is required.

  • auto_capture_configAutoCaptureConfigInput (optional) Configuration for Inference Tables which automatically logs requests and responses to Unity Catalog. Note: this field is deprecated for creating new provisioned throughput endpoints, or updating existing provisioned throughput endpoints that never have inference table configured; in these cases please use AI Gateway to manage inference tables.

  • served_entities – List[ServedEntityInput] (optional) The list of served entities under the serving endpoint config.

  • served_models – List[ServedModelInput] (optional) (Deprecated, use served_entities instead) The list of served models under the serving endpoint config.

  • traffic_configTrafficConfig (optional) The traffic configuration associated with the serving endpoint config.

Returns:

Long-running operation waiter for ServingEndpointDetailed. See :method:wait_get_serving_endpoint_not_updating for more details.

update_config_and_wait(name: str [, auto_capture_config: Optional[AutoCaptureConfigInput], served_entities: Optional[List[ServedEntityInput]], served_models: Optional[List[ServedModelInput]], traffic_config: Optional[TrafficConfig], timeout: datetime.timedelta = 0:20:00]) ServingEndpointDetailed
update_permissions(serving_endpoint_id: str [, access_control_list: Optional[List[ServingEndpointAccessControlRequest]]]) ServingEndpointPermissions

Update serving endpoint permissions.

Updates the permissions on a serving endpoint. Serving endpoints can inherit permissions from their root object.

Parameters:
Returns:

ServingEndpointPermissions

wait_get_serving_endpoint_not_updating(name: str, timeout: datetime.timedelta = 0:20:00, callback: Optional[Callable[[ServingEndpointDetailed], None]]) ServingEndpointDetailed