w.serving_endpoints: Serving endpoints

class databricks.sdk.service.serving.ServingEndpointsAPI

The Serving Endpoints API allows you to create, update, and delete model serving endpoints.

You can use a serving endpoint to serve models from the Databricks Model Registry or from Unity Catalog. Endpoints expose the underlying models as scalable REST API endpoints using serverless compute. This means the endpoints and associated compute resources are fully managed by Databricks and will not appear in your cloud account. A serving endpoint can consist of one or more MLflow models from the Databricks Model Registry, called served entities. A serving endpoint can have at most ten served entities. You can configure traffic settings to define how requests should be routed to your served entities behind an endpoint. Additionally, you can configure the scale of resources that should be applied to each served entity.

build_logs(name: str, served_model_name: str) BuildLogsResponse

Get build logs for a served model.

Retrieves the build logs associated with the provided served model.

Parameters:
  • name – str The name of the serving endpoint that the served model belongs to. This field is required.

  • served_model_name – str The name of the served model that build logs will be retrieved for. This field is required.

Returns:

BuildLogsResponse

create(name: str, config: EndpointCoreConfigInput [, rate_limits: Optional[List[RateLimit]], tags: Optional[List[EndpointTag]]]) Wait[ServingEndpointDetailed]

Create a new serving endpoint.

Parameters:
  • name – str The name of the serving endpoint. This field is required and must be unique across a Databricks workspace. An endpoint name can consist of alphanumeric characters, dashes, and underscores.

  • configEndpointCoreConfigInput The core config of the serving endpoint.

  • rate_limits – List[RateLimit] (optional) Rate limits to be applied to the serving endpoint. NOTE: only external and foundation model endpoints are supported as of now.

  • tags – List[EndpointTag] (optional) Tags to be attached to the serving endpoint and automatically propagated to billing logs.

Returns:

Long-running operation waiter for ServingEndpointDetailed. See :method:wait_get_serving_endpoint_not_updating for more details.

create_and_wait(name: str, config: EndpointCoreConfigInput [, rate_limits: Optional[List[RateLimit]], tags: Optional[List[EndpointTag]], timeout: datetime.timedelta = 0:20:00]) ServingEndpointDetailed
delete(name: str)

Delete a serving endpoint.

Parameters:

name – str The name of the serving endpoint. This field is required.

export_metrics(name: str)

Get metrics of a serving endpoint.

Retrieves the metrics associated with the provided serving endpoint in either Prometheus or OpenMetrics exposition format.

Parameters:

name – str The name of the serving endpoint to retrieve metrics for. This field is required.

get(name: str) ServingEndpointDetailed

Get a single serving endpoint.

Retrieves the details for a single serving endpoint.

Parameters:

name – str The name of the serving endpoint. This field is required.

Returns:

ServingEndpointDetailed

get_open_api(name: str)

Get the schema for a serving endpoint.

Get the query schema of the serving endpoint in OpenAPI format. The schema contains information for the supported paths, input and output format and datatypes.

Parameters:

name – str The name of the serving endpoint that the served model belongs to. This field is required.

get_permission_levels(serving_endpoint_id: str) GetServingEndpointPermissionLevelsResponse

Get serving endpoint permission levels.

Gets the permission levels that a user can have on an object.

Parameters:

serving_endpoint_id – str The serving endpoint for which to get or manage permissions.

Returns:

GetServingEndpointPermissionLevelsResponse

get_permissions(serving_endpoint_id: str) ServingEndpointPermissions

Get serving endpoint permissions.

Gets the permissions of a serving endpoint. Serving endpoints can inherit permissions from their root object.

Parameters:

serving_endpoint_id – str The serving endpoint for which to get or manage permissions.

Returns:

ServingEndpointPermissions

list() Iterator[ServingEndpoint]

Get all serving endpoints.

Returns:

Iterator over ServingEndpoint

logs(name: str, served_model_name: str) ServerLogsResponse

Get the latest logs for a served model.

Retrieves the service logs associated with the provided served model.

Parameters:
  • name – str The name of the serving endpoint that the served model belongs to. This field is required.

  • served_model_name – str The name of the served model that logs will be retrieved for. This field is required.

Returns:

ServerLogsResponse

patch(name: str [, add_tags: Optional[List[EndpointTag]], delete_tags: Optional[List[str]]]) Iterator[EndpointTag]

Update tags of a serving endpoint.

Used to batch add and delete tags from a serving endpoint with a single API call.

Parameters:
  • name – str The name of the serving endpoint who’s tags to patch. This field is required.

  • add_tags – List[EndpointTag] (optional) List of endpoint tags to add

  • delete_tags – List[str] (optional) List of tag keys to delete

Returns:

Iterator over EndpointTag

put(name: str [, rate_limits: Optional[List[RateLimit]]]) PutResponse

Update rate limits of a serving endpoint.

Used to update the rate limits of a serving endpoint. NOTE: only external and foundation model endpoints are supported as of now.

Parameters:
  • name – str The name of the serving endpoint whose rate limits are being updated. This field is required.

  • rate_limits – List[RateLimit] (optional) The list of endpoint rate limits.

Returns:

PutResponse

query(name: str [, dataframe_records: Optional[List[Any]], dataframe_split: Optional[DataframeSplitInput], extra_params: Optional[Dict[str, str]], input: Optional[Any], inputs: Optional[Any], instances: Optional[List[Any]], max_tokens: Optional[int], messages: Optional[List[ChatMessage]], n: Optional[int], prompt: Optional[Any], stop: Optional[List[str]], stream: Optional[bool], temperature: Optional[float]]) QueryEndpointResponse

Query a serving endpoint.

Parameters:
  • name – str The name of the serving endpoint. This field is required.

  • dataframe_records – List[Any] (optional) Pandas Dataframe input in the records orientation.

  • dataframe_splitDataframeSplitInput (optional) Pandas Dataframe input in the split orientation.

  • extra_params – Dict[str,str] (optional) The extra parameters field used ONLY for __completions, chat,__ and __embeddings external & foundation model__ serving endpoints. This is a map of strings and should only be used with other external/foundation model query fields.

  • input – Any (optional) The input string (or array of strings) field used ONLY for __embeddings external & foundation model__ serving endpoints and is the only field (along with extra_params if needed) used by embeddings queries.

  • inputs – Any (optional) Tensor-based input in columnar format.

  • instances – List[Any] (optional) Tensor-based input in row format.

  • max_tokens – int (optional) The max tokens field used ONLY for __completions__ and __chat external & foundation model__ serving endpoints. This is an integer and should only be used with other chat/completions query fields.

  • messages – List[ChatMessage] (optional) The messages field used ONLY for __chat external & foundation model__ serving endpoints. This is a map of strings and should only be used with other chat query fields.

  • n – int (optional) The n (number of candidates) field used ONLY for __completions__ and __chat external & foundation model__ serving endpoints. This is an integer between 1 and 5 with a default of 1 and should only be used with other chat/completions query fields.

  • prompt – Any (optional) The prompt string (or array of strings) field used ONLY for __completions external & foundation model__ serving endpoints and should only be used with other completions query fields.

  • stop – List[str] (optional) The stop sequences field used ONLY for __completions__ and __chat external & foundation model__ serving endpoints. This is a list of strings and should only be used with other chat/completions query fields.

  • stream – bool (optional) The stream field used ONLY for __completions__ and __chat external & foundation model__ serving endpoints. This is a boolean defaulting to false and should only be used with other chat/completions query fields.

  • temperature – float (optional) The temperature field used ONLY for __completions__ and __chat external & foundation model__ serving endpoints. This is a float between 0.0 and 2.0 with a default of 1.0 and should only be used with other chat/completions query fields.

Returns:

QueryEndpointResponse

set_permissions(serving_endpoint_id: str [, access_control_list: Optional[List[ServingEndpointAccessControlRequest]]]) ServingEndpointPermissions

Set serving endpoint permissions.

Sets permissions on a serving endpoint. Serving endpoints can inherit permissions from their root object.

Parameters:
Returns:

ServingEndpointPermissions

update_config(name: str [, auto_capture_config: Optional[AutoCaptureConfigInput], served_entities: Optional[List[ServedEntityInput]], served_models: Optional[List[ServedModelInput]], traffic_config: Optional[TrafficConfig]]) Wait[ServingEndpointDetailed]

Update config of a serving endpoint.

Updates any combination of the serving endpoint’s served entities, the compute configuration of those served entities, and the endpoint’s traffic config. An endpoint that already has an update in progress can not be updated until the current update completes or fails.

Parameters:
  • name – str The name of the serving endpoint to update. This field is required.

  • auto_capture_configAutoCaptureConfigInput (optional) Configuration for Inference Tables which automatically logs requests and responses to Unity Catalog.

  • served_entities – List[ServedEntityInput] (optional) A list of served entities for the endpoint to serve. A serving endpoint can have up to 15 served entities.

  • served_models – List[ServedModelInput] (optional) (Deprecated, use served_entities instead) A list of served models for the endpoint to serve. A serving endpoint can have up to 15 served models.

  • traffic_configTrafficConfig (optional) The traffic config defining how invocations to the serving endpoint should be routed.

Returns:

Long-running operation waiter for ServingEndpointDetailed. See :method:wait_get_serving_endpoint_not_updating for more details.

update_config_and_wait(name: str [, auto_capture_config: Optional[AutoCaptureConfigInput], served_entities: Optional[List[ServedEntityInput]], served_models: Optional[List[ServedModelInput]], traffic_config: Optional[TrafficConfig], timeout: datetime.timedelta = 0:20:00]) ServingEndpointDetailed
update_permissions(serving_endpoint_id: str [, access_control_list: Optional[List[ServingEndpointAccessControlRequest]]]) ServingEndpointPermissions

Update serving endpoint permissions.

Updates the permissions on a serving endpoint. Serving endpoints can inherit permissions from their root object.

Parameters:
Returns:

ServingEndpointPermissions

wait_get_serving_endpoint_not_updating(name: str, timeout: datetime.timedelta = 0:20:00, callback: Optional[Callable[[ServingEndpointDetailed], None]]) ServingEndpointDetailed