``w.serving_endpoints``: Serving endpoints
==========================================
.. currentmodule:: databricks.sdk.service.serving

.. py:class:: ServingEndpointsExt

    The Serving Endpoints API allows you to create, update, and delete model serving endpoints.

    You can use a serving endpoint to serve models from the Databricks Model Registry or from Unity Catalog.
    Endpoints expose the underlying models as scalable REST API endpoints using serverless compute. This means
    the endpoints and associated compute resources are fully managed by Databricks and will not appear in your
    cloud account. A serving endpoint can consist of one or more MLflow models from the Databricks Model
    Registry, called served entities. A serving endpoint can have at most ten served entities. You can
    configure traffic settings to define how requests should be routed to your served entities behind an
    endpoint. Additionally, you can configure the scale of resources that should be applied to each served
    entity.

    .. py:method:: build_logs(name: str, served_model_name: str) -> BuildLogsResponse

        Retrieves the build logs associated with the provided served model.

        :param name: str
          The name of the serving endpoint that the served model belongs to. This field is required.
        :param served_model_name: str
          The name of the served model that build logs will be retrieved for. This field is required.

        :returns: :class:`BuildLogsResponse`
        

    .. py:method:: create(name: str [, ai_gateway: Optional[AiGatewayConfig], budget_policy_id: Optional[str], config: Optional[EndpointCoreConfigInput], description: Optional[str], email_notifications: Optional[EmailNotifications], rate_limits: Optional[List[RateLimit]], route_optimized: Optional[bool], tags: Optional[List[EndpointTag]]]) -> Wait[ServingEndpointDetailed]

        Create a new serving endpoint.

        :param name: str
          The name of the serving endpoint. This field is required and must be unique across a Databricks
          workspace. An endpoint name can consist of alphanumeric characters, dashes, and underscores.
        :param ai_gateway: :class:`AiGatewayConfig` (optional)
          The AI Gateway configuration for the serving endpoint. NOTE: External model, provisioned throughput,
          and pay-per-token endpoints are fully supported; agent endpoints currently only support inference
          tables.
        :param budget_policy_id: str (optional)
          The budget policy to be applied to the serving endpoint.
        :param config: :class:`EndpointCoreConfigInput` (optional)
          The core config of the serving endpoint.
        :param description: str (optional)
        :param email_notifications: :class:`EmailNotifications` (optional)
          Email notification settings.
        :param rate_limits: List[:class:`RateLimit`] (optional)
          Rate limits to be applied to the serving endpoint. NOTE: this field is deprecated, please use AI
          Gateway to manage rate limits.
        :param route_optimized: bool (optional)
          Enable route optimization for the serving endpoint.
        :param tags: List[:class:`EndpointTag`] (optional)
          Tags to be attached to the serving endpoint and automatically propagated to billing logs.

        :returns:
          Long-running operation waiter for :class:`ServingEndpointDetailed`.
          See :method:wait_get_serving_endpoint_not_updating for more details.
        

    .. py:method:: create_and_wait(name: str [, ai_gateway: Optional[AiGatewayConfig], budget_policy_id: Optional[str], config: Optional[EndpointCoreConfigInput], description: Optional[str], email_notifications: Optional[EmailNotifications], rate_limits: Optional[List[RateLimit]], route_optimized: Optional[bool], tags: Optional[List[EndpointTag]], timeout: datetime.timedelta = 0:20:00]) -> ServingEndpointDetailed


    .. py:method:: create_provisioned_throughput_endpoint(name: str, config: PtEndpointCoreConfig [, ai_gateway: Optional[AiGatewayConfig], budget_policy_id: Optional[str], email_notifications: Optional[EmailNotifications], tags: Optional[List[EndpointTag]]]) -> Wait[ServingEndpointDetailed]

        Create a new PT serving endpoint.

        :param name: str
          The name of the serving endpoint. This field is required and must be unique across a Databricks
          workspace. An endpoint name can consist of alphanumeric characters, dashes, and underscores.
        :param config: :class:`PtEndpointCoreConfig`
          The core config of the serving endpoint.
        :param ai_gateway: :class:`AiGatewayConfig` (optional)
          The AI Gateway configuration for the serving endpoint.
        :param budget_policy_id: str (optional)
          The budget policy associated with the endpoint.
        :param email_notifications: :class:`EmailNotifications` (optional)
          Email notification settings.
        :param tags: List[:class:`EndpointTag`] (optional)
          Tags to be attached to the serving endpoint and automatically propagated to billing logs.

        :returns:
          Long-running operation waiter for :class:`ServingEndpointDetailed`.
          See :method:wait_get_serving_endpoint_not_updating for more details.
        

    .. py:method:: create_provisioned_throughput_endpoint_and_wait(name: str, config: PtEndpointCoreConfig [, ai_gateway: Optional[AiGatewayConfig], budget_policy_id: Optional[str], email_notifications: Optional[EmailNotifications], tags: Optional[List[EndpointTag]], timeout: datetime.timedelta = 0:20:00]) -> ServingEndpointDetailed


    .. py:method:: delete(name: str)

        Delete a serving endpoint.

        :param name: str


    .. py:method:: export_metrics(name: str) -> ExportMetricsResponse

        Retrieves the metrics associated with the provided serving endpoint in either Prometheus or
        OpenMetrics exposition format.

        :param name: str
          The name of the serving endpoint to retrieve metrics for. This field is required.

        :returns: :class:`ExportMetricsResponse`
        

    .. py:method:: get(name: str) -> ServingEndpointDetailed

        Retrieves the details for a single serving endpoint.

        :param name: str
          The name of the serving endpoint. This field is required.

        :returns: :class:`ServingEndpointDetailed`
        

    .. py:method:: get_langchain_chat_open_ai_client(model)

        Create a LangChain ChatOpenAI client configured for Databricks Model Serving.

        .. deprecated::
            This method is deprecated. Please install the `databricks-langchain` package
            and use `from databricks_langchain import ChatDatabricks` instead.
            See https://api-docs.databricks.com/python/databricks-ai-bridge/latest/databricks_langchain.html for more information.
        

    .. py:method:: get_open_ai_client()

        Create an OpenAI client configured for Databricks Model Serving.

        .. deprecated::
            This method is deprecated. Please install the `databricks-openai` package
            and use `from databricks_openai import DatabricksOpenAI` instead.
            See https://api-docs.databricks.com/python/databricks-ai-bridge/latest/databricks_openai.html for more information.

        Returns an OpenAI client instance that is pre-configured to send requests to
        Databricks Model Serving endpoints. The client uses Databricks authentication
        to query endpoints within the workspace associated with the current WorkspaceClient
        instance.

        Args:
            **kwargs: Additional parameters to pass to the OpenAI client constructor.
                Common parameters include:
                - timeout (float): Request timeout in seconds (e.g., 30.0)
                - max_retries (int): Maximum number of retries for failed requests (e.g., 3)
                - default_headers (dict): Additional headers to include with requests
                - default_query (dict): Additional query parameters to include with requests

                Any parameter accepted by the OpenAI client constructor can be passed here,
                except for the following parameters which are reserved for Databricks integration:
                base_url, api_key, http_client

        Returns:
            OpenAI: An OpenAI client instance configured for Databricks Model Serving.

        Raises:
            ImportError: If the OpenAI library is not installed.
            ValueError: If any reserved Databricks parameters are provided in kwargs.

        Example:
            >>> client = workspace_client.serving_endpoints.get_open_ai_client()
            >>> # With custom timeout and retries
            >>> client = workspace_client.serving_endpoints.get_open_ai_client(
            ...     timeout=30.0,
            ...     max_retries=5
            ... )
        

    .. py:method:: get_open_api(name: str) -> GetOpenApiResponse

        Get the query schema of the serving endpoint in OpenAPI format. The schema contains information for
        the supported paths, input and output format and datatypes.

        :param name: str
          The name of the serving endpoint that the served model belongs to. This field is required.

        :returns: :class:`GetOpenApiResponse`
        

    .. py:method:: get_permission_levels(serving_endpoint_id: str) -> GetServingEndpointPermissionLevelsResponse

        Gets the permission levels that a user can have on an object.

        :param serving_endpoint_id: str
          The serving endpoint for which to get or manage permissions.

        :returns: :class:`GetServingEndpointPermissionLevelsResponse`
        

    .. py:method:: get_permissions(serving_endpoint_id: str) -> ServingEndpointPermissions

        Gets the permissions of a serving endpoint. Serving endpoints can inherit permissions from their root
        object.

        :param serving_endpoint_id: str
          The serving endpoint for which to get or manage permissions.

        :returns: :class:`ServingEndpointPermissions`
        

    .. py:method:: http_request(conn: str, method: ExternalFunctionRequestHttpMethod, path: str [, headers: typing.Dict[str, str], json: typing.Dict[str, str], params: typing.Dict[str, str]]) -> Response

        Make external services call using the credentials stored in UC Connection.
        **NOTE:** Experimental: This API may change or be removed in a future release without warning.
        :param conn: str
          The connection name to use. This is required to identify the external connection.
        :param method: :class:`ExternalFunctionRequestHttpMethod`
          The HTTP method to use (e.g., 'GET', 'POST'). This is required.
        :param path: str
          The relative path for the API endpoint. This is required.
        :param headers: Dict[str,str] (optional)
          Additional headers for the request. If not provided, only auth headers from connections would be
          passed.
        :param json: Dict[str,str] (optional)
          JSON payload for the request.
        :param params: Dict[str,str] (optional)
          Query parameters for the request.
        :returns: :class:`Response`
        

    .. py:method:: list() -> Iterator[ServingEndpoint]

        Get all serving endpoints.


        :returns: Iterator over :class:`ServingEndpoint`
        

    .. py:method:: logs(name: str, served_model_name: str) -> ServerLogsResponse

        Retrieves the service logs associated with the provided served model.

        :param name: str
          The name of the serving endpoint that the served model belongs to. This field is required.
        :param served_model_name: str
          The name of the served model that logs will be retrieved for. This field is required.

        :returns: :class:`ServerLogsResponse`
        

    .. py:method:: patch(name: str [, add_tags: Optional[List[EndpointTag]], delete_tags: Optional[List[str]]]) -> EndpointTags

        Used to batch add and delete tags from a serving endpoint with a single API call.

        :param name: str
          The name of the serving endpoint who's tags to patch. This field is required.
        :param add_tags: List[:class:`EndpointTag`] (optional)
          List of endpoint tags to add
        :param delete_tags: List[str] (optional)
          List of tag keys to delete

        :returns: :class:`EndpointTags`
        

    .. py:method:: put(name: str [, rate_limits: Optional[List[RateLimit]]]) -> PutResponse

        Deprecated: Please use AI Gateway to manage rate limits instead.

        :param name: str
          The name of the serving endpoint whose rate limits are being updated. This field is required.
        :param rate_limits: List[:class:`RateLimit`] (optional)
          The list of endpoint rate limits.

        :returns: :class:`PutResponse`
        

    .. py:method:: put_ai_gateway(name: str [, fallback_config: Optional[FallbackConfig], guardrails: Optional[AiGatewayGuardrails], inference_table_config: Optional[AiGatewayInferenceTableConfig], rate_limits: Optional[List[AiGatewayRateLimit]], usage_tracking_config: Optional[AiGatewayUsageTrackingConfig]]) -> PutAiGatewayResponse

        Used to update the AI Gateway of a serving endpoint. NOTE: External model, provisioned throughput, and
        pay-per-token endpoints are fully supported; agent endpoints currently only support inference tables.

        :param name: str
          The name of the serving endpoint whose AI Gateway is being updated. This field is required.
        :param fallback_config: :class:`FallbackConfig` (optional)
          Configuration for traffic fallback which auto fallbacks to other served entities if the request to a
          served entity fails with certain error codes, to increase availability.
        :param guardrails: :class:`AiGatewayGuardrails` (optional)
          Configuration for AI Guardrails to prevent unwanted data and unsafe data in requests and responses.
        :param inference_table_config: :class:`AiGatewayInferenceTableConfig` (optional)
          Configuration for payload logging using inference tables. Use these tables to monitor and audit data
          being sent to and received from model APIs and to improve model quality.
        :param rate_limits: List[:class:`AiGatewayRateLimit`] (optional)
          Configuration for rate limits which can be set to limit endpoint traffic.
        :param usage_tracking_config: :class:`AiGatewayUsageTrackingConfig` (optional)
          Configuration to enable usage tracking using system tables. These tables allow you to monitor
          operational usage on endpoints and their associated costs.

        :returns: :class:`PutAiGatewayResponse`
        

    .. py:method:: query(name: str [, client_request_id: Optional[str], dataframe_records: Optional[List[Any]], dataframe_split: Optional[DataframeSplitInput], extra_params: Optional[Dict[str, str]], input: Optional[Any], inputs: Optional[Any], instances: Optional[List[Any]], max_tokens: Optional[int], messages: Optional[List[ChatMessage]], n: Optional[int], prompt: Optional[Any], stop: Optional[List[str]], stream: Optional[bool], temperature: Optional[float], usage_context: Optional[Dict[str, str]]]) -> QueryEndpointResponse

        Query a serving endpoint

        :param name: str
          The name of the serving endpoint. This field is required and is provided via the path parameter.
        :param client_request_id: str (optional)
          Optional user-provided request identifier that will be recorded in the inference table and the usage
          tracking table.
        :param dataframe_records: List[Any] (optional)
          Pandas Dataframe input in the records orientation.
        :param dataframe_split: :class:`DataframeSplitInput` (optional)
          Pandas Dataframe input in the split orientation.
        :param extra_params: Dict[str,str] (optional)
          The extra parameters field used ONLY for __completions, chat,__ and __embeddings external &
          foundation model__ serving endpoints. This is a map of strings and should only be used with other
          external/foundation model query fields.
        :param input: Any (optional)
          The input string (or array of strings) field used ONLY for __embeddings external & foundation
          model__ serving endpoints and is the only field (along with extra_params if needed) used by
          embeddings queries.
        :param inputs: Any (optional)
          Tensor-based input in columnar format.
        :param instances: List[Any] (optional)
          Tensor-based input in row format.
        :param max_tokens: int (optional)
          The max tokens field used ONLY for __completions__ and __chat external & foundation model__ serving
          endpoints. This is an integer and should only be used with other chat/completions query fields.
        :param messages: List[:class:`ChatMessage`] (optional)
          The messages field used ONLY for __chat external & foundation model__ serving endpoints. This is an
          array of ChatMessage objects and should only be used with other chat query fields.
        :param n: int (optional)
          The n (number of candidates) field used ONLY for __completions__ and __chat external & foundation
          model__ serving endpoints. This is an integer between 1 and 5 with a default of 1 and should only be
          used with other chat/completions query fields.
        :param prompt: Any (optional)
          The prompt string (or array of strings) field used ONLY for __completions external & foundation
          model__ serving endpoints and should only be used with other completions query fields.
        :param stop: List[str] (optional)
          The stop sequences field used ONLY for __completions__ and __chat external & foundation model__
          serving endpoints. This is a list of strings and should only be used with other chat/completions
          query fields.
        :param stream: bool (optional)
          The stream field used ONLY for __completions__ and __chat external & foundation model__ serving
          endpoints. This is a boolean defaulting to false and should only be used with other chat/completions
          query fields.
        :param temperature: float (optional)
          The temperature field used ONLY for __completions__ and __chat external & foundation model__ serving
          endpoints. This is a float between 0.0 and 2.0 with a default of 1.0 and should only be used with
          other chat/completions query fields.
        :param usage_context: Dict[str,str] (optional)
          Optional user-provided context that will be recorded in the usage tracking table.

        :returns: :class:`QueryEndpointResponse`
        

    .. py:method:: set_permissions(serving_endpoint_id: str [, access_control_list: Optional[List[ServingEndpointAccessControlRequest]]]) -> ServingEndpointPermissions

        Sets permissions on an object, replacing existing permissions if they exist. Deletes all direct
        permissions if none are specified. Objects can inherit permissions from their root object.

        :param serving_endpoint_id: str
          The serving endpoint for which to get or manage permissions.
        :param access_control_list: List[:class:`ServingEndpointAccessControlRequest`] (optional)

        :returns: :class:`ServingEndpointPermissions`
        

    .. py:method:: update_config(name: str [, auto_capture_config: Optional[AutoCaptureConfigInput], served_entities: Optional[List[ServedEntityInput]], served_models: Optional[List[ServedModelInput]], traffic_config: Optional[TrafficConfig]]) -> Wait[ServingEndpointDetailed]

        Updates any combination of the serving endpoint's served entities, the compute configuration of those
        served entities, and the endpoint's traffic config. An endpoint that already has an update in progress
        can not be updated until the current update completes or fails.

        :param name: str
          The name of the serving endpoint to update. This field is required.
        :param auto_capture_config: :class:`AutoCaptureConfigInput` (optional)
          Configuration for legacy Inference Tables which automatically log requests and responses to Unity
          Catalog. Deprecated: please use AI Gateway inference tables instead. See
          https://docs.databricks.com/aws/en/ai-gateway/inference-tables.
        :param served_entities: List[:class:`ServedEntityInput`] (optional)
          The list of served entities under the serving endpoint config.
        :param served_models: List[:class:`ServedModelInput`] (optional)
          (Deprecated, use served_entities instead) The list of served models under the serving endpoint
          config.
        :param traffic_config: :class:`TrafficConfig` (optional)
          The traffic configuration associated with the serving endpoint config.

        :returns:
          Long-running operation waiter for :class:`ServingEndpointDetailed`.
          See :method:wait_get_serving_endpoint_not_updating for more details.
        

    .. py:method:: update_config_and_wait(name: str [, auto_capture_config: Optional[AutoCaptureConfigInput], served_entities: Optional[List[ServedEntityInput]], served_models: Optional[List[ServedModelInput]], traffic_config: Optional[TrafficConfig], timeout: datetime.timedelta = 0:20:00]) -> ServingEndpointDetailed


    .. py:method:: update_notifications(name: str [, email_notifications: Optional[EmailNotifications]]) -> UpdateInferenceEndpointNotificationsResponse

        Updates the email and webhook notification settings for an endpoint.

        :param name: str
          The name of the serving endpoint whose notifications are being updated. This field is required.
        :param email_notifications: :class:`EmailNotifications` (optional)
          The email notification settings to update. Specify email addresses to notify when endpoint state
          changes occur.

        :returns: :class:`UpdateInferenceEndpointNotificationsResponse`
        

    .. py:method:: update_permissions(serving_endpoint_id: str [, access_control_list: Optional[List[ServingEndpointAccessControlRequest]]]) -> ServingEndpointPermissions

        Updates the permissions on a serving endpoint. Serving endpoints can inherit permissions from their
        root object.

        :param serving_endpoint_id: str
          The serving endpoint for which to get or manage permissions.
        :param access_control_list: List[:class:`ServingEndpointAccessControlRequest`] (optional)

        :returns: :class:`ServingEndpointPermissions`
        

    .. py:method:: update_provisioned_throughput_endpoint_config(name: str, config: PtEndpointCoreConfig) -> Wait[ServingEndpointDetailed]

        Updates any combination of the pt endpoint's served entities, the compute configuration of those
        served entities, and the endpoint's traffic config. Updates are instantaneous and endpoint should be
        updated instantly

        :param name: str
          The name of the pt endpoint to update. This field is required.
        :param config: :class:`PtEndpointCoreConfig`

        :returns:
          Long-running operation waiter for :class:`ServingEndpointDetailed`.
          See :method:wait_get_serving_endpoint_not_updating for more details.
        

    .. py:method:: update_provisioned_throughput_endpoint_config_and_wait(name: str, config: PtEndpointCoreConfig, timeout: datetime.timedelta = 0:20:00) -> ServingEndpointDetailed


    .. py:method:: wait_get_serving_endpoint_not_updating(name: str, timeout: datetime.timedelta = 0:20:00, callback: Optional[Callable[[ServingEndpointDetailed], None]]) -> ServingEndpointDetailed