w.instance_pools: Instance Pools

class databricks.sdk.service.compute.InstancePoolsAPI

Instance Pools API are used to create, edit, delete and list instance pools by using ready-to-use cloud instances which reduces a cluster start and auto-scaling times.

Databricks pools reduce cluster start and auto-scaling times by maintaining a set of idle, ready-to-use instances. When a cluster is attached to a pool, cluster nodes are created using the pool’s idle instances. If the pool has no idle instances, the pool expands by allocating a new instance from the instance provider in order to accommodate the cluster’s request. When a cluster releases an instance, it returns to the pool and is free for another cluster to use. Only clusters attached to a pool can use that pool’s idle instances.

You can specify a different pool for the driver node and worker nodes, or use the same pool for both.

Databricks does not charge DBUs while instances are idle in the pool. Instance provider billing does apply. See pricing.

create(instance_pool_name: str, node_type_id: str [, aws_attributes: Optional[InstancePoolAwsAttributes], azure_attributes: Optional[InstancePoolAzureAttributes], custom_tags: Optional[Dict[str, str]], disk_spec: Optional[DiskSpec], enable_elastic_disk: Optional[bool], gcp_attributes: Optional[InstancePoolGcpAttributes], idle_instance_autotermination_minutes: Optional[int], max_capacity: Optional[int], min_idle_instances: Optional[int], preloaded_docker_images: Optional[List[DockerImage]], preloaded_spark_versions: Optional[List[str]]]) CreateInstancePoolResponse

Usage:

import time

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

smallest = w.clusters.select_node_type(local_disk=True)

created = w.instance_pools.create(instance_pool_name=f'sdk-{time.time_ns()}', node_type_id=smallest)

# cleanup
w.instance_pools.delete(instance_pool_id=created.instance_pool_id)

Create a new instance pool.

Creates a new instance pool using idle and ready-to-use cloud instances.

Parameters:
  • instance_pool_name – str Pool name requested by the user. Pool name must be unique. Length must be between 1 and 100 characters.

  • node_type_id – str This field encodes, through a single value, the resources available to each of the Spark nodes in this cluster. For example, the Spark nodes can be provisioned and optimized for memory or compute intensive workloads. A list of available node types can be retrieved by using the :method:clusters/listNodeTypes API call.

  • aws_attributesInstancePoolAwsAttributes (optional) Attributes related to instance pools running on Amazon Web Services. If not specified at pool creation, a set of default values will be used.

  • azure_attributesInstancePoolAzureAttributes (optional) Attributes related to instance pools running on Azure. If not specified at pool creation, a set of default values will be used.

  • custom_tags

    Dict[str,str] (optional) Additional tags for pool resources. Databricks will tag all pool resources (e.g., AWS instances and EBS volumes) with these tags in addition to default_tags. Notes:

    • Currently, Databricks allows at most 45 custom tags

  • disk_specDiskSpec (optional) Defines the specification of the disks that will be attached to all spark containers.

  • enable_elastic_disk – bool (optional) Autoscaling Local Storage: when enabled, this instances in this pool will dynamically acquire additional disk space when its Spark workers are running low on disk space. In AWS, this feature requires specific AWS permissions to function correctly - refer to the User Guide for more details.

  • gcp_attributesInstancePoolGcpAttributes (optional) Attributes related to instance pools running on Google Cloud Platform. If not specified at pool creation, a set of default values will be used.

  • idle_instance_autotermination_minutes – int (optional) Automatically terminates the extra instances in the pool cache after they are inactive for this time in minutes if min_idle_instances requirement is already met. If not set, the extra pool instances will be automatically terminated after a default timeout. If specified, the threshold must be between 0 and 10000 minutes. Users can also set this value to 0 to instantly remove idle instances from the cache if min cache size could still hold.

  • max_capacity – int (optional) Maximum number of outstanding instances to keep in the pool, including both instances used by clusters and idle instances. Clusters that require further instance provisioning will fail during upsize requests.

  • min_idle_instances – int (optional) Minimum number of idle instances to keep in the instance pool

  • preloaded_docker_images – List[DockerImage] (optional) Custom Docker Image BYOC

  • preloaded_spark_versions – List[str] (optional) A list containing at most one preloaded Spark image version for the pool. Pool-backed clusters started with the preloaded Spark version will start faster. A list of available Spark versions can be retrieved by using the :method:clusters/sparkVersions API call.

Returns:

CreateInstancePoolResponse

delete(instance_pool_id: str)

Delete an instance pool.

Deletes the instance pool permanently. The idle instances in the pool are terminated asynchronously.

Parameters:

instance_pool_id – str The instance pool to be terminated.

edit(instance_pool_id: str, instance_pool_name: str, node_type_id: str [, custom_tags: Optional[Dict[str, str]], idle_instance_autotermination_minutes: Optional[int], max_capacity: Optional[int], min_idle_instances: Optional[int]])

Usage:

import time

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

smallest = w.clusters.select_node_type(local_disk=True)

created = w.instance_pools.create(instance_pool_name=f'sdk-{time.time_ns()}', node_type_id=smallest)

w.instance_pools.edit(instance_pool_id=created.instance_pool_id,
                      instance_pool_name=f'sdk-{time.time_ns()}',
                      node_type_id=smallest)

# cleanup
w.instance_pools.delete(instance_pool_id=created.instance_pool_id)

Edit an existing instance pool.

Modifies the configuration of an existing instance pool.

Parameters:
  • instance_pool_id – str Instance pool ID

  • instance_pool_name – str Pool name requested by the user. Pool name must be unique. Length must be between 1 and 100 characters.

  • node_type_id – str This field encodes, through a single value, the resources available to each of the Spark nodes in this cluster. For example, the Spark nodes can be provisioned and optimized for memory or compute intensive workloads. A list of available node types can be retrieved by using the :method:clusters/listNodeTypes API call.

  • custom_tags

    Dict[str,str] (optional) Additional tags for pool resources. Databricks will tag all pool resources (e.g., AWS instances and EBS volumes) with these tags in addition to default_tags. Notes:

    • Currently, Databricks allows at most 45 custom tags

  • idle_instance_autotermination_minutes – int (optional) Automatically terminates the extra instances in the pool cache after they are inactive for this time in minutes if min_idle_instances requirement is already met. If not set, the extra pool instances will be automatically terminated after a default timeout. If specified, the threshold must be between 0 and 10000 minutes. Users can also set this value to 0 to instantly remove idle instances from the cache if min cache size could still hold.

  • max_capacity – int (optional) Maximum number of outstanding instances to keep in the pool, including both instances used by clusters and idle instances. Clusters that require further instance provisioning will fail during upsize requests.

  • min_idle_instances – int (optional) Minimum number of idle instances to keep in the instance pool

get(instance_pool_id: str) GetInstancePool

Usage:

import time

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

smallest = w.clusters.select_node_type(local_disk=True)

created = w.instance_pools.create(instance_pool_name=f'sdk-{time.time_ns()}', node_type_id=smallest)

by_id = w.instance_pools.get(instance_pool_id=created.instance_pool_id)

# cleanup
w.instance_pools.delete(instance_pool_id=created.instance_pool_id)

Get instance pool information.

Retrieve the information for an instance pool based on its identifier.

Parameters:

instance_pool_id – str The canonical unique identifier for the instance pool.

Returns:

GetInstancePool

get_permission_levels(instance_pool_id: str) GetInstancePoolPermissionLevelsResponse

Get instance pool permission levels.

Gets the permission levels that a user can have on an object.

Parameters:

instance_pool_id – str The instance pool for which to get or manage permissions.

Returns:

GetInstancePoolPermissionLevelsResponse

get_permissions(instance_pool_id: str) InstancePoolPermissions

Get instance pool permissions.

Gets the permissions of an instance pool. Instance pools can inherit permissions from their root object.

Parameters:

instance_pool_id – str The instance pool for which to get or manage permissions.

Returns:

InstancePoolPermissions

list() Iterator[InstancePoolAndStats]

Usage:

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

all = w.instance_pools.list()

List instance pool info.

Gets a list of instance pools with their statistics.

Returns:

Iterator over InstancePoolAndStats

set_permissions(instance_pool_id: str [, access_control_list: Optional[List[InstancePoolAccessControlRequest]]]) InstancePoolPermissions

Set instance pool permissions.

Sets permissions on an instance pool. Instance pools can inherit permissions from their root object.

Parameters:
Returns:

InstancePoolPermissions

update_permissions(instance_pool_id: str [, access_control_list: Optional[List[InstancePoolAccessControlRequest]]]) InstancePoolPermissions

Update instance pool permissions.

Updates the permissions on an instance pool. Instance pools can inherit permissions from their root object.

Parameters:
Returns:

InstancePoolPermissions