w.instance_pools: Instance Pools¶
- class databricks.sdk.service.compute.InstancePoolsAPI¶
Instance Pools API are used to create, edit, delete and list instance pools by using ready-to-use cloud instances which reduces a cluster start and auto-scaling times.
Databricks pools reduce cluster start and auto-scaling times by maintaining a set of idle, ready-to-use instances. When a cluster is attached to a pool, cluster nodes are created using the pool’s idle instances. If the pool has no idle instances, the pool expands by allocating a new instance from the instance provider in order to accommodate the cluster’s request. When a cluster releases an instance, it returns to the pool and is free for another cluster to use. Only clusters attached to a pool can use that pool’s idle instances.
You can specify a different pool for the driver node and worker nodes, or use the same pool for both.
Databricks does not charge DBUs while instances are idle in the pool. Instance provider billing does apply. See pricing.
- create(instance_pool_name: str, node_type_id: str [, aws_attributes: Optional[InstancePoolAwsAttributes], azure_attributes: Optional[InstancePoolAzureAttributes], custom_tags: Optional[Dict[str, str]], disk_spec: Optional[DiskSpec], enable_elastic_disk: Optional[bool], gcp_attributes: Optional[InstancePoolGcpAttributes], idle_instance_autotermination_minutes: Optional[int], max_capacity: Optional[int], min_idle_instances: Optional[int], node_type_flexibility: Optional[NodeTypeFlexibility], preloaded_docker_images: Optional[List[DockerImage]], preloaded_spark_versions: Optional[List[str]], remote_disk_throughput: Optional[int], total_initial_remote_disk_size: Optional[int]]) CreateInstancePoolResponse¶
Usage:
import time from databricks.sdk import WorkspaceClient w = WorkspaceClient() smallest = w.clusters.select_node_type(local_disk=True) created = w.instance_pools.create(instance_pool_name=f"sdk-{time.time_ns()}", node_type_id=smallest) # cleanup w.instance_pools.delete(instance_pool_id=created.instance_pool_id)
Creates a new instance pool using idle and ready-to-use cloud instances.
- Parameters:
instance_pool_name – str Pool name requested by the user. Pool name must be unique. Length must be between 1 and 100 characters.
node_type_id – str This field encodes, through a single value, the resources available to each of the Spark nodes in this cluster. For example, the Spark nodes can be provisioned and optimized for memory or compute intensive workloads. A list of available node types can be retrieved by using the :method:clusters/listNodeTypes API call.
aws_attributes –
InstancePoolAwsAttributes(optional) Attributes related to instance pools running on Amazon Web Services. If not specified at pool creation, a set of default values will be used.azure_attributes –
InstancePoolAzureAttributes(optional) Attributes related to instance pools running on Azure. If not specified at pool creation, a set of default values will be used.custom_tags –
Dict[str,str] (optional) Additional tags for pool resources. Databricks will tag all pool resources (e.g., AWS instances and EBS volumes) with these tags in addition to default_tags. Notes:
Currently, Databricks allows at most 45 custom tags
disk_spec –
DiskSpec(optional) Defines the specification of the disks that will be attached to all spark containers.enable_elastic_disk – bool (optional) Autoscaling Local Storage: when enabled, this instances in this pool will dynamically acquire additional disk space when its Spark workers are running low on disk space. In AWS, this feature requires specific AWS permissions to function correctly - refer to the User Guide for more details.
gcp_attributes –
InstancePoolGcpAttributes(optional) Attributes related to instance pools running on Google Cloud Platform. If not specified at pool creation, a set of default values will be used.idle_instance_autotermination_minutes – int (optional) Automatically terminates the extra instances in the pool cache after they are inactive for this time in minutes if min_idle_instances requirement is already met. If not set, the extra pool instances will be automatically terminated after a default timeout. If specified, the threshold must be between 0 and 10000 minutes. Users can also set this value to 0 to instantly remove idle instances from the cache if min cache size could still hold.
max_capacity – int (optional) Maximum number of outstanding instances to keep in the pool, including both instances used by clusters and idle instances. Clusters that require further instance provisioning will fail during upsize requests.
min_idle_instances – int (optional) Minimum number of idle instances to keep in the instance pool
node_type_flexibility –
NodeTypeFlexibility(optional) Flexible node type configuration for the pool.preloaded_docker_images – List[
DockerImage] (optional) Custom Docker Image BYOCpreloaded_spark_versions – List[str] (optional) A list containing at most one preloaded Spark image version for the pool. Pool-backed clusters started with the preloaded Spark version will start faster. A list of available Spark versions can be retrieved by using the :method:clusters/sparkVersions API call.
remote_disk_throughput – int (optional) If set, what the configurable throughput (in Mb/s) for the remote disk is. Currently only supported for GCP HYPERDISK_BALANCED types.
total_initial_remote_disk_size – int (optional) If set, what the total initial volume size (in GB) of the remote disks should be. Currently only supported for GCP HYPERDISK_BALANCED types.
- Returns:
- delete(instance_pool_id: str)¶
Deletes the instance pool permanently. The idle instances in the pool are terminated asynchronously.
- Parameters:
instance_pool_id – str The instance pool to be terminated.
- edit(instance_pool_id: str, instance_pool_name: str, node_type_id: str [, custom_tags: Optional[Dict[str, str]], idle_instance_autotermination_minutes: Optional[int], max_capacity: Optional[int], min_idle_instances: Optional[int], remote_disk_throughput: Optional[int], total_initial_remote_disk_size: Optional[int]])¶
Usage:
import time from databricks.sdk import WorkspaceClient w = WorkspaceClient() smallest = w.clusters.select_node_type(local_disk=True) created = w.instance_pools.create(instance_pool_name=f"sdk-{time.time_ns()}", node_type_id=smallest) w.instance_pools.edit( instance_pool_id=created.instance_pool_id, instance_pool_name=f"sdk-{time.time_ns()}", node_type_id=smallest, ) # cleanup w.instance_pools.delete(instance_pool_id=created.instance_pool_id)
Modifies the configuration of an existing instance pool.
- Parameters:
instance_pool_id – str Instance pool ID
instance_pool_name – str Pool name requested by the user. Pool name must be unique. Length must be between 1 and 100 characters.
node_type_id – str This field encodes, through a single value, the resources available to each of the Spark nodes in this cluster. For example, the Spark nodes can be provisioned and optimized for memory or compute intensive workloads. A list of available node types can be retrieved by using the :method:clusters/listNodeTypes API call.
custom_tags –
Dict[str,str] (optional) Additional tags for pool resources. Databricks will tag all pool resources (e.g., AWS instances and EBS volumes) with these tags in addition to default_tags. Notes:
Currently, Databricks allows at most 45 custom tags
idle_instance_autotermination_minutes – int (optional) Automatically terminates the extra instances in the pool cache after they are inactive for this time in minutes if min_idle_instances requirement is already met. If not set, the extra pool instances will be automatically terminated after a default timeout. If specified, the threshold must be between 0 and 10000 minutes. Users can also set this value to 0 to instantly remove idle instances from the cache if min cache size could still hold.
max_capacity – int (optional) Maximum number of outstanding instances to keep in the pool, including both instances used by clusters and idle instances. Clusters that require further instance provisioning will fail during upsize requests.
min_idle_instances – int (optional) Minimum number of idle instances to keep in the instance pool
remote_disk_throughput – int (optional) If set, what the configurable throughput (in Mb/s) for the remote disk is. Currently only supported for GCP HYPERDISK_BALANCED types.
total_initial_remote_disk_size – int (optional) If set, what the total initial volume size (in GB) of the remote disks should be. Currently only supported for GCP HYPERDISK_BALANCED types.
- get(instance_pool_id: str) GetInstancePool¶
Usage:
import time from databricks.sdk import WorkspaceClient w = WorkspaceClient() smallest = w.clusters.select_node_type(local_disk=True) created = w.instance_pools.create(instance_pool_name=f"sdk-{time.time_ns()}", node_type_id=smallest) by_id = w.instance_pools.get(instance_pool_id=created.instance_pool_id) # cleanup w.instance_pools.delete(instance_pool_id=created.instance_pool_id)
Retrieve the information for an instance pool based on its identifier.
- Parameters:
instance_pool_id – str The canonical unique identifier for the instance pool.
- Returns:
- get_permission_levels(instance_pool_id: str) GetInstancePoolPermissionLevelsResponse¶
Gets the permission levels that a user can have on an object.
- Parameters:
instance_pool_id – str The instance pool for which to get or manage permissions.
- Returns:
- get_permissions(instance_pool_id: str) InstancePoolPermissions¶
Gets the permissions of an instance pool. Instance pools can inherit permissions from their root object.
- Parameters:
instance_pool_id – str The instance pool for which to get or manage permissions.
- Returns:
- list() Iterator[InstancePoolAndStats]¶
Usage:
from databricks.sdk import WorkspaceClient w = WorkspaceClient() all = w.instance_pools.list()
Gets a list of instance pools with their statistics.
- Returns:
Iterator over
InstancePoolAndStats
- set_permissions(instance_pool_id: str [, access_control_list: Optional[List[InstancePoolAccessControlRequest]]]) InstancePoolPermissions¶
Sets permissions on an object, replacing existing permissions if they exist. Deletes all direct permissions if none are specified. Objects can inherit permissions from their root object.
- Parameters:
instance_pool_id – str The instance pool for which to get or manage permissions.
access_control_list – List[
InstancePoolAccessControlRequest] (optional)
- Returns:
- update_permissions(instance_pool_id: str [, access_control_list: Optional[List[InstancePoolAccessControlRequest]]]) InstancePoolPermissions¶
Updates the permissions on an instance pool. Instance pools can inherit permissions from their root object.
- Parameters:
instance_pool_id – str The instance pool for which to get or manage permissions.
access_control_list – List[
InstancePoolAccessControlRequest] (optional)
- Returns: