w.clusters: Clusters

class databricks.sdk.service.compute.ClustersExt

The Clusters API allows you to create, start, edit, list, terminate, and delete clusters.

Databricks maps cluster node instance types to compute units known as DBUs. See the instance type pricing page for a list of the supported instance types and their corresponding DBUs.

A Databricks cluster is a set of computation resources and configurations on which you run data engineering, data science, and data analytics workloads, such as production ETL pipelines, streaming analytics, ad-hoc analytics, and machine learning.

You run these workloads as a set of commands in a notebook or as an automated job. Databricks makes a distinction between all-purpose clusters and job clusters. You use all-purpose clusters to analyze data collaboratively using interactive notebooks. You use job clusters to run fast and robust automated jobs.

You can create an all-purpose cluster using the UI, CLI, or REST API. You can manually terminate and restart an all-purpose cluster. Multiple users can share such clusters to do collaborative interactive analysis.

IMPORTANT: Databricks retains cluster configuration information for up to 200 all-purpose clusters terminated in the last 30 days and up to 30 job clusters recently terminated by the job scheduler. To keep an all-purpose cluster configuration even after it has been terminated for more than 30 days, an administrator can pin a cluster to the cluster list.

change_owner(cluster_id: str, owner_username: str)

Usage:

import os
import time

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

latest = w.clusters.select_spark_version(latest=True, long_term_support=True)

cluster_name = f'sdk-{time.time_ns()}'

other_owner = w.users.create(user_name=f'sdk-{time.time_ns()}@example.com')

clstr = w.clusters.create(cluster_name=cluster_name,
                          spark_version=latest,
                          instance_pool_id=os.environ["TEST_INSTANCE_POOL_ID"],
                          autotermination_minutes=15,
                          num_workers=1).result()

w.clusters.change_owner(cluster_id=clstr.cluster_id, owner_username=other_owner.user_name)

# cleanup
w.users.delete(id=other_owner.id)
w.clusters.permanent_delete(cluster_id=clstr.cluster_id)

Change cluster owner.

Change the owner of the cluster. You must be an admin and the cluster must be terminated to perform this operation. The service principal application ID can be supplied as an argument to owner_username.

Parameters:
  • cluster_id – str <needs content added>

  • owner_username – str New owner of the cluster_id after this RPC.

create(spark_version: str [, apply_policy_default_values: Optional[bool], autoscale: Optional[AutoScale], autotermination_minutes: Optional[int], aws_attributes: Optional[AwsAttributes], azure_attributes: Optional[AzureAttributes], clone_from: Optional[CloneCluster], cluster_log_conf: Optional[ClusterLogConf], cluster_name: Optional[str], cluster_source: Optional[ClusterSource], custom_tags: Optional[Dict[str, str]], data_security_mode: Optional[DataSecurityMode], docker_image: Optional[DockerImage], driver_instance_pool_id: Optional[str], driver_node_type_id: Optional[str], enable_elastic_disk: Optional[bool], enable_local_disk_encryption: Optional[bool], gcp_attributes: Optional[GcpAttributes], init_scripts: Optional[List[InitScriptInfo]], instance_pool_id: Optional[str], node_type_id: Optional[str], num_workers: Optional[int], policy_id: Optional[str], runtime_engine: Optional[RuntimeEngine], single_user_name: Optional[str], spark_conf: Optional[Dict[str, str]], spark_env_vars: Optional[Dict[str, str]], ssh_public_keys: Optional[List[str]], workload_type: Optional[WorkloadType]]) Wait[ClusterDetails]

Usage:

import os
import time

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

latest = w.clusters.select_spark_version(latest=True, long_term_support=True)

cluster_name = f'sdk-{time.time_ns()}'

clstr = w.clusters.create(cluster_name=cluster_name,
                          spark_version=latest,
                          instance_pool_id=os.environ["TEST_INSTANCE_POOL_ID"],
                          autotermination_minutes=15,
                          num_workers=1).result()

# cleanup
w.clusters.permanent_delete(cluster_id=clstr.cluster_id)

Create new cluster.

Creates a new Spark cluster. This method will acquire new instances from the cloud provider if necessary. Note: Databricks may not be able to acquire some of the requested nodes, due to cloud provider limitations (account limits, spot price, etc.) or transient network issues.

If Databricks acquires at least 85% of the requested on-demand nodes, cluster creation will succeed. Otherwise the cluster will terminate with an informative error message.

Parameters:
  • spark_version – str The Spark version of the cluster, e.g. 3.3.x-scala2.11. A list of available Spark versions can be retrieved by using the :method:clusters/sparkVersions API call.

  • apply_policy_default_values – bool (optional)

  • autoscaleAutoScale (optional) Parameters needed in order to automatically scale clusters up and down based on load. Note: autoscaling works best with DB runtime versions 3.0 or later.

  • autotermination_minutes – int (optional) Automatically terminates the cluster after it is inactive for this time in minutes. If not set, this cluster will not be automatically terminated. If specified, the threshold must be between 10 and 10000 minutes. Users can also set this value to 0 to explicitly disable automatic termination.

  • aws_attributesAwsAttributes (optional) Attributes related to clusters running on Amazon Web Services. If not specified at cluster creation, a set of default values will be used.

  • azure_attributesAzureAttributes (optional) Attributes related to clusters running on Microsoft Azure. If not specified at cluster creation, a set of default values will be used.

  • clone_fromCloneCluster (optional) When specified, this clones libraries from a source cluster during the creation of a new cluster.

  • cluster_log_confClusterLogConf (optional) The configuration for delivering spark logs to a long-term storage destination. Two kinds of destinations (dbfs and s3) are supported. Only one destination can be specified for one cluster. If the conf is given, the logs will be delivered to the destination every 5 mins. The destination of driver logs is $destination/$clusterId/driver, while the destination of executor logs is $destination/$clusterId/executor.

  • cluster_name – str (optional) Cluster name requested by the user. This doesn’t have to be unique. If not specified at creation, the cluster name will be an empty string.

  • cluster_sourceClusterSource (optional) Determines whether the cluster was created by a user through the UI, created by the Databricks Jobs Scheduler, or through an API request. This is the same as cluster_creator, but read only.

  • custom_tags

    Dict[str,str] (optional) Additional tags for cluster resources. Databricks will tag all cluster resources (e.g., AWS instances and EBS volumes) with these tags in addition to default_tags. Notes:

    • Currently, Databricks allows at most 45 custom tags

    • Clusters can only reuse cloud resources if the resources’ tags are a subset of the cluster tags

  • data_security_mode

    DataSecurityMode (optional) Data security mode decides what data governance model to use when accessing data from a cluster.

    • NONE: No security isolation for multiple users sharing the cluster. Data governance features are

    not available in this mode. * SINGLE_USER: A secure cluster that can only be exclusively used by a single user specified in single_user_name. Most programming languages, cluster features and data governance features are available in this mode. * USER_ISOLATION: A secure cluster that can be shared by multiple users. Cluster users are fully isolated so that they cannot see each other’s data and credentials. Most data governance features are supported in this mode. But programming languages and cluster features might be limited. * LEGACY_TABLE_ACL: This mode is for users migrating from legacy Table ACL clusters. * LEGACY_PASSTHROUGH: This mode is for users migrating from legacy Passthrough on high concurrency clusters. * LEGACY_SINGLE_USER: This mode is for users migrating from legacy Passthrough on standard clusters.

  • docker_imageDockerImage (optional)

  • driver_instance_pool_id – str (optional) The optional ID of the instance pool for the driver of the cluster belongs. The pool cluster uses the instance pool with id (instance_pool_id) if the driver pool is not assigned.

  • driver_node_type_id – str (optional) The node type of the Spark driver. Note that this field is optional; if unset, the driver node type will be set as the same value as node_type_id defined above.

  • enable_elastic_disk – bool (optional) Autoscaling Local Storage: when enabled, this cluster will dynamically acquire additional disk space when its Spark workers are running low on disk space. This feature requires specific AWS permissions to function correctly - refer to the User Guide for more details.

  • enable_local_disk_encryption – bool (optional) Whether to enable LUKS on cluster VMs’ local disks

  • gcp_attributesGcpAttributes (optional) Attributes related to clusters running on Google Cloud Platform. If not specified at cluster creation, a set of default values will be used.

  • init_scripts – List[InitScriptInfo] (optional) The configuration for storing init scripts. Any number of destinations can be specified. The scripts are executed sequentially in the order provided. If cluster_log_conf is specified, init script logs are sent to <destination>/<cluster-ID>/init_scripts.

  • instance_pool_id – str (optional) The optional ID of the instance pool to which the cluster belongs.

  • node_type_id – str (optional) This field encodes, through a single value, the resources available to each of the Spark nodes in this cluster. For example, the Spark nodes can be provisioned and optimized for memory or compute intensive workloads. A list of available node types can be retrieved by using the :method:clusters/listNodeTypes API call.

  • num_workers

    int (optional) Number of worker nodes that this cluster should have. A cluster has one Spark Driver and num_workers Executors for a total of num_workers + 1 Spark nodes.

    Note: When reading the properties of a cluster, this field reflects the desired number of workers rather than the actual current number of workers. For instance, if a cluster is resized from 5 to 10 workers, this field will immediately be updated to reflect the target size of 10 workers, whereas the workers listed in spark_info will gradually increase from 5 to 10 as the new nodes are provisioned.

  • policy_id – str (optional) The ID of the cluster policy used to create the cluster if applicable.

  • runtime_engineRuntimeEngine (optional) Decides which runtime engine to be use, e.g. Standard vs. Photon. If unspecified, the runtime engine is inferred from spark_version.

  • single_user_name – str (optional) Single user name if data_security_mode is SINGLE_USER

  • spark_conf – Dict[str,str] (optional) An object containing a set of optional, user-specified Spark configuration key-value pairs. Users can also pass in a string of extra JVM options to the driver and the executors via spark.driver.extraJavaOptions and spark.executor.extraJavaOptions respectively.

  • spark_env_vars

    Dict[str,str] (optional) An object containing a set of optional, user-specified environment variable key-value pairs. Please note that key-value pair of the form (X,Y) will be exported as is (i.e., export X=’Y’) while launching the driver and workers.

    In order to specify an additional set of SPARK_DAEMON_JAVA_OPTS, we recommend appending them to $SPARK_DAEMON_JAVA_OPTS as shown in the example below. This ensures that all default databricks managed environmental variables are included as well.

    Example Spark environment variables: {“SPARK_WORKER_MEMORY”: “28000m”, “SPARK_LOCAL_DIRS”: “/local_disk0”} or {“SPARK_DAEMON_JAVA_OPTS”: “$SPARK_DAEMON_JAVA_OPTS -Dspark.shuffle.service.enabled=true”}

  • ssh_public_keys – List[str] (optional) SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name ubuntu on port 2200. Up to 10 keys can be specified.

  • workload_typeWorkloadType (optional)

Returns:

Long-running operation waiter for ClusterDetails. See :method:wait_get_cluster_running for more details.

create_and_wait(spark_version: str [, apply_policy_default_values: Optional[bool], autoscale: Optional[AutoScale], autotermination_minutes: Optional[int], aws_attributes: Optional[AwsAttributes], azure_attributes: Optional[AzureAttributes], clone_from: Optional[CloneCluster], cluster_log_conf: Optional[ClusterLogConf], cluster_name: Optional[str], cluster_source: Optional[ClusterSource], custom_tags: Optional[Dict[str, str]], data_security_mode: Optional[DataSecurityMode], docker_image: Optional[DockerImage], driver_instance_pool_id: Optional[str], driver_node_type_id: Optional[str], enable_elastic_disk: Optional[bool], enable_local_disk_encryption: Optional[bool], gcp_attributes: Optional[GcpAttributes], init_scripts: Optional[List[InitScriptInfo]], instance_pool_id: Optional[str], node_type_id: Optional[str], num_workers: Optional[int], policy_id: Optional[str], runtime_engine: Optional[RuntimeEngine], single_user_name: Optional[str], spark_conf: Optional[Dict[str, str]], spark_env_vars: Optional[Dict[str, str]], ssh_public_keys: Optional[List[str]], workload_type: Optional[WorkloadType], timeout: datetime.timedelta = 0:20:00]) ClusterDetails
delete(cluster_id: str) Wait[ClusterDetails]

Usage:

import os
import time

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

latest = w.clusters.select_spark_version(latest=True, long_term_support=True)

cluster_name = f'sdk-{time.time_ns()}'

clstr = w.clusters.create(cluster_name=cluster_name,
                          spark_version=latest,
                          instance_pool_id=os.environ["TEST_INSTANCE_POOL_ID"],
                          autotermination_minutes=15,
                          num_workers=1).result()

_ = w.clusters.delete(cluster_id=clstr.cluster_id).result()

# cleanup
w.clusters.permanent_delete(cluster_id=clstr.cluster_id)

Terminate cluster.

Terminates the Spark cluster with the specified ID. The cluster is removed asynchronously. Once the termination has completed, the cluster will be in a TERMINATED state. If the cluster is already in a TERMINATING or TERMINATED state, nothing will happen.

Parameters:

cluster_id – str The cluster to be terminated.

Returns:

Long-running operation waiter for ClusterDetails. See :method:wait_get_cluster_terminated for more details.

delete_and_wait(cluster_id: str, timeout: datetime.timedelta = 0:20:00) ClusterDetails
edit(cluster_id: str, spark_version: str [, apply_policy_default_values: Optional[bool], autoscale: Optional[AutoScale], autotermination_minutes: Optional[int], aws_attributes: Optional[AwsAttributes], azure_attributes: Optional[AzureAttributes], clone_from: Optional[CloneCluster], cluster_log_conf: Optional[ClusterLogConf], cluster_name: Optional[str], cluster_source: Optional[ClusterSource], custom_tags: Optional[Dict[str, str]], data_security_mode: Optional[DataSecurityMode], docker_image: Optional[DockerImage], driver_instance_pool_id: Optional[str], driver_node_type_id: Optional[str], enable_elastic_disk: Optional[bool], enable_local_disk_encryption: Optional[bool], gcp_attributes: Optional[GcpAttributes], init_scripts: Optional[List[InitScriptInfo]], instance_pool_id: Optional[str], node_type_id: Optional[str], num_workers: Optional[int], policy_id: Optional[str], runtime_engine: Optional[RuntimeEngine], single_user_name: Optional[str], spark_conf: Optional[Dict[str, str]], spark_env_vars: Optional[Dict[str, str]], ssh_public_keys: Optional[List[str]], workload_type: Optional[WorkloadType]]) Wait[ClusterDetails]

Usage:

import os
import time

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

cluster_name = f'sdk-{time.time_ns()}'

latest = w.clusters.select_spark_version(latest=True, long_term_support=True)

clstr = w.clusters.create(cluster_name=cluster_name,
                          spark_version=latest,
                          instance_pool_id=os.environ["TEST_INSTANCE_POOL_ID"],
                          autotermination_minutes=15,
                          num_workers=1).result()

_ = w.clusters.edit(cluster_id=clstr.cluster_id,
                    spark_version=latest,
                    cluster_name=cluster_name,
                    instance_pool_id=os.environ["TEST_INSTANCE_POOL_ID"],
                    autotermination_minutes=10,
                    num_workers=2).result()

# cleanup
w.clusters.permanent_delete(cluster_id=clstr.cluster_id)

Update cluster configuration.

Updates the configuration of a cluster to match the provided attributes and size. A cluster can be updated if it is in a RUNNING or TERMINATED state.

If a cluster is updated while in a RUNNING state, it will be restarted so that the new attributes can take effect.

If a cluster is updated while in a TERMINATED state, it will remain TERMINATED. The next time it is started using the clusters/start API, the new attributes will take effect. Any attempt to update a cluster in any other state will be rejected with an INVALID_STATE error code.

Clusters created by the Databricks Jobs service cannot be edited.

Parameters:
  • cluster_id – str ID of the cluser

  • spark_version – str The Spark version of the cluster, e.g. 3.3.x-scala2.11. A list of available Spark versions can be retrieved by using the :method:clusters/sparkVersions API call.

  • apply_policy_default_values – bool (optional)

  • autoscaleAutoScale (optional) Parameters needed in order to automatically scale clusters up and down based on load. Note: autoscaling works best with DB runtime versions 3.0 or later.

  • autotermination_minutes – int (optional) Automatically terminates the cluster after it is inactive for this time in minutes. If not set, this cluster will not be automatically terminated. If specified, the threshold must be between 10 and 10000 minutes. Users can also set this value to 0 to explicitly disable automatic termination.

  • aws_attributesAwsAttributes (optional) Attributes related to clusters running on Amazon Web Services. If not specified at cluster creation, a set of default values will be used.

  • azure_attributesAzureAttributes (optional) Attributes related to clusters running on Microsoft Azure. If not specified at cluster creation, a set of default values will be used.

  • clone_fromCloneCluster (optional) When specified, this clones libraries from a source cluster during the creation of a new cluster.

  • cluster_log_confClusterLogConf (optional) The configuration for delivering spark logs to a long-term storage destination. Two kinds of destinations (dbfs and s3) are supported. Only one destination can be specified for one cluster. If the conf is given, the logs will be delivered to the destination every 5 mins. The destination of driver logs is $destination/$clusterId/driver, while the destination of executor logs is $destination/$clusterId/executor.

  • cluster_name – str (optional) Cluster name requested by the user. This doesn’t have to be unique. If not specified at creation, the cluster name will be an empty string.

  • cluster_sourceClusterSource (optional) Determines whether the cluster was created by a user through the UI, created by the Databricks Jobs Scheduler, or through an API request. This is the same as cluster_creator, but read only.

  • custom_tags

    Dict[str,str] (optional) Additional tags for cluster resources. Databricks will tag all cluster resources (e.g., AWS instances and EBS volumes) with these tags in addition to default_tags. Notes:

    • Currently, Databricks allows at most 45 custom tags

    • Clusters can only reuse cloud resources if the resources’ tags are a subset of the cluster tags

  • data_security_mode

    DataSecurityMode (optional) Data security mode decides what data governance model to use when accessing data from a cluster.

    • NONE: No security isolation for multiple users sharing the cluster. Data governance features are

    not available in this mode. * SINGLE_USER: A secure cluster that can only be exclusively used by a single user specified in single_user_name. Most programming languages, cluster features and data governance features are available in this mode. * USER_ISOLATION: A secure cluster that can be shared by multiple users. Cluster users are fully isolated so that they cannot see each other’s data and credentials. Most data governance features are supported in this mode. But programming languages and cluster features might be limited. * LEGACY_TABLE_ACL: This mode is for users migrating from legacy Table ACL clusters. * LEGACY_PASSTHROUGH: This mode is for users migrating from legacy Passthrough on high concurrency clusters. * LEGACY_SINGLE_USER: This mode is for users migrating from legacy Passthrough on standard clusters.

  • docker_imageDockerImage (optional)

  • driver_instance_pool_id – str (optional) The optional ID of the instance pool for the driver of the cluster belongs. The pool cluster uses the instance pool with id (instance_pool_id) if the driver pool is not assigned.

  • driver_node_type_id – str (optional) The node type of the Spark driver. Note that this field is optional; if unset, the driver node type will be set as the same value as node_type_id defined above.

  • enable_elastic_disk – bool (optional) Autoscaling Local Storage: when enabled, this cluster will dynamically acquire additional disk space when its Spark workers are running low on disk space. This feature requires specific AWS permissions to function correctly - refer to the User Guide for more details.

  • enable_local_disk_encryption – bool (optional) Whether to enable LUKS on cluster VMs’ local disks

  • gcp_attributesGcpAttributes (optional) Attributes related to clusters running on Google Cloud Platform. If not specified at cluster creation, a set of default values will be used.

  • init_scripts – List[InitScriptInfo] (optional) The configuration for storing init scripts. Any number of destinations can be specified. The scripts are executed sequentially in the order provided. If cluster_log_conf is specified, init script logs are sent to <destination>/<cluster-ID>/init_scripts.

  • instance_pool_id – str (optional) The optional ID of the instance pool to which the cluster belongs.

  • node_type_id – str (optional) This field encodes, through a single value, the resources available to each of the Spark nodes in this cluster. For example, the Spark nodes can be provisioned and optimized for memory or compute intensive workloads. A list of available node types can be retrieved by using the :method:clusters/listNodeTypes API call.

  • num_workers

    int (optional) Number of worker nodes that this cluster should have. A cluster has one Spark Driver and num_workers Executors for a total of num_workers + 1 Spark nodes.

    Note: When reading the properties of a cluster, this field reflects the desired number of workers rather than the actual current number of workers. For instance, if a cluster is resized from 5 to 10 workers, this field will immediately be updated to reflect the target size of 10 workers, whereas the workers listed in spark_info will gradually increase from 5 to 10 as the new nodes are provisioned.

  • policy_id – str (optional) The ID of the cluster policy used to create the cluster if applicable.

  • runtime_engineRuntimeEngine (optional) Decides which runtime engine to be use, e.g. Standard vs. Photon. If unspecified, the runtime engine is inferred from spark_version.

  • single_user_name – str (optional) Single user name if data_security_mode is SINGLE_USER

  • spark_conf – Dict[str,str] (optional) An object containing a set of optional, user-specified Spark configuration key-value pairs. Users can also pass in a string of extra JVM options to the driver and the executors via spark.driver.extraJavaOptions and spark.executor.extraJavaOptions respectively.

  • spark_env_vars

    Dict[str,str] (optional) An object containing a set of optional, user-specified environment variable key-value pairs. Please note that key-value pair of the form (X,Y) will be exported as is (i.e., export X=’Y’) while launching the driver and workers.

    In order to specify an additional set of SPARK_DAEMON_JAVA_OPTS, we recommend appending them to $SPARK_DAEMON_JAVA_OPTS as shown in the example below. This ensures that all default databricks managed environmental variables are included as well.

    Example Spark environment variables: {“SPARK_WORKER_MEMORY”: “28000m”, “SPARK_LOCAL_DIRS”: “/local_disk0”} or {“SPARK_DAEMON_JAVA_OPTS”: “$SPARK_DAEMON_JAVA_OPTS -Dspark.shuffle.service.enabled=true”}

  • ssh_public_keys – List[str] (optional) SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name ubuntu on port 2200. Up to 10 keys can be specified.

  • workload_typeWorkloadType (optional)

Returns:

Long-running operation waiter for ClusterDetails. See :method:wait_get_cluster_running for more details.

edit_and_wait(cluster_id: str, spark_version: str [, apply_policy_default_values: Optional[bool], autoscale: Optional[AutoScale], autotermination_minutes: Optional[int], aws_attributes: Optional[AwsAttributes], azure_attributes: Optional[AzureAttributes], clone_from: Optional[CloneCluster], cluster_log_conf: Optional[ClusterLogConf], cluster_name: Optional[str], cluster_source: Optional[ClusterSource], custom_tags: Optional[Dict[str, str]], data_security_mode: Optional[DataSecurityMode], docker_image: Optional[DockerImage], driver_instance_pool_id: Optional[str], driver_node_type_id: Optional[str], enable_elastic_disk: Optional[bool], enable_local_disk_encryption: Optional[bool], gcp_attributes: Optional[GcpAttributes], init_scripts: Optional[List[InitScriptInfo]], instance_pool_id: Optional[str], node_type_id: Optional[str], num_workers: Optional[int], policy_id: Optional[str], runtime_engine: Optional[RuntimeEngine], single_user_name: Optional[str], spark_conf: Optional[Dict[str, str]], spark_env_vars: Optional[Dict[str, str]], ssh_public_keys: Optional[List[str]], workload_type: Optional[WorkloadType], timeout: datetime.timedelta = 0:20:00]) ClusterDetails
ensure_cluster_is_running(cluster_id: str)

Usage:

import os

from databricks.sdk import WorkspaceClient
from databricks.sdk.service import compute

w = WorkspaceClient()

cluster_id = os.environ["TEST_DEFAULT_CLUSTER_ID"]

context = w.command_execution.create(cluster_id=cluster_id, language=compute.Language.PYTHON).result()

w.clusters.ensure_cluster_is_running(cluster_id)

# cleanup
w.command_execution.destroy(cluster_id=cluster_id, context_id=context.id)

Ensures that given cluster is running, regardless of the current state

events(cluster_id: str [, end_time: Optional[int], event_types: Optional[List[EventType]], limit: Optional[int], offset: Optional[int], order: Optional[GetEventsOrder], start_time: Optional[int]]) Iterator[ClusterEvent]

Usage:

import os
import time

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

latest = w.clusters.select_spark_version(latest=True, long_term_support=True)

cluster_name = f'sdk-{time.time_ns()}'

clstr = w.clusters.create(cluster_name=cluster_name,
                          spark_version=latest,
                          instance_pool_id=os.environ["TEST_INSTANCE_POOL_ID"],
                          autotermination_minutes=15,
                          num_workers=1).result()

events = w.clusters.events(cluster_id=clstr.cluster_id)

# cleanup
w.clusters.permanent_delete(cluster_id=clstr.cluster_id)

List cluster activity events.

Retrieves a list of events about the activity of a cluster. This API is paginated. If there are more events to read, the response includes all the nparameters necessary to request the next page of events.

Parameters:
  • cluster_id – str The ID of the cluster to retrieve events about.

  • end_time – int (optional) The end time in epoch milliseconds. If empty, returns events up to the current time.

  • event_types – List[EventType] (optional) An optional set of event types to filter on. If empty, all event types are returned.

  • limit – int (optional) The maximum number of events to include in a page of events. Defaults to 50, and maximum allowed value is 500.

  • offset – int (optional) The offset in the result set. Defaults to 0 (no offset). When an offset is specified and the results are requested in descending order, the end_time field is required.

  • orderGetEventsOrder (optional) The order to list events in; either “ASC” or “DESC”. Defaults to “DESC”.

  • start_time – int (optional) The start time in epoch milliseconds. If empty, returns events starting from the beginning of time.

Returns:

Iterator over ClusterEvent

get(cluster_id: str) ClusterDetails

Usage:

import os
import time

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

latest = w.clusters.select_spark_version(latest=True, long_term_support=True)

cluster_name = f'sdk-{time.time_ns()}'

clstr = w.clusters.create(cluster_name=cluster_name,
                          spark_version=latest,
                          instance_pool_id=os.environ["TEST_INSTANCE_POOL_ID"],
                          autotermination_minutes=15,
                          num_workers=1).result()

by_id = w.clusters.get(cluster_id=clstr.cluster_id)

# cleanup
w.clusters.permanent_delete(cluster_id=clstr.cluster_id)

Get cluster info.

Retrieves the information for a cluster given its identifier. Clusters can be described while they are running, or up to 60 days after they are terminated.

Parameters:

cluster_id – str The cluster about which to retrieve information.

Returns:

ClusterDetails

get_permission_levels(cluster_id: str) GetClusterPermissionLevelsResponse

Get cluster permission levels.

Gets the permission levels that a user can have on an object.

Parameters:

cluster_id – str The cluster for which to get or manage permissions.

Returns:

GetClusterPermissionLevelsResponse

get_permissions(cluster_id: str) ClusterPermissions

Get cluster permissions.

Gets the permissions of a cluster. Clusters can inherit permissions from their root object.

Parameters:

cluster_id – str The cluster for which to get or manage permissions.

Returns:

ClusterPermissions

list([, can_use_client: Optional[str]]) Iterator[ClusterDetails]

Usage:

from databricks.sdk import WorkspaceClient
from databricks.sdk.service import compute

w = WorkspaceClient()

all = w.clusters.list(compute.ListClustersRequest())

List all clusters.

Return information about all pinned clusters, active clusters, up to 200 of the most recently terminated all-purpose clusters in the past 30 days, and up to 30 of the most recently terminated job clusters in the past 30 days.

For example, if there is 1 pinned cluster, 4 active clusters, 45 terminated all-purpose clusters in the past 30 days, and 50 terminated job clusters in the past 30 days, then this API returns the 1 pinned cluster, 4 active clusters, all 45 terminated all-purpose clusters, and the 30 most recently terminated job clusters.

Parameters:

can_use_client – str (optional) Filter clusters based on what type of client it can be used for. Could be either NOTEBOOKS or JOBS. No input for this field will get all clusters in the workspace without filtering on its supported client

Returns:

Iterator over ClusterDetails

list_node_types() ListNodeTypesResponse

Usage:

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

nodes = w.clusters.list_node_types()

List node types.

Returns a list of supported Spark node types. These node types can be used to launch a cluster.

Returns:

ListNodeTypesResponse

list_zones() ListAvailableZonesResponse

List availability zones.

Returns a list of availability zones where clusters can be created in (For example, us-west-2a). These zones can be used to launch a cluster.

Returns:

ListAvailableZonesResponse

permanent_delete(cluster_id: str)

Permanently delete cluster.

Permanently deletes a Spark cluster. This cluster is terminated and resources are asynchronously removed.

In addition, users will no longer see permanently deleted clusters in the cluster list, and API users can no longer perform any action on permanently deleted clusters.

Parameters:

cluster_id – str The cluster to be deleted.

pin(cluster_id: str)

Usage:

import os
import time

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

latest = w.clusters.select_spark_version(latest=True, long_term_support=True)

cluster_name = f'sdk-{time.time_ns()}'

clstr = w.clusters.create(cluster_name=cluster_name,
                          spark_version=latest,
                          instance_pool_id=os.environ["TEST_INSTANCE_POOL_ID"],
                          autotermination_minutes=15,
                          num_workers=1).result()

w.clusters.pin(cluster_id=clstr.cluster_id)

# cleanup
w.clusters.permanent_delete(cluster_id=clstr.cluster_id)

Pin cluster.

Pinning a cluster ensures that the cluster will always be returned by the ListClusters API. Pinning a cluster that is already pinned will have no effect. This API can only be called by workspace admins.

Parameters:

cluster_id – str <needs content added>

resize(cluster_id: str [, autoscale: Optional[AutoScale], num_workers: Optional[int]]) Wait[ClusterDetails]

Usage:

import os
import time

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

latest = w.clusters.select_spark_version(latest=True, long_term_support=True)

cluster_name = f'sdk-{time.time_ns()}'

clstr = w.clusters.create(cluster_name=cluster_name,
                          spark_version=latest,
                          instance_pool_id=os.environ["TEST_INSTANCE_POOL_ID"],
                          autotermination_minutes=15,
                          num_workers=1).result()

by_id = w.clusters.resize(cluster_id=clstr.cluster_id, num_workers=1).result()

# cleanup
w.clusters.permanent_delete(cluster_id=clstr.cluster_id)

Resize cluster.

Resizes a cluster to have a desired number of workers. This will fail unless the cluster is in a RUNNING state.

Parameters:
  • cluster_id – str The cluster to be resized.

  • autoscaleAutoScale (optional) Parameters needed in order to automatically scale clusters up and down based on load. Note: autoscaling works best with DB runtime versions 3.0 or later.

  • num_workers

    int (optional) Number of worker nodes that this cluster should have. A cluster has one Spark Driver and num_workers Executors for a total of num_workers + 1 Spark nodes.

    Note: When reading the properties of a cluster, this field reflects the desired number of workers rather than the actual current number of workers. For instance, if a cluster is resized from 5 to 10 workers, this field will immediately be updated to reflect the target size of 10 workers, whereas the workers listed in spark_info will gradually increase from 5 to 10 as the new nodes are provisioned.

Returns:

Long-running operation waiter for ClusterDetails. See :method:wait_get_cluster_running for more details.

resize_and_wait(cluster_id: str [, autoscale: Optional[AutoScale], num_workers: Optional[int], timeout: datetime.timedelta = 0:20:00]) ClusterDetails
restart(cluster_id: str [, restart_user: Optional[str]]) Wait[ClusterDetails]

Usage:

import os
import time

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

latest = w.clusters.select_spark_version(latest=True, long_term_support=True)

cluster_name = f'sdk-{time.time_ns()}'

clstr = w.clusters.create(cluster_name=cluster_name,
                          spark_version=latest,
                          instance_pool_id=os.environ["TEST_INSTANCE_POOL_ID"],
                          autotermination_minutes=15,
                          num_workers=1).result()

_ = w.clusters.restart(cluster_id=clstr.cluster_id).result()

# cleanup
w.clusters.permanent_delete(cluster_id=clstr.cluster_id)

Restart cluster.

Restarts a Spark cluster with the supplied ID. If the cluster is not currently in a RUNNING state, nothing will happen.

Parameters:
  • cluster_id – str The cluster to be started.

  • restart_user – str (optional) <needs content added>

Returns:

Long-running operation waiter for ClusterDetails. See :method:wait_get_cluster_running for more details.

restart_and_wait(cluster_id: str [, restart_user: Optional[str], timeout: datetime.timedelta = 0:20:00]) ClusterDetails
select_node_type(min_memory_gb: int, gb_per_core: int, min_cores: int, min_gpus: int, local_disk: bool, local_disk_min_size: int, category: str, photon_worker_capable: bool, photon_driver_capable: bool, graviton: bool, is_io_cache_enabled: bool, support_port_forwarding: bool, fleet: str) str

Usage:

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

smallest = w.clusters.select_node_type(local_disk=True)

Selects smallest available node type given the conditions.

Parameters:
  • min_memory_gb – int

  • gb_per_core – int

  • min_cores – int

  • min_gpus – int

  • local_disk – bool

  • local_disk_min_size – bool

  • category – bool

  • photon_worker_capable – bool

  • photon_driver_capable – bool

  • graviton – bool

  • is_io_cache_enabled – bool

  • support_port_forwarding – bool

  • fleet – bool

Returns:

node_type compatible string

select_spark_version(long_term_support: bool = False, beta: bool = False, latest: bool = True, ml: bool = False, genomics: bool = False, gpu: bool = False, scala: str = 2.12, spark_version: str, photon: bool = False, graviton: bool = False) str

Usage:

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

latest = w.clusters.select_spark_version(latest=True, long_term_support=True)

Selects the latest Databricks Runtime Version.

Parameters:
  • long_term_support – bool

  • beta – bool

  • latest – bool

  • ml – bool

  • genomics – bool

  • gpu – bool

  • scala – str

  • spark_version – str

  • photon – bool

  • graviton – bool

Returns:

spark_version compatible string

set_permissions(cluster_id: str [, access_control_list: Optional[List[ClusterAccessControlRequest]]]) ClusterPermissions

Set cluster permissions.

Sets permissions on a cluster. Clusters can inherit permissions from their root object.

Parameters:
  • cluster_id – str The cluster for which to get or manage permissions.

  • access_control_list – List[ClusterAccessControlRequest] (optional)

Returns:

ClusterPermissions

spark_versions() GetSparkVersionsResponse

List available Spark versions.

Returns the list of available Spark versions. These versions can be used to launch a cluster.

Returns:

GetSparkVersionsResponse

start(cluster_id: str) Wait[ClusterDetails]

Usage:

import os
import time

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

latest = w.clusters.select_spark_version(latest=True, long_term_support=True)

cluster_name = f'sdk-{time.time_ns()}'

clstr = w.clusters.create(cluster_name=cluster_name,
                          spark_version=latest,
                          instance_pool_id=os.environ["TEST_INSTANCE_POOL_ID"],
                          autotermination_minutes=15,
                          num_workers=1).result()

_ = w.clusters.start(cluster_id=clstr.cluster_id).result()

# cleanup
w.clusters.permanent_delete(cluster_id=clstr.cluster_id)

Start terminated cluster.

Starts a terminated Spark cluster with the supplied ID. This works similar to createCluster except:

  • The previous cluster id and attributes are preserved. * The cluster starts with the last specified

cluster size. * If the previous cluster was an autoscaling cluster, the current cluster starts with the minimum number of nodes. * If the cluster is not currently in a TERMINATED state, nothing will happen. * Clusters launched to run a job cannot be started.

Parameters:

cluster_id – str The cluster to be started.

Returns:

Long-running operation waiter for ClusterDetails. See :method:wait_get_cluster_running for more details.

start_and_wait(cluster_id: str, timeout: datetime.timedelta = 0:20:00) ClusterDetails
unpin(cluster_id: str)

Usage:

import os
import time

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

latest = w.clusters.select_spark_version(latest=True, long_term_support=True)

cluster_name = f'sdk-{time.time_ns()}'

clstr = w.clusters.create(cluster_name=cluster_name,
                          spark_version=latest,
                          instance_pool_id=os.environ["TEST_INSTANCE_POOL_ID"],
                          autotermination_minutes=15,
                          num_workers=1).result()

w.clusters.unpin(cluster_id=clstr.cluster_id)

# cleanup
w.clusters.permanent_delete(cluster_id=clstr.cluster_id)

Unpin cluster.

Unpinning a cluster will allow the cluster to eventually be removed from the ListClusters API. Unpinning a cluster that is not pinned will have no effect. This API can only be called by workspace admins.

Parameters:

cluster_id – str <needs content added>

update_permissions(cluster_id: str [, access_control_list: Optional[List[ClusterAccessControlRequest]]]) ClusterPermissions

Update cluster permissions.

Updates the permissions on a cluster. Clusters can inherit permissions from their root object.

Parameters:
  • cluster_id – str The cluster for which to get or manage permissions.

  • access_control_list – List[ClusterAccessControlRequest] (optional)

Returns:

ClusterPermissions

wait_get_cluster_running(cluster_id: str, timeout: datetime.timedelta = 0:20:00, callback: Optional[Callable[[ClusterDetails], None]]) ClusterDetails
wait_get_cluster_terminated(cluster_id: str, timeout: datetime.timedelta = 0:20:00, callback: Optional[Callable[[ClusterDetails], None]]) ClusterDetails