Getting Started

Installation

To install the Databricks SDK for Python, simply run:

pip install databricks-sdk

Databricks Runtime starting from version 13.1 includes a bundled version of the Python SDK. It is highly recommended to upgrade to the latest version which you can do by running the following in a notebook cell:

%pip install --upgrade databricks-sdk

followed by

dbutils.library.restartPython()

Usage Overview

At its core, the SDK exposes two primary clients: databricks.sdk.WorkspaceClient and databricks.sdk.AccountClient. The WorkspaceClient is tailored for interacting with resources within the Databricks workspace, such as notebooks, jobs, and clusters, while the AccountClient focuses on account-level functionalities including user and group management, billing, and workspace provisioning and management.

To use the SDK to call an API, first find the API in either the Workspace API Reference or Account API reference. Then, on the appropriate client, call the corresponding method. All API calls have the form

w.<SERVICE>.<METHOD>(<parameters>)

or

a.<SERVICE>.<METHOD>(parameters)

For example, to list all SQL queries in the workspace, run:

# Authenticate as described above
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
for query in w.queries.list():
    print(f'query {query.name} was created at {query.created_at}')

To list all workspaces in the account, run:

# Authenticate as described above
from databricks.sdk import AccountClient
a = AccountClient()
for workspace in a.workspaces.list():
    print(f'workspace {workspace.workspace_name} was created at {workspace.creation_time}')

Authentication

There are two primary entry points to the Databricks SDK:

  • databricks.sdk.WorkspaceClient for working with workspace-level APIs, and

  • databricks.sdk.AccountClient for working with account-level APIs.

To work with APIs, client instances need to authenticate that could be done different ways (according to the Unified Authentication approach that is used by all Databricks SDKs and related tools):

  • When WorkspaceClient is created in the notebook, and no authentication parameters provided, then it will automatically pull authentication information from the execution context. This implicit authentication doesn’t work for AccountClient. Simply use the following code fragment:

from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
  • When WorkspaceClient or AccountClient are created outside of the notebook, then they will read authentication information from environment variables and your .databrickscfg file. This method is recommended when writing command-line tools as it allows more flexible configuration of authentication.

  • Authentication parameters may also be explicitly passed when creating WorkspaceClient or AccountClient. For example:

from databricks.sdk import WorkspaceClient
w = WorkspaceClient(
    host='http://my-databricks-instance.com',
    username='my-user',
    password='my-password')

It’s also possible to provide a custom authentication implementation if necessary. For more details, see Authentication.

Listing resources

One of the key advantages of the Databricks SDK is that it provides a consistent interface for working with APIs returning paginated responses. The pagination details are hidden from the caller. Instead, the SDK returns a generator that yields each item in the list. For example, to list all clusters in the workspace, you can use the following code:

for cluster in w.clusters.list():
    print(f'cluster {cluster.cluster_name} has {cluster.num_workers} workers')

See Pagination for more details.

Using Data Classes & Enums

The Databricks SDK for Python makes use of Python’s data classes and enums to represent data for APIs - this makes code more readable and type-safe, and it allows easier work with code compared with untyped dicts.

Specific data classes are organized into separate packages under databricks.sdk.service. For example, databricks.sdk.service.jobs has defintions for data classes & enums related to the Jobs API.

For more information, consult the Dataclasses API Reference.

Examples

The Databricks SDK for Python comes with a number of examples demonstrating how to use the library for various common use-cases, including

These examples and more are located in the examples/ directory of the Github repository.

Some other examples of using the SDK include: