Getting Started¶
Installation¶
To install the Databricks SDK for Python, simply run:
pip install databricks-sdk
Databricks Runtime starting from version 13.1 includes a bundled version of the Python SDK. It is highly recommended to upgrade to the latest version which you can do by running the following in a notebook cell:
%pip install --upgrade databricks-sdk
followed by
dbutils.library.restartPython()
Usage Overview¶
At its core, the SDK exposes two primary clients: databricks.sdk.WorkspaceClient
and databricks.sdk.AccountClient
. The WorkspaceClient
is tailored for interacting with resources within the Databricks workspace, such as notebooks, jobs, and clusters, while the AccountClient
focuses on account-level functionalities including user and group management, billing, and workspace provisioning and management.
To use the SDK to call an API, first find the API in either the Workspace API Reference or Account API reference. Then, on the appropriate client, call the corresponding method. All API calls have the form
w.<SERVICE>.<METHOD>(<parameters>)
or
a.<SERVICE>.<METHOD>(parameters)
For example, to list all SQL queries in the workspace, run:
# Authenticate as described above
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
for query in w.queries.list():
print(f'query {query.name} was created at {query.created_at}')
To list all workspaces in the account, run:
# Authenticate as described above
from databricks.sdk import AccountClient
a = AccountClient()
for workspace in a.workspaces.list():
print(f'workspace {workspace.workspace_name} was created at {workspace.creation_time}')
Authentication¶
There are two primary entry points to the Databricks SDK:
databricks.sdk.WorkspaceClient
for working with workspace-level APIs, anddatabricks.sdk.AccountClient
for working with account-level APIs.
To work with APIs, client instances need to authenticate that could be done different ways (according to the Unified Authentication approach that is used by all Databricks SDKs and related tools):
When
WorkspaceClient
is created in the notebook, and no authentication parameters provided, then it will automatically pull authentication information from the execution context. This implicit authentication doesn’t work for AccountClient. Simply use the following code fragment:
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
When
WorkspaceClient
orAccountClient
are created outside of the notebook, then they will read authentication information from environment variables and your.databrickscfg
file. This method is recommended when writing command-line tools as it allows more flexible configuration of authentication.Authentication parameters may also be explicitly passed when creating
WorkspaceClient
orAccountClient
. For example:
from databricks.sdk import WorkspaceClient
w = WorkspaceClient(
host='http://my-databricks-instance.com',
username='my-user',
password='my-password')
It’s also possible to provide a custom authentication implementation if necessary. For more details, see Authentication.
Listing resources¶
One of the key advantages of the Databricks SDK is that it provides a consistent interface for working with APIs returning paginated responses. The pagination details are hidden from the caller. Instead, the SDK returns a generator that yields each item in the list. For example, to list all clusters in the workspace, you can use the following code:
for cluster in w.clusters.list():
print(f'cluster {cluster.cluster_name} has {cluster.num_workers} workers')
See Pagination for more details.
Using Data Classes & Enums¶
The Databricks SDK for Python makes use of Python’s data classes and enums to represent data for APIs - this makes code more readable and type-safe, and it allows easier work with code compared with untyped dicts.
Specific data classes are organized into separate packages under databricks.sdk.service
. For example, databricks.sdk.service.jobs
has defintions for data classes & enums related to the Jobs API.
For more information, consult the Dataclasses API Reference.
Examples¶
The Databricks SDK for Python comes with a number of examples demonstrating how to use the library for various common use-cases, including
These examples and more are located in the examples/
directory of the Github repository.
Some other examples of using the SDK include:
Unity Catalog Automated Migration heavily relies on Python SDK for working with Databricks APIs.
ip-access-list-analyzer checks & prunes invalid entries from IP Access Lists.