Authentication

If you use Databricks configuration profiles or Databricks-specific environment variables for Databricks authentication, the only code required to start working with a Databricks workspace is the following code snippet, which instructs the Databricks SDK for Python to use its default authentication flow:

from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
w. # press <TAB> for autocompletion

The conventional name for the variable that holds the workspace-level client of the Databricks SDK for Python is w, which is shorthand for workspace.

Notebook-native authentication

If you initialise WorkspaceClient without any arguments, credentials will be picked up automatically from the notebook context. If the same code is run outside the notebook environment, like CI/CD, you have to supply environment variables for the authentication to work.

notebook-native authentication

databricks.sdk.AccountClient does not support notebook-native authentication.

Default authentication flow

If you run the Databricks Terraform Provider, the Databricks SDK for Go, the Databricks CLI, or applications that target the Databricks SDKs for other languages, most likely they will all interoperate nicely together. By default, the Databricks SDK for Python tries the following authentication methods, in the following order, until it succeeds:

  1. Databricks native authentication

  2. Azure native authentication

  3. If the SDK is unsuccessful at this point, it returns an authentication error and stops running.

You can instruct the Databricks SDK for Python to use a specific authentication method by setting the auth_type argument as described in the following sections.

For each authentication method, the SDK searches for compatible authentication credentials in the following locations, in the following order. Once the SDK finds a compatible set of credentials that it can use, it stops searching:

  1. Credentials that are hard-coded into configuration arguments.

    :warning: Caution: Databricks does not recommend hard-coding credentials into arguments, as they can be exposed in plain text in version control systems. Use environment variables or configuration profiles instead.

  2. Credentials in Databricks-specific environment variables.

  3. For Databricks native authentication, credentials in the .databrickscfg file’s DEFAULT configuration profile from its default file location (~ for Linux or macOS, and %USERPROFILE% for Windows).

  4. For Azure native authentication, the SDK searches for credentials through the Azure CLI as needed.

Depending on the Databricks authentication method, the SDK uses the following information. Presented are the WorkspaceClient and AccountClient arguments (which have corresponding .databrickscfg file fields), their descriptions, and any corresponding environment variables.

Databricks native authentication

By default, the Databricks SDK for Python initially tries Databricks token authentication (auth_type='pat' argument). If the SDK is unsuccessful, it then tries Databricks basic (username/password) authentication (auth_type="basic" argument).

  • For Databricks token authentication, you must provide host and token; or their environment variable or .databrickscfg file field equivalents.

  • For Databricks basic authentication, you must provide host, username, and password (for AWS workspace-level operations); or host, account_id, username, and password (for AWS, Azure, or GCP account-level operations); or their environment variable or .databrickscfg file field equivalents.

Argument

Description

Environment variable

host

(String) The Databricks host URL for either the Databricks workspace endpoint or the Databricks accounts endpoint.

DATABRICKS_HOST

account_id

(String) The Databricks account ID for the Databricks accounts endpoint. Only has effect when Host is either https://accounts.cloud.databricks.com/ (AWS), https://accounts.azuredatabricks.net/ (Azure), or https://accounts.gcp.databricks.com/ (GCP).

DATABRICKS_ACCOUNT_ID

token

(String) The Databricks personal access token (PAT) (AWS, Azure, and GCP) or Azure Active Directory (Azure AD) token (Azure).

DATABRICKS_TOKEN

username

(String) The Databricks username part of basic authentication. Only possible when Host is *.cloud.databricks.com (AWS).

DATABRICKS_USERNAME

password

(String) The Databricks password part of basic authentication. Only possible when Host is *.cloud.databricks.com (AWS).

DATABRICKS_PASSWORD

For example, to use Databricks token authentication:

from databricks.sdk import WorkspaceClient
w = WorkspaceClient(host=input('Databricks Workspace URL: '), token=input('Token: '))

Azure native authentication

By default, the Databricks SDK for Python first tries Azure client secret authentication (auth_type='azure-client-secret' argument). If the SDK is unsuccessful, it then tries Azure CLI authentication (auth_type='azure-cli' argument). See Manage service principals.

The Databricks SDK for Python picks up an Azure CLI token, if you’ve previously authenticated as an Azure user by running az login on your machine. See Get Azure AD tokens for users by using the Azure CLI.

To authenticate as an Azure Active Directory (Azure AD) service principal, you must provide one of the following. See also Add a service principal to your Azure Databricks account:

  • azure_workspace_resource_id, azure_client_secret, azure_client_id, and azure_tenant_id; or their environment variable or .databrickscfg file field equivalents.

  • azure_workspace_resource_id and azure_use_msi; or their environment variable or .databrickscfg file field equivalents.

Argument

Description

Environment variable

azure_workspace_resource_id

(String) The Azure Resource Manager ID for the Azure Databricks workspace, which is exchanged for a Databricks host URL.

DATABRICKS_AZURE_RESOURCE_ID

azure_use_msi

(Boolean) true to use Azure Managed Service Identity passwordless authentication flow for service principals.

ARM_USE_MSI

azure_client_secret

(String) The Azure AD service principal’s client secret.

ARM_CLIENT_SECRET

azure_client_id

(String) The Azure AD service principal’s application ID.

ARM_CLIENT_ID

azure_tenant_id

(String) The Azure AD service principal’s tenant ID.

ARM_TENANT_ID

azure_environment

(String) The Azure environment type (such as Public, UsGov, China, and Germany) for a specific set of API endpoints. Defaults to PUBLIC.

ARM_ENVIRONMENT

For example, to use Azure client secret authentication:

from databricks.sdk import WorkspaceClient
w = WorkspaceClient(host=input('Databricks Workspace URL: '),
                    azure_workspace_resource_id=input('Azure Resource ID: '),
                    azure_tenant_id=input('AAD Tenant ID: '),
                    azure_client_id=input('AAD Client ID: '),
                    azure_client_secret=input('AAD Client Secret: '))

Overriding .databrickscfg

For Databricks native authentication, you can override the default behavior for using .databrickscfg as follows:

Argument

Description

Environment variable

profile

(String) A connection profile specified within .databrickscfg to use instead of DEFAULT.

DATABRICKS_CONFIG_PROFILE

config_file

(String) A non-default location of the Databricks CLI credentials file.

DATABRICKS_CONFIG_FILE

For example, to use a profile named MYPROFILE instead of DEFAULT:

from databricks.sdk import WorkspaceClient
w = WorkspaceClient(profile='MYPROFILE')
# Now call the Databricks workspace APIs as desired...

Additional configuration options

For all authentication methods, you can override the default behavior in client arguments as follows:

Argument

Description

Environment variable

auth_type

(String) When multiple auth attributes are available in the environment, use the auth type specified by this argument. This argument also holds the currently selected auth.

DATABRICKS_AUTH_TYPE

http_timeout_seconds

(Integer) Number of seconds for HTTP timeout. Default is 60.

(None)

retry_timeout_seconds

(Integer) Number of seconds to keep retrying HTTP requests. Default is 300 (5 minutes).

(None)

debug_truncate_bytes

(Integer) Truncate JSON fields in debug logs above this limit. Default is 96.

DATABRICKS_DEBUG_TRUNCATE_BYTES

debug_headers

(Boolean) true to debug HTTP headers of requests made by the application. Default is false, as headers contain sensitive data, such as access tokens.

DATABRICKS_DEBUG_HEADERS

rate_limit

(Integer) Maximum number of requests per second made to Databricks REST API.

DATABRICKS_RATE_LIMIT

For example, to turn on debug HTTP headers:

from databricks.sdk import WorkspaceClient
w = WorkspaceClient(debug_headers=True)
# Now call the Databricks workspace APIs as desired...