Data Quality

These dataclasses are used in the SDK to represent API requests and responses for services in the databricks.sdk.service.dataquality module.

class databricks.sdk.service.dataquality.AggregationGranularity

The granularity for aggregating data into time windows based on their timestamp.

AGGREGATION_GRANULARITY_1_DAY = "AGGREGATION_GRANULARITY_1_DAY"
AGGREGATION_GRANULARITY_1_HOUR = "AGGREGATION_GRANULARITY_1_HOUR"
AGGREGATION_GRANULARITY_1_MONTH = "AGGREGATION_GRANULARITY_1_MONTH"
AGGREGATION_GRANULARITY_1_WEEK = "AGGREGATION_GRANULARITY_1_WEEK"
AGGREGATION_GRANULARITY_1_YEAR = "AGGREGATION_GRANULARITY_1_YEAR"
AGGREGATION_GRANULARITY_2_WEEKS = "AGGREGATION_GRANULARITY_2_WEEKS"
AGGREGATION_GRANULARITY_30_MINUTES = "AGGREGATION_GRANULARITY_30_MINUTES"
AGGREGATION_GRANULARITY_3_WEEKS = "AGGREGATION_GRANULARITY_3_WEEKS"
AGGREGATION_GRANULARITY_4_WEEKS = "AGGREGATION_GRANULARITY_4_WEEKS"
AGGREGATION_GRANULARITY_5_MINUTES = "AGGREGATION_GRANULARITY_5_MINUTES"
class databricks.sdk.service.dataquality.AnomalyDetectionConfig(excluded_table_full_names: List[str] | None = None)

Anomaly Detection Configurations.

excluded_table_full_names: List[str] | None = None

List of fully qualified table names to exclude from anomaly detection.

as_dict() dict

Serializes the AnomalyDetectionConfig into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the AnomalyDetectionConfig into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) AnomalyDetectionConfig

Deserializes the AnomalyDetectionConfig from a dictionary.

class databricks.sdk.service.dataquality.CancelRefreshResponse(refresh: Refresh | None = None)

Response to cancelling a refresh.

refresh: Refresh | None = None

The refresh to cancel.

as_dict() dict

Serializes the CancelRefreshResponse into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the CancelRefreshResponse into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) CancelRefreshResponse

Deserializes the CancelRefreshResponse from a dictionary.

class databricks.sdk.service.dataquality.CronSchedule(quartz_cron_expression: str, timezone_id: str, pause_status: CronSchedulePauseStatus | None = None)

The data quality monitoring workflow cron schedule.

quartz_cron_expression: str

The expression that determines when to run the monitor. See [examples].

[examples]: https://www.quartz-scheduler.org/documentation/quartz-2.3.0/tutorials/crontrigger.html

timezone_id: str

A Java timezone id. The schedule for a job will be resolved with respect to this timezone. See Java TimeZone for details. The timezone id (e.g., America/Los_Angeles) in which to evaluate the quartz expression.

pause_status: CronSchedulePauseStatus | None = None

Read only field that indicates whether the schedule is paused or not.

as_dict() dict

Serializes the CronSchedule into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the CronSchedule into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) CronSchedule

Deserializes the CronSchedule from a dictionary.

class databricks.sdk.service.dataquality.CronSchedulePauseStatus

The data quality monitoring workflow cron schedule pause status.

CRON_SCHEDULE_PAUSE_STATUS_PAUSED = "CRON_SCHEDULE_PAUSE_STATUS_PAUSED"
CRON_SCHEDULE_PAUSE_STATUS_UNPAUSED = "CRON_SCHEDULE_PAUSE_STATUS_UNPAUSED"
class databricks.sdk.service.dataquality.DataProfilingConfig(output_schema_id: str, assets_dir: str | None = None, baseline_table_name: str | None = None, custom_metrics: List[DataProfilingCustomMetric] | None = None, dashboard_id: str | None = None, drift_metrics_table_name: str | None = None, effective_warehouse_id: str | None = None, inference_log: InferenceLogConfig | None = None, latest_monitor_failure_message: str | None = None, monitor_version: int | None = None, monitored_table_name: str | None = None, notification_settings: NotificationSettings | None = None, profile_metrics_table_name: str | None = None, schedule: CronSchedule | None = None, skip_builtin_dashboard: bool | None = None, slicing_exprs: List[str] | None = None, snapshot: SnapshotConfig | None = None, status: DataProfilingStatus | None = None, time_series: TimeSeriesConfig | None = None, warehouse_id: str | None = None)

Data Profiling Configurations.

output_schema_id: str

ID of the schema where output tables are created.

assets_dir: str | None = None

Field for specifying the absolute path to a custom directory to store data-monitoring assets. Normally prepopulated to a default user location via UI and Python APIs.

baseline_table_name: str | None = None

Baseline table name. Baseline data is used to compute drift from the data in the monitored table_name. The baseline table and the monitored table shall have the same schema.

custom_metrics: List[DataProfilingCustomMetric] | None = None

Custom metrics.

dashboard_id: str | None = None

Id of dashboard that visualizes the computed metrics. This can be empty if the monitor is in PENDING state.

drift_metrics_table_name: str | None = None

Table that stores drift metrics data. Format: catalog.schema.table_name.

effective_warehouse_id: str | None = None

The warehouse for dashboard creation

inference_log: InferenceLogConfig | None = None

Analysis Configuration for monitoring inference log tables.

latest_monitor_failure_message: str | None = None

The latest error message for a monitor failure.

monitor_version: int | None = None

Represents the current monitor configuration version in use. The version will be represented in a numeric fashion (1,2,3…). The field has flexibility to take on negative values, which can indicate corrupted monitor_version numbers.

monitored_table_name: str | None = None

Unity Catalog table to monitor. Format: catalog.schema.table_name

notification_settings: NotificationSettings | None = None

Field for specifying notification settings.

profile_metrics_table_name: str | None = None

Table that stores profile metrics data. Format: catalog.schema.table_name.

schedule: CronSchedule | None = None

The cron schedule.

skip_builtin_dashboard: bool | None = None

Whether to skip creating a default dashboard summarizing data quality metrics.

slicing_exprs: List[str] | None = None

List of column expressions to slice data with for targeted analysis. The data is grouped by each expression independently, resulting in a separate slice for each predicate and its complements. For example slicing_exprs=[“col_1”, “col_2 > 10”] will generate the following slices: two slices for col_2 > 10 (True and False), and one slice per unique value in col1. For high-cardinality columns, only the top 100 unique values by frequency will generate slices.

snapshot: SnapshotConfig | None = None

Analysis Configuration for monitoring snapshot tables.

status: DataProfilingStatus | None = None

The data profiling monitor status.

time_series: TimeSeriesConfig | None = None

Analysis Configuration for monitoring time series tables.

warehouse_id: str | None = None

Optional argument to specify the warehouse for dashboard creation. If not specified, the first running warehouse will be used.

as_dict() dict

Serializes the DataProfilingConfig into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the DataProfilingConfig into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) DataProfilingConfig

Deserializes the DataProfilingConfig from a dictionary.

class databricks.sdk.service.dataquality.DataProfilingCustomMetric(name: str, definition: str, input_columns: List[str], output_data_type: str, type: DataProfilingCustomMetricType)

Custom metric definition.

name: str

Name of the metric in the output tables.

definition: str

Jinja template for a SQL expression that specifies how to compute the metric. See [create metric definition].

[create metric definition]: https://docs.databricks.com/en/lakehouse-monitoring/custom-metrics.html#create-definition

input_columns: List[str]

A list of column names in the input table the metric should be computed for. Can use ":table" to indicate that the metric needs information from multiple columns.

output_data_type: str

The output type of the custom metric.

type: DataProfilingCustomMetricType

The type of the custom metric.

as_dict() dict

Serializes the DataProfilingCustomMetric into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the DataProfilingCustomMetric into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) DataProfilingCustomMetric

Deserializes the DataProfilingCustomMetric from a dictionary.

class databricks.sdk.service.dataquality.DataProfilingCustomMetricType

The custom metric type.

DATA_PROFILING_CUSTOM_METRIC_TYPE_AGGREGATE = "DATA_PROFILING_CUSTOM_METRIC_TYPE_AGGREGATE"
DATA_PROFILING_CUSTOM_METRIC_TYPE_DERIVED = "DATA_PROFILING_CUSTOM_METRIC_TYPE_DERIVED"
DATA_PROFILING_CUSTOM_METRIC_TYPE_DRIFT = "DATA_PROFILING_CUSTOM_METRIC_TYPE_DRIFT"
class databricks.sdk.service.dataquality.DataProfilingStatus

The status of the data profiling monitor.

DATA_PROFILING_STATUS_ACTIVE = "DATA_PROFILING_STATUS_ACTIVE"
DATA_PROFILING_STATUS_DELETE_PENDING = "DATA_PROFILING_STATUS_DELETE_PENDING"
DATA_PROFILING_STATUS_ERROR = "DATA_PROFILING_STATUS_ERROR"
DATA_PROFILING_STATUS_FAILED = "DATA_PROFILING_STATUS_FAILED"
DATA_PROFILING_STATUS_PENDING = "DATA_PROFILING_STATUS_PENDING"
class databricks.sdk.service.dataquality.InferenceLogConfig(problem_type: InferenceProblemType, timestamp_column: str, granularities: List[AggregationGranularity], prediction_column: str, model_id_column: str, label_column: str | None = None)

Inference log configuration.

problem_type: InferenceProblemType

Problem type the model aims to solve.

timestamp_column: str

Column for the timestamp.

granularities: List[AggregationGranularity]

List of granularities to use when aggregating data into time windows based on their timestamp.

prediction_column: str

Column for the prediction.

model_id_column: str

Column for the model identifier.

label_column: str | None = None

Column for the label.

as_dict() dict

Serializes the InferenceLogConfig into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the InferenceLogConfig into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) InferenceLogConfig

Deserializes the InferenceLogConfig from a dictionary.

class databricks.sdk.service.dataquality.InferenceProblemType

Inference problem type the model aims to solve.

INFERENCE_PROBLEM_TYPE_CLASSIFICATION = "INFERENCE_PROBLEM_TYPE_CLASSIFICATION"
INFERENCE_PROBLEM_TYPE_REGRESSION = "INFERENCE_PROBLEM_TYPE_REGRESSION"
class databricks.sdk.service.dataquality.ListMonitorResponse(monitors: List[Monitor] | None = None, next_page_token: str | None = None)

Response for listing Monitors.

monitors: List[Monitor] | None = None
next_page_token: str | None = None
as_dict() dict

Serializes the ListMonitorResponse into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the ListMonitorResponse into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) ListMonitorResponse

Deserializes the ListMonitorResponse from a dictionary.

class databricks.sdk.service.dataquality.ListRefreshResponse(next_page_token: str | None = None, refreshes: List[Refresh] | None = None)

Response for listing refreshes.

next_page_token: str | None = None
refreshes: List[Refresh] | None = None
as_dict() dict

Serializes the ListRefreshResponse into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the ListRefreshResponse into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) ListRefreshResponse

Deserializes the ListRefreshResponse from a dictionary.

class databricks.sdk.service.dataquality.Monitor(object_type: str, object_id: str, anomaly_detection_config: AnomalyDetectionConfig | None = None, data_profiling_config: DataProfilingConfig | None = None)

Monitor for the data quality of unity catalog entities such as schema or table.

object_type: str

The type of the monitored object. Can be one of the following: schema or table.

object_id: str

The UUID of the request object. It is schema_id for schema, and table_id for table.

Find the schema_id from either: 1. The [schema_id] of the Schemas resource. 2. In [Catalog Explorer] > select the schema > go to the Details tab > the Schema ID field.

Find the table_id from either: 1. The [table_id] of the Tables resource. 2. In [Catalog Explorer] > select the table > go to the Details tab > the Table ID field.

[Catalog Explorer]: https://docs.databricks.com/aws/en/catalog-explorer/ [schema_id]: https://docs.databricks.com/api/workspace/schemas/get#schema_id [table_id]: https://docs.databricks.com/api/workspace/tables/get#table_id

anomaly_detection_config: AnomalyDetectionConfig | None = None

Anomaly Detection Configuration, applicable to schema object types.

data_profiling_config: DataProfilingConfig | None = None

Data Profiling Configuration, applicable to table object types. Exactly one Analysis Configuration must be present.

as_dict() dict

Serializes the Monitor into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the Monitor into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) Monitor

Deserializes the Monitor from a dictionary.

class databricks.sdk.service.dataquality.NotificationDestination(email_addresses: List[str] | None = None)

Destination of the data quality monitoring notification.

email_addresses: List[str] | None = None

The list of email addresses to send the notification to. A maximum of 5 email addresses is supported.

as_dict() dict

Serializes the NotificationDestination into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the NotificationDestination into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) NotificationDestination

Deserializes the NotificationDestination from a dictionary.

class databricks.sdk.service.dataquality.NotificationSettings(on_failure: NotificationDestination | None = None)

Settings for sending notifications on the data quality monitoring.

on_failure: NotificationDestination | None = None

Destinations to send notifications on failure/timeout.

as_dict() dict

Serializes the NotificationSettings into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the NotificationSettings into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) NotificationSettings

Deserializes the NotificationSettings from a dictionary.

class databricks.sdk.service.dataquality.Refresh(object_type: str, object_id: str, end_time_ms: int | None = None, message: str | None = None, refresh_id: int | None = None, start_time_ms: int | None = None, state: RefreshState | None = None, trigger: RefreshTrigger | None = None)

The Refresh object gives information on a refresh of the data quality monitoring pipeline.

object_type: str

The type of the monitored object. Can be one of the following: schema`or `table.

object_id: str

The UUID of the request object. It is schema_id for schema, and table_id for table.

Find the schema_id from either: 1. The [schema_id] of the Schemas resource. 2. In [Catalog Explorer] > select the schema > go to the Details tab > the Schema ID field.

Find the table_id from either: 1. The [table_id] of the Tables resource. 2. In [Catalog Explorer] > select the table > go to the Details tab > the Table ID field.

[Catalog Explorer]: https://docs.databricks.com/aws/en/catalog-explorer/ [schema_id]: https://docs.databricks.com/api/workspace/schemas/get#schema_id [table_id]: https://docs.databricks.com/api/workspace/tables/get#table_id

end_time_ms: int | None = None

Time when the refresh ended (milliseconds since 1/1/1970 UTC).

message: str | None = None

An optional message to give insight into the current state of the refresh (e.g. FAILURE messages).

refresh_id: int | None = None

Unique id of the refresh operation.

start_time_ms: int | None = None

Time when the refresh started (milliseconds since 1/1/1970 UTC).

state: RefreshState | None = None

The current state of the refresh.

trigger: RefreshTrigger | None = None

What triggered the refresh.

as_dict() dict

Serializes the Refresh into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the Refresh into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) Refresh

Deserializes the Refresh from a dictionary.

class databricks.sdk.service.dataquality.RefreshState

The state of the refresh.

MONITOR_REFRESH_STATE_CANCELED = "MONITOR_REFRESH_STATE_CANCELED"
MONITOR_REFRESH_STATE_FAILED = "MONITOR_REFRESH_STATE_FAILED"
MONITOR_REFRESH_STATE_PENDING = "MONITOR_REFRESH_STATE_PENDING"
MONITOR_REFRESH_STATE_RUNNING = "MONITOR_REFRESH_STATE_RUNNING"
MONITOR_REFRESH_STATE_SUCCESS = "MONITOR_REFRESH_STATE_SUCCESS"
MONITOR_REFRESH_STATE_UNKNOWN = "MONITOR_REFRESH_STATE_UNKNOWN"
class databricks.sdk.service.dataquality.RefreshTrigger

The trigger of the refresh.

MONITOR_REFRESH_TRIGGER_DATA_CHANGE = "MONITOR_REFRESH_TRIGGER_DATA_CHANGE"
MONITOR_REFRESH_TRIGGER_MANUAL = "MONITOR_REFRESH_TRIGGER_MANUAL"
MONITOR_REFRESH_TRIGGER_SCHEDULE = "MONITOR_REFRESH_TRIGGER_SCHEDULE"
MONITOR_REFRESH_TRIGGER_UNKNOWN = "MONITOR_REFRESH_TRIGGER_UNKNOWN"
class databricks.sdk.service.dataquality.SnapshotConfig

Snapshot analysis configuration.

as_dict() dict

Serializes the SnapshotConfig into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the SnapshotConfig into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) SnapshotConfig

Deserializes the SnapshotConfig from a dictionary.

class databricks.sdk.service.dataquality.TimeSeriesConfig(timestamp_column: str, granularities: List[AggregationGranularity])

Time series analysis configuration.

timestamp_column: str

Column for the timestamp.

granularities: List[AggregationGranularity]

List of granularities to use when aggregating data into time windows based on their timestamp.

as_dict() dict

Serializes the TimeSeriesConfig into a dictionary suitable for use as a JSON request body.

as_shallow_dict() dict

Serializes the TimeSeriesConfig into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) TimeSeriesConfig

Deserializes the TimeSeriesConfig from a dictionary.