Data Quality¶

These dataclasses are used in the SDK to represent API requests and responses for services in the databricks.sdk.service.dataquality module.

class databricks.sdk.service.dataquality.AggregationGranularity¶

The granularity for aggregating data into time windows based on their timestamp.

AGGREGATION_GRANULARITY_1_DAY = "AGGREGATION_GRANULARITY_1_DAY"¶

AGGREGATION_GRANULARITY_1_HOUR = "AGGREGATION_GRANULARITY_1_HOUR"¶

AGGREGATION_GRANULARITY_1_MONTH = "AGGREGATION_GRANULARITY_1_MONTH"¶

AGGREGATION_GRANULARITY_1_WEEK = "AGGREGATION_GRANULARITY_1_WEEK"¶

AGGREGATION_GRANULARITY_1_YEAR = "AGGREGATION_GRANULARITY_1_YEAR"¶

AGGREGATION_GRANULARITY_2_WEEKS = "AGGREGATION_GRANULARITY_2_WEEKS"¶

AGGREGATION_GRANULARITY_30_MINUTES = "AGGREGATION_GRANULARITY_30_MINUTES"¶

AGGREGATION_GRANULARITY_3_WEEKS = "AGGREGATION_GRANULARITY_3_WEEKS"¶

AGGREGATION_GRANULARITY_4_WEEKS = "AGGREGATION_GRANULARITY_4_WEEKS"¶

AGGREGATION_GRANULARITY_5_MINUTES = "AGGREGATION_GRANULARITY_5_MINUTES"¶

class databricks.sdk.service.dataquality.AnomalyDetectionConfig(excluded_table_full_names: List[str] | None = None)¶

Anomaly Detection Configurations.

excluded_table_full_names: List[str] | None = None¶: List of fully qualified table names to exclude from anomaly detection.

as_dict() → dict¶: Serializes the AnomalyDetectionConfig into a dictionary suitable for use as a JSON request body.

as_shallow_dict() → dict¶: Serializes the AnomalyDetectionConfig into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) → AnomalyDetectionConfig¶: Deserializes the AnomalyDetectionConfig from a dictionary.

class databricks.sdk.service.dataquality.CancelRefreshResponse(refresh: Refresh | None = None)¶

Response to cancelling a refresh.

refresh: Refresh | None = None¶: The refresh to cancel.

as_dict() → dict¶: Serializes the CancelRefreshResponse into a dictionary suitable for use as a JSON request body.

as_shallow_dict() → dict¶: Serializes the CancelRefreshResponse into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) → CancelRefreshResponse¶: Deserializes the CancelRefreshResponse from a dictionary.

class databricks.sdk.service.dataquality.CronSchedule(quartz_cron_expression: str, timezone_id: str, pause_status: CronSchedulePauseStatus | None = None)¶

The data quality monitoring workflow cron schedule.

quartz_cron_expression: str¶: The expression that determines when to run the monitor. See examples.

timezone_id: str¶: A Java timezone id. The schedule for a job will be resolved with respect to this timezone. See Java TimeZone for details. The timezone id (e.g., America/Los_Angeles) in which to evaluate the quartz expression.

pause_status: CronSchedulePauseStatus | None = None¶: Read only field that indicates whether the schedule is paused or not.

as_dict() → dict¶: Serializes the CronSchedule into a dictionary suitable for use as a JSON request body.

as_shallow_dict() → dict¶: Serializes the CronSchedule into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) → CronSchedule¶: Deserializes the CronSchedule from a dictionary.

class databricks.sdk.service.dataquality.CronSchedulePauseStatus¶

The data quality monitoring workflow cron schedule pause status.

CRON_SCHEDULE_PAUSE_STATUS_PAUSED = "CRON_SCHEDULE_PAUSE_STATUS_PAUSED"¶

CRON_SCHEDULE_PAUSE_STATUS_UNPAUSED = "CRON_SCHEDULE_PAUSE_STATUS_UNPAUSED"¶

class databricks.sdk.service.dataquality.DataProfilingConfig(output_schema_id: str, assets_dir: str | None = None, baseline_table_name: str | None = None, custom_metrics: List[DataProfilingCustomMetric] | None = None, dashboard_id: str | None = None, drift_metrics_table_name: str | None = None, effective_warehouse_id: str | None = None, inference_log: InferenceLogConfig | None = None, latest_monitor_failure_message: str | None = None, monitor_version: int | None = None, monitored_table_name: str | None = None, notification_settings: NotificationSettings | None = None, profile_metrics_table_name: str | None = None, schedule: CronSchedule | None = None, skip_builtin_dashboard: bool | None = None, slicing_exprs: List[str] | None = None, snapshot: SnapshotConfig | None = None, status: DataProfilingStatus | None = None, time_series: TimeSeriesConfig | None = None, warehouse_id: str | None = None)¶

Data Profiling Configurations.

output_schema_id: str¶: ID of the schema where output tables are created.

assets_dir: str | None = None¶: Field for specifying the absolute path to a custom directory to store data-monitoring assets. Normally prepopulated to a default user location via UI and Python APIs.

baseline_table_name: str | None = None¶: Baseline table name. Baseline data is used to compute drift from the data in the monitored table_name. The baseline table and the monitored table shall have the same schema.

custom_metrics: List[DataProfilingCustomMetric] | None = None¶: Custom metrics.

dashboard_id: str | None = None¶: Id of dashboard that visualizes the computed metrics. This can be empty if the monitor is in PENDING state.

drift_metrics_table_name: str | None = None¶: Table that stores drift metrics data. Format: catalog.schema.table_name.

effective_warehouse_id: str | None = None¶: The warehouse for dashboard creation

inference_log: InferenceLogConfig | None = None¶: Analysis Configuration for monitoring inference log tables.

latest_monitor_failure_message: str | None = None¶: The latest error message for a monitor failure.

monitor_version: int | None = None¶: Represents the current monitor configuration version in use. The version will be represented in a numeric fashion (1,2,3…). The field has flexibility to take on negative values, which can indicate corrupted monitor_version numbers.

monitored_table_name: str | None = None¶: Unity Catalog table to monitor. Format: catalog.schema.table_name

notification_settings: NotificationSettings | None = None¶: Field for specifying notification settings.

profile_metrics_table_name: str | None = None¶: Table that stores profile metrics data. Format: catalog.schema.table_name.

schedule: CronSchedule | None = None¶: The cron schedule.

skip_builtin_dashboard: bool | None = None¶: Whether to skip creating a default dashboard summarizing data quality metrics.

slicing_exprs: List[str] | None = None¶: List of column expressions to slice data with for targeted analysis. The data is grouped by each expression independently, resulting in a separate slice for each predicate and its complements. For example slicing_exprs=[“col_1”, “col_2 > 10”] will generate the following slices: two slices for col_2 > 10 (True and False), and one slice per unique value in col1. For high-cardinality columns, only the top 100 unique values by frequency will generate slices.

snapshot: SnapshotConfig | None = None¶: Analysis Configuration for monitoring snapshot tables.

status: DataProfilingStatus | None = None¶: The data profiling monitor status.

time_series: TimeSeriesConfig | None = None¶: Analysis Configuration for monitoring time series tables.

warehouse_id: str | None = None¶: Optional argument to specify the warehouse for dashboard creation. If not specified, the first running warehouse will be used.

as_dict() → dict¶: Serializes the DataProfilingConfig into a dictionary suitable for use as a JSON request body.

as_shallow_dict() → dict¶: Serializes the DataProfilingConfig into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) → DataProfilingConfig¶: Deserializes the DataProfilingConfig from a dictionary.

class databricks.sdk.service.dataquality.DataProfilingCustomMetric(name: str, definition: str, input_columns: List[str], output_data_type: str, type: DataProfilingCustomMetricType)¶

Custom metric definition.

name: str¶: Name of the metric in the output tables.

definition: str¶: Jinja template for a SQL expression that specifies how to compute the metric. See create metric definition.

input_columns: List[str]¶: A list of column names in the input table the metric should be computed for. Can use ":table" to indicate that the metric needs information from multiple columns.

output_data_type: str¶: The output type of the custom metric.

type: DataProfilingCustomMetricType¶: The type of the custom metric.

as_dict() → dict¶: Serializes the DataProfilingCustomMetric into a dictionary suitable for use as a JSON request body.

as_shallow_dict() → dict¶: Serializes the DataProfilingCustomMetric into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) → DataProfilingCustomMetric¶: Deserializes the DataProfilingCustomMetric from a dictionary.

class databricks.sdk.service.dataquality.DataProfilingCustomMetricType¶

The custom metric type.

DATA_PROFILING_CUSTOM_METRIC_TYPE_AGGREGATE = "DATA_PROFILING_CUSTOM_METRIC_TYPE_AGGREGATE"¶

DATA_PROFILING_CUSTOM_METRIC_TYPE_DERIVED = "DATA_PROFILING_CUSTOM_METRIC_TYPE_DERIVED"¶

DATA_PROFILING_CUSTOM_METRIC_TYPE_DRIFT = "DATA_PROFILING_CUSTOM_METRIC_TYPE_DRIFT"¶

class databricks.sdk.service.dataquality.DataProfilingStatus¶

The status of the data profiling monitor.

DATA_PROFILING_STATUS_ACTIVE = "DATA_PROFILING_STATUS_ACTIVE"¶

DATA_PROFILING_STATUS_DELETE_PENDING = "DATA_PROFILING_STATUS_DELETE_PENDING"¶

DATA_PROFILING_STATUS_ERROR = "DATA_PROFILING_STATUS_ERROR"¶

DATA_PROFILING_STATUS_FAILED = "DATA_PROFILING_STATUS_FAILED"¶

DATA_PROFILING_STATUS_PENDING = "DATA_PROFILING_STATUS_PENDING"¶

class databricks.sdk.service.dataquality.InferenceLogConfig(problem_type: InferenceProblemType, timestamp_column: str, granularities: List[AggregationGranularity], prediction_column: str, model_id_column: str, label_column: str | None = None)¶

Inference log configuration.

problem_type: InferenceProblemType¶: Problem type the model aims to solve.

timestamp_column: str¶: Column for the timestamp.

granularities: List[AggregationGranularity]¶: List of granularities to use when aggregating data into time windows based on their timestamp.

prediction_column: str¶: Column for the prediction.

model_id_column: str¶: Column for the model identifier.

label_column: str | None = None¶: Column for the label.

as_dict() → dict¶: Serializes the InferenceLogConfig into a dictionary suitable for use as a JSON request body.

as_shallow_dict() → dict¶: Serializes the InferenceLogConfig into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) → InferenceLogConfig¶: Deserializes the InferenceLogConfig from a dictionary.

class databricks.sdk.service.dataquality.InferenceProblemType¶

Inference problem type the model aims to solve.

INFERENCE_PROBLEM_TYPE_CLASSIFICATION = "INFERENCE_PROBLEM_TYPE_CLASSIFICATION"¶

INFERENCE_PROBLEM_TYPE_REGRESSION = "INFERENCE_PROBLEM_TYPE_REGRESSION"¶

class databricks.sdk.service.dataquality.ListMonitorResponse(monitors: List[Monitor] | None = None, next_page_token: str | None = None)¶

Response for listing Monitors.

monitors: List[Monitor] | None = None¶

next_page_token: str | None = None¶

as_dict() → dict¶: Serializes the ListMonitorResponse into a dictionary suitable for use as a JSON request body.

as_shallow_dict() → dict¶: Serializes the ListMonitorResponse into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) → ListMonitorResponse¶: Deserializes the ListMonitorResponse from a dictionary.

class databricks.sdk.service.dataquality.ListRefreshResponse(next_page_token: str | None = None, refreshes: List[Refresh] | None = None)¶

Response for listing refreshes.

next_page_token: str | None = None¶

refreshes: List[Refresh] | None = None¶

as_dict() → dict¶: Serializes the ListRefreshResponse into a dictionary suitable for use as a JSON request body.

as_shallow_dict() → dict¶: Serializes the ListRefreshResponse into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) → ListRefreshResponse¶: Deserializes the ListRefreshResponse from a dictionary.

class databricks.sdk.service.dataquality.Monitor(object_type: str, object_id: str, anomaly_detection_config: AnomalyDetectionConfig | None = None, data_profiling_config: DataProfilingConfig | None = None)¶

Monitor for the data quality of unity catalog entities such as schema or table.

object_type: str¶: The type of the monitored object. Can be one of the following: schema or table.

object_id: str¶

The UUID of the request object. It is schema_id for schema, and table_id for table.

Find the schema_id from either:

The schema_id of the Schemas resource.
In Catalog Explorer > select the schema > go to the Details tab > the Schema ID field.

Find the table_id from either:

The table_id of the Tables resource.
In Catalog Explorer > select the table > go to the Details tab > the Table ID field.

anomaly_detection_config: AnomalyDetectionConfig | None = None¶: Anomaly Detection Configuration, applicable to schema object types.

data_profiling_config: DataProfilingConfig | None = None¶: Data Profiling Configuration, applicable to table object types. Exactly one Analysis Configuration must be present.

as_dict() → dict¶: Serializes the Monitor into a dictionary suitable for use as a JSON request body.

as_shallow_dict() → dict¶: Serializes the Monitor into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) → Monitor¶: Deserializes the Monitor from a dictionary.

class databricks.sdk.service.dataquality.NotificationDestination(email_addresses: List[str] | None = None)¶

Destination of the data quality monitoring notification.

email_addresses: List[str] | None = None¶: The list of email addresses to send the notification to. A maximum of 5 email addresses is supported.

as_dict() → dict¶: Serializes the NotificationDestination into a dictionary suitable for use as a JSON request body.

as_shallow_dict() → dict¶: Serializes the NotificationDestination into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) → NotificationDestination¶: Deserializes the NotificationDestination from a dictionary.

class databricks.sdk.service.dataquality.NotificationSettings(on_failure: NotificationDestination | None = None)¶

Settings for sending notifications on the data quality monitoring.

on_failure: NotificationDestination | None = None¶: Destinations to send notifications on failure/timeout.

as_dict() → dict¶: Serializes the NotificationSettings into a dictionary suitable for use as a JSON request body.

as_shallow_dict() → dict¶: Serializes the NotificationSettings into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) → NotificationSettings¶: Deserializes the NotificationSettings from a dictionary.

The Refresh object gives information on a refresh of the data quality monitoring pipeline.

object_type: str¶: The type of the monitored object. Can be one of the following: schema or table.

object_id: str¶

The UUID of the request object. It is schema_id for schema, and table_id for table.

Find the schema_id from either:

The schema_id of the Schemas resource.
In Catalog Explorer > select the schema > go to the Details tab > the Schema ID field.

Find the table_id from either:

The table_id of the Tables resource.
In Catalog Explorer > select the table > go to the Details tab > the Table ID field.

end_time_ms: int | None = None¶: Time when the refresh ended (milliseconds since 1/1/1970 UTC).

message: str | None = None¶: An optional message to give insight into the current state of the refresh (e.g. FAILURE messages).

refresh_id: int | None = None¶: Unique id of the refresh operation.

start_time_ms: int | None = None¶: Time when the refresh started (milliseconds since 1/1/1970 UTC).

state: RefreshState | None = None¶: The current state of the refresh.

trigger: RefreshTrigger | None = None¶: What triggered the refresh.

as_dict() → dict¶: Serializes the Refresh into a dictionary suitable for use as a JSON request body.

as_shallow_dict() → dict¶: Serializes the Refresh into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) → Refresh¶: Deserializes the Refresh from a dictionary.

class databricks.sdk.service.dataquality.RefreshState¶

The state of the refresh.

MONITOR_REFRESH_STATE_CANCELED = "MONITOR_REFRESH_STATE_CANCELED"¶

MONITOR_REFRESH_STATE_FAILED = "MONITOR_REFRESH_STATE_FAILED"¶

MONITOR_REFRESH_STATE_PENDING = "MONITOR_REFRESH_STATE_PENDING"¶

MONITOR_REFRESH_STATE_RUNNING = "MONITOR_REFRESH_STATE_RUNNING"¶

MONITOR_REFRESH_STATE_SUCCESS = "MONITOR_REFRESH_STATE_SUCCESS"¶

MONITOR_REFRESH_STATE_UNKNOWN = "MONITOR_REFRESH_STATE_UNKNOWN"¶

class databricks.sdk.service.dataquality.RefreshTrigger¶

The trigger of the refresh.

MONITOR_REFRESH_TRIGGER_DATA_CHANGE = "MONITOR_REFRESH_TRIGGER_DATA_CHANGE"¶

MONITOR_REFRESH_TRIGGER_MANUAL = "MONITOR_REFRESH_TRIGGER_MANUAL"¶

MONITOR_REFRESH_TRIGGER_SCHEDULE = "MONITOR_REFRESH_TRIGGER_SCHEDULE"¶

MONITOR_REFRESH_TRIGGER_UNKNOWN = "MONITOR_REFRESH_TRIGGER_UNKNOWN"¶

class databricks.sdk.service.dataquality.SnapshotConfig¶

Snapshot analysis configuration.

as_dict() → dict¶: Serializes the SnapshotConfig into a dictionary suitable for use as a JSON request body.

as_shallow_dict() → dict¶: Serializes the SnapshotConfig into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) → SnapshotConfig¶: Deserializes the SnapshotConfig from a dictionary.

class databricks.sdk.service.dataquality.TimeSeriesConfig(timestamp_column: str, granularities: List[AggregationGranularity])¶

Time series analysis configuration.

timestamp_column: str¶: Column for the timestamp.

granularities: List[AggregationGranularity]¶: List of granularities to use when aggregating data into time windows based on their timestamp.

as_dict() → dict¶: Serializes the TimeSeriesConfig into a dictionary suitable for use as a JSON request body.

as_shallow_dict() → dict¶: Serializes the TimeSeriesConfig into a shallow dictionary of its immediate attributes.

classmethod from_dict(d: Dict[str, Any]) → TimeSeriesConfig¶: Deserializes the TimeSeriesConfig from a dictionary.

Navigation

Related Topics

Data Quality¶