Data Quality¶
These dataclasses are used in the SDK to represent API requests and responses for services in the databricks.sdk.service.dataquality module.
- class databricks.sdk.service.dataquality.AggregationGranularity¶
The granularity for aggregating data into time windows based on their timestamp.
- AGGREGATION_GRANULARITY_1_DAY = "AGGREGATION_GRANULARITY_1_DAY"¶
- AGGREGATION_GRANULARITY_1_HOUR = "AGGREGATION_GRANULARITY_1_HOUR"¶
- AGGREGATION_GRANULARITY_1_MONTH = "AGGREGATION_GRANULARITY_1_MONTH"¶
- AGGREGATION_GRANULARITY_1_WEEK = "AGGREGATION_GRANULARITY_1_WEEK"¶
- AGGREGATION_GRANULARITY_1_YEAR = "AGGREGATION_GRANULARITY_1_YEAR"¶
- AGGREGATION_GRANULARITY_2_WEEKS = "AGGREGATION_GRANULARITY_2_WEEKS"¶
- AGGREGATION_GRANULARITY_30_MINUTES = "AGGREGATION_GRANULARITY_30_MINUTES"¶
- AGGREGATION_GRANULARITY_3_WEEKS = "AGGREGATION_GRANULARITY_3_WEEKS"¶
- AGGREGATION_GRANULARITY_4_WEEKS = "AGGREGATION_GRANULARITY_4_WEEKS"¶
- AGGREGATION_GRANULARITY_5_MINUTES = "AGGREGATION_GRANULARITY_5_MINUTES"¶
- class databricks.sdk.service.dataquality.AnomalyDetectionConfig(excluded_table_full_names: List[str] | None = None)¶
Anomaly Detection Configurations.
- excluded_table_full_names: List[str] | None = None¶
List of fully qualified table names to exclude from anomaly detection.
- as_dict() dict¶
Serializes the AnomalyDetectionConfig into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the AnomalyDetectionConfig into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) AnomalyDetectionConfig¶
Deserializes the AnomalyDetectionConfig from a dictionary.
- class databricks.sdk.service.dataquality.CancelRefreshResponse(refresh: Refresh | None = None)¶
Response to cancelling a refresh.
- as_dict() dict¶
Serializes the CancelRefreshResponse into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the CancelRefreshResponse into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) CancelRefreshResponse¶
Deserializes the CancelRefreshResponse from a dictionary.
- class databricks.sdk.service.dataquality.CronSchedule(quartz_cron_expression: str, timezone_id: str, pause_status: CronSchedulePauseStatus | None = None)¶
The data quality monitoring workflow cron schedule.
- quartz_cron_expression: str¶
The expression that determines when to run the monitor. See [examples].
[examples]: https://www.quartz-scheduler.org/documentation/quartz-2.3.0/tutorials/crontrigger.html
- timezone_id: str¶
A Java timezone id. The schedule for a job will be resolved with respect to this timezone. See Java TimeZone for details. The timezone id (e.g.,
America/Los_Angeles) in which to evaluate the quartz expression.
- pause_status: CronSchedulePauseStatus | None = None¶
Read only field that indicates whether the schedule is paused or not.
- as_dict() dict¶
Serializes the CronSchedule into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the CronSchedule into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) CronSchedule¶
Deserializes the CronSchedule from a dictionary.
- class databricks.sdk.service.dataquality.CronSchedulePauseStatus¶
The data quality monitoring workflow cron schedule pause status.
- CRON_SCHEDULE_PAUSE_STATUS_PAUSED = "CRON_SCHEDULE_PAUSE_STATUS_PAUSED"¶
- CRON_SCHEDULE_PAUSE_STATUS_UNPAUSED = "CRON_SCHEDULE_PAUSE_STATUS_UNPAUSED"¶
- class databricks.sdk.service.dataquality.DataProfilingConfig(output_schema_id: str, assets_dir: str | None = None, baseline_table_name: str | None = None, custom_metrics: List[DataProfilingCustomMetric] | None = None, dashboard_id: str | None = None, drift_metrics_table_name: str | None = None, effective_warehouse_id: str | None = None, inference_log: InferenceLogConfig | None = None, latest_monitor_failure_message: str | None = None, monitor_version: int | None = None, monitored_table_name: str | None = None, notification_settings: NotificationSettings | None = None, profile_metrics_table_name: str | None = None, schedule: CronSchedule | None = None, skip_builtin_dashboard: bool | None = None, slicing_exprs: List[str] | None = None, snapshot: SnapshotConfig | None = None, status: DataProfilingStatus | None = None, time_series: TimeSeriesConfig | None = None, warehouse_id: str | None = None)¶
Data Profiling Configurations.
- output_schema_id: str¶
ID of the schema where output tables are created.
- assets_dir: str | None = None¶
Field for specifying the absolute path to a custom directory to store data-monitoring assets. Normally prepopulated to a default user location via UI and Python APIs.
- baseline_table_name: str | None = None¶
Baseline table name. Baseline data is used to compute drift from the data in the monitored table_name. The baseline table and the monitored table shall have the same schema.
- custom_metrics: List[DataProfilingCustomMetric] | None = None¶
Custom metrics.
- dashboard_id: str | None = None¶
Id of dashboard that visualizes the computed metrics. This can be empty if the monitor is in PENDING state.
- drift_metrics_table_name: str | None = None¶
Table that stores drift metrics data. Format: catalog.schema.table_name.
- effective_warehouse_id: str | None = None¶
The warehouse for dashboard creation
- inference_log: InferenceLogConfig | None = None¶
Analysis Configuration for monitoring inference log tables.
- latest_monitor_failure_message: str | None = None¶
The latest error message for a monitor failure.
- monitor_version: int | None = None¶
Represents the current monitor configuration version in use. The version will be represented in a numeric fashion (1,2,3…). The field has flexibility to take on negative values, which can indicate corrupted monitor_version numbers.
- monitored_table_name: str | None = None¶
Unity Catalog table to monitor. Format: catalog.schema.table_name
- notification_settings: NotificationSettings | None = None¶
Field for specifying notification settings.
- profile_metrics_table_name: str | None = None¶
Table that stores profile metrics data. Format: catalog.schema.table_name.
- schedule: CronSchedule | None = None¶
The cron schedule.
- skip_builtin_dashboard: bool | None = None¶
Whether to skip creating a default dashboard summarizing data quality metrics.
- slicing_exprs: List[str] | None = None¶
List of column expressions to slice data with for targeted analysis. The data is grouped by each expression independently, resulting in a separate slice for each predicate and its complements. For example slicing_exprs=[“col_1”, “col_2 > 10”] will generate the following slices: two slices for col_2 > 10 (True and False), and one slice per unique value in col1. For high-cardinality columns, only the top 100 unique values by frequency will generate slices.
- snapshot: SnapshotConfig | None = None¶
Analysis Configuration for monitoring snapshot tables.
- status: DataProfilingStatus | None = None¶
The data profiling monitor status.
- time_series: TimeSeriesConfig | None = None¶
Analysis Configuration for monitoring time series tables.
- warehouse_id: str | None = None¶
Optional argument to specify the warehouse for dashboard creation. If not specified, the first running warehouse will be used.
- as_dict() dict¶
Serializes the DataProfilingConfig into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the DataProfilingConfig into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) DataProfilingConfig¶
Deserializes the DataProfilingConfig from a dictionary.
- class databricks.sdk.service.dataquality.DataProfilingCustomMetric(name: str, definition: str, input_columns: List[str], output_data_type: str, type: DataProfilingCustomMetricType)¶
Custom metric definition.
- name: str¶
Name of the metric in the output tables.
- definition: str¶
Jinja template for a SQL expression that specifies how to compute the metric. See [create metric definition].
[create metric definition]: https://docs.databricks.com/en/lakehouse-monitoring/custom-metrics.html#create-definition
- input_columns: List[str]¶
A list of column names in the input table the metric should be computed for. Can use
":table"to indicate that the metric needs information from multiple columns.
- output_data_type: str¶
The output type of the custom metric.
- type: DataProfilingCustomMetricType¶
The type of the custom metric.
- as_dict() dict¶
Serializes the DataProfilingCustomMetric into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the DataProfilingCustomMetric into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) DataProfilingCustomMetric¶
Deserializes the DataProfilingCustomMetric from a dictionary.
- class databricks.sdk.service.dataquality.DataProfilingCustomMetricType¶
The custom metric type.
- DATA_PROFILING_CUSTOM_METRIC_TYPE_AGGREGATE = "DATA_PROFILING_CUSTOM_METRIC_TYPE_AGGREGATE"¶
- DATA_PROFILING_CUSTOM_METRIC_TYPE_DERIVED = "DATA_PROFILING_CUSTOM_METRIC_TYPE_DERIVED"¶
- DATA_PROFILING_CUSTOM_METRIC_TYPE_DRIFT = "DATA_PROFILING_CUSTOM_METRIC_TYPE_DRIFT"¶
- class databricks.sdk.service.dataquality.DataProfilingStatus¶
The status of the data profiling monitor.
- DATA_PROFILING_STATUS_ACTIVE = "DATA_PROFILING_STATUS_ACTIVE"¶
- DATA_PROFILING_STATUS_DELETE_PENDING = "DATA_PROFILING_STATUS_DELETE_PENDING"¶
- DATA_PROFILING_STATUS_ERROR = "DATA_PROFILING_STATUS_ERROR"¶
- DATA_PROFILING_STATUS_FAILED = "DATA_PROFILING_STATUS_FAILED"¶
- DATA_PROFILING_STATUS_PENDING = "DATA_PROFILING_STATUS_PENDING"¶
- class databricks.sdk.service.dataquality.InferenceLogConfig(problem_type: InferenceProblemType, timestamp_column: str, granularities: List[AggregationGranularity], prediction_column: str, model_id_column: str, label_column: str | None = None)¶
Inference log configuration.
- problem_type: InferenceProblemType¶
Problem type the model aims to solve.
- timestamp_column: str¶
Column for the timestamp.
- granularities: List[AggregationGranularity]¶
List of granularities to use when aggregating data into time windows based on their timestamp.
- prediction_column: str¶
Column for the prediction.
- model_id_column: str¶
Column for the model identifier.
- label_column: str | None = None¶
Column for the label.
- as_dict() dict¶
Serializes the InferenceLogConfig into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the InferenceLogConfig into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) InferenceLogConfig¶
Deserializes the InferenceLogConfig from a dictionary.
- class databricks.sdk.service.dataquality.InferenceProblemType¶
Inference problem type the model aims to solve.
- INFERENCE_PROBLEM_TYPE_CLASSIFICATION = "INFERENCE_PROBLEM_TYPE_CLASSIFICATION"¶
- INFERENCE_PROBLEM_TYPE_REGRESSION = "INFERENCE_PROBLEM_TYPE_REGRESSION"¶
- class databricks.sdk.service.dataquality.ListMonitorResponse(monitors: List[Monitor] | None = None, next_page_token: str | None = None)¶
Response for listing Monitors.
- next_page_token: str | None = None¶
- as_dict() dict¶
Serializes the ListMonitorResponse into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the ListMonitorResponse into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) ListMonitorResponse¶
Deserializes the ListMonitorResponse from a dictionary.
- class databricks.sdk.service.dataquality.ListRefreshResponse(next_page_token: str | None = None, refreshes: List[Refresh] | None = None)¶
Response for listing refreshes.
- next_page_token: str | None = None¶
- as_dict() dict¶
Serializes the ListRefreshResponse into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the ListRefreshResponse into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) ListRefreshResponse¶
Deserializes the ListRefreshResponse from a dictionary.
- class databricks.sdk.service.dataquality.Monitor(object_type: str, object_id: str, anomaly_detection_config: AnomalyDetectionConfig | None = None, data_profiling_config: DataProfilingConfig | None = None)¶
Monitor for the data quality of unity catalog entities such as schema or table.
- object_type: str¶
The type of the monitored object. Can be one of the following: schema or table.
- object_id: str¶
The UUID of the request object. It is schema_id for schema, and table_id for table.
Find the schema_id from either: 1. The [schema_id] of the Schemas resource. 2. In [Catalog Explorer] > select the schema > go to the Details tab > the Schema ID field.
Find the table_id from either: 1. The [table_id] of the Tables resource. 2. In [Catalog Explorer] > select the table > go to the Details tab > the Table ID field.
[Catalog Explorer]: https://docs.databricks.com/aws/en/catalog-explorer/ [schema_id]: https://docs.databricks.com/api/workspace/schemas/get#schema_id [table_id]: https://docs.databricks.com/api/workspace/tables/get#table_id
- anomaly_detection_config: AnomalyDetectionConfig | None = None¶
Anomaly Detection Configuration, applicable to schema object types.
- data_profiling_config: DataProfilingConfig | None = None¶
Data Profiling Configuration, applicable to table object types. Exactly one Analysis Configuration must be present.
- as_dict() dict¶
Serializes the Monitor into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the Monitor into a shallow dictionary of its immediate attributes.
- class databricks.sdk.service.dataquality.NotificationDestination(email_addresses: List[str] | None = None)¶
Destination of the data quality monitoring notification.
- email_addresses: List[str] | None = None¶
The list of email addresses to send the notification to. A maximum of 5 email addresses is supported.
- as_dict() dict¶
Serializes the NotificationDestination into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the NotificationDestination into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) NotificationDestination¶
Deserializes the NotificationDestination from a dictionary.
- class databricks.sdk.service.dataquality.NotificationSettings(on_failure: NotificationDestination | None = None)¶
Settings for sending notifications on the data quality monitoring.
- on_failure: NotificationDestination | None = None¶
Destinations to send notifications on failure/timeout.
- as_dict() dict¶
Serializes the NotificationSettings into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the NotificationSettings into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) NotificationSettings¶
Deserializes the NotificationSettings from a dictionary.
- class databricks.sdk.service.dataquality.Refresh(object_type: str, object_id: str, end_time_ms: int | None = None, message: str | None = None, refresh_id: int | None = None, start_time_ms: int | None = None, state: RefreshState | None = None, trigger: RefreshTrigger | None = None)¶
The Refresh object gives information on a refresh of the data quality monitoring pipeline.
- object_type: str¶
The type of the monitored object. Can be one of the following: schema`or `table.
- object_id: str¶
The UUID of the request object. It is schema_id for schema, and table_id for table.
Find the schema_id from either: 1. The [schema_id] of the Schemas resource. 2. In [Catalog Explorer] > select the schema > go to the Details tab > the Schema ID field.
Find the table_id from either: 1. The [table_id] of the Tables resource. 2. In [Catalog Explorer] > select the table > go to the Details tab > the Table ID field.
[Catalog Explorer]: https://docs.databricks.com/aws/en/catalog-explorer/ [schema_id]: https://docs.databricks.com/api/workspace/schemas/get#schema_id [table_id]: https://docs.databricks.com/api/workspace/tables/get#table_id
- end_time_ms: int | None = None¶
Time when the refresh ended (milliseconds since 1/1/1970 UTC).
- message: str | None = None¶
An optional message to give insight into the current state of the refresh (e.g. FAILURE messages).
- refresh_id: int | None = None¶
Unique id of the refresh operation.
- start_time_ms: int | None = None¶
Time when the refresh started (milliseconds since 1/1/1970 UTC).
- state: RefreshState | None = None¶
The current state of the refresh.
- trigger: RefreshTrigger | None = None¶
What triggered the refresh.
- as_dict() dict¶
Serializes the Refresh into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the Refresh into a shallow dictionary of its immediate attributes.
- class databricks.sdk.service.dataquality.RefreshState¶
The state of the refresh.
- MONITOR_REFRESH_STATE_CANCELED = "MONITOR_REFRESH_STATE_CANCELED"¶
- MONITOR_REFRESH_STATE_FAILED = "MONITOR_REFRESH_STATE_FAILED"¶
- MONITOR_REFRESH_STATE_PENDING = "MONITOR_REFRESH_STATE_PENDING"¶
- MONITOR_REFRESH_STATE_RUNNING = "MONITOR_REFRESH_STATE_RUNNING"¶
- MONITOR_REFRESH_STATE_SUCCESS = "MONITOR_REFRESH_STATE_SUCCESS"¶
- MONITOR_REFRESH_STATE_UNKNOWN = "MONITOR_REFRESH_STATE_UNKNOWN"¶
- class databricks.sdk.service.dataquality.RefreshTrigger¶
The trigger of the refresh.
- MONITOR_REFRESH_TRIGGER_DATA_CHANGE = "MONITOR_REFRESH_TRIGGER_DATA_CHANGE"¶
- MONITOR_REFRESH_TRIGGER_MANUAL = "MONITOR_REFRESH_TRIGGER_MANUAL"¶
- MONITOR_REFRESH_TRIGGER_SCHEDULE = "MONITOR_REFRESH_TRIGGER_SCHEDULE"¶
- MONITOR_REFRESH_TRIGGER_UNKNOWN = "MONITOR_REFRESH_TRIGGER_UNKNOWN"¶
- class databricks.sdk.service.dataquality.SnapshotConfig¶
Snapshot analysis configuration.
- as_dict() dict¶
Serializes the SnapshotConfig into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the SnapshotConfig into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) SnapshotConfig¶
Deserializes the SnapshotConfig from a dictionary.
- class databricks.sdk.service.dataquality.TimeSeriesConfig(timestamp_column: str, granularities: List[AggregationGranularity])¶
Time series analysis configuration.
- timestamp_column: str¶
Column for the timestamp.
- granularities: List[AggregationGranularity]¶
List of granularities to use when aggregating data into time windows based on their timestamp.
- as_dict() dict¶
Serializes the TimeSeriesConfig into a dictionary suitable for use as a JSON request body.
- as_shallow_dict() dict¶
Serializes the TimeSeriesConfig into a shallow dictionary of its immediate attributes.
- classmethod from_dict(d: Dict[str, Any]) TimeSeriesConfig¶
Deserializes the TimeSeriesConfig from a dictionary.