w.vector_search_indexes: Indexes

class databricks.sdk.service.vectorsearch.VectorSearchIndexesAPI

Index: An efficient representation of your embedding vectors that supports real-time and efficient approximate nearest neighbor (ANN) search queries.

There are 2 types of AI Search indexes: - Delta Sync Index: An index that automatically syncs with a source Delta Table, automatically and incrementally updating the index as the underlying data in the Delta Table changes. - Direct Vector Access Index: An index that supports direct read and write of vectors and metadata through our REST and SDK APIs. With this model, the user manages index updates.

create_index(name: str, endpoint_name: str, primary_key: str, index_type: VectorIndexType [, delta_sync_index_spec: Optional[DeltaSyncVectorIndexSpecRequest], direct_access_index_spec: Optional[DirectAccessVectorIndexSpec], index_subtype: Optional[IndexSubtype]]) VectorIndex

Create a new index.

Parameters:
  • name – str Name of the index

  • endpoint_name – str Name of the endpoint to be used for serving the index

  • primary_key – str Primary key of the index

  • index_typeVectorIndexType

  • delta_sync_index_specDeltaSyncVectorIndexSpecRequest (optional) Specification for Delta Sync Index. Required if index_type is DELTA_SYNC.

  • direct_access_index_specDirectAccessVectorIndexSpec (optional) Specification for Direct Vector Access Index. Required if index_type is DIRECT_ACCESS.

  • index_subtypeIndexSubtype (optional) The subtype of the index. Use HYBRID or FULL_TEXT. VECTOR is not supported.

Returns:

VectorIndex

delete_data_vector_index(index_name: str, primary_keys: List[str]) DeleteDataVectorIndexResponse

Handles the deletion of data from a specified vector index.

Parameters:
  • index_name – str Name of the vector index where data is to be deleted. Must be a Direct Vector Access Index.

  • primary_keys – List[str] List of primary keys for the data to be deleted.

Returns:

DeleteDataVectorIndexResponse

delete_index(index_name: str)

Delete an index.

Parameters:

index_name – str Name of the index

get_index(index_name: str [, ensure_reranker_compatible: Optional[bool]]) VectorIndex

Get an index.

Parameters:
  • index_name – str Name of the index

  • ensure_reranker_compatible – bool (optional) If true, the URL returned for the index is guaranteed to be compatible with the reranker. Currently this means we return the CP URL regardless of how the index is being accessed. If not set or set to false, the URL may still be compatible with the reranker depending on what URL we return.

Returns:

VectorIndex

list_indexes(endpoint_name: str [, page_token: Optional[str]]) Iterator[MiniVectorIndex]

List all indexes in the given endpoint.

Parameters:
  • endpoint_name – str Name of the endpoint

  • page_token – str (optional) Token for pagination

Returns:

Iterator over MiniVectorIndex

query_index(index_name: str, columns: List[str] [, columns_to_rerank: Optional[List[str]], facets: Optional[List[str]], filters_json: Optional[str], num_results: Optional[int], query_columns: Optional[List[str]], query_text: Optional[str], query_type: Optional[str], query_vector: Optional[List[float]], reranker: Optional[RerankerConfig], score_threshold: Optional[float], sort_columns: Optional[List[str]]]) QueryVectorIndexResponse

Query the specified vector index.

Parameters:
  • index_name – str Name of the vector index to query.

  • columns – List[str] List of column names to include in the response.

  • columns_to_rerank – List[str] (optional) Column names used to retrieve data to send to the reranker.

  • facets – List[str] (optional) Facets to compute over the matched results. Each entry has one of these forms: “<column>” - top 10 distinct values by count “<column> TOP <n>” - top n distinct values, where n > 0 “<column> BUCKETS [[from,to],…]” - inclusive numeric ranges TOP and BUCKETS are case-insensitive. A column may appear at most once.

  • filters_json

    str (optional) JSON string representing query filters.

    Example filters:

    • {“id <”: 5}: Filter for id less than 5. - {“id >”: 5}: Filter for id greater than 5. - `{“id

    <=”: 5}`: Filter for id less than equal to 5. - {“id >=”: 5}: Filter for id greater than equal to 5. - {“id”: 5}: Filter for id equal to 5.

  • num_results – int (optional) Number of results to return. Defaults to 10.

  • query_columns – List[str] (optional) Text columns to search for query_text. When empty, all text columns are searched.

  • query_text – str (optional) Query text. Required for Delta Sync Index using model endpoint.

  • query_type – str (optional) The query type to use. Choices are ANN and HYBRID and FULL_TEXT. Defaults to ANN.

  • query_vector – List[float] (optional) Query vector. Required for Direct Vector Access Index and Delta Sync Index using self-managed vectors.

  • rerankerRerankerConfig (optional) If set, the top 50 results are reranked with the Databricks Reranker model before returning the num_results results to the user. The setting columns_to_rerank selects which columns are used for reranking. For each datapoint, the columns selected are concatenated before being sent to the reranking model. See https://docs.databricks.com/aws/en/vector-search/query-vector-search#rerank for more information.

  • score_threshold – float (optional) Threshold for the approximate nearest neighbor search. Defaults to 0.0.

  • sort_columns – List[str] (optional) Sort results by column values instead of the default relevance ordering. Each clause has the form “<column> ASC” or “<column> DESC”, for example [“rating DESC”, “price ASC”].

Returns:

QueryVectorIndexResponse

query_next_page(index_name: str [, endpoint_name: Optional[str], page_token: Optional[str]]) QueryVectorIndexResponse

Use next_page_token returned from previous QueryVectorIndex or QueryVectorIndexNextPage request to fetch next page of results.

Parameters:
  • index_name – str Name of the vector index to query.

  • endpoint_name – str (optional) Name of the endpoint.

  • page_token – str (optional) Page token returned from previous QueryVectorIndex or QueryVectorIndexNextPage API.

Returns:

QueryVectorIndexResponse

scan_index(index_name: str [, last_primary_key: Optional[str], num_results: Optional[int]]) ScanVectorIndexResponse

Scan the specified vector index and return the first num_results entries after the exclusive primary_key.

Parameters:
  • index_name – str Name of the vector index to scan.

  • last_primary_key – str (optional) Primary key of the last entry returned in the previous scan.

  • num_results – int (optional) Number of results to return. Defaults to 10.

Returns:

ScanVectorIndexResponse

sync_index(index_name: str)

Triggers a synchronization process for a specified vector index.

Parameters:

index_name – str Name of the vector index to synchronize. Must be a Delta Sync Index.

upsert_data_vector_index(index_name: str, inputs_json: str) UpsertDataVectorIndexResponse

Handles the upserting of data into a specified vector index.

Parameters:
  • index_name – str Name of the vector index where data is to be upserted. Must be a Direct Vector Access Index.

  • inputs_json – str JSON string representing the data to be upserted.

Returns:

UpsertDataVectorIndexResponse