w.vector_search_indexes: Indexes¶
- class databricks.sdk.service.vectorsearch.VectorSearchIndexesAPI¶
Index: An efficient representation of your embedding vectors that supports real-time and efficient approximate nearest neighbor (ANN) search queries.
There are 2 types of AI Search indexes: - Delta Sync Index: An index that automatically syncs with a source Delta Table, automatically and incrementally updating the index as the underlying data in the Delta Table changes. - Direct Vector Access Index: An index that supports direct read and write of vectors and metadata through our REST and SDK APIs. With this model, the user manages index updates.
- create_index(name: str, endpoint_name: str, primary_key: str, index_type: VectorIndexType [, delta_sync_index_spec: Optional[DeltaSyncVectorIndexSpecRequest], direct_access_index_spec: Optional[DirectAccessVectorIndexSpec], index_subtype: Optional[IndexSubtype]]) VectorIndex¶
Create a new index.
- Parameters:
name – str Name of the index
endpoint_name – str Name of the endpoint to be used for serving the index
primary_key – str Primary key of the index
index_type –
VectorIndexTypedelta_sync_index_spec –
DeltaSyncVectorIndexSpecRequest(optional) Specification for Delta Sync Index. Required if index_type is DELTA_SYNC.direct_access_index_spec –
DirectAccessVectorIndexSpec(optional) Specification for Direct Vector Access Index. Required if index_type is DIRECT_ACCESS.index_subtype –
IndexSubtype(optional) The subtype of the index. Use HYBRID or FULL_TEXT. VECTOR is not supported.
- Returns:
- delete_data_vector_index(index_name: str, primary_keys: List[str]) DeleteDataVectorIndexResponse¶
Handles the deletion of data from a specified vector index.
- Parameters:
index_name – str Name of the vector index where data is to be deleted. Must be a Direct Vector Access Index.
primary_keys – List[str] List of primary keys for the data to be deleted.
- Returns:
- delete_index(index_name: str)¶
Delete an index.
- Parameters:
index_name – str Name of the index
- get_index(index_name: str [, ensure_reranker_compatible: Optional[bool]]) VectorIndex¶
Get an index.
- Parameters:
index_name – str Name of the index
ensure_reranker_compatible – bool (optional) If true, the URL returned for the index is guaranteed to be compatible with the reranker. Currently this means we return the CP URL regardless of how the index is being accessed. If not set or set to false, the URL may still be compatible with the reranker depending on what URL we return.
- Returns:
- list_indexes(endpoint_name: str [, page_token: Optional[str]]) Iterator[MiniVectorIndex]¶
List all indexes in the given endpoint.
- Parameters:
endpoint_name – str Name of the endpoint
page_token – str (optional) Token for pagination
- Returns:
Iterator over
MiniVectorIndex
- query_index(index_name: str, columns: List[str] [, columns_to_rerank: Optional[List[str]], facets: Optional[List[str]], filters_json: Optional[str], num_results: Optional[int], query_columns: Optional[List[str]], query_text: Optional[str], query_type: Optional[str], query_vector: Optional[List[float]], reranker: Optional[RerankerConfig], score_threshold: Optional[float], sort_columns: Optional[List[str]]]) QueryVectorIndexResponse¶
Query the specified vector index.
- Parameters:
index_name – str Name of the vector index to query.
columns – List[str] List of column names to include in the response.
columns_to_rerank – List[str] (optional) Column names used to retrieve data to send to the reranker.
facets – List[str] (optional) Facets to compute over the matched results. Each entry has one of these forms: “<column>” - top 10 distinct values by count “<column> TOP <n>” - top n distinct values, where n > 0 “<column> BUCKETS [[from,to],…]” - inclusive numeric ranges TOP and BUCKETS are case-insensitive. A column may appear at most once.
filters_json –
str (optional) JSON string representing query filters.
Example filters:
{“id <”: 5}: Filter for id less than 5. - {“id >”: 5}: Filter for id greater than 5. - `{“id
<=”: 5}`: Filter for id less than equal to 5. - {“id >=”: 5}: Filter for id greater than equal to 5. - {“id”: 5}: Filter for id equal to 5.
num_results – int (optional) Number of results to return. Defaults to 10.
query_columns – List[str] (optional) Text columns to search for query_text. When empty, all text columns are searched.
query_text – str (optional) Query text. Required for Delta Sync Index using model endpoint.
query_type – str (optional) The query type to use. Choices are ANN and HYBRID and FULL_TEXT. Defaults to ANN.
query_vector – List[float] (optional) Query vector. Required for Direct Vector Access Index and Delta Sync Index using self-managed vectors.
reranker –
RerankerConfig(optional) If set, the top 50 results are reranked with the Databricks Reranker model before returning the num_results results to the user. The setting columns_to_rerank selects which columns are used for reranking. For each datapoint, the columns selected are concatenated before being sent to the reranking model. See https://docs.databricks.com/aws/en/vector-search/query-vector-search#rerank for more information.score_threshold – float (optional) Threshold for the approximate nearest neighbor search. Defaults to 0.0.
sort_columns – List[str] (optional) Sort results by column values instead of the default relevance ordering. Each clause has the form “<column> ASC” or “<column> DESC”, for example [“rating DESC”, “price ASC”].
- Returns:
- query_next_page(index_name: str [, endpoint_name: Optional[str], page_token: Optional[str]]) QueryVectorIndexResponse¶
Use next_page_token returned from previous QueryVectorIndex or QueryVectorIndexNextPage request to fetch next page of results.
- Parameters:
index_name – str Name of the vector index to query.
endpoint_name – str (optional) Name of the endpoint.
page_token – str (optional) Page token returned from previous QueryVectorIndex or QueryVectorIndexNextPage API.
- Returns:
- scan_index(index_name: str [, last_primary_key: Optional[str], num_results: Optional[int]]) ScanVectorIndexResponse¶
Scan the specified vector index and return the first num_results entries after the exclusive primary_key.
- Parameters:
index_name – str Name of the vector index to scan.
last_primary_key – str (optional) Primary key of the last entry returned in the previous scan.
num_results – int (optional) Number of results to return. Defaults to 10.
- Returns:
- sync_index(index_name: str)¶
Triggers a synchronization process for a specified vector index.
- Parameters:
index_name – str Name of the vector index to synchronize. Must be a Delta Sync Index.
- upsert_data_vector_index(index_name: str, inputs_json: str) UpsertDataVectorIndexResponse¶
Handles the upserting of data into a specified vector index.
- Parameters:
index_name – str Name of the vector index where data is to be upserted. Must be a Direct Vector Access Index.
inputs_json – str JSON string representing the data to be upserted.
- Returns: