w.files: Files¶
- class databricks.sdk.service.files.FilesExt¶
The Files API is a standard HTTP API that allows you to read, write, list, and delete files and directories by referring to their URI. The API makes working with file content as raw bytes easier and more efficient.
The API supports [Unity Catalog volumes], where files and directories to operate on are specified using their volume URI path, which follows the format /Volumes/<catalog_name>/<schema_name>/<volume_name>/<path_to_file>.
The Files API has two distinct endpoints, one for working with files (/fs/files) and another one for working with directories (/fs/directories). Both endpoints use the standard HTTP methods GET, HEAD, PUT, and DELETE to manage files and directories specified using their URI path. The path is always absolute.
Use of Files API may incur Databricks data transfer charges.
[Unity Catalog volumes]: https://docs.databricks.com/en/connect/unity-catalog/volumes.html
- create_directory(directory_path: str)¶
Creates an empty directory. If necessary, also creates any parent directories of the new, empty directory (like the shell command mkdir -p). If called on an existing directory, returns a success response; this method is idempotent (it will succeed if the directory already exists).
- Parameters:
directory_path – str The absolute path of a directory.
- delete(file_path: str)¶
Deletes a file. If the request is successful, there is no response body.
- Parameters:
file_path – str The absolute path of the file.
- delete_directory(directory_path: str)¶
Deletes an empty directory.
To delete a non-empty directory, first delete all of its contents. This can be done by listing the directory contents and deleting each file and subdirectory recursively.
- Parameters:
directory_path – str The absolute path of a directory.
- download(file_path: str) DownloadResponse¶
Download a file.
Downloads a file as a stream into memory.
Use this when you want to process the downloaded file in memory or pipe it into another system. Supports files of any size in SDK v0.72.0+. Earlier versions have a 5 GB file size limit.
If the download is successful, the function returns the downloaded file result. If the download is unsuccessful, the function raises an exception.
- Parameters:
file_path – str The remote path of the file, e.g. /Volumes/path/to/your/file
- Returns:
- download_to(file_path: str, destination: str [, overwrite: bool = True, use_parallel: bool = False, parallelism: Optional[int]]) DownloadFileResult¶
Downloads a file directly to a local file path.
Use this when you want to write the file straight to disk instead of holding it in memory. Supports files of any size in SDK v0.72.0+. Earlier versions have a 5 GB file size limit.
Supports parallel download (use_parallel=True), which may improve performance for large files. This is available on all operating systems except Windows.
- Parameters:
file_path – str The remote path of the file, e.g. /Volumes/path/to/your/file
destination – str The local path where the file will be saved.
overwrite – bool If true, an existing file will be overwritten. When not specified, defaults to True.
use_parallel – bool If true, the download will be performed using multiple threads.
parallelism – int The number of parallel threads to use for downloading. If not specified, defaults to the number of CPU cores.
- Returns:
DownloadFileResult
- get_directory_metadata(directory_path: str)¶
Get the metadata of a directory. The response HTTP headers contain the metadata. There is no response body.
This method is useful to check if a directory exists and the caller has access to it.
If you wish to ensure the directory exists, you can instead use PUT, which will create the directory if it does not exist, and is idempotent (it will succeed if the directory already exists).
- Parameters:
directory_path – str The absolute path of a directory.
- get_metadata(file_path: str) GetMetadataResponse¶
Get the metadata of a file. The response HTTP headers contain the metadata. There is no response body.
- Parameters:
file_path – str The absolute path of the file.
- Returns:
- list_directory_contents(directory_path: str [, page_size: Optional[int], page_token: Optional[str]]) Iterator[DirectoryEntry]¶
Returns the contents of a directory. If there is no directory at the specified path, the API returns a HTTP 404 error.
- Parameters:
directory_path – str The absolute path of a directory.
page_size –
int (optional) The maximum number of directory entries to return. The response may contain fewer entries. If the response contains a next_page_token, there may be more entries, even if fewer than page_size entries are in the response.
We recommend not to set this value unless you are intentionally listing less than the complete directory contents.
If unspecified, at most 1000 directory entries will be returned. The maximum value is 1000. Values above 1000 will be coerced to 1000.
page_token – str (optional) An opaque page token which was the next_page_token in the response of the previous request to list the contents of this directory. Provide this token to retrieve the next page of directory entries. When providing a page_token, all other parameters provided to the request must match the previous request. To list all of the entries in a directory, it is necessary to continue requesting pages of entries until the response contains no next_page_token. Note that the number of entries returned must not be used to determine when the listing is complete.
- Returns:
Iterator over
DirectoryEntry
- upload(file_path: str, contents: BinaryIO [, overwrite: Optional[bool], part_size: Optional[int], use_parallel: bool = True, parallelism: Optional[int]]) UploadStreamResult¶
Uploads a file from memory or a stream interface.
Use this when you want to upload data already in memory or piped from another system. Supports files of any size in SDK v0.72.0+. Earlier versions have a 5 GB file size limit.
Limitations: If the storage account is on Azure and has firewall enabled, the maximum file size is 5GB.
- Parameters:
file_path – str The absolute remote path of the target file, e.g. /Volumes/path/to/your/file
contents – BinaryIO The contents of the file to upload. This must be a BinaryIO stream.
overwrite – bool (optional) If true, an existing file will be overwritten. When not specified, defaults to True.
part_size – int (optional) If set, multipart upload will use the value as its size per uploading part. If not set, an appropriate value will be automatically used.
use_parallel – bool (optional) If true, the upload will be performed using multiple threads. Note that this will consume more memory because multiple parts will be buffered in memory before being uploaded. The amount of memory used is proportional to parallelism * part_size. If false, the upload will be performed in a single thread. Default is True.
parallelism – int (optional) The number of threads to use for parallel uploads. This is only used if use_parallel is True.
- Returns:
UploadStreamResult
- upload_from(file_path: str, source_path: str [, overwrite: Optional[bool], part_size: Optional[int], use_parallel: bool = True, parallelism: Optional[int]]) UploadFileResult¶
Uploads a file from a local file path.
Use this when your data already exists on disk and you want to upload it directly without manually opening it yourself. Supports files of any size in SDK v0.72.0+. Earlier versions have a 5 GB file size limit.
- Parameters:
file_path – str The absolute remote path of the target file.
source_path – str The local path of the file to upload. This must be a path to a local file.
part_size – int (optional) If set, multipart upload will use the value as its size per uploading part. If not set, an appropriate default value will be automatically used.
overwrite – bool (optional) If true, an existing file will be overwritten. When not specified, defaults True.
use_parallel – bool (optional) If true, the upload will be performed using multiple threads. Default is True.
parallelism – int (optional) The number of threads to use for parallel uploads. This is only used if use_parallel is True. If not specified, the default parallelism will be set to config.multipart_upload_default_parallelism
- Returns:
UploadFileResult