Skip to main content

Interacting with the Databricks API using PowerShell

· 5 min read
Hasan Gural

In this post, we will be exploring how to engage with the Databricks API through PowerShell. I would love to cover the following topics as parts of this post:

[✔️] Prerequisites: We'll start with the basics, ensuring you have all the necessary setup done. This includes having the right PowerShell modules and permissions in place, and an overview of the Databricks environment we'll be interacting with.

[✔️] Authenticate to the Databricks API via Azure Access Token: Security is our top priority, so we'll walk through the process of securely authenticating to the Databricks API. We'll understand how to obtain and use an Azure access token, which is essential for making it get going.

[✔️] Retrieve the Databricks Resources Using the Databricks API: Once we're in, it's all about getting the information you need. We'll go over how to send requests to the API to retrieve details on your Databricks resources. Whether it's clusters, jobs, or notebooks, you'll learn how to pull the data you're searching for.

⚙️ Prerequisites

Before diving in, ensure you have the essentials ready:

  • PowerShell Modules: Install the Az.Accounts module, crucial for interacting with Azure resources.
  • Azure Access Token: An Azure access token is a must for Databricks API authentication. This is obtained from Azure Active Directory (AAD). Consider using a Service Principal, Managed Identity, or your personal identity for this.
  • Databricks Environment: A Databricks workspace should be operational. Ensure you have the data plane access to interact effectively with the Databricks environment.

🔑 Authenticate to the Databricks API via Azure Access Token

For the Databricks API access, securing an Azure access token is the first step. This token, a key to API authentication, is retrieved from the Azure Active Directory (AAD). The options to get this token include using a Service Principal, Managed Identity, or your individual identity. This step is foundational for establishing a secure connection with the Databricks API. There is another way to authenticate to the Databricks API, which is using the Personal Access Token (PAT). However, in this post, we will focus on the Azure Access Token.

First example is to use user identity to get the Azure Access Token. Here is the PowerShell script to get the Azure

  • DataBricks Instance URL: https://<databricks-instance>.azuredatabricks.net
  • Azure Active Directory (AAD) Tenant ID: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
  • DataBricks Resource ID: 2ff814a6-3304-4ab8-85cb-cd0e6f879c1d

$dataBricksInstanceUrl = "https://adb-3289562874852353.13.azuredatabricks.net"
$tenantId = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
$dataBricksResourceId = "2ff814a6-3304-4ab8-85cb-cd0e6f879c1d"

# Assume that you have already logged in to Azure using Connect-AzAccount
# Connect-AzAccount -Tenant $tenantId

$azureADToken = Get-AzAccessToken -ResourceUrl "2ff814a6-3304-4ab8-85cb-cd0e6f879c1d" | Select-Object -ExpandProperty Token

$body = @{

"lifetime_seconds" = 3600; # Token validity period in seconds
"comment" = "Token for cluster creation"

} | ConvertTo-Json

$headers = @{

"Authorization" = "Bearer $azureADToken"
"Content-Type" = "application/json"

}

$uri = "$databricksInstanceUrl/api/2.0/token/create"

$response = Invoke-RestMethod -Uri $uri -Method POST -Headers $headers -Body $body

return $response.token_value

When you execute the script mentioned above, you'll receive a Databricks token highlighted in the red box in the image below. This token is required to execute API calls securely within the Databricks environment. Keep this token safe, as it's the key to your Databricks access.

image

🔍 Retrieve the Databricks Resources Using the Databricks API

With our generated Databricks Token on our hand, we're ready to dive deeper and begin interacting with the Databricks API. This token unlocks the ability to retrieve information about various Databricks resources, including clusters, jobs, and notebooks. Below is a PowerShell script that I developed to fetch details about the clusters within your Databricks environment using the Databricks API.


# Function to get the list of clusters from the Databricks environment
function Get-DatabricksCluster {

param (

[string]$databricksInstanceUrl,
[string]$token

)

$headers = @{

"Authorization" = "Bearer $token"
"Content-Type" = "application/json"

}

# Get the list of clusters from the Databricks environment - API Endpoint: /api/2.0/clusters/list
$uri = "$databricksInstanceUrl/api/2.0/clusters/list"
$response = Invoke-RestMethod -Uri $uri -Method Get -Headers $headers
$getClusters = $response.clusters

if ($getClusters) {

$output = @()

ForEach($cluster in $getClusters) {

$obj = [PSCustomObject]@{

cluster_id = $cluster.cluster_id
cluster_name = $cluster.cluster_name
state = $cluster.state
}

$output += $obj

}

return $output
}
else {

return $null

}

}

Here is the example of the PowerShell Function that I developed to get the list of the clusters from the Databricks environment. Let's get load function and execute it to get the list of clusters from the Databricks environment.


Get-DatabricksCluster -databricksInstanceUrl $databricksInstanceUrl -token $response.token_value

cluster_idcluster_namestate
0310-143737-1c3vorwgHasan Gural's PowerShell APIPENDING
0303-120959-4j5dagcdHasan Gural's ClusterTERMINATED
0310-143737-1c3vorwgHasan Gural's PowerShell v2PENDING

The output of the script will be similar to the below image. This script will return the list of clusters from the Databricks environment.

image

As we come to the end of this article, I hope it proves to be a useful resource for those looking to use PowerShell to automate Databricks Workspace's operations. Whether you're just starting out or looking to refine your existing processes, I believe the potential for simplification can be a game changer for your automation. Happy scripting and more automated Databricks management!