How to enable system tables on Databricks
This post is about two things that are dependent on each other. First, it explains how to enable system tables (other than the one enabled by default) and second, how to use these system tables to view data about workflow runs and costs.
Please note that Unity Catalog is required to use these features. And a premium workspace is required for using dashboards.
What are system tables?
Databricks share important data with its users that can be used for monitoring and management. For example, the billing schema, which is enabled by default, contains data about usage and costs in two tables: usage and list_prices.
We can enable more tables (we are actually enabling schemas, each with one or more tables) to get other areas of data for monitoring. See the full list here (note that some of these might be in preview): https://learn.microsoft.com/en-us/azure/databricks/admin/system-tables/
The link above contains the steps to enable system tables using bash commands. Since this can be done inside a Databricks notebook, I thought many people would want to use a simple notebook to do this, so I created a notebook with a sample code. You can find this link at the end of this post.
Let’s go through the main steps:
Create an access token for Databricks. See instructions here: https://learn.microsoft.com/en-us/azure/databricks/dev-tools/auth/pat#azure-databricks-personal-access-tokens-for-workspace-users
Get the workspace URL and metastore ID:
token = "*************************"
#get the current workspace url
workspace_url = spark.conf.get("spark.databricks.workspaceUrl")
#get the metastore ID
metastore = spark.sql("select current_metastore()").first()[0]
metastoreid = metastore[metastore.find(":",metastore.find(":")+1)+1:]
See the list of system tables available for you:
import requests
headers = {"Authorization": f"Bearer {token}"}
url = f"https://{workspace_url}/api/2.0/unity-catalog/metastores/{metastoreid}/systemschemas"
resp = requests.get(url, headers=headers)
resp.text
Here is an example of the response:
{"schemas":
[
{"schema":"storage","state":"AVAILABLE"},
{"schema":"operational_data","state":"UNAVAILABLE"},
{"schema":"access","state":"AVAILABLE"},
{"schema":"billing","state":"ENABLE_COMPLETED"},
{"schema":"compute","state":"AVAILABLE"},
{"schema":"lakeflow","state":"AVAILABLE"}
]}
Optional statuses:
- UNAVAILABLE = not available for you
- AVAILABLE = You need to enable it to use it
- ENABLE_COMPLETED = ready for use
We can see that the “access” schema is available for us, but it’s not enabled yet. To enable it we need to use a put request to the API:
schema_name = "access"
put_url = f"https://{workspace_url}/api/2.0/unity-catalog/metastores/{metastoreid}/systemschemas/{schema_name}"
put_resp
If the response = 200, the schema was enabled successfully.
Now let’s see if we can see the access schema in our catalog:
Installing the Jobs System Tables Dashboard
As an example of the use of system tables, let’s try the new “Jobs System Tables Dashboard”.
first, we need to enable three system schemas required for the dashboard. use the code above, or the example notebook (link at the end of the post) to enable these 3 schemas:
- lakeflow
- access
- compute
Go to the dashboard page on GitHub, and download the JSON file with the dashboard settings: https://github.com/databricks/tmm/blob/main/System-Tables-Demo/Jobs-PrPr/Jobs%20System%20Tables%20Dashboard.lvdash.json
In the Databricks workspace, go to dashboards, click on the dropdown next to “create dashboard” and click on “import dashboard from file”. Choose the file we downloaded in the previous steps.
This dashboard presents information about workflows (jobs) runs statuses (successes vs. failures), and running costs (the image below shows only the first graphs).
See the full notebook with code on GitHub: https://github.com/MadeiraData/microsoft-data-engineers-club/tree/main/Databricks/Enable%20System%20Tables