Monitor Databricks costs with the new Dashboard and budgets
In the good old days, I used to work as a BI developer in a big insurance company. I had to write code and build solutions for ETL and data manipulation, but I never really cared how much running my code cost.
The server and its software were purchased and belonged to the company. I had, of course, to make sure my code ran efficiently, and not cause the server to get stuck, but I didn’t care how much the physical server costs and how much operating system and software licenses cost.
Now, in the cloud era, things are different. I have amazing tools to do my job, but where the costs are per usage (i.e. Databricks) I have to be very careful. Inefficient code will run slower, and take more resources (and cost more money), and if I leave a service running without a purpose for a long time, the company might get a large bill from the cloud provider, and I might find myself in an embarrassing situation.
Databrick recently released some new features to help us monitor our spending:
- Account usage dashboard
- Budget monitoring and alerts
- The billing schema
Before we start reviewing those features, 2 notes:
- You must have Unity Catalog
- The usage this tool shows does not include all Databricks-related spending. It only includes costs directly associated with Databricks (DBU), but it does not include the virtual machine in the clusters costs. It also does not include costs for data stored in the customer storage (i.e. in Azure – Storage Accounts)
Account usage dashboard
The Account usage dashboard can show you details about Databricks usage and costs in your organization.
To enable it, you need to be an account admin. Go to the Databricks account console (https://accounts.azuredatabricks.net/) and then go to usage. Under Consumption, you can enable the dashboard. You need to choose on which workspace to create it.
After you enable the dashboard, go to the selected workspace and then to SQL -> Dashboard. Open the “Account Usage Dashboard”.
The dashboard is created with Databricks’ new dashboarding capability. You can filter the usage data by workspace, date range, types of usage, and tags defined on clusters.
Since this dashboard is created on your workspace, its source code is open to you, and you can edit and adjust it to your needs.
Budgets
Budgets help you keep track of how much you spend each month compared to a predefined target, and raise and alert if you go over the limit. Please note that it does not stop or limit your costs. It only show the status, and if defined, send an alert (by email).
Budgets are still in preview.
Budgets can also be set up on the account console under Usage, under the Budgets tab.
Click on “Add Budget”.
You can set up the amount which is the target, add filters by workspace(s) and by tags (that you set on the clusters). To receive a notification when you go over the limit, add an email address (or many).
After creation, you can view the current status of the budget:
The billing schema
Both the dashboard and budget use behind the scenes the new billing schema on this system catalog. You can find it on your catalog list if you have the right permissions (and remember, you need Unity Catalog). It contains 2 tables:
- list_prices – has the prices for each use type
- usage – contains the actual usage details: start and end times, type of use, and more
We can query these tables if we want to build our own reports on consumption.
Here is an example of a query checking how much DBU was used in jobs in the 10 days.
SELECT
usage_date as `Date`, sum(usage_quantity) as `DBUs Consumed`
FROM
system.billing.usage
WHERE
sku_name like "%JOB%"
GROUP BY
usage_date
ORDER BY
usage_date desc
limit 10
For more query examples, see here: https://learn.microsoft.com/en-us/azure/databricks/admin/system-tables/billing#sample-queries
Conclusion
Databricks is a pay-for-what-you-use service, so monitoring your costs is very important. With the new dashboard and budgets, based on system tables, you can do it easily.