Chen Hirsh's Data Engineering Blog Blog
Exploring the Databricks Debugger: Writing flawless code on the first try is a dream, but debugging is a reality for most developers. In this post, I dive into the new Databricks code cell debugger, sharing my first impressions and tips for getting started with this powerful tool.
System tables on Databricks can help us monitor and manage our Data Warehouse. In this post I’ll show how to enable them and how to install the Jobs Dashboard based on system tables.
Cleaning data is a very common task for data professionals. In this post, I demonstrate a few common data cleaning task with spark Python and SQL.
As Data Engineers we need to monitor usage and costs of our data solutions. Databricks lately released tools to help use do that: the Account Usage Dashboard and Budgets. Both based on the “Billing” system schema.
How using SQL Windows functions with non unique order column can cause indeterminate results
Databricks recently added a for-each task to their workflow capability. How does it work and what are its limitations?
Cloning tables in Databricks is a fast way to create replicated data for test proposes, or archiving. Explore the different types of table cloning, each with its pros and cons.
Excel is one of the most common data file formats, and, as data engineers, we are required to read data from it on almost every project. Working in Databricks, you can read and write Excel files, but you need to pay attention to some pitfalls.
A simple source control for SQL server code objects like views and stored procedures.
A Python variable has vanished! Can you help the confused Data Engineer find out why?
Google Analytics is a popular tool to measure your website traffic. In the post I will show how to read data from Google Analytics into Databricks to be used in your own reports or Data warehouse