Category: Data Engineering

Data Engineering / Databricks / Delta Tables

April 21, 2025

Lakehouse spring cleaning – Vacuum your Delta tables

Your Lakehouse tables need some spring cleaning, too. Use the vacuum command to delete older versions and save on storage costs.

Data Engineering / Databricks / Python

March 9, 2025

Databricks job clusters are stricter than interactive clusters

While converting a list to a dataframe, I got a type error, but only on a job cluster, not on an interactive cluster. How to get around that and why does it happen?

Data Engineering / Databricks / Python

January 26, 2025

Write data to one CSV file in Databricks

Exporting data to a CSV file in Databricks can sometimes result in multiple files, odd filenames, and unnecessary metadata—issues that aren’t ideal when sharing data externally. This guide explores two practical solutions: using Pandas for small datasets and leveraging Spark’s coalesce to consolidate partitions into a single, clean file. Learn how to choose the right approach for your use case and ensure your CSV exports are efficient, shareable, and hassle-free.

Data Engineering / Python

November 4, 2024

How to enable system tables on Databricks

System tables on Databricks can help us monitor and manage our Data Warehouse. In this post I’ll show how to enable them and how to install the Jobs Dashboard based on system tables.

Data Engineering / Databricks / Fabric / Python / SQL

October 16, 2024

Cleaning Data with Spark

Cleaning data is a very common task for data professionals. In this post, I demonstrate a few common data cleaning task with spark Python and SQL.

Data Engineering / Databricks

September 24, 2024

Databricks workflows for-each – not quite there yet

Databricks recently added a for-each task to their workflow capability. How does it work and what are its limitations?

Data Engineering / Databricks / Python

September 8, 2024

Working with Excel files in Databricks

Excel is one of the most common data file formats, and, as data engineers, we are required to read data from it on almost every project. Working in Databricks, you can read and write Excel files, but you need to pay attention to some pitfalls.

Data Engineering

August 11, 2024

How to plan a successful Data Architecture

In the era of cloud computing, it’s really easy to create and change data services, so in each project we have architecture decisions to make, and each developer has to deal with these considerations.
This is a short summary of a meetup I gave about Data Architecture in the “Microsoft Data Engineers Club” community.

Category: Data Engineering

Lakehouse spring cleaning – Vacuum your Delta tables

Databricks job clusters are stricter than interactive clusters

Write data to one CSV file in Databricks

How to enable system tables on Databricks

Cleaning Data with Spark

Databricks workflows for-each – not quite there yet

Working with Excel files in Databricks

How to plan a successful Data Architecture

Recent Posts

Recent Comments

Categories