The mystery of the vanishing variable
It’s kind of a funny story (in retrospect, anyhow).
A friend called to ask for my help with a weird issue. In a Databricks notebook using Python, he declares and assigns a variable in the first cell. Something like that:
my_var = 1
He then runs the rest of the notebook, and somewhere along the way, tries to use this variable, and gets this message:
NameError: name 'my_var' is not defined
Going back to cell 1, and checking the value of my_var, he gets the same error.
I have to admit that it took me some time to figure it out.
Going through the cells between the first cell (variable declaration) and the last cell (using the variable), I finally found a cell that installs a library:
%pip install some_library
And in the output of the cell, in plain English:
Python interpreter will be restarted
Embarrassing to admit that I missed it before, the issue was now obvious. When the Python environment is restarted (or the Databricks cluster, but that’s another issue), all the variables are wiped out.
So, what we learned today?
- Always keep cells with pip install in the beginning of the notebook, or even better, install the libraries on the cluster .
- When you have an issue with variables, use the right-side Variable Explorer, to view the value of the variable while running cells one by one.