… or as I like to call it, "A lesson on the pitfalls of doing anything remotely complicated with batch scripting". To preface all of this, we have a fairly primitive method of managing database schema and data updates for our main product here at Day Job. The database has a UDF that returns a double-value for the database version. We then have a template for update scripts that handles the plumbing of checking the version, determining if the update should be applied, applying the update, and then updating the UDF to the correct version number. This means that you can run an update script, and if the scripts version number is greater than the database’s version number, the script will apply itself, otherwise it will say "no thanks" and politely exit. The entire update process is driven by a simple batch script that loops over the update scripts in the current directory and executes them in order. All in all, this solution is not pretty, but it actually works really well most of the time.
Unfortunately, our batch script was a bit too simplistic, and it recently bit me in the butt. I ran the script on an old database, and it correctly began iterating through the update SQL scripts, executing each in turn. Unfortunately, one of the scripts in the middle failed (completely different can of worms that I won’t get into here), but the batch script kept on going to the next script. Suddenly, we had a database updated to the latest version, but it had completely skipped a revision! This is *bad*. The problem was that the script wasn’t checking the return code from sqlcmd.exe. I thought "hey, adding a check on the return and breaking out of the for loop should be easy!" Sadly, nothing could have been further from the truth.
In batch scripts, you can access the return code from a previous command using the ERRORLEVEL variable. So, here’s what I did:
Seems simple, right? If ERRORLEVEL is not 0, then something went wrong, so break out of the loop and go to the error handler. Except ERRORLEVEL was always 0. ALWAYS. I could replace TARGET and DATABASE (defined elsewhere in the script) with garbage, and sqlcmd.exe would indeed crap out, but ERRORLEVEL was still 0. I ran sqlcmd.exe outside of a script and verified that it was indeed setting ERRORLEVEL, but it obviously wasn’t working in my script.
After a lot of Googling, I managed to piece together this important tidbit: ERRORLEVEL does not behave as you would expect it to behave inside of a FOR loop. Inside the loop, ERRORLEVEL is not modified by any of the commands within the loop, so it will have whatever value it had prior to entering the loop. The solution? Well, it’s so obvious that I don’t know why I didn’t think of it the first time (<— sarcasm)!
At the top of the script, I added ‘setlocal enabledelayedexpansion’, then in the FOR loop, I changed from ‘%ERRORLEVEL%’ to ‘!ERRORLEVEL!’, and like magic, everything STILL DIDN’T WORK! Not completely, anyway. The solution would work as long as I did something like pass in an invalid database name, but ERRORLEVEL still wasn’t being set if a problem occurred in the SQL statements that were being executed. Fortunately, that was much easier to fix as sqlcmd.exe has decent documentation: just add the ‘-b’ flag:
Finally, the script works, and the update process is aborted if a script encounters an error. The moral of this story is: batch scripting sucks, use PowerShell, and be very careful if you are mixing ERRORLEVEL with FOR loops.