When debugging errors in chef recipes, I usually like logging on to a problematic machine, typing “shef -z”. In the case that there was a previous successful run, the registered run_list is automatically loaded with all the relevant attributes and recipes so that I can jump right in to try the offending lines.
The problem is that if chef-client failed on the first run, the run_list is empty, and shef even deletes the (much needed) recipes from the cache saying “its cookbook is no longer needed on this client”. Is there some way to force shef to let me use my run_list?
The official wiki mentions:
“The run list will be set in the same way as normal chef client
runs.”
but trying “-j” doesn’t seem to help.
I also tried running the solo mode (“-s -j”) instead, as explained in this blog post. It loads the run_list (without expanding it, which is weird), but when I try to do anything, I get stuff like:
“Cookbook tomcat not found. If you’re loading tomcat from another
cookbook, make sure you configure the dependency in your metadata”.
Note that my recipe has a couple of dozen dependencies 3 levels deep, so pasting all the code manually to shef is not really an option. My current workaround is to upload a modified recipe with all possibly offending parts commented-out (and continue manually from there), but I would love to have a better alternative.
The recipes aren’t included by default when you start shef in client or solo modes, because a use case of shef is to determine why a recipe may not be loading at all, such as a compile time bug that causes some exception.
Therefore, you need to use include_recipe to include the recipes you wish to use. You can do this automatically for all the roles/recipes in the node’s run list with:
For more information on this, and a bunch of other great tips for using Shef to debug chef-client runs, see Steven Danna’s blog post