I agree we need to work out some issues load order of cookbook segments. Along with related ordering issues, we'll talk more about engineering a solution in the short term at Opscode.
The Chef server does not calculate a DAG. The issue we hold with a DAG is that order is not guaranteed. Unless you have a fixed path between all the cookbooks, as you increase the number of cookbooks the number of possible solutions increases and so does the likelihood that one Chef run will be in a different order than the one preceding it.
Consider an oversimplified example, that we have three cookbooks, A, B, and C. Cookbook B depends on A, while A and C have no dependencies. There are two possible solutions:
A, B, C
C, A, B
Now lets state that cookbook B has a side-affect that causes cookbook C to fail. However, this only happens in one of the solutions, where B is run before C. Sometimes your run will succeed, other times it will fail.
If there were any other orders that cookbooks cared about, then it would be their responsibility to mark it in their metadatas.
Consider another tool where instead of just cookbooks having dependencies, each individual resource has dependencies. The apache2 cookbook has ~57 resources [1] (although you're not going to use all of them). It isn't hard to get into the thousands of resources, as we've done in the past. In this situation, it is likely that the topological sort will produce a solution that will cause a resource to fail because of ordering, and consequently runs will occasionally fail. Additionally, that failure will be difficult to reproduce because you can't reason about the order without a visualization of the DAG, and even then it is hard. It is easy to say, "it is your responsibility to declare all of your dependencies," but this doesn't solve the users problem.
Because many of the cookbooks in an expanded run list won't have dependencies at all, or insufficient dependencies to form a directed path, each run is likely to be different.
We currently handle recipe dependencies by 1) requiring the other cookbook as a dependency in the metadata so we have it and 2) using include_recipe to ensure that the recipe has already been run, and if it hasn't, to run it.
I would expect that we could use the same solution with cookbooks in the cookbook loader. We could consider loading all the segments for a dependent cookbook, but sooner or later you'll find an edge case that any reasonably simple solution fails for.
I'd like to see a set of cookbooks where a library requires an LWRP from another cookbook. I suspect this is possibly an edge case that could be solved by fixing the parameters of the cookbooks to fit the [future] loader, rather than vice versa.
[1]
for r in `grep resource_name ~/src/chef/chef/lib/chef/resource/* | awk -F: '{ print $3}'` ; do grep "^$r \"\| $r \"" ~/devel/opscode-cookbooks/apache2/recipes/* ; done | awk -F: '{print $2}' | sed -e "s/^\s*
Chef holds a philosophy that you should be able to reason about order. Some objects are not being loaded in a consistent order such as attributes in
CHEF-2903, and we should fix that. However, we differed philosophically from other projects that use topological sorts from the start because of order. As an increasing number of cookbooks are added, ensuring there is only one solution to the topographical sort is difficult and expensive for the user. The alternative is that the load order has many solutions and thus is not consistent across sequential runs.Order matters, but we should not give up order to add order.
Some history exists on the mailing list, such as here.
One option would be loading cookbook segments in the order of the expanded run list first. This would give the user the ability to ensure a definition or LWRP was loaded before using it on another cookbook. However, there could be situations where you're adding a noop recipe to the runlist for this purpose.
I think we could walk cookbook.metadata.dependencies in Chef::RunContext#foreach_cookbook_load_segment. I've thought about this and played around a little and I'm not sure. This doesn't work yet, but I've got to run for the day: https://gist.github.com/5332542a89c6ca14b825