Maintaining unit tests is difficult. I am sure that we all have experienced a time when a seemingly small change to the system under test caused dozens of unit tests to fail. Sometimes these failures reveal bugs in the SUT, but often the tests are out of date and no longer reflect the correct behavior of the SUT. In these cases, it is necessary to fix the broken tests.
Have you encountered this situation? Does it happen often? What change did you introduce and how did the failures manifest? Did you fix the broken tests or simply delete them? If the former, how? If the latter, why? How does the fear of failures affect your desire to write tests?
I would also like to find specific examples of broken tests. Do you know of any open-source applications that evolved in ways that caused tests to fail?
True enough. The co-evolution of production code and test code together is hard. Both kinds of code shares similiarities (e.g. naming convention), but they still differ in nature. For instance the DRY can be violated in test code if necessary; code duplication would indeed be found easily as test would break in both place. The tension between test and production code results sometimes in specific design trade-off (e.g. dependency injection) to ease testability. Again, these tensions are relatively new, and the relationship between the design on production code and the effort in maintenance is not well understood. The article “On the interplay between software testing and evolution” is great (I was not able to find it in PDF, but didn’t google for a long time).
Defect localization – the ability of a test suite to pinpoint defect precisely – is also only partially understood. What are the best strategies to design test suites resulting in high defect localization is not clear. Most tests have some overlap between them, which cause low defect localization. Ordering tests so that they depend on each other improves this aspect, but at the same time goes against the principle of having isolated tests. We see a growing awareness of such tensions, but there is not definitive solution to address these issues. Here is an article about exploiting dependencies between tests.
The problem of outdated or irrelevant tests (those who don’t cover anything at the end) is also growing awareness. Test coverage is not enough and high quality test suite require experience, or at least, some education. See this article about the 100% coverage myth.
You have to find a balance between (1) initial time invested in test suite (2) effort in maintenance and (3) effectiveness of the test suite. I write mostly what I call “inflection point tests” and here my view on the subject.