Test Double Troubles



  • Inhibited Refactoring. Mock-based tests must have special knowledge of a class' interactions with its collaborators. We gladly accept this special role of tests, as it allows us to test otherwise impenetrable code.  By testing interactions, however, we create dependencies on design structure. Refactoring changes the structure of code, which breaks naïve structure-dependent tests. Tests that extensively use test doubles can exhibit structure-sensitive breakages which dissuade programmers from refactoring. Fear of refactoring is death to system evolution.
    A proper unit test should have a clear intent, signaled by the name, and should read as though it is explaining the code. To a certain extent, though, the test will rephrase or paraphrase the system under test. If the system under test is not structure-shy, then the mock-enabled tests will also not be structure-shy. In general, the tests should be as simple as you want the code to be.
    A test with extensive mock setup signals a class with many dependencies or too little encapsulation. A class designed with the Law Of Demeter in mind will be structure-shy, making it more cleanly and easily tested with partial mocks.
  • Tool complexity. Third party mock libraries like Rhino Mocks and Mockito are getting better at allowing you to write expressive tests, but they all introduce complexity. You must first learn a tool's extensive API and unique view of mock usage, which can include subtle nuances around things like partial mocks and void methods. Some tools even support multiple styles of mocking, each with its own special syntax. You must also learn how to read often-idiomatic code required to implement these various nuanced mock recipes.
    Refactor tests to eliminate redundant test double detail*. Good tests require high levels of abstraction, emphasizing readability and de-emphasizing mock implementation details. Extract the idiomatic, tool-dependent code to small, declaratively-named helper methods.
  • Passing tests that indicate nothing. A naive mock-based test may tell you that function X is called twice, and function Y is called once, but it does not tell you if you have a good design or even if you will get a good result. A heavily-mocked test suite may not behave in the same way that the underlying code will behave in production. It may have assumptions about handling of nulls or exceptions that are not coherent with production code. Worse yet, a test generated via 'tracer bullet' method may exercise a class or method but provide no useful information or evidence of correctness.
    A proper unit test is more than function counting. To combat unrealistic mock scenarios, examine the code as-written to determine weaknesses that can be simulated and explained in further tests. Avoid writing tests that are mere exercises of code, and have no clear intent.
  • Testing mocks rather than the SUT. In a maze of test doubles, stubs, mocks, and partial mocks, one can become lost in the entanglement between tests and production code. A feature may appear covered by tests, all passing, but fails in production. Deeper exploration reveals that someone unwittingly replaced the method being tested with a stub. This always happens in subtle and indirect ways, and always results in face-palming.
    It is important to be careful when mock setup has been extracted from test methods. One may not be aware that the method under test has been replaced by a mock in some shared setup method. Avoid using stubs that are distanced from the direct class target of a test.
  • Low readability. Tests can require significant amounts of detailed setup ("record") and verification ("expect"). Such clutter makes it hard to tell which expectations are simulation-enabling and which are the crucial assertions of the test.
    The primary goal of test doubles is to emulate collaborators in as simple a fashion as possible. Well-designed test doubles have virtually no logic, and well-designed classes only directly interact with a few collaborators. When interactions are many and span many classes it is because the system under test is too structure-sensitive. Refactoring the system under test to be structure-shy will help reduce the number of collaborators that demand mocking, which in turn will simplify its test. Extracting methods which interact with other classes or APIs will allow effective use of partial-mocking.
  • Ambitious mock implementations. Fakes--objects that completely emulate all aspects of a collaborator--require implementing redundant behavior, which sometimes requires involved logic. The problem with real logic, as opposed to simple stubbed methods, is that it's easy to screw up. Recently both of us have been working with code that uses a massive faking scheme, and both of us have wasted considerable time in implementing, deciphering, and debugging the fakes.
    If you have many tests that require variant test double behaviors for a single collaborator, absolutely resist the temptation to combine these into a "mock mother." One mock, one behavior. Combining behaviors into a single test double class will quickly lead you down the divergent path of maintaining mocks for a living. Keep your test doubles simple and discrete!
  • Vendor dependency. You'd think we would have learned our lesson years ago. We created Java systems that rampantly interacted with JDBC (an almost direct mapping to SQL statements). Most of us moved to APIs that provided higher levels of abstraction, such as JDO, entity beans, and Hibernate. That transition was painful, mostly because of the highly-redundant, highly vendor-dependent code that we allowed to seep into hundreds of classes and thousands of methods.
    Mock tools are no different. Some of you chose RMock several years ago, and some of you probably feel that you're stuck with it due to its pervasive use in your system's tests. Too bad. Mockito is a great tool, but we imagine a better one will come along when Java finally sports closures (Java 8? 9? ...). We want to transition to this new tool without so much pain.
    The recommendation is, once again, devise small, cohesive methods that encapsulate mock tool details.
* We wanted to abbreviate the phrase "Test Double Detail," but we think TDD might mean something else.