Chapter 4 - How to Write Tests¶

In the previous chapters we covered some of the philosophical reasons why software testing is generally a good idea. Now we’ll dive into specifically how one can create effective automated software testing systems.

Rules of the road¶

We’ll start with some “rules of the road” to be mindful of when writing automated tests.

Rule #1 Don’t confuse unit tests with integration tests¶

When approaching software testing many developers start by creating tests which assert the boundary conditions of their code. That is, send some data to a section of their code and expect some data back. This is what might classically be called a “black box” test, or often referred to as an “integration test”. The key assumption is that the test does not know (or care) about the inner workings of the code but simply cares whether the actual output produced matches the expected output given known inputs.

This type of testing is often thought of as “unit testing” because the developer is testing a single “unit” (e.g. a method) of code, but in reality they are often testing several units of code because of the dependent nature of the application.

Although this approach may seem sensible, it turns out it can be extremely problematic for a few good reasons:

Dependencies on other code or systems becomes difficult to orchestrate¶

When testing a unit of code that depends on other components or systems, these dependent systems must be orchestrated to simulate the conditions of the test.

Consider the following function:

public void doSomething(String data) {

    DatabaseWriter writer = new DatabaseWriter();

    if(data != null) {

        Emailer emailer = new Emailer();

        try {
            Serializer serializer = new Serializer();
            Object value = serializer.deserialize(data);
            writer.writeValueToDatabase(value);
            emailer.sendConfirmationEmail();
        }
        catch (Exception e) {
            writer.logErrorToDatabase(e);
            emailer.sendErrorEmail();
        }
    }
    else {
        writer.logErrorToDatabase("No data");
    }
}

Testing a method like this in a traditional black box approach presents several challenges. A successful execution of this code would result in two external systems changing state, namely a database being written to and an email being sent. Asserting that these events actually took place would require the creation of additional systems to check the state of these external systems. Although possible, this often leads to a significant problems in the creation of entire systems simply to facilitate a complete automation of tests. The danger here is that if these additional systems are themselves complex then they too will contain bugs and may require tests to ensure their long term viability. An even less desirable outcome is where the systems required to assert test outcomes become more complex than the systems they are testing.

Simulating failure is equally difficult¶

Robust applications will often employ mechanisms by which failures in external systems are gracefully handled so as not to lead to unrecoverable problems or errors. Terms like fail over and redundancy and common in these discussions but in many cases there aren’t any tests for this because it can be extremely difficult to create these tests.

Consider that the code written for an application is specifically designed for the case where an external system becomes unavailable because (perhaps) the external system is particularly fragile or often unreachable. A traditional integration test would commonly only test the “success case”. That is, the case in which a the external system (or a test version thereof) is available, but will often not test the “fail case” because doing so would require the orchestration of a real failure. This might involve the simulation of a network failure, disk failure, DNS failure etc and while this can be easily done in a once off manual way (e.g. pulling out a network cable), it can be extremely difficult to reliably reproduce failure cases with external systems in an automated test.

Rule #2 Unit tests should be deterministic¶

Related to Rule #1 is the concept that the only reason for a test failure should be a failure in the code being tested. If a test relies on non-deterministic components in order to succeed then failures in these components will cause a failure in the test which does not accurately reflect the validity of the code under test.

Common examples of this are:

Depending on external systems which may fail for reasons that are unrelated to the code under test
- Mitigation: Mock dependencies
Tests involving asynchronous processes that take variable amounts of time to complete may lead to timeout failures
- Mitigation: Where possible test the components of the asynchronous process independently
Test code that retains state which makes the order in which tests are run meaningful and that may cause failures due to unexpected state when subsequent tests begin.
- Mitigation: Either avoid storing state external to tests, or ensure external state is cleared after each test.

Rule #3 Inject dependencies where possible¶

Wherever possible, inject dependencies. If you’re already familiar with dependency injection as a pattern then this will probably make sense to you. If not, take a moment to review the Dependency Injection Primer

Duplicating tests of the same code¶

External classes which are called within the contents of a method are likely to also be called from elsewhere (otherwise they probably wouldn’t be an external class). This means that the tests which run in a black box fashion will often be calling the same code several times in several different places. This is not necessarily wrong (more testing is better right?!) but it’s a waste of resources, the most precious of which is time.

Tests can take a long time to execute¶

If a single method is making several calls out to external systems (like databases or email systems) it is often the case that these calls may be time consuming as the number of tests increases. A tightly run development team will typically want to ensure ALL tests are passing before deploying code to a staging or production environment but if tests are taking a long time to complete (in some cases hours) then the productivity of the team can really suffer. This is compounded even more by the previous note regarding duplication of tests. If a write to a database is slow, and that write is being called from several methods, then the entire test suite can be held up performing unnecessary database writes.

Testing 3rd party code¶

If your application is making use of any 3rd party libraries then you will inevitably be calling these libraries as part of the execution of your code. A black box test will therefore inevitably also be calling these 3rd party libraries and although this is not wrong as such, it is again unnecessary and will lead to both increases in test times as well as unnecessary complexity.

Rule #4 Don’t test external code¶

Write tests for your own code but when you are calling external code (i.e. 3rd party libraries) mock them out.

Diagnosing the cause of a failed test can be difficult¶

If the test you are running executes code several levels deep and a failure occurs in one of these deep levels it can incorrectly report an error where no error actually occurred.

Consider the following example:

public void doSomething(String data) {

    DatabaseWriter writer = new DatabaseWriter();

    if(data != null) {

        Emailer emailer = new Emailer();

        try {
            Serializer serializer = new Serializer();
            Object value = serializer.deserialize(data);
            writer.writeValueToDatabase(value);
            emailer.sendConfirmationEmail();
        }
        catch (Exception e) {
            writer.logErrorToDatabase(e);
            emailer.sendErrorEmail();
        }
    }
    else {
        writer.logErrorToDatabase("No data");
    }
}

Imagine we wanted to create a test that ensured the data provided to the method was indeed written to the database.

We might have a test like this:

DatabaseReader reader = new DatabaseReader();

String testData = “test";

doSomethingSimple(testData);

String dataAfter = reader.readData();

assertEquals(testData, dataAfter);

Now imagine that the code inside the DatabaseWriter is incorrect and does not correctly write the data to the expected table. The test above would fail, but the code in the method is actually correct. This may lead the developer on a “wild goose chase” looking for the failure in the wrong place.

You might argue that a subsequent test which verifies the correct behavior of the DatabaseWriter would pick up the error and fixing that problem would fix both tests but this argument starts wavering when the number of tests is large and the complexity of the code is high. Unless there is a clear indication to the developer as to the exact reason for the failure there is no guarantee they will start looking in the right place. They may end up checking several places in which failures were indicated before finally realizing the cause. This is all time and resources wasted.

Rule #5 Unit tests should only test a single unit¶

When creating unit tests it is often the case that the test will simply call a method with prescribed inputs and expect certain outputs. This is really a black box test because the test code does not know anything about the internals of the method, and this is actually fine but the problem can arise when dependencies, or other methods, are required by the method under test but not visible to the test code.

Consider the following example:

public class Foo {

    public int methodA(int value) {
        return methodB(value) + 10;
    }

    public int methodB(int value) {
        return value + 5;
    }
}

In this case a test written to test “methodA” will have the effect of calling “methodB”. While this is not wrong per se, it is not ideal for two reasons:

The code in methodB is (presumably) already tested in its own test so executing it a second time in the test for methodA is unnecessary
If tests were written for methodA and methodB and a failure occurred in methodB, both tests would fail. In this trivial example this may not be a problem but in more complex applications creates a situation in which the cause of a failure may be unclear, that in turn leads to developers spending more time tracking down the source of the failure.

The preferred way to test this would be to mock methodB in the above code and simply assert that methodA called the mock. More details on creating mock objects are covered below.

Conversely consider the following variation where methodB is marked as private:

public class Foo2 {

    public int methodA(int value) {
        return methodB(value) + 10;
    }

    private int methodB(int value) {
        return value + 5;
    }
}

In this example “methodB” is not callable by the test and hence will not typically have its own test so the test for “methodA” that calls “methodB” could be considered ok. Although there is still the problem that the test is testing two logically independent units of work the limitations/features of the language mean it may not be feasible to create independent tests [1].

Rule #6 Don’t mix test code and production code¶

One of the common mistakes made when engineers start down a path of testing is to invoke conditions within their code specifically intended for testing, commonly referred to a “Test Mode”. Running an application in Test Mode means that the expectation of the executor is that the application will not perform “real” operations, but rather will behave in a predictable, consistent manner for the purposes of testing.

This in itself is not a problem and having applications that run in Test Mode is not necessarily a bad thing, but the most common way this is implemented is through conditional statements within the production code.

Consider the following:

public class Foo {
	public float methodA(float value) {
		float result;

		if(Globals.TEST_MODE) {
			result = value;
		}
		else {
			result = methodB(value) + 10.0f;
		}

		return result;
	}
	public float methodB(float value) {
		return 10.0f / value;
	}
}

In this example a global (static) variable called “TEST_MODE” is used to determine the result of “methodA”. If the application is in “TEST_MODE” then “methodB” will not be called. While this satisfies our objective of not combining more than one unit, it presents several other problems:

Maintaining test conditions can be arduous¶

As features are added and the codebase increases maintaining “test mode” across the application becomes more and more difficult. At every point the engineer needs to consider whether the feature they are creating needs a “test mode” condition. This inevitably leads to human error as code that should not be executed within tests is executed because the requisite test conditions weren’t provided.

Test conditions may leak into production¶

Whenever any condition is introduced into code there is a chance that this condition may assume a value that wasn’t originally expected. Any time there is “test code” introduced into production code there is a chance that this test code will execute in a production environment. The only way to eliminate this chance with complete confidence is to completely remove any and all test code from the production codebase.

You may also fall victim to the “I’ll just quickly comment this out” problem. While debugging an unrelated issue, or while creating/changing tests an engineer may be tempted to temporarily switch the toggle on a particular TEST_MODE condition. Forgetting to toggle this back may have catastrophic effects when the code goes into production.

You’re never really testing the real code¶

Because the inclusion of the TEST_MODE necessarily omits certain code from executing, or indeed includes the execution of non-production code, the tests are never really testing real world conditions. If there is code that falls outside the TEST_MODE condition (and is therefore not subject to tests) it is possible this code itself contains a bug that will never be uncovered because it is not actually executed during testing.

Code coverage metrics lose value¶

Code coverage metrics are often a good tool in determining whether the tests written provide a sufficient indication that the software is functioning correctly. TEST_MODE conditions will (by definition) only be half covered by tests which will act to reduce the value of code coverage metrics.

Rule #7 Integration tests shouldn’t contribute to code coverage¶

In an effort to increase the code coverage engineers may be tempted to write integration (black box) tests simply to lift the coverage percent. This implies the implementation of a single test that may actually be testing a fairly trivial case but which executes a large section of code. Depending on the coverage system used this will often report that the code has been covered by tests but this may not actually be the case. It’s true that the code was executed during the test but that doesn’t mean that the behavior of the code was “tested”.

The simplest mitigation to this is to separate unit tests from integration tests and ensure that code coverage is not recorded for the integration tests.

Rule #8 Write tests before fixing bugs¶

Even in the hardiest testing system bugs may still exist in production code and when a bug is reported the first instinct of the engineer is often the locate the cause of the problem. Once located the next most compelling temptation is to fix the problem, particularly if the fix is trivial. This temptation should be avoided wherever possible in preference to first creating a test that exhibits the problem.

The only thing worse that a customer experiencing a bug is a customer experiencing a bug that was fixed but somehow was reintroduced in a subsequent release. The temptation to simply fix problems as they arise exposes the system to the potential for bugs to reemerge without anyone knowing.

Rule #9 Don’t use software tests to verify the environment¶

A common argument in favor of integration tests is that without an integration test we don’t truly know if the system is behaving as expected. This can be indicative of a misunderstanding of the purpose of the software test.

If there are concerns about whether the environment in which the software is to be run is correct then there should be dedicated “environment tests” which verify the environment independent of the software tests. Merging these two distinct activities can lead to software tests that fail because of environment conditions which doesn’t necessarily mean the “software” is faulty.

Rule #10 Ensure tests are isolated and idempotent¶

When creating unit tests it may be necessary to maintain state outside the test, or use components that are shared by tests and which themselves maintain state. Although it may be preferable to avoid doing this, in some case it cannot be avoided. In these cases it is important to assert pre conditions before, and post conditions after the test is run.

Test should be both idempotent and isolated. Changing the order in which tests are run or re-running the same tests multiple times should not affect the outcome. If state is maintained outside the test and not restored to its original state it may affect subsequent tests leading to either false positive or false negative results.

If an integration test is deemed to be the simplest or best solution for a given software test (this should be a rare occurrence!) then it’s important to ensure that any external systems are checked before the test is run and reverted after the test is complete. Problems can often arise when a test persists state in an external system which then affects subsequent tests.

For example:

A test that writes files to a file system should assert both that the files written during the test do not exist before the test is run, and that they are removed before the test completes.

Guidebook for writing tests¶

Based on the earlier conclusions drawn about how and why black box tests are problematic, or why mixing test and production code can be a slippery slope, let’s review what we’re trying to achieve:

We want a system in which:

Code is tested as single units of work
We avoid testing the same code twice
We avoid testing 3rd party code
External systems are not required or are trivial to orchestrate
Production code is free of test code

One of the most common and most prevalent solutions to this is the use of Dependency Injection together with Mock Objects.

Mocking dependencies to create isolated tests¶

A mock object is an object that implements the same signature as the object it is mocking, but does not possess any of the actual implementation. That is, as far as the application is concerned it “looks” like the real thing, but it’s actually a fake.

The behavior of the mock object is orchestrated as part of the test so that it produces predictable outputs.

Consider the following example:

public void doSomething(String data,
                        DatabaseWriter writer,
                        Emailer emailer) throws Exception {

    if(data != null) {

        try {
            writer.writeValueToDatabase(data);
        }
        catch (Exception e) {
            emailer.sendErrorEmail(e);
        }
    }
    else {
        writer.logErrorToDatabase("No data");
    }
}

If we assume all the prerequisites to this method are met (i.e. all the parameters passed to the method are valid), there are 5 possible outcomes:

“data” is null in which case we expect an error is written to the database
“data” is null but the write operation to the database for the error fails in which case we expect the method to throw an Exception.
“data” is NOT null but the database write operation fails in which case expect an email to be sent
“data” is NOT null, the database write operation fails AND the email fails in which case we expect the method to throw an Exception.
“data” is NOT null and the database write operation succeeds in which case we expect “data” is written to the database

Ideally we should provide tests for all of these possible scenarios [2] but for this to work we’d need to somehow orchestrate the states in the external systems (database and email server) to simulate these scenarios. That is, for each of the above variations we’d need create a state in which:

Database is available and working as expected
Database is not available
Database is not available, but email server is
Database is not available and email server is not available
Database is available and working as expected

Creating a test environment to simulate these conditions presents several challenges:

It is difficult to automate the enabling/disabling of external resources like databases/email servers in order to simulate a failure
Tests are naturally fragile as external systems may go down for reasons unrelated to the test
External systems that maintain state (e.g. databases) may need to be setup in a predictable state before each test. Depending on how the database is accessed this often means the creation of custom queries and/or API endpoints specifically for use in tests which are then subject to errors/failure as database schemas change etc.

A far simpler approach is to mock these external systems and configure the mocks with the predictable state you need.

While there are many ways to do this, using a dedicated mocking framework is often the easiest as they will provide several key benefits over a roll-your-own solution:

Mocks of real objects are easily created without needing to implement all required methods [3]
Test conditions are easily orchestrated by setting expectations on mocks
Mocks allow assertions of methods called and method parameters as well as methods NOT called.

The following is an example of a test for the above code using Mockito, a Java based mocking framework:

DatabaseWriter writer = Mockito.mock(DatabaseWriter.class);
Emailer emailer = Mockito.mock(Emailer.class);
DatabaseError error = Mockito.mock(DatabaseError.class);
String data = "foobar";

// Orchestrate
Mockito.doThrow(error).when(writer).writeValueToDatabase(data);

// Execute
MyClass instance = new MyClass();
instance.doSomething(data, writer, emailer);

// Assert
Mockito.verify(writer, Mockito.times(1)).writeValueToDatabase(data);
Mockito.verify(emailer, Mockito.times(1)).sendErrorEmail(error);

In this example the “MyClass” instance (the class with the “doSomething” method) is provided with mock implementations of the DatabaseWriter and the Emailer that have been orchestrated to create the conditions needed for the test.

After the code is executed we verify that the methods we expected to be called were in-fact called with the parameters we expect.

Importantly this test does not actually require a database or an email server at all and can be run in complete isolation to these external systems.

Using a Dependency Injection framework (where applicable)¶

As may now be apparent from the previous section on mocks, use of a Dependency Injection design pattern is critical to successful implementation of mocked tests in a compiled language like Java.

If the application under test has been created in such a way that all object dependencies are injected it becomes far simpler to inject mocks into these objects when running tests. This has the crucial benefit of not requiring changes to production code solely to facilitate the creation of tests for that code.

Detecting code duplication and invalid code paths¶

When method calls are idempotent (they can be called many times without changing the outcome) it is possible that multiple calls will be made without the caller knowing. Additionally code that executes within a method but which does not significantly alter the outcome of the method may be executed without the caller’s knowledge. Some may claim this is correct and that the called shouldn’t need to know what goes on inside the method, and to a large extent they’d be right but there is critical information being lost here that may lead to bugs of a particularly insidious nature.

Imagine that we have a method that, among other things, updates a row in the database:

public void updateData(
        String data,
        DatabaseWriter writer) {

    writer.update(data);
    writer.update(data);
    writer.flush();
}

A traditional black box test for this might look something like this:

String data = "foobar";

MyClass instance = new MyClass();
DatabaseWriter writer = new DatabaseWriter();

instance.updateData(data, writer);

DatabaseReader reader = new DatabaseReader();

String dataAfter = reader.readData();

Assert.assertEquals(data, dataAfter);

In this test we want to make sure that after we call the update method the database has been correctly updated with the new data.

This test passes and we are happy the code is working, but there’s an obvious problem. The “update” call on the DatabaseWriter was executed twice but because this call is idempotent the test does not know. Although the outcome of this is somewhat meaningless from a behavioral standpoint, the actual implementation has a bug which could prove disastrous. If this method is called several times (or millions/billions of times) the the overall performance of the system will suffer and this may be an extremely difficult bug to track down.

Conversely consider the mocked variation:

public void duplicateCheck () {
    String data = "foobar";

    MyClass instance = new MyClass();
    DatabaseWriter writer = Mockito.mock(DatabaseWriter.class);

    instance.updateData(data, writer);

    Mockito.verify(writer, Mockito.times(1)).update(data);
}

The first thing to note is that the test has considerably less code which is quite common when using mocks but the more important issue is that not only does this verify that the database “would have” been written to, but crucially that it would only happen once. This is something not typically caught by a more coarse black box test.

Similarly the execution of unexpected code can also be detected. In the previous example the call to “flush()” may not be required by the method but may in fact be a cause of significant performance delay. Again this may not be detected by the black box test as it does not have a meaningful impact on the outcome of the method, but would be detected by the mocked example as a matter of course.

There is of course a natural criticism of this approach in that asserting that (for example) the database “would have been” written to is not the same as asserting that it actually was written to but this is a case of appropriate separation of tests. We assume that the DatabaseWriter class in the above example is tested elsewhere and those tests verify that it does what it is supposed to do. Even in those tests we may not be testing whether the database was actually written to but simply following execution path up to the point that we no longer have control over the outcome. We don’t for example need to test that MySQL knows how to deal with an “UPDATE” SQL statement, we assume this works.

Misuse of (live) integration tests¶

As highlighted at the beginning there are several classifications of tests and even with a comprehensive automated testing policy in place there are certainly conditions in which production systems will fail with little or no warning. This however is still not a valid case for arguing the need for integration tests.

Here’s a simple example of why:

Consider a method of a class that writes a file to the local disk. The integration test approach would have the class write the file then the test would assert that the file was actually written. But in this case we are actually testing several things:

Our code correctly calls the appropriate file system operations to write a file
The programming language knows how to call the appropriate operating system commands to write to disk.
The operating system knows how to correctly access the disk hardware to write data
The disk is operating correctly and is able to have data written to it.

Of all of these things we only truly have control over the first one. Or to put it another way, “our code changes can not affect anything beyond our code”. So the question becomes, “why do we need to test this?”, to which the answer is “we don’t”.

Now the important implication here is that we are never actually testing that files are written to disk and there could indeed be a failure at some point that is outside our code but which affects the behavior of the method such that it fails. The key point is that tests are required to ensure external systems are available and working correctly but these tests should not be the same as the tests which verify that code is working as expected.

Indeed testing all these components in the one place can create problems for testers and engineers:

Over time tests become fragile. When a test includes dependencies on external systems changes in state or availability in these external systems will often cause tests to fail when in fact there is nothing wrong with the code being tested. Code that is heavily dependent on many external systems will be sensitive to changes in any external system and may result in being in a fail state more often than not. This leads to the problem of “Test Nonchalance”. A situation in which team members are not particularly worried by tests failing because they “are always failing” and it’s rarely due to the code.
Verifying the state of external systems after a test can be problematic (or impossible) as already discussed.
Performing integration tests on production systems can be risky, and testing on non-production systems is meaningless. If we want to make sure that the external systems are behaving correctly in production, but we don’t test the production systems because we are concerned about polluting production data then one could say the test is meaningless.

It turns out that the answer to the question, “when is integration testing required?” is surprisingly, “not very often at all”. Code is tested in unit tests that utilize mocked versions of external dependencies. External systems are tested independently and monitored with appropriate hardware and/or device monitoring systems.

There are however some cases where integration tests are useful. Consider the case where a database schema change is made which introduces a new field, or removes an existing field from a table in the database. This change causes a catastrophic failure in the code because the unit tests that cover the code did not completely test that data can be written to the database however the schema changes made imply that the code written will no longer function. The database server is up and running and is accessible over the network. In this case the production system will fail and yet all the tests are passing. So what went wrong?

There are broadly two approaches to answering this:

The system needs some integration tests to verify that data is getting all the way to the database
The system currently has a gap that is not tested, specifically the relationship between the code and the data model. This gap needs to be plugged.

With all the information presented it is probably no surprise that the preferred option is the second one. In this case we have a disconnect between what the code expects the database to look like and what it actually looks like. The correct fix for this is to create a set of tests that assert that the expectations of the code are correct, and a separate set of tests that assert the database represents these expectations. Creation of an intermediate representation of the database to which both tests refer would one possible way to do this (although there may be many).

Correct use of (live) integration tests¶

There are however occasions where having a fully working copy of a production system in a controlled environment can be of great benefit and where mocking dependencies is not an appropriate solution.

Keeping your unit tests in check¶

One of the problematic by-products of unit tests is that they are not guaranteed to be testing anything meaningful. That is, the unit test really just asserts that the code does what the test expects, but this may not have any meaning in terms of whether the application as a whole is behaving as expected. This issue will often arise shortly after a code change that leads to a test failure.

Consider the situation where an engineer makes a change in response to a reported bug or feature request. In the event that this change causes another test to fail (which could mean the tests are doing their job!), the engineer will presumably investigate this failure to determine the cause. At this point a very risky and fragile situation is created. There is nothing preventing the engineer from simply altering the test so that it passes. If this were to happen, a regression bug could have been introduced that will not be detected by any of the unit tests.

In these scenarios it is often useful (critical) to have some level of end-to-end integration testing in place to ensure the application is behaving as expected. The integration test would (ideally) exercise the fail case hidden by the modified test and reveal the bug. Naturally it is possible that this hidden bug may not be execised by the integration test (which is itself another problem with integration testing more broadly), however the testing regime will often not be able to eliminate all bugs, but will act to significantly reduce overall bug counts.

To avoid the pitfalls already mentioned in relation to integration tests, a common approach is to have these end-to-end integration tests run as a final step prior to release rather than running for every commit of code.

Re-creating end user problems¶

One of the stark limitations of any testing approach is that it will only ever verify what the engineer knows to be true. This means there may be particular code paths or particular combinations of actions that result in unexpected behaviour and which are not covered by tests. In these situations the first and most crucial stage in diagnosis is the ability to re-create the problem in a controlled environment. In these situations a full duplicate of a production system can be useful in gaining a rapid understanding of the end user problem without risking pollution of production data or state. Whether or not the time gained by having this duplicate standing by is outweighed by the cost of creating and maintaining such a system will depend on the particulars of the application however once all of the common “misuses” of live integration testing are removed it may turn out that the number of problems reported that are not easily solved without a duplicate system is actually quite small.

Unleashing the Havoc Monkey¶

In extremely sophisticated systems or systems which demand extremely high levels of reliability and uptime it is often the case that engineers will want to verify that the measures put in place to handle catastrophic system failures are actually working. In this case a duplicate system is necessary to ensure that things like network failures, database failures or complete machine blackouts are simulated accurately without affecting production systems.

Pitfalls and pro tips¶

When embarking down a path of effective software testing, engineers may encounter situations which make creating tests difficult. Here are some tips and recommendations to make writing effective automated tests simpler.

Avoid using class scope (“static”)¶

In languages like Java the static keyword is used to indicate a variable or method is scoped to the class level and is available without the need to create an instance of the class. This can be particularly useful for utility methods and/or objects which act in a singleton-like manner, however they are notoriously difficult to test.

Although some mocking frameworks (e.g PowerMock in Java) do allow for the mocking of static methods, other environments (e.g. Android) may not afford the developer such luxuries and static methods are unmockable.

Generally statically scoped fields are not problematic (either they never change in which case tests should use them as they are, or they can change in which case the test can set them) however statically scoped methods may make it impossibly to create unit tests because the static method may be unmockable.

Key point:

Question whether the use of a statically scoped method is being used in place of a singleton, and if so replace the static method with a single class implementing the same method scoped to the singleton instance. (i.e. not static).

Avoid using the “final” keyword¶

The final keyword in Java is used to prevent alteration of a field or method via either overriding a method in a subclass, or changing the value of a field. In most cases this is done either for performance reasons (calling final methods can be faster than calling non final methods depending on the compiler/runtime environment) or for “security” to ensure the class behaves in a consistent manner regardless of whether it is extended.

Engineers are often tempted to “throw in” the final keyword wherever they feel a method should not ever be changed, however this can make the creation of tests difficult.

As with static methods, final methods are often unmockable (PowerMock does allow mocking of final methods).

This is particularly problematic for engineers using 3rd party libraries in which classes or methods are declared final because they may be unable to mock out these 3rd party dependencies.

Key point:

Question whether the use of a the final keyword is necessary, or merely convenient, and how it may impact the creation of unit tests

Use factories rather than “new”¶

The standard way to allocate an instance of an object in a language like Java (or C++) is to use the new keyword. It may seem ridiculous to suggest this is wrong, but liberal use of the new keyword can make it impossible to create isolated unit tests.

Consider the following method:

public void doSomething(boolean someValue) {

    Emailer emailer = new Emailer(); // <== Unmockable

    if(someValue) {
        emailer.sendConfirmationEmail();
    }
    else {
        emailer.sendErrorEmail();
    }
}

If we wanted to create a unit test for this and mock out the Emailer instance, we couldn’t [4] . This means we need to change the way we allocate the Emailer instance to make it mockable.

One way of doing this is to create a Factory class that externalizes the allocation of the Emailer:

private EmailerFactory factory;

public void doSomethingWithFactory(boolean someValue) {

    Emailer emailer = factory.create(); // <== Mockable

    if(someValue) {
        emailer.sendConfirmationEmail();
    }
    else {
        emailer.sendErrorEmail();
    }
}

Alternatively a quick and easy solution is just to make the allocation mockable by creating an internal method to do the allocation:

public void doSomething(boolean someValue) {

    Emailer emailer = newEmailer(); // <== Mockable

    if(someValue) {
        emailer.sendConfirmationEmail();
    }
    else {
        emailer.sendErrorEmail();
    }
}

// Mockable
protected Emailer newEmailer() {
    return new Emailer();
}

Use “sneaky spies” when all else fails¶

Sometimes you can run into situations where you want to assert the behavior of a class without resorting to a Black Box test but you don’t have access to the internals of the class. Under normal circumstances we should not be concerned about the internals of a class insofar as unit testing is concerned, however there are a few instances where this becomes relevant.

Consider this example:

public class MyClass {
    private final List<Object> objects = new LinkedList<Object>();

    public void add(Object value) {
        objects.add(value);
    }

    public void clear() {
        objects.clear();
    }
}

If we wanted to test the behavior of this class we’d ideally want to test that both the add and clear methods do in fact change the underlying data structure (objects) but this class does not provide a way for us to assert this because there is not publicly visible access to the internal objects field.

In situations like this it is impossible to create “black box” tests without changing the underlying definition of the class (which may be the right thing to do, but may not).

To overcome this we can use a combination of reflection and mocks to assert the conditions within the test.

For example, we could create some utility classes:

@SuppressWarnings("unchecked")
public static <T extends Object> T spy(Object container, String field) {
    try {
        Field declaredField = locateField(container.getClass(), field);
        declaredField.setAccessible(true);
        Object value = declaredField.get(container);
        Object spy = Mockito.spy(value);
        declaredField.set(container, spy);
        return (T) spy;
    }
    catch (Exception e) {
        throw new RuntimeException("Cannot spy on field [" +
                field +
                "] in object [" +
                container +
                "]", e);
    }
}

static Field locateField(Class<?> container, String field) throws NoSuchFieldException {

    try {
        return container.getDeclaredField(field);
    }
    catch (NoSuchFieldException e) {

        if(container.getSuperclass() != null) {
            return locateField(container.getSuperclass(), field);
        }
        else {
            throw e;
        }
    }
}

That allow us to simply assert the changes in the internal object

MyClass instance = new MyClass();

List<Object> objects = MockUtils.spy(instance, "objects");

instance.add("foobar");

Mockito.verify(objects).add("foobar");

Note

Care should be taken when adopting this technique. You don’t want to start indiscriminately checking the internal workings of components unless the specific case demands it.

[1]	There is a case to be made that private method should be used sparingly because of this issue, but that’s somewhat out of the scope of this discussion.

[2]	Often only the most common cases are explicitly tested. This is generally a judgement call made by the members of the team to balance time spent vs. value vs. risk

[3]	Typically only needed for compiled languages.

[4]	Dynamic languages like python and javascript would allow this.