BDD: Behavior-Driven Development

Software Development Challenges

If it ain’t tested, it’s broke. This is a quote from an avionics testing engineer. Testing cannot prove software works, but all you know about untested code is that it will definitely fail somewhere. However, developers often rely on undocumented, hand-executed tests. This makes testing unrepeatable. If the tests are documented, they are still tedious and rarely re-run. This sort of manual testing does little to promote stability. Stability is needed: code that worked yesterday often fails tomorrow. Errors creep in. It is important to repeat tests. Together, these facts mean that tests must be thorough and automated. That is, comprehensive, automated tests must be part of the delivered solution.

The most common solution is to write module-level tests with tools like JUnit or NUnit and to augment them with higher level tests for a few key features. While this sort of testing can be effective, it is not readable by non-technical clients. It is also too detailed for technical clients who have other responsibilities. To provide confidence, clients need thorough, executable tests that they can read. Interestingly, the basis for the tests should already be available to you: all PBIs should have at least one detailed acceptance criterion (AC). It would be great if the AC could be used to write the tests.

This is the idea behind Behavior-Driven Development (BDD): we use tools to translate AC into executable tests. This gives a direct path from AC to testing, putting the product owner in charge of what is being tested. At a minimum this provides executable acceptance tests: the client accepts the software once it passes all tests. It also documents required behavior in an executable, readable format. People who are interested in details about system behavior can simply read the tests. BDD makes system requirements explicit and testable in a way that cannot be achieved by standard requirements documents and standard testing.

Of course, writing thorough AC is challenging. Often clients and product owners do not have the skills. They might choose to hire a team of specialists to write them, but that is also not always feasible. This means the development team may need to write the AC instead. But the win of (automated) BDD is that the tests can at least be reviewed with clients using language they understand. (And if a client cannot be bothered, one wonders about whether they have any commitment to the project!) This means that the AC document expected behavior as approved by clients; the AC become an executable requirements document. If an error is found, the solution is simple: write AC revealing the error and create a PBI to get those AC to pass. BDD closes the loop between project concept and tested implementation. If you ask a competent software developer about whether they use some form of BDD, you are likely to get the response “who wouldn’t?”

An example

BDD is illustrated by the simple project at https://gitlab.com/hasker/cucumber-phonebook. As described in README.md, this system is a simple, command-line implementation of a phonebook. The names and numbers are stored in a text file, and the program uses command-line arguments to list, add, and remove entries. The project uses Cucumber to implement BDD. Cucumber includes a suite of tools supporting many development languages and over 70 spoken languages. Thus you can write your code in C or Go and your AC in Česky (Czech) or Tiếng Việt (Vietnamese). It is possible to integrate Cucumber with any development framework, so it works for web-based projects as well as GUI applications. We will use this project to cover the basics of BDD with Cucumber.

As an aside, review Build | Pipelines and note this project dates

to 2022. One of the strengths of using Docker is that tests that ran years ago can still run today with minimal updates. Executing the tests through CI (continuous integration) lowers barriers further.

To see how Cucumber works, navigate to add_phone.feature in

    src / test / resources / pbook_command_tests

The very top of the file is an introductory section describing what is being tested in the file. The first line starts with the keyword Feature; this line is written to the output to document which tests are being executed. All text below the Feature line is treated as commentary (that is, ignored by Cucumber). It provides further context for human readers. The bulk of the file is a collection of testing scenarios. Each scenario begins with the keyword Scenario followed by a description of an individual test. The description consists of lines of Given, When, Then, and And statements. These statements capture the AC that would be in a PBI. Note they use actual names and numbers to make the AC concrete, that is, to make them fully testable. The Given portion describes the setup, the When portion describes the action taken by the user, and the Then portion describes the result.

Review the first scenario:

Scenario: add a person and list
  Given an empty phonebook
  When I add Xander with number 123
  Then Xander's number is 123
  And the number of entries is 1

This starts with an empty phonebook, adds Xander with the number 123, and confirms that when the phonebook is listed it has just one entry and that entry is as expected. And you likely figured that out simply by reading the scenario! That is the key: this is a test that anyone fluent in English can read. The syntax is also very simple; whitespace at the starts of lines is ignored, case is ignored, scenarios are separated by blank lines, and each line starts with one of the four step keywords. In fact, the step keywords are essentially ignored; they make the scenario more readable, but you could rewrite it using all Then statements. Cucumber also allows stars:

Scenario: add a person and list
  * an empty phonebook
  * I add Xander with number 123
  * Xander's number is 123
  * the number of entries is 1

This often makes it easier to read through thousands of scenarios, but you should use Given/When/Then/And for smaller projects.

To check these, there must be a way to link natural (spoken) language statements to executable code. This is done through regular expressions. These allow the developer to describe patterns of strings and numbers so they can be captured by the code and turned into actions. See the short tutorial at https://www.regexone.com/ to become familiar with regular expressions. While regular expressions can become very complex, the RegexOne site covers the regular expressions you will see the most often: \^, \?, \$, [a-zA-Z]+, and \\d+:

\^: matches the start of the input; this ensures there is no extra text at the start of the target phrase
\$: matches the end of the input: this ensures there is no extra text after the target phrase
\?: marks the previous character as optional; numbers? means that the s at the end is optional, allowing the text to be either number or numbers
[a-zA-Z]+: matches a word of any length that includes both upper and lower case letters; the + in this means “one or more of the preceding”
\\d+: \\d matches any digit (0–9), and the + again means “one or more”, so this matches any integer The parentheses allow the system to capture the matched word so you can use it in your code. This is also explained on the RegexOne site. For example, the regular expression

    ^I add ([a-zA-Z]+) with number (\\d+)$

matches text starting with “I add”, containing a word followed by “with number” and a numeric value. The numeric value must end the text. Thus the test step

    When I add Xander with number 123

matches this with “Xander” as the first item captured and “123” as the second.

As discussed above, a test specification consists of its description followed by a number of individual tests, and each test consists of a series of test steps starting with given, if, when, and, and then. Another input is a collection of step definition files that define the action taken for each test step. Cucumber reads the tests, finds the matching step definition (using regular expression processing), and executes the code associated with that step definition. Each step either succeeds or fails, and a test passes if all of its steps pass. Cucumber reports on any failed steps. A test suite passes if all of the tests pass. Since the tests reflect the AC, Cucumber effectively confirms that all AC are implemented.

See

    src / src / test / java / pbook_tests / AddStepDefinitions.java

in the repository for example step definitions. The file starts with library imports and a class name. The remainder of the file gives the step definitions. For example,

    @Given("^I reset the phonebook$")
    public void do_reset() {
      String[] ignored = PhoneCommandCapture.instance().outputLines(new Phones(), "reset", "");
    }

matches when the test step is “When I reset the phonebook”, and the code resets the phonebook by running the program with the command line argument reset. In this case the output is ignored. Another step definition in the file specifies how to check how many entries are in the phonebook, so the test simply confirms that the phonebook is empty after it has been reset.

You may note that there are specifications associated with @Given, @When, and @Then. Just as in the test specifications, each of these terms is treated as equivalent; the value of using one or the other is to make the specification more readable. As discussed above, the step definition for

    ^I add ([a-zA-Z]+) with number (\\d+)$

matches I add Xander with number 123, capturing Xander as the first item and 123 as the second. Cucumber calls the corresponding code, do_add, with name set to the person’s name and number set to the person’s (integer) phone number. do_add then executes the command

    java Phones add

and submits the name and number when prompted. Note the procedure check_phone_number later in the file; this executes

    java Phones list

and confirms the specified name and number is in the list. If either assertion fails, Cucumber records an error. It would be wrong if add inserted additional records, so there is a step definition check_entry_count that confirms the phone book has the right length. This file contains sufficient step definitions to implement all of the scenarios.

The provided tests only check just some of the add and list functionality. We’d also want to check remove and add more checks for add. You could put all of the steps into the same step definition file - step definition file names do not need to match scenario file names - but you would likely want to organize step definitions in a reasonable way.

A successful run is all green such as shown in

That is, the output should be, “green like a cuke.”

Remember the phrase “if it ain’t tested, it’s broke”? We should test our test environment as well! Clone the repository and change the add operation in some reasonable way. For example, you might delete the body of the unexecute method. Use docker to re-run the tests, confirming that one or more show up in red. Undo this error, then modify the regular expression for one of the step definitions so that it is incorrect - say change “empty” to “emptie” in AddStepDefinitions.java. Now one or more tests will show as yellow, indicating that the test could not be run to completion because of a missing step definition.

This gives you the pattern for supporting the lines in acceptance criteria. Assuming you have the framework built out (we will provide examples for other languages as well), start by editing a .feature file to introduce one or more AC. Run Cucumber, noting which steps have no matching regular expressions. Either fix the steps to match an existing one or create an empty step definition with the appropriate regular expression. Use assertTrue(false); for the body. Once all steps are marked as red, implement each body. Iterate until all tests are green and you and your client are convinced that all features are adequately tested.

Tables

Suppose a video game displays a graphic based on the player’s performance: a black hole if the score is below 10, a sun if the score is above 10, and a supernova if the score is above 10 with no lives lost. You could write AC like

    Given the score is 5
    When the user reaches the end of the game
    Then the system displays a black hole

But this is going to get tedious if you add just a bit more complexity, and it will be difficult to ensure all cases have been covered. Cucumber supports tables:

    Scenario Outline: game ending results
    Given the score is <displayed_score>
    And the number of lives lost is <lost_lives>
    When the game ends
    Then the system displays <graphic>

    Examples:
    | displayed_score | lost_lives | graphic |
    |       3         |     9      | black hole |
    |       3         |     0      | black hole |
    |      11         |     5      | sun |
    |      11         |     0      | supernova  |
    |      10         |     5      | sun |
    |     500         |    89      | sun |
    |      -1         |     2      | black hole |

Some things to note:

This uses Scenario Outline instead of Scenario.
The key word Examples: precedes the table.
Each column is separated by vertical bars (|). The whitespace around the bars is just for readability; Cucumber ignores this spacing.
The first word in each column is its name, and the corresponding values are substituted for those names (when they appear between angle brackets) in the steps.
The result is essentially expanded into the appropriate number of scenarios, and each scenario is run in turn. So the above scenario represents 7 complete scenarios with all of the values substituted.

Best Practices

Stateless Scenarios

Just as each unit test must be separate from the others, each scenario must be complete. For example, if your system implements a shopping cart, do not write one scenario to fill the cart and another to compute the total bill. Instead, write the bill totaling scenario to create an empty cart, add items, then compute the total. Linking scenarios makes it complicated to debug the tests (“oh, you forgot to run Scenario X before Y!”) and almost impossible to understand them. If you have a large number of scenarios that depend on the same setup - say, a cart with 5 items in it - then write a step definition for that setup:

    @Given("cart has standard small order")

It is also possible to write before hooks that standardize the setup, but using step definitions to achieve this is ultimately more flexible.

Name steps by initial states

When requirements change, these are often reflected in changes to Then clauses. Name your steps by the Given and When clauses to minimize test maintenance.

Use personas and other recognizable items

Name different users and use those names consistently throughout the scenarios. Maybe Drew is the type of shopper who buys just one item on a visit, while Terry is the type who will buy five items but return one or two. You can then have standard setups based on the different types of shoppers to make scenarios more understandable. You might even add some adjectives: “StingyDrew” and “ExtravagantTerry”. The goal is to make scenarios as readable as possible.

Wrap-up

If your client is not going to review AC, then there is little point in using a framework like Cucumber. But then you have to ask why the client does not care about what the system does. Combining Cucumber with Docker and CI results in a project that can be tested all the time. Of course, there are other tools that do the same thing, possibly even better. But trying to run a project without BDD and CI is like trying to race in mud:

You might make forward progress eventually, but it will be too little too late.