If it ain’t tested, it’s broke. This is a quote from an avionics testing engineer. Testing cannot prove software works, but all you know about untested code is that it will definitely fail somewhere. However, developers often rely on undocumented, hand-executed tests. This makes testing unrepeatable. If the tests are documented, they are still tedious and rarely re-run. This sort of manual testing does little to promote stability. Stability is needed: code that worked yesterday often fails tomorrow. Errors creep in. It is important to repeat tests. Together, these facts mean that tests must be thorough and automated. That is, comprehensive, automated tests must be part of the delivered solution.
The most common solution is to write module-level tests with tools like JUnit or NUnit and to augment them with higher level tests for a few key features. While this sort of testing can be effective, it is not readable by non-technical clients. It is also too detailed for technical clients who have other responsibilities. To provide confidence, clients need thorough, executable tests that they can read. Interestingly, the basis for the tests should already be available to you: all PBIs should have at least one detailed acceptance criterion (AC). It would be great if the AC could be used to write the tests.
This is the idea behind Behavior-Driven Development (BDD): we use tools to translate AC into executable tests. This gives a direct path from AC to testing, putting the product owner in charge of what is being tested. At a minimum this provides executable acceptance tests: the client accepts the software once it passes all tests. It also documents required behavior in an executable, readable format. People who are interested in details about system behavior can simply read the tests. BDD makes system requirements explicit and testable in a way that cannot be achieved by standard requirements documents and standard testing.
Of course, writing thorough AC is challenging. Often clients and product owners do not have the skills. They might choose to hire a team of specialists to write them, but that is also not always feasible. This means the development team may need to write the AC instead. But the win of (automated) BDD is that the tests can at least be reviewed with clients using language they understand. (And if a client cannot be bothered, one wonders about whether they have any commitment to the project!) This means that the AC document expected behavior as approved by clients; the AC become an executable requirements document. If an error is found, the solution is simple: write AC revealing the error and create a PBI to get those AC to pass. BDD closes the loop between project concept and tested implementation. If you ask a competent software developer about whether they use some form of BDD, you are likely to get the response “who wouldn’t?”
BDD is illustrated by the simple project at
https://gitlab.com/hasker/cucumber-phonebook. As described in
README.md
, this system is a simple, command-line implementation of a
phonebook. The names and numbers are stored in a text file, and the program
uses command-line arguments to list, add, and remove entries. The project
uses Cucumber to implement BDD. Cucumber includes a
suite of tools supporting many development languages and over 70 spoken
languages. Thus you can write
your code in C or
Go and your AC
in Česky (Czech) or Tiếng Việt (Vietnamese). It is possible to integrate
Cucumber with any development framework, so it works for web-based projects
as well as GUI applications. We will use this project to cover the basics
of BDD with Cucumber.
As an aside, review Build | Pipelines and note this project dates
to 2022. One of the strengths of using Docker is that tests that ran years ago can still run today with minimal updates. Executing the tests through CI (continuous integration) lowers barriers further.
To see how Cucumber works, navigate to add_phone.feature
in
src / test / resources / pbook_command_tests
The very top of the file is an introductory section describing what is
being tested in the file. The first line starts with the keyword Feature
;
this line is written to the output to document which tests are being
executed. All text below the Feature
line is treated as commentary (that
is, ignored by Cucumber). It provides further context for human
readers. The bulk of the file is a collection of testing scenarios. Each
scenario begins with the keyword Scenario
followed by a description of an
individual test. The description consists of lines of Given
, When
,
Then
, and And
statements. These statements capture the AC that would be
in a PBI. Note they use actual names and numbers to make the AC concrete,
that is, to make them fully testable. The Given
portion describes the
setup, the When
portion describes the action taken by the user, and the
Then
portion describes the result.
Review the first scenario:
Scenario: add a person and list
Given an empty phonebook
When I add Xander with number 123
Then Xander's number is 123
And the number of entries is 1
This starts with an empty phonebook, adds Xander with the number 123, and
confirms that when the phonebook is listed it has just one entry and that
entry is as expected. And you likely figured that out simply by reading the
scenario! That is the key: this is a test that anyone fluent in English
can read. The syntax is also very simple; whitespace at the starts of lines
is ignored, case is ignored, scenarios are separated by blank lines, and
each line starts
with one of the four step keywords. In fact, the step keywords are essentially
ignored; they make the scenario more readable, but you could rewrite it
using all Then
statements. Cucumber also allows stars:
Scenario: add a person and list
* an empty phonebook
* I add Xander with number 123
* Xander's number is 123
* the number of entries is 1
This often makes it easier to read through thousands of scenarios, but you
should use Given
/When
/Then
/And
for smaller projects.
To check these, there must be a way to link natural (spoken) language
statements to executable code. This is done through regular
expressions. These allow
the developer to describe patterns of strings and numbers so they can be
captured by the code and turned into actions. See the short tutorial at
https://www.regexone.com/ to become familiar with regular
expressions. While regular expressions can become very complex, the
RegexOne site covers the regular expressions you will see the most often:
\^
, \?
, \$
, [a-zA-Z]+
, and \\d+
:
\^
: matches the start of the input; this ensures there is no extra text at the start of the target phrase\$
: matches the end of the input: this ensures there is no extra text after the target phrase\?
: marks the previous character as optional; numbers?
means that the s
at the end is optional, allowing the text to be either number
or numbers
[a-zA-Z]+
: matches a word of any length that includes both upper and lower case letters; the +
in this means “one or more of the preceding”\\d+
: \\d
matches any digit (0–9), and the +
again means “one or more”, so this matches any integer
The parentheses allow the system to capture the matched word so you can use
it in your code. This is also explained on the RegexOne site.
For example, the regular expression ^I add ([a-zA-Z]+) with number (\\d+)$
matches text starting with “I add”, containing a word followed by “with number” and a numeric value. The numeric value must end the text. Thus the test step
When I add Xander with number 123
matches this with “Xander” as the first item captured and “123” as the second.
As discussed above, a test specification consists of its description
followed by a number of individual tests, and each test consists of a
series of test steps starting with given
, if
, when
, and
, and
then
. Another input is a collection of step definition files that
define the action taken for each test step. Cucumber reads the tests, finds
the matching step definition (using regular expression processing), and
executes the code associated with that step definition. Each step either
succeeds or fails, and a test passes if all of its steps pass. Cucumber
reports on any failed steps. A test suite passes if all of the tests
pass. Since the tests reflect the AC, Cucumber effectively confirms that
all AC are implemented.
See
src / src / test / java / pbook_tests / AddStepDefinitions.java
in the repository for example step definitions. The file starts with library imports and a class name. The remainder of the file gives the step definitions. For example,
@Given("^I reset the phonebook$")
public void do_reset() {
String[] ignored = PhoneCommandCapture.instance().outputLines(new Phones(), "reset", "");
}
matches when the test step is “When I reset the phonebook”, and the code
resets the phonebook by running the program with the command line argument
reset
. In this case the output is ignored. Another step definition in the
file specifies how to check how many entries are in the phonebook, so the
test simply confirms that the phonebook is empty after it has been reset.
You may note that there are specifications associated with @Given
,
@When
, and @Then
. Just as in the test specifications, each of these
terms is treated as equivalent; the value of using one or the other is to
make the specification more readable. As discussed above,
the step definition for
^I add ([a-zA-Z]+) with number (\\d+)$
matches I add Xander with number 123
, capturing Xander
as the first
item and 123
as the second. Cucumber calls the corresponding code,
do_add
, with name
set to the person’s name and number
set to the
person’s (integer) phone number. do_add
then executes the command
java Phones add
and submits the name and number when prompted. Note the procedure
check_phone_number
later in the file; this executes
java Phones list
and confirms the specified name and number is in the list. If either
assertion fails, Cucumber records an error. It would be wrong if add
inserted additional records, so there is a step definition
check_entry_count
that confirms the phone book has the right length.
This file contains sufficient step definitions to implement all of the
scenarios.
The provided tests only check just some of the add and list functionality. We’d also want to check remove and add more checks for add. You could put all of the steps into the same step definition file - step definition file names do not need to match scenario file names - but you would likely want to organize step definitions in a reasonable way.
A successful run is all green such as shown in
That is, the output should be, “green like a cuke.”
Remember the phrase “if it ain’t tested, it’s broke”? We should test our
test environment as well! Clone the repository and change the add
operation in some reasonable way. For example, you might delete the body of
the unexecute
method. Use docker to re-run the tests, confirming that one
or more show up in red. Undo this error, then modify the regular expression
for one of the step definitions so that it is incorrect - say change
“empty” to “emptie” in AddStepDefinitions.java
. Now one or more tests
will show as yellow, indicating that the test could not be run to
completion because of a missing step definition.
This gives you the pattern for supporting the lines in acceptance
criteria. Assuming you have the framework built out (we will provide
examples for other languages as well), start by editing a .feature
file
to introduce one or more AC. Run Cucumber, noting which steps have no
matching regular expressions. Either fix the steps to match an existing one
or create an empty step definition with the appropriate regular
expression. Use assertTrue(false);
for the body. Once all steps are
marked as red, implement each body. Iterate until all tests are green and
you and your client are convinced that all features are adequately tested.
Suppose a video game displays a graphic based on the player’s performance: a black hole if the score is below 10, a sun if the score is above 10, and a supernova if the score is above 10 with no lives lost. You could write AC like
Given the score is 5
When the user reaches the end of the game
Then the system displays a black hole
But this is going to get tedious if you add just a bit more complexity, and it will be difficult to ensure all cases have been covered. Cucumber supports tables:
Scenario Outline: game ending results
Given the score is <displayed_score>
And the number of lives lost is <lost_lives>
When the game ends
Then the system displays <graphic>
Examples:
| displayed_score | lost_lives | graphic |
| 3 | 9 | black hole |
| 3 | 0 | black hole |
| 11 | 5 | sun |
| 11 | 0 | supernova |
| 10 | 5 | sun |
| 500 | 89 | sun |
| -1 | 2 | black hole |
Some things to note:
Scenario Outline
instead of Scenario
.Examples:
precedes the table.|
). The whitespace around
the bars is just for readability; Cucumber ignores this spacing.Just as each unit test must be separate from the others, each scenario must be complete. For example, if your system implements a shopping cart, do not write one scenario to fill the cart and another to compute the total bill. Instead, write the bill totaling scenario to create an empty cart, add items, then compute the total. Linking scenarios makes it complicated to debug the tests (“oh, you forgot to run Scenario X before Y!”) and almost impossible to understand them. If you have a large number of scenarios that depend on the same setup - say, a cart with 5 items in it - then write a step definition for that setup:
@Given("cart has standard small order")
It is also possible to write before hooks that standardize the setup, but using step definitions to achieve this is ultimately more flexible.
When requirements change, these are often reflected in changes to Then
clauses. Name your steps by the Given
and When
clauses to minimize test
maintenance.
Name different users and use those names consistently throughout the scenarios. Maybe Drew is the type of shopper who buys just one item on a visit, while Terry is the type who will buy five items but return one or two. You can then have standard setups based on the different types of shoppers to make scenarios more understandable. You might even add some adjectives: “StingyDrew” and “ExtravagantTerry”. The goal is to make scenarios as readable as possible.
If your client is not going to review AC, then there is little point in using a framework like Cucumber. But then you have to ask why the client does not care about what the system does. Combining Cucumber with Docker and CI results in a project that can be tested all the time. Of course, there are other tools that do the same thing, possibly even better. But trying to run a project without BDD and CI is like trying to race in mud:
You might make forward progress eventually, but it will be too little too late.