How to test a cognitive system (and why it’s so important)

Part 1 of the Cognitive System Testing series, originally posted in 2016 on IBM Developer.

Introduction

I joined IBM Watson in 2012 and immediately became interested in how cognitive solutions, such as those based on Watson, are tested. Watson solutions are probabilistic systems, generally based on machine learning algorithms involving hundreds or thousands of variables, and as such are a far cry from the deterministic systems I was used to. This chapter and several following will chronicle my experiences testing these systems as well as sharing some lessons learned along the way.

Manual testing

The first type of testing I encountered was “traditional”, manual testing. Most software developers are familiar with the pros and cons of manual testing, so let’s not dwell too long here. When all else fails, even Watson can be tested manually. As with manually testing any other software, this approach can find bugs but at very high cost. In our infancy, manual testing made up the bulk of our testing efforts. Today, we still do some manual testing, but sparingly, as we rely more on manually tests.

Automated testing

I much prefer using automated tests to test my software. Write once, run many times! But where do you begin with a software system that does not behave like other software systems?

Testing triangle

It turns out that the “testing triangle” (see Martin Fowler’s Test Pyramid post) provides a useful guidepost.

In summary there are three layers of the triangle, and as you scale the triangle the tests cover more function, cost more to write, and are slower to run.

Unit test – a unit test ideally tests one class, and since there are so many classes, you should necessarily have the most tests in unit test layer

Functional test – a test that tests on component or a logical group of classes

UI test – a test that tests the entire application, just like a user uses the application. If you don’t have a graphical UI, this can just represent your system tests

This triangle approach applies nicely to cognitive systems and we have adapted a version of this approach within Watson. Most of what I learned in my Watson experience was interesting parts in the functional layer.

Before the triangle

We needed a transitive step before converting our test plan entirely to the testing triangle. There are enough components in a cognitive solution that some form of smoke testing is required before passing the system to other tests (or testers). In our question and answer (QA) systems, the smoke test was simply to send a question through the system and verify that an answer came out. Notice I said “an answer”, not “the answer”, as the purpose of this test is to verify that the system has basic functionality. Our smoke testing philosophies will be explored in further chapters.

Components of a cognitive system

Cognitive systems typically have at least four components:

Ingestion – retrieve raw data, transform into a format suitable for processing by the rest of the system

Natural Language Processing (NLP) – extract concepts and meaning from plain, natural language test

Question pipeline– given a question/query from a user, traverse a knowledge graph and provide a useful response to the user

REST API – facilitate communication between components and interface elements

I’ll cover testing over these components, as well as the full system and the smallest units, in future chapters. These chapters are shown with their relation to the testing triangle.

There’s a time and place for manual testing, but even in cognitive systems, automated testing is best.