Cognitive System Testing: Concluding remarks

Part 7 of the Cognitive System Testing series, originally posted in 2016 on IBM Developer.

Introduction

This blog series has focused on a set of techniques needed for testing a cognitive system. As previously discussed, the cognitive system itself may be probabilistic and non-deterministic but it can still be tested. How are you supposed to know which tests to apply and which tests to spend your precious energy on? There are some guiding principles you can use in your testing journey.

Don’t lose sight of quality

It can be fun to inject lots of different testing techniques, frameworks, and ideas into your test plan. Don’t get carried away with testing for testing’s sake. The whole reason you are writing tests for various parts of the system is to ensure quality of the system as a whole. If you find yourself working on tests that are not contributing to the end-result quality of the system, abandon these tests!

Write the tests you need

Time is a precious resource, you only have so much of it to write tests for your system. Tests are subject to diminishing returns – at some point adding a new test does not catch any new bugs. Furthermore, consider how many tests each part of the system has and avoid writing too many tests for high-functioning parts. If you are about to write your 200th XML parsing test and your NLP code is full of bugs, set down the XML test and fix your NLP! Similarly, don’t focus too much effort writing additional tests for rarely used components of your system if other components are lightly tested.

Find where your system performs the worst and iterate, iterate, iterate. Your system will have many moving parts and the interactions between all of them may be unclear when you start testing. No matter which level of testing you are doing (system, functional, unit), find the part where the most problems are and fix that first (while adding new tests, of course). Just keep repeating this process. This ensures you are focusing your testing effort where you need it most.

Know when to quit testing

A cognitive system is never going to achieve 100% accuracy and accuracy improvements come with a cost of diminishing returns. Consider how much any accuracy improvements will help your system and how much they will cost. When building natural language processing code, a final F1 score of 80% accuracy is very good. Higher accuracy is possible but at very high cost. Other subsystems may require 100% or near accuracy (such as parsers, API layers, etc) – use your judgement as to what level of accuracy you need and how much you are willing to pay for it.

Conclusion

Follow the testing pyramid when thinking about how to test a cognitive system. Remember why you are writing tests – for a high quality system – and write only the tests you need. Focus on the worst-performing parts of your system and iteratively improve them. Stop testing and improving the system when you have reached a “good enough” accuracy and further improvement is prohibitively expensive.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.