{"id":445,"date":"2020-03-03T02:40:11","date_gmt":"2020-03-03T02:40:11","guid":{"rendered":"http:\/\/freedville.com\/blog\/?p=445"},"modified":"2020-03-03T02:40:44","modified_gmt":"2020-03-03T02:40:44","slug":"cognitive-system-testing-testing-at-the-beginning-with-ingestion-verification-test","status":"publish","type":"post","link":"https:\/\/freedville.com\/blog\/2020\/03\/03\/cognitive-system-testing-testing-at-the-beginning-with-ingestion-verification-test\/","title":{"rendered":"Cognitive system testing: Testing at the beginning with ingestion verification test"},"content":{"rendered":"\n<p>Part 3 of the\u00a0<a href=\"http:\/\/freedville.com\/blog\/2016\/12\/04\/cognitive-system-testing-from-a-to-z\/\"><strong>Cognitive System Testing<\/strong>\u00a0<\/a>series, originally posted in 2016 on <a href=\"https:\/\/developer.ibm.com\/\">IBM Developer<\/a>. <\/p>\n\n\n\n<p>\nA cognitive system is only as good as the data loaded into the\nsystem. Loading data into a cognitive system is often referred to as\nan \u201cingestion\u201d phase. Some systems do a single large ingestion,\nsome do continuous ingestion, and some do a \u201crip and replace\u201d\nseries of ingestions (where ingestion n is deployed, ingestion n+1 is\nrun later, and when ingestion n+1 is finished it replaces ingestion\nn). Since ingestion is the first part of a cognitive solution, it\u2019s\nimportant to get it right. Hence a need for a functional test of the\ningestion layer, which I call ingestion verification test.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Definition<\/h2>\n\n\n\n<p>\nAn ingestion verification test suite should cover all the steps that\nare covered in getting raw data from a source system, converting it\ninto an output format usable by the target system, and basic\nverification that the output format supports interactions that the\nrest of the solution will want to do with that output.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Types of ingestion processes<\/h2>\n\n\n\n<p>\nThere are several types of ingestion processes used by cognitive\nsystems, here are a couple that we have used and a description of how\nwe have tested them.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Local document extract\/transform\/load (ETL)<\/h3>\n\n\n\n<p>\nOne solution read XML files off of a local disk, used a series of\nparsers to extract key data, and stored this data in an output\nstructure (ultimately in a database). The test suite included JUnit\ntests of the parsers (given an input file, did the output format have\nfield values X, Y, Z?). Most of the tests used&nbsp;<a href=\"http:\/\/mockito.org\/\" target=\"_blank\" rel=\"noreferrer noopener\">Mockito<\/a>&nbsp;to\nstub out the database, but a handful of tests verified output at the\ndatabase level. (Even at the component level, we take advantage of\nthe&nbsp;<a href=\"http:\/\/martinfowler.com\/bliki\/TestPyramid.html\" target=\"_blank\" rel=\"noreferrer noopener\">testing\npyramid<\/a>.)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Remote service ETL<\/h3>\n\n\n\n<p>\nLike the local document solution above, except that documents were\nread off of remote systems via web service calls. In addition to the\nparsing tests above (with and without stubbed database), we added\ntests to verify the web service calls. We tested mostly with a stub\nof the remote web service calls, to verify that the right URLs were\naccessed and the right parsers called, but we also used a stub of the\nweb service itself (which simply returned static documents) to verify\nour usage of&nbsp;<a href=\"https:\/\/hc.apache.org\/httpcomponents-client-ga\/\" target=\"_blank\" rel=\"noreferrer noopener\">HttpClient<\/a>&nbsp;libraries\nwas correct.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Building a text index<\/h2>\n\n\n\n<p>\nSome of our solutions have a&nbsp;<a href=\"http:\/\/lucene.apache.org\/solr\/\" target=\"_blank\" rel=\"noreferrer noopener\">Solr<\/a>&nbsp;index\nat the heart. A Solr index allows you to do a variety of search\nqueries against a collection of documents (generally referred to as a\ncorpus). Our test suite for this index involved ingesting a small\nbatch of well-known\/curated documents and executing a series of Solr\nqueries to verify which documents were returned. The queries covered\nall the interesting searches used by our application, including\nsearching by case insensitivity (does search for \u201candrew\u201d return\ndocuments with \u201cAndrew\u201d?), numeric attributes (search for\ndocuments with greater than 50 citations), and entity searching (does\nsearch for \u201canimals\u201d return documents with \u201ccat\u201d and \u201cdog\u201d)<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Smoke testing the ingestion<\/h2>\n\n\n\n<p>\nReiterating the test pyramid notion, it is very useful to have a\nsmoke test suite for your ingestion test suite. The smoke tests I\nhave used in ingestion are very simple. Ingest one (or a few\ndocuments). Do you have a database produced, and does it contain any\nrows?&nbsp;&nbsp; Or, do you have a text index produced, and are\nthere any documents in it? As with our other smoke test suites, the\nimportant thing is a quick, non-brittle test. Test for non-zero, not\na specific number of outputs \u2013 other tests can verify the right\nnumber.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Setting up a good ingestion verification test\nsuite<\/h2>\n\n\n\n<p>\nAs an ingestion developer, it may be fun to talk about your ability\nto ingest millions of documents, or the blazing fast speed of your\ningestion process, but the ingestion process is no good if it does\nnot support the needs of the application. The ingestion test suite\nshould certainly prove that the ingestion worked functionally, but it\nmust include a basic exercise of the ingested output as the full\napplication will do at runtime. Talk with the rest of the solution\nteam, find out what kinds of queries they want to run against your\ningestion output, and verify that the ingestion output supports those\nqueries.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">In case of errors<\/h2>\n\n\n\n<p>\nThe severity of ingestion test suite errors is determined by what\ntype of ingestion process you have. If you have continuous ingestion,\nan ingestion test suite failure may be a Severity 1 \u2013 System Down\nsituation. For a rip-and-replace ingestion process, an error simply\nmeans you have to stay on the previous successful ingestion a little\nlonger. No matter the severity, the same troubleshooting techniques\ndescribed in our smoke test chapter make sense here \u2013 scan the logs\nand notify the right people.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p>\nThe ingestion process runs at the beginning of any cognitive system\nand errors in ingestion may prevent the rest of the solution from\nhaving access to data. Thus treat ingestion verification testing as a\ncritical part of your testing plan. The ingestion verification test\nneeds to focus not just on the functionality of the ingestion process\nbut that the ingestion process produces output that is usable by the\nrest of the cognitive system by supporting all the major query types\nthat system is known to execute.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Part 3 of the\u00a0Cognitive System Testing\u00a0series, originally posted in 2016 on IBM Developer. A cognitive system is only as good as the data loaded into the system. Loading data into a cognitive system is often&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/posts\/445"}],"collection":[{"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/comments?post=445"}],"version-history":[{"count":3,"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/posts\/445\/revisions"}],"predecessor-version":[{"id":450,"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/posts\/445\/revisions\/450"}],"wp:attachment":[{"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/media?parent=445"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/categories?post=445"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/tags?post=445"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}