{"id":499,"date":"2022-03-01T17:36:59","date_gmt":"2022-03-01T22:36:59","guid":{"rendered":"https:\/\/freedville.com\/blog\/?p=499"},"modified":"2022-03-01T17:37:01","modified_gmt":"2022-03-01T22:37:01","slug":"book-review-designing-data-intensive-applications","status":"publish","type":"post","link":"https:\/\/freedville.com\/blog\/2022\/03\/01\/book-review-designing-data-intensive-applications\/","title":{"rendered":"Book Review: Designing Data-Intensive Applications"},"content":{"rendered":"\n<h1 class=\"wp-block-heading\"><\/h1>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" loading=\"lazy\" width=\"700\" height=\"524\" src=\"https:\/\/freedville.com\/blog\/wp-content\/uploads\/2022\/03\/book-cover-700x524.png\" alt=\"Book cover:  Designing Data-Intensive Applications by Martin Kleppmann\" class=\"wp-image-500\" srcset=\"https:\/\/freedville.com\/blog\/wp-content\/uploads\/2022\/03\/book-cover-700x524.png 700w, https:\/\/freedville.com\/blog\/wp-content\/uploads\/2022\/03\/book-cover-300x224.png 300w, https:\/\/freedville.com\/blog\/wp-content\/uploads\/2022\/03\/book-cover-768x574.png 768w, https:\/\/freedville.com\/blog\/wp-content\/uploads\/2022\/03\/book-cover.png 936w\" sizes=\"(max-width: 700px) 100vw, 700px\" \/><figcaption>I\u2019m so glad I read <a href=\"https:\/\/www.oreilly.com\/library\/view\/designing-data-intensive-applications\/9781491903063\/\">Designing Data-Intensive Applications<\/a> by Martin Kleppmann!<\/figcaption><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p><a href=\"https:\/\/www.oreilly.com\/library\/view\/designing-data-intensive-applications\/9781491903063\/\">Designing Data-Intensive Applications<\/a> by Martin Kleppmann is a tour de force, a wonderful book for anyone interested in learning about the kinds of applications where data is the key constraint.&nbsp; Kleppmann\u2019s central thesis is that applications are rarely CPU-bound, especially given that CPUs are not getting much faster, and parallelism is on the rise.&nbsp; You\u2019ve probably heard that Internet companies and hyperscalers achieve their amazing scale via horizontal scaling rather than vertical scaling (adding more servers instead of making each server bigger).&nbsp; This book fundamentally lays out how that is possible.<\/p>\n\n\n\n<p>As a university student, I did not fully appreciate how relational databases worked so well until the instructor did a lesson on B-trees.&nbsp; I knew a primary law of computing: \u201c\u2019never\u2019 go to the disk\u201d, but could not fathom how that was possible.&nbsp; On seeing a B-tree I first thought \u201cwhat an amazingly odd data structure\u201d.&nbsp; Then I had a flash of insight, on how this data structure prevented you from searching a disk for some database record (assuming you had enough RAM to keep the B-tree in memory).&nbsp; That one lesson made four years of computer theory make sense.&nbsp; <a href=\"https:\/\/www.oreilly.com\/library\/view\/designing-data-intensive-applications\/9781491903063\/\">Designing Data-Intensive Applications<\/a> had the same impact on me.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" loading=\"lazy\" width=\"700\" height=\"467\" src=\"https:\/\/freedville.com\/blog\/wp-content\/uploads\/2022\/03\/niko-photos-tGTVxeOr_Rs-unsplash-700x467.jpeg\" alt=\"Wide tree\" class=\"wp-image-501\" srcset=\"https:\/\/freedville.com\/blog\/wp-content\/uploads\/2022\/03\/niko-photos-tGTVxeOr_Rs-unsplash-700x467.jpeg 700w, https:\/\/freedville.com\/blog\/wp-content\/uploads\/2022\/03\/niko-photos-tGTVxeOr_Rs-unsplash-300x200.jpeg 300w, https:\/\/freedville.com\/blog\/wp-content\/uploads\/2022\/03\/niko-photos-tGTVxeOr_Rs-unsplash-768x512.jpeg 768w, https:\/\/freedville.com\/blog\/wp-content\/uploads\/2022\/03\/niko-photos-tGTVxeOr_Rs-unsplash-1536x1024.jpeg 1536w, https:\/\/freedville.com\/blog\/wp-content\/uploads\/2022\/03\/niko-photos-tGTVxeOr_Rs-unsplash-2048x1365.jpeg 2048w, https:\/\/freedville.com\/blog\/wp-content\/uploads\/2022\/03\/niko-photos-tGTVxeOr_Rs-unsplash-175x117.jpeg 175w\" sizes=\"(max-width: 700px) 100vw, 700px\" \/><figcaption>B-trees are even wider than this! Photo by <a href=\"https:\/\/unsplash.com\/@niko_photos?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText\">niko photos<\/a> on <a href=\"https:\/\/unsplash.com\/s\/photos\/tree?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText\">Unsplash<\/a><\/figcaption><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>The book starts with a definition of scalability: a system\u2019s ability to deal with increased load.&nbsp; &nbsp;Kleppmann reminds the reader to never say \u201cX doesn\u2019t scale\u201d but instead to address \u201cif a system grows in a certain way, how can we deal with the growth?\u201d.&nbsp; (This is a much more thoughtful question than you often see on Internet forums!)&nbsp; That key point is worth repeating \u2013 scaling is not a binary question.<\/p>\n\n\n\n<p>Kleppmann starts at the beginning.&nbsp; The book dives into common data structures used, even in single-system applications.&nbsp; I even got to see B-trees again!&nbsp; After each structure is introduced, he starts describing how these structures can work when more machines are added.&nbsp;&nbsp; \u201cHorizontal scaling\u201d seems easy to talk about, but Kleppmann describes how it actually works.<\/p>\n\n\n\n<p>A key concern in data-intensive applications is how to keep the data consistent.&nbsp; Scaling is generally achieved by adding more copies of things, such as application servers, worker nodes, or database replicas, but it\u2019s no small feat to keep those in sync. &nbsp;When you know that anything can fail, and you don\u2019t want single points of failure, you\u2019re inclined to add duplicates.&nbsp; But if there are two independent write operations to the same record, which should win?&nbsp; It\u2019s challenging in a single-server scenario and even harder when multiple servers are involved.&nbsp; &nbsp;Kleppmann walks through all of the scenarios, the common solutions, and the tradeoffs involved.<\/p>\n\n\n\n<p>A delightful bonus is that each chapter is adorned with a Tolkien-like map.&nbsp; Seeing \u201cThe Land of the B-Trees\u201d and the \u201cLog Structured Storage\u201d in relation to the \u201cKingdom of Analytics\u201d was a fun way to conceptualize the different kinds of databases one may encounter.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" loading=\"lazy\" width=\"700\" height=\"522\" src=\"https:\/\/freedville.com\/blog\/wp-content\/uploads\/2022\/03\/book-maps-700x522.png\" alt=\"Map from chapter 3 of the book.\" class=\"wp-image-502\" srcset=\"https:\/\/freedville.com\/blog\/wp-content\/uploads\/2022\/03\/book-maps-700x522.png 700w, https:\/\/freedville.com\/blog\/wp-content\/uploads\/2022\/03\/book-maps-300x224.png 300w, https:\/\/freedville.com\/blog\/wp-content\/uploads\/2022\/03\/book-maps-768x573.png 768w, https:\/\/freedville.com\/blog\/wp-content\/uploads\/2022\/03\/book-maps.png 936w\" sizes=\"(max-width: 700px) 100vw, 700px\" \/><figcaption>Map for Chapter 3: Storage and Retrieval<\/figcaption><\/figure>\n\n\n\n<p>This book is a wonderful primer to any study of System Design and has been a wonderful complement to the microservices books I\u2019ve been reading (<a href=\"https:\/\/freedville.com\/blog\/2022\/01\/18\/book-review-microservices-in-action\/\">Microservices In Action<\/a>, <a href=\"https:\/\/freedville.com\/blog\/2021\/12\/21\/book-review-microservices-patterns\/\">Microservices Patterns<\/a>, and Microservices: A Practical Guide).&nbsp; I\u2019m glad I got to revisit many concepts from those books in a different context.&nbsp; Designing Data-Intensive Applications reflects a lifetime\u2019s worth of knowledge and it\u2019s easy to see how the concepts from this book have been with us since the dawn of computing.&nbsp; This book is well worth the read!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Designing Data-Intensive Applications by Martin Kleppmann is a tour de force, a wonderful book for anyone interested in learning about the kinds of applications where data is the key constraint.&nbsp; Kleppmann\u2019s central thesis is that&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/posts\/499"}],"collection":[{"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/comments?post=499"}],"version-history":[{"count":2,"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/posts\/499\/revisions"}],"predecessor-version":[{"id":504,"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/posts\/499\/revisions\/504"}],"wp:attachment":[{"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/media?parent=499"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/categories?post=499"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/freedville.com\/blog\/wp-json\/wp\/v2\/tags?post=499"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}