Interview: Downsides of packages, upsides of jUnit (with Elisabeth Hendrickson and Chris McMahon) ("Packages", Part 4)
Download MP3Brian Marick 0:02
Welcome to Oddly Influenced, a podcast about how people have applied ideas from outside software to software. Episode Five: bandwagons can squeeze out good ideas.
Brian Marick 0:23
This is the last four episodes on Joan Fujimura's idea of packages that spread scientific theories and technology together. Nothing is purely good and the proto-oncogene package Fujimura documents is no exception. Previous episodes concentrated on the good. This one will explain some problems that arose exactly because recombinant DNA was so successful. As discussed before, the TDD plus jUnit package was not successful the way the proto-oncogene package was. However, jUnit did reinforce the idea that programmers should be testing more, and they had fewer excuses not to, given how widespread programmer testing frameworks were in languages and development environments. It seems to me that jUnit's mini-bandwagon produced harms that echo those of recombinant DNA.
Brian Marick 1:13
... record scratch noise ...
Brian Marick 1:14
But it turns out I was wrong. And the bulk of this episode is Elizabeth Hendrickson and Chris McMahon telling me how. Before we get to that, though, let's talk about the negative results of the recombinant DNA bandwagon.
Brian Marick 1:28
One result was that so many people shifted their research to incorporate recombinant DNA technologies. And it wasn't just people who were already working on DNA. It was people working on other things. As someone said, "it's fun to do DNA work after you've worked on a protein for a while and are frustrated with it. DNA does very consistent things. Proteins are much more changeable. There are more different kinds of proteins and they move around."
Brian Marick 1:58
So they got a lot of newbies in the field, and it's probably inevitable that those people would perform shallower research. That's especially true because it seems to me the pressure to publish, so as not to perish, ramped up during this time. Recombinant DNA Tech was a way to live with that heightened pressure, because it allowed the quicker completion of what's sometimes called "a least publishable unit". A researcher who didn't jump on the bandwagon was likely to produce results more slowly. And as an old-style and privately funded scientist said to Fujimura, "therefore, a new scientist doesn't go to the fundamental problems that are very difficult, that take a long time. He's not going to study how to grow epithelial cells for 30 years, he'd be fired long before that. So he has to pick a surefire project that's going to yield results quickly.
Brian Marick 2:53
Even established, researchers sometimes had to incorporate recombinant DNA work into their labs in order to keep the funding flowing. The exceptions were people like a Berkeley emeritus professor who told Fujimura "not many people are interested in this problem because it's a stinker to work with. There are a lot more profitable things to do. You have to be a brave soul to do it or somebody my age, so it doesn't matter. I don't have to prove anything anymore. I don't have to be 'good enough' to be at the University of California Berkeley."
Brian Marick 3:28
The result was a bias toward exploiting recombinant DNA over working on other problems. As Varmus, one of the inventors of the proto-oncogene theory put it, "in the areas of light, there is so much to do. And with so much light, it's difficult to be attracted to areas of darkness. Areas where the most remarkable discoveries are waiting to be made.
Brian Marick 3:51
Bandwagons like the proto-oncogene package tend to steamroller over people who are outsiders, or want to maintain less fashionable ways of approaching the problem.
Brian Marick 4:04
So I was ready at this point to have Elizabeth and Chris tell stories about how exploratory testing and end-to-end testing got pushed aside by a focus on programmer testing with jUnit. Meaning these other kinds of testing had their development stunted. As you'll hear, they disagree. What got pushed aside were styles of testing that were always a bad idea. Not doing the bad kind of testing freed up time to do more of the good kind. The problems I was thinking of were either temporary bubbles or problems for companies that had much larger problems. So let's listen to these industry veterans school me.
Brian Marick 4:43
First, introductions. I invited Elizabeth Hendrickson because I knew her as an expert in exploratory testing during the jUnit bandwagon. I recommend her book /Explore It/. She moved on from testing around then and got real jobs while I stayed a consultant. Her newest venture, Curious Duck Digital Laboratory, helps leaders reason about trade-offs in software development. She cares deeply about people, delivering value, fast feedback cycles, and interesting technology, in that order.
Brian Marick 5:15
I invited Chris McMahon because he was and is one of the more thoughtful people doing whole-system testing, especially through web browsers. He used to do a fair amount of writing and influencing, though he was never the sort of independent consultant that Elizabeth and I were. He's abandoned thought leadering and asks you, should you meet him, to greet him with "Say, didn't you used to be somebody?"
Brian Marick 5:38
You can find both of their Twitter handles and other info in the show notes.
Brian Marick 5:42
After I listened to the recording of the interview, I realized I have a lot of work to do before I'm even an okay interviewer. I'll spare you most of my slow rambling questions and put the interview in a better order than the one I originally gave it.
Brian Marick 5:58
As the saying goes, this interview has been edited for length and clarity.
Brian Marick 6:02
So let's begin. For those of you not familiar with it, exploratory testing is a style that combines deciding what to test, how to test, and doing the testing – in a very rapid feedback loop. It's often, but not necessarily, manual. In a typical session, you might see an exploratory tester try something, notice something odd, then try variations to try to amplify an oddness into a legitimate bug.
Brian Marick 6:31
In keeping with the theme of the episode, I first gave Elizabeth a chance to say that the jUnit bandwagon had hurt exploratory testing. But she disappointed me.
Elisabeth Hendrickson 6:41
The existing practice, the fact was that the norm was these big QA departments. And part of the reason was because somebody had to actually run through the checklists. And so much of the testing was manual, scripted manual, and it was scripted, because that was the safety net. And that meant the big controversy at the time was, oh, exploratory testing... It's irresponsible to spend that valuable time doing exploratory testing, because you don't know what you're going to come up with. And there's all of this regression testing that needs to happen. So testers should be focused on regression testing.
Elisabeth Hendrickson 7:23
And in hindsight, looking at that 20 years later, of course, the same, the same, almost rage goes off in my head about what are you talking about, this is insane that you would not want... please forgive my language there. It's not insane. It's... it's... suboptimal that you would not want the most information that you could possibly get about your software. But if we flash forward to the enablement for developers to actually do unit testing, and code is now well composed, very cleanly organized, you can test the units and you've used... pick your favorite flavor of xUnit – if you've got that, at least, there is so much less low hanging fruit to be found with a rigorous manually executed regression test script. That gave rise to the possibility of exploratory testing.
Elisabeth Hendrickson 8:14
And I actually think Ward Cunningham put it best, because he said, the thing about XP is the code is always ready to explore. And so I, you know, I credit xUnit and and that whole discipline of writing software very well with the organization's ability to actually get value out of exploratory testing, where before they couldn't,
Brian Marick 8:38
Well. So I gave Chris his chance to explain how jUnit ruined everything.
Chris McMahon 8:44
In my experience, TDD is not the value, it's the knock on experience. What TDD makes possible is continuous integration, and continuous deployment. And so in my experience... so I specialize in doing system-wide end-to-end testing. And for that, I need a test environment that has user interface, that has APIs, has a data store, blah, blah, blah. This is this is my niche that I work in, these really big, large scale end-to-end system tests. And so, what units make possible is continuous integration. And if I can get continuous integration, then I can deploy to a test environment anytime I want to. And then I can explore in there, and I have a large regression suite of end-to-end automated tests.
Chris McMahon 9:41
But what what this changes... what continuous integration and continuous deployment changes for QA and testing is: if I find a bug, I can talk to a developer and say, "Yo, I found a bug." And the developer can say, "Oh, wow, good catch. Let me check. Let me change the code." And they'll fix the bug. And they'll run it through to continuous integration. And that takes seconds. And we'll do a deploy to the test environment. And that takes minutes. And I can check that the fix is good. And that takes maybe a minute, and then I can run the regression suite. And that takes a few minutes. So when your cycle... Elisabeth often talks about feedback loops a lot, this was her theme for a long, long time. But when your feedback loop is "find a bug, fix the bug, check that the fix is good, deploy to a system-wide test environment, run a regression suite against the system", and that cycle is a matter of minutes... That, that changed the world.
Brian Marick 10:45
But surely, there must be some downsides to programmer testing.
Elisabeth Hendrickson 10:50
If you're looking for the bad side, overconfidence, I think, is one of the bad sides. I think it's easier to fix overconfidence than it is to fix the inherent mess of not having developers testing at all. But I've encountered many, many, many companies and projects where there was nobody in a formal test, QA exploratory, whatever, role – plus there weren't really those skills on the team. And the developers basically, were saying, "well, we TDD all the things. And we've got all of these tiny, appropriately sized handful of automated acceptance tests, and we've got a product manager, everything should be fine. We don't need any of that there QA."
Elisabeth Hendrickson 11:40
And sometimes they're right. One of the best projects that I was ever on had zero QA people, but we had a team of mixed skills. I was on that team. I was the worst programmer on the team. But I was the only person on the team that had that background in testing and a sense of the kinds of things that could go wrong. And so during development, just as part of how we worked, we were exploring to find risks and fixing them.
Elisabeth Hendrickson 12:06
But I've also walked into other teams, where they didn't have anybody doing that. And they basically took the attitude of well, we're developers and we're TDDing, so it's all fine. And, spoiler, it's not fine. It then ships to customers and has severely value-limiting behavior problems. Like not just something that might bug somebody, like a character walking through a wall in an early internet fun game, but the kinds of bugs that prevent customers from giving you money, for example.
Brian Marick 12:42
At this point, it's reasonable for me to admit that I was one of those overconfident programmers.
Brian Marick 12:49
(recorded earlier)... I was a consultant at that time, and for some reason, I needed to keep track of my hours. So I TDDed up a program I could use keep track of my hours. Then there was some sort of testing workshop of the sort that Elizabeth and I used to go to a lot back in those days. And we set aside an extra half day in which Elizabeth would test my program
Brian Marick 13:17
She sat down at the command line interface, which was what I used, and it was no more than five minutes... she did this dramatic push back from the desktop and said, "Look!". And somehow or other, she had gotten it to start tracking the same time for two completely different tasks at once, which is probably a feature if you're a lawyer and are doing billable hours.
Brian Marick 13:49
But I so distinctly remember my reaction as a typical programmer, was "you... you must have cheated." And I got immediately all defensive.
Brian Marick 14:03
(return to narration) So perhaps a downside of a tool like J unit is that it becomes the hammer that makes everyone see nothing but nails. But that seems to have been temporary, at least in non-screwed-up organizations.
Brian Marick 14:18
I broached another possible problem, which was that the TDD approach – present an example that needs to be made to work – might have inspired people to think that the same thing should be done at the whole-system level. That is: tests running against the user interface should make up top-level goals for a user story.
Chris McMahon 14:38
No, no, no, I would not. I think that way lies madness. I actually work for an org right now that has automated end-to-end tests for every single user story. And it's madness. It's crazy. It is unbelievably... it's nonsense.
Brian Marick 14:56
Gosh, Chris seems to have some strong opinions. But why is it nonsense? And what does he do instead?
Chris McMahon 15:03
It is unbelievably expensive. In terms of both creating the tests and maintaining the test, it's just crazy pants. So, you know, we can we can talk about BDD and acceptance test driven development and stuff like that, but I think you can craft your your test criteria, you can craft your acceptance criteria to encompass entire swaths of user stories, right? You don't have to have a one-to-one correspondence of a user story to an end-to-end test. I think that way is crazy, I think you just will, you will crash and burn in very short order, if you try it.
Elisabeth Hendrickson 15:48
That's my experience as well. It just becomes so unwieldy so quickly. There is a tremendous amount of value in a small subset.
Elisabeth Hendrickson 15:56
One big win that I once got from a system level test automation kind of thing was to take the demo script from a piece of software that was being developed. And I just automated that. So it's not everything that a user could do. It doesn't even cover all the capabilities of the software. It was just if something's gonna go wrong [in the demo], that would make things really, really, really bad. What is the thing that would go wrong? That would be like the total nightmare? It's that the demo doesn't work anymore. And so I automated that. And there was big payback, relatively small investment.
Elisabeth Hendrickson 16:30
But trying to do that level of automation for absolutely everything...? It's not gonna yield the kind of benefits that people imagine. [It] takes too long to run, it's flaky.
Brian Marick 16:43
That Paul Czyzewski? People were always trying to get him to write test plans, test plans that were in some sense, systematic. And his response to that was, "why would I want to test where the bug isn't?"
Chris McMahon 17:01
I just quoted Wayne Gretzky the other day: "I try to skate to where the puck is going to be and not where it's been."
Elisabeth Hendrickson 17:09
I love "Why would I want to test where the bug isn't." And one of the big criticisms of xUnit-style tests early on, coming from the testing community, was essentially the Crocodile Dundee criticism, "That's not a test. *This* is a test." And it was very much "those tests are not going to actually find bugs. So they're not valuable." And I think that missed the value of TDD style, very, very fast test suites that do tell you to what extent does the code still meet the expectations that we have for the code? It doesn't tell you anything at all about the user experience. But it does tell you if you broke some fundamental assumption in the codebase.
Brian Marick 17:53
That's a lot about the tool. But what about the theory. Chris and Elizabeth said something I thought was interesting.
Chris McMahon 18:00
Elizabeth had a talk long ago, that I believe still exists, but I haven't tried to find it in a while. And the title was something along the lines of "A Place to put our stuff."
Elisabeth Hendrickson 18:10
It doesn't exist. It doesn't exist!
Chris McMahon 18:12
I'm so sorry. But it was brilliant. Because the thing is that, in the late 90s, in the early 2000s, there had been this talk of unit testing. But nobody really knew what a unit was, or nobody really knew what a test for a unit would look like. And what J unit did was, it brought people together with a consensus about "this is a unit. And this is what a test for a unit looks like." And I think Elizabeth was actually one of the first people to get really like hip to this to really sort of understanding that jUnit was going to be the consensus about what is a unit and what is a unit test, and so much followed from that.
Chris McMahon 18:12
Yeah, totally, you know, that talk was a five minute lightning talk at one of the peer conferences. It was "A Place to Put Things", and it was all about how jUnit gives you, you know, here's setup, here is your single little method that's going to test your one method. So it gave you a very clear place to put each of the things that you would need to. And by contrast, all of the test automation tools that I had used up to that point, up to the point where I had become aware of the xUnit family of test tools... All of them were just these gigantic, bloated, convoluted messes.
Brian Marick 19:43
In reaction. I'm tempted to reach for boundary objects again, but jUnit is maybe a bit hard edged to be one of those. Instead, I'm going to use this to foreshadow Galison's "trading zones," which are conceptual places where different cultures meet, typically using some sort of pidgin or creole language to smooth cooperation.
Brian Marick 20:04
As an example, let's go back to the early 1950s when there was a debate among particle physicists, whether tau mesons and the kappa particle were in fact the same particle. Theorists and experimentalists had to cooperate to figure this out. But they very much spoke different languages.
Brian Marick 20:22
To the experimentalists, the theorists' talk of spin, parity, pseudoscalars, vectors, and pseudo vectors was, to quote Elizabeth, "a bloated, convoluted mess." They came from a tradition that was pictorial rather than symbolic. They looked at pictures of particle tracks in, at that time, exposed photographic plates, and measured things like angles. They focused on so called "golden events," new shapes of tracks indicating new particles, with only very limited use of statistics. A guy named Dalitz came up with an ingenious way to plot individual events, a way that incorporated spin and parity without requiring experimentalists to absorb the details: they only needed to plot points and look to see whether they clustered or not.
Brian Marick 21:13
The Dalitz plot is an example of a pidgin language that supports cooperation. It is only as complicated as it needs to be. It's my hunch that the words associated with jUnit, and the procedures followed by programmers, were a pidgin. A lot of the bloated and convoluted nature of Elizabeth's test drivers was because of the inherent difficulty of testing through the user interface, be it native or a browser. jUnit worked at a simpler level, so it allowed a simpler language. And then that language fed back into the already-existing world of testing, and provided a simplifying structure that affected both tools and how test automators use the tools.
Brian Marick 21:56
Anyway, that's a theory.
Brian Marick 22:00
Back to the topic at hand. Last episode, I suggested that TDD versus non-TDD gets at fundamental issues of identity. Chris suggests that perhaps that doesn't matter.
Chris McMahon 22:13
The first time I saw unit testing... first time I saw test driven development: And it was brilliant. It was absolutely brilliant. The code was so usable, so malleable. So of course, I still found bugs, you know, there are still oversights, there were still, you know, things we didn't think of, there were bugs to be found, for sure.
Chris McMahon 22:34
But since that time, I have worked with test driven development. But I've also worked in places where unit tests were written sort of ad-hoc in the process of development. And I've also worked at places where unit tests were written sort of after the fact as documentation and a sort of a sanity-check regression test. And my experience is that all of this code is very high quality.
Chris McMahon 23:04
Test Driven Development is not the essential marker of quality code. In my experience, the fact that you have units that are testable at any point in the development process is the marker for quality code.
Brian Marick 23:22
However, maybe I can take some comfort that the kind of testing Chris does is heavily exploratory, which I associate with being reactive, and being always ready to take concrete feedback from events in the world.
Chris McMahon 23:38
I think it's really impossible to design a good end-to-end system test without doing exploratory testing. In order to design a informative, useful, automated end-to-end test, you have to understand the software stack, from the user interface down through every API layer down to the data store, whatever that might be, file, system, sequel, whatever. And then all the APIs that bring that back, display that in the user interface. So what I what I do, it's almost error guessing at this point, it's almost an intuitive sense of how to create these kinds of tests.
Chris McMahon 24:25
And what I want to do is make something that will navigate that path, all the way down, and all the way back, and then come back and tell me that everything is doing what I expect it to do. And in order to do that, in order to even create one of these, you have to explore all that other stuff. You have to dig into everything, the entire software architecture, from the user interface all the way down to the data store and back and explore all that and figure out what it's all doing. And then you can write a single test that, if it fails, will give you important information about a flaw in the system.
Chris McMahon 25:06
But it doesn't necessarily have to be fast. I don't really care so much about speed the way TDD people do. Because that's super-important, you have to run... your unit test suite will have to run in a second or less, because you need that feedback in that kind of timeframe. The sort of tests that I specialize in run much more slowly than that. But I still I want to... I want a test suite that runs in under 20 minutes is ideal. Under an hour is, you know, if I got a really complex situation, you know, under an hour [is OK]. I always run overnight, but I also want to run them on demand. And I want to be able to run a subset whenever I need them. So the design aspect of these end-to-end tests becomes super important, even once you have malleable, testable software.
Brian Marick 25:59
Toward the end of the interview, I asked my guests if they had advice for programmers,
Elisabeth Hendrickson 26:04
Oh, can I take this one first? Oh, please, open your mind to the possibility of things that you think are impossible, that maybe you think are unimportant. Learn how to do exploratory testing from the outside in. It's not not just user testing, it's taking that huge step back from the system that you're looking at.
Elisabeth Hendrickson 26:32
Brian, you told the story about our interaction and how you became defensive. And I remember sitting next to you and feeling really bad at the time. Because I didn't mean to burst your bubble, but but by stepping back, I was seeing your system in a way that I think you were not seeing it. And I see this over and over and over again, with developers who are just so inside the system, they're not looking at it from that external perspective.
Elisabeth Hendrickson 27:02
And so the advice that I would have is really to begin to adopt that beginner's mind, and look at the system that you're building from the outside as though you have never seen it, and then use all of those exploratory and analytical skills to discover the implications of what you've built.
Brian Marick 27:23
Because I'm weird, one of my standard pieces of advice is due to Oliver Cromwell: "I beseech you, in the bowels of Christ, think it possible you may be mistaken." Elizabeth here, I think, is expanding that to beseech you to consider that there are so many things about your software that you haven't even thought about. You haven't yet *gotten* to the point of being mistaken.
Brian Marick 27:51
Chris answered my closing question by looking at when a separate QA function is needed.
Chris McMahon 27:57
Yeah, I think, for my closing thought... [my] year at ThoughtWorks was really, really influential. It really influenced my thinking and my career. Thoughtworks specialized in projects [with] the combination of high complexity and high risk. And when you have a project that is both high complexity and high risk, that's when you really need a kind of a QA function, a system test function, because you simply cannot apprehend the project at a TDD level.
Brian Marick 28:33
That raises the question when might a project do completely without independent QA? Chris has two theories. They're not super relevant to all this highfalutin' Fujimura packages stuff. But they're still interesting and maybe useful to you.
Chris McMahon 28:50
The first example is if you just don't care. And there's a great example from this in the early days of continuous deployment. There was this company that made this silly little internet, like role playing game, you could get an avatar and you could interact with other avatars online in your browser. And, and they were, they were *vilified* because this company was a pioneer of continuous deployment. They were deploying all the time. And the software was like incredibly buggy. There were just glitches everywhere. I downloaded it a few times. And the glitches were amusing too, like you could walk through walls, and you could do all sorts of goofy stuff. But, uh, but it didn't matter. I mean, the glitches were kind of the part of the charm of the application. And... and it was simply a vehicle for advertisements. You know, there was no, there was no competitive nature to the game. And so there was a loud and vocal reactionary component in the QA world at the time, who were very much against Continuous Delivery because they couldn't test it in their traditional ways. They would point to this thing and say, "Oh, look at all the bugs. They didn't test it." And the rest of us who were doing serious work were like, of course not. Why would they?
Elisabeth Hendrickson 30:11
I remember the whole controversy. And I remember the people who were complaining about the glitches clearly did not have any idea the business value that had been delivered with the software and the extent to which the software met all of the business needs. And any additional investment in the kinds of activities that they were promoting, would not have earned the company one more dollar.
Chris McMahon 30:38
So the other way that you can release software... Now this is this is really interesting. The other way that you can release software with no testing is that if you are very, very, very good at reverting. And so this was the situation that I walked into, when I was hired to create the QA test practice at Wikipedia. So I walked into my first day of the job, Wikipedia, I'm like, "Well, so I understand that you need some software tested, where is it?" And they said, "Well, you tell us." I said, "Okay, well, where's your test environment?" And they said, "well, we don't have one." And I boggled. It's like, "How can you run Wikipedia...? You know, what, a top five website? How can you run Wikipedia for 11 years with no testing and no test environment?"
Chris McMahon 31:31
And it turns out that they had a culture of reverting code in prod, they had really, really top-notch system administrators, operations staff, and top-notch monitoring and observability of their... They own all their own servers, and they own all their own hardware. And they've got their fingers deep into second-by-second operations. And so for 11 years, what they did was, they would deploy code to prod. And if anything got weird, they would just simply revert. And this absolutely boggled my mind.
Chris McMahon 32:12
And so, so one of the the first thing that I did, and I did this with Zeljko Filipin, a brilliant, brilliant tester in Croatia. But one of the first things we did was we built a test environment for Wikipedia. And as you can imagine, it took a while. Wikipedia is a very, very complicated piece of software. And having a model of Wikipedia standing in as a proxy for production, about which you can reason, took us a while... took us a while to create this.
Chris McMahon 32:46
I still remember though, the first bug we found in it, when I knew I was on the right track. And I saw the bug, I showed it to other people, everybody saw the bug. And everybody said, "Oh, it's just you know, it's a test environment. It's just a test environment." So we released it, and it corrupted article headers on hundreds and hundreds and hundreds of Wikipedia articles that we had to fix later on. Took weeks to fix them all. So that's when I knew I was on the right track with this test environment: when we actually found a bug, identified it, mislabeled it, released it anyway, and then had to fix it. That's when I knew I was on the right track.
Chris McMahon 33:22
But yeah, you can absolutely release software with no testing if you're prepared to revert. And you know, when and why.
Brian Marick 33:30
Here, I pointed out that whereas Wikipedia could revert their code, they apparently, in this case, couldn't revert their data.
Chris McMahon 33:39
Yeah, that one bug was kind of an exceptional bug. Again, Wikipedia is a very well designed software. This is... this is not a trivial application, and they go through intense code review. But yeah, it was it was an unusual bug. And it happened to make it through all of our filters. But basically, because the the test environment was still novel and still faulty at the time, we didn't believe it was a bug until it actually got to production and corrupted a bunch of data.
Brian Marick 34:16
That's all. Thank you for listening. And next week, if all goes well, we'll be having James shore talking about how he's used the idea of boundary objects in his work.
Transcribed by https://otter.ai