Oddly Influenced | Transcript: E36: BONUS: One circle-style history of Context-Driven Testing

E36: BONUS: One circle-style history of Context-Driven Testing

August 3, 2023 / 47:52/E36 Download MP3

Welcome to Oddly Influenced, a podcast about how people have applied ideas from *outside* software *to* software. BONUS: The Context-Driven Testing collaborative circle

According to context-driven-testing.com, the context-driven school of software testing was developed by Cem Kaner, James Bach, Bret Pettichord, and me. I’d say it’s the only collaborative circle I’ve been a core member of. As such, I thought it might be another example to give you more of a feel for what a collaborative circle is like from the inside. So: this is another episode in a series based on Michael P. Farrell’s 2001 book /Collaborative Circles/.

My topics are: what the context-driven circle was reacting against; the nature of the reaction and the resulting shared vision; how geographically-distributed circles work (including the first-wave feminist Ultras and the Freud/Fleiss collaboration); two meeting formats you may want to copy; why I value shared techniques over shared vision; how circles develop a shared *tone* and stereotyped reactions, not just a shared vision; and, the nature of “going public” with the vision.

Farrell says that after a circle is successful in “going public”, it frequently falls apart. As context-driven-testing.com says, all four of us have gone our different ways, and – as Farrell says often happens – the “disintegration” (Farrell’s phrase) left bad feelings.

Farrell describes disintegrated circles that – as the former rebels become something like elder statesmen – get back together for reunion and reminiscence. I’d hoped that the four of us might once again collaborate and, to quote my email, “provide a sketch of a history, noting how it maps onto Farrell’s schema.” However, I seem to be on good terms only with Bret Pettichord, who doesn’t want to get into all that again, so you’re stuck with my solo reminiscences.

I end this story before the disintegration of the circle. I think a description of our circle’s end would have some value – perhaps add onto Farrell’s account of *why* collaborative circles that go public tend to end badly – I claim the end is foreshadowed by the beginning – but I haven’t worked out how to say it yet. My attitude, two decades on, is – I believe – one of bemusement. “Lord, what fools these mortals be” about sums it up, with “mortals” most definitely including me. That attitude is – let’s just say – not universally shared. I am not constituted such that I can shrug off claims of bad faith, and I don’t want to start those up again.

So: I offer you this analytical memoir. Just like any memoir, you should be extra-special wary of subjectivity and bias.

≤ music ≥

First, the – ahem! – context in which the context-driven circle appeared during the mid-1990s. In the beginning it was most closely associated with the “black-box” testing of retail or semi-retail software. The central example of retail software was the sort delivered on a floppy disk or compact disk, typically packaged inside a box secured by a plastic film shrink-wrapped around it: hence, shrinkwrap software. Think of Excel which, at the time, anyone with money could easily buy at an office supply store.

Semi-retail software had narrower markets. Think of tax preparation software purchased by accountants. Or scheduling and billing software marketed at hair salons.

Nowadays, the word “retail” goes naturally with software. But, at the time, that market was still fairly new. What most set it apart from previous generations of software was the sheer number of users and the lack of any central organization they belonged to. Military software was used by soldiers, but they were part of a hierarchy – and someone in the hierarchy could be assigned to speak for them. The same was true of, say, banking software. It might be developed in-house, or it might be built under contract, but there was presumed to be a single voice to say what to build.

That was not the case for software bought in an office supply store or shipped to beauty salons. Requirements – what customers are willing to pay for – became more unknowable. The sources of information were indirect: you could read reviews in magazines, but how much do reviewers know about what users really want? You could notice shifts in sales in response to competitor moves, but it’s not easy to understand the reasons. And so on.

The status quo on how software should be developed was slow to adapt to the new market, which made various people grumpy in the way that the Impressionists and first-wave feminists were grumpy.

When it came to testing, “black box testing” meant that the software was tested from the outside, typically with no or little supportive instrumentation inside the code. To a first approximation, think of sitting down to a copy of Excel, clicking with the mouse, typing, and checking what happened. There were tools to automate that but they were expensive and flaky. In addition, they were very fragile in the face of changes to the user interface and, sometimes, to internal changes that had no effect on human use. Consequently, a lot of testing was manual.

I think it’s fair to say that the software testing status quo believed more strongly in last episode’s document-centric software engineering than did the construction side – that is, the programmers and their management. If you didn’t listen to that episode, it’s enough to know that development was to be done by producing, roughly in order, successively more detailed documents. A typical set of documents would be the requirements, the specification, the architectural design, various detailed designs, and then the code – which was the most detailed.

To see how testing fits in, imagine a rectangle labeled “requirements”. Now imagine another labeled “specification”. Place that one below and to the right of the “requirements” rectangle. Now do the same for each of “architectural design”, “detailed design”, and “code”. If you were making a presentation, you could decorate the resulting staggered row of rectangles with an image of water flowing from the “requirements” rectangle onto the “specification” rectangle and then onto the “architectural design” and so on. That decoration was not uncommon, because this was often called “the waterfall model”. That’s not a realistic name because water going down a waterfall doesn’t reverse course to revisit a previous step, while it was not unusual for decisions made in earlier documents to have to be revised because of later discoveries.

If, instead of water, you draw a straight line through all the rectangles, you have the left-hand leg of the letter “v”. You can now make the image symmetrical by drawing the other leg of the “v”. Place a new rectangle across from the “detailed design”, at the same height, and label it “unit tests”. Across from “architectural design” put “integration tests”. Across from “specification” put “system tests” (those are the black box tests), and across from “requirements” put “acceptance tests”.

Now, isn’t that a pretty mental image? It was called the “V model for testing”. The idea is that whenever one document on the left-hand leg is approved, testers can start work on a corresponding test design document. The core content of that document is the specific tests to be run. These tests are ideally complete enough that any normal user of the software could follow them precisely. Essentially, a human being – call them a tester – was turned into a computer running a program. For example, I once talked to a person who tested police radios in a radiation-sealed room. He had a big stack of paper, one test script per page. He’d take the top page, turn the knobs the way it told him to, and make the checks it told him to check. Then he’d put the page to the side and do the same thing with the next page. All day. Five days a week.

Now, testers couldn’t actually run their tests until at least a good chunk of the code was finished. Until then, they could keep busy writing test plans and test scripts (or doing the manual script-following for some other project).

In my experience with shrink-wrap and early web companies, the acceptance tests and integration tests were fairly hand-wavey. My impression is that the acceptance tests were a subset of the black-box system tests with maybe more emphasis on fairly complicated user scenarios. And, as is still the case, nobody agreed on what integration tests even are. So what the testers did were the black-box system tests. The programmers may or may not have done unit tests – more often “not”, I’d say – but those were essentially invisible to the testers.

While the V model is all pretty and symmetrical, it was frustrating for everyone.

The testers were to design their tests from the specification. Except: the specification often started out not so great and degraded from there. As the programmers did their design and coding, they’d discover errors and omissions in the specification and seek clarification. When they got it, would they hasten to update the official specification the way that they should? They would not. It didn’t provide them any benefit – they knew the answer without it – and the project was always already behind schedule.

It was the *testers* who needed the updated specification, so they were put in the position of whining that people weren’t following the process, which is always a sure way to gain popularity.

Moreover, testers were low status. I’m reminded of a story my wife, a large animal veterinarian, told me. She was treating a client’s llama. This client was a human physician. As he watched her work, he remarked something like, “You’re really good. You could be a real doctor.”

That went over well.

She retaliates, to this day, by referring to what such doctors do as “single-species medicine”.

You see, everybody’s prone to lording their profession over everyone else’s. Sometimes, you’ll be shocked to hear, even programmers do that. Testers couldn’t program, so what good were they? Seeing testers do rote work – like that police radio tester did – didn’t help. Testers were also more likely to have degrees in “worthless” fields like the liberal arts: history and the like. (Yes, that prejudice, which today I most associate with “tech oligarchs”, has a long, *long* pedigree.)

Moreover, these low-status testers were constantly finding fault with other people’s work, another sure-fire way to win friends. Moreover, because the systems weren’t generally built in a way that facilitated early testing, the bulk of testing happened close to the deadline. That meant a late flood of bugs during the dreaded “stabilization phase”, that period during which the product was “feature complete” and all that’s supposedly left is bug-finding and fixing. It was easy for testers to become the villains that caused the schedule to slip. Not logical – after all, it wasn’t the testers who put the bugs in the product – but easy.

As a result of all this, testers tended to come in two categories: the beaten down and the defiant. The people who formed the context-driven circle tended toward the defiant side.

Let me give one last example of the plight of testers. Let’s suppose there’s a project in the stabilization phase. There are a few automated tests but a larger number of manual scripted tests. The programmers are busily fixing bugs. In the process, they’re sometimes breaking things that used to work.

To find those new bugs, tests are repeated. Sensible, right? (if expensive). Suppose the product finally ships. It will ship with known bugs – that was simply accepted as inevitable in those days. Suppose, though, that a really serious bug is discovered after it ships. That was a big problem in the days before users could just download an update.

There might be a postmortem to find out what went wrong. A test manager might be asked why the bug wasn’t caught. Here are two possible answers:

1. the bug wasn’t caught because there was never a test written that might have caught it.
2. there *was* a test that could have caught it, but the last time it was run was before the bug was introduced.

From a purely economic point of view, I’m hard-pressed to think there’s much of a difference between the two. But from a *blame* point of view, the “never retested” case is much, much worse: “Wait. You’re telling me you had the test that could have saved us many tens of thousands of dollars, but you *never got around to running it*?”

It’s much easier to defend never having had the test. For example, you could remind people that the specification was never kept up to date, which meant you lacked the information to design the right tests. You could point out that you’d been complaining about understaffing since the very beginning of the project. Or: there’s an immense academic literature about how many tests would be required to “completely cover” the code, so gosh darn it you were doing the best you could. (Fortunately, only a few marginal academic testing researchers – and me – pointed out that the combinatorial explosion of possible tests has little to do with actually finding bugs.)

≤ music ≥

The classicist Edith Hamilton characterized the ancient Greeks’ understanding of happiness as “the exercise of vital powers, along lines of excellence, in a life affording them scope.” In less flowery terms, moderns prone to join collaborative circles want to do work that draws on their special talents, they want to do it well, and they want to be limited only by their own abilities. Testing, as a profession, wasn’t that.

Although Farrell emphasizes that collaborative circles are collections of peers, he at the same time gives examples of circles that form around – or near – a somewhat older, somewhat more established person that the circle starts out seeing as – not exactly a mentor – but as someone to look up to. For the Impressionists, that was Edouard Manet (not Monet), who had caused a scandal with the painting now known as Dejeuner sur l’Herbes, which was considered outrageously crude and unfinished and garish at the time. For the feminist Ultras, that person was Quaker minister Lucretia Mott. To the Quakers of that time, the word “minister” didn’t have connotations of a job or authority, but rather acknowledged someone with a gift for spoken persuasion. (The Quakers were unusual among Christian congregations at the time in that they allowed women to speak in church.) Mott had been speaking against slavery for some time when the younger pre-Ultras began to see the need to take anti-slavery and anti-drunkenness techniques – like speech-making – and apply them to women’s issues. So Mott became something of an older mentor.

For the context-driven circle, the equivalent figure was Cem Kaner. His 1993 book (along with Jack Falk and Hung Quoc Nguyen) titled /Testing Computer Software/ was the equivalent of Dejeuner sur l’Herbes. It was based on his experience as a tester and test manager for retail software. Right on the first page of text, it had a clear statement of purpose, highlighted prominently, that rejected a key part of the status quo: “This book is about doing testing when your coworkers don’t, won’t, and don’t have to follow the rules.” (Like: following an up-to-date specification.)

That shot across the bows was swiftly followed, in chapter two, with “The purpose of testing a program is to find problems in it”. It’s not to make sure you’ve tested everything the specification says – in the jargon, that you’ve “covered” the specification. It’s not to rerun an old test if a new test is more likely to find a bug. (And new tests almost always are.) As Paul Czyzewski, one of the best exploratory testers I’ve ever worked with, once said in a mock-puzzled way, “Why would I want to test where the bugs aren’t?”

The book immediately followed “finding bugs” with another goal: “The purpose of finding bugs is to get them fixed.” That very practical attitude was something of a novelty. I’d been involved in open-source software before open source was cool – for real, I released my first open source app in 1992 and had been the maintainer for Emacs on the Gould PowerNode hardware since sometime in the 1980s – and I was used to programmers writing little essays on “how to report a bug so that it gets fixed”, but those were directed to other programmers. I’d never encountered similar advice directed at *testers* (though I’m sure some existed).

This book was the seed for the context-driven school.

I don’t think any of Farrell’s circles had as much of a ready-made shared vision so early. The first-wave feminists come closest. Many of them got their start in the anti-slavery movements, and analogies between the plight of the slave and the plight of women were ripe to be plucked.

That’s not to say that the shared vision didn’t evolve. It did, via the conversations of the circle. Which brings us to process.

≤ short music ≥

I joined the circle via the swtest-discuss mailing list. Because it was an open-enrollment mailing list, there was less of the membership filtering that Farrell assigns to the Gatekeeper role. (I’ll have more to say about roles in another episode.) I recall Kaner acting in what I think of as a combination mentor / gatekeeper / charismatic leader role. Certainly, I came to the mailing list with a lot of habits of mind borne from my experience as a programmer-tester, and he exceedingly patiently explained that there was more in heaven and earth than was dreamt of in my philosophy. Later the context-driven circle shifted its focus to an invitation-only mailing list that was much closer to the sort of meetings that Farrell describes. And there was also direct mail between all or subsets of the core members.

Most of Farrell’s circles were of people who all lived in the same place, but not all. The Ultras all lived in the Northeastern US, but travel was much more difficult than it is now, so much work was done via letters. Freud and Fleiss had a multi-year written correspondence. When the context-driven circle formed, Pettichord lived in Texas, I lived in Illinois, Kaner lived in Silicon Valley, and I believe Bach lived in Seattle. As Zoom didn’t exist and teleconferencing was still an awkward technology, most work was done via email.

I think the circle could not have formed in our post-email era. Email afforded two things: it allowed people to explain their ideas at some length and, importantly, the habit of what was later called “fisking” allowed conversations to have the detail and depth of, say, the Fugitive Poets’ reviews of each other’s poems. “Fisking” is blogger slang meaning a line-by-line rebuttal. In email, you do it by quote-replying the entire previous email, then proceeding through it, typically paragraph by paragraph, commenting inline just after individual quoted paragraphs, or deleting quoted paragraphs you don’t feel like addressing.

I think all the core members were, from the start, independent consultants or coaches. If not, they became so fairly quickly. As such, a lot of our personal marketing involved giving talks at testing conferences two or three times a year. Both the Quality Week and Software Testing, Analysis, and Review conferences were unusual in that they distributed printed proceedings, so giving a talk usually also meant writing a paper. And after the paper, there was the hallway track, if some of us were attending the same conference. There were also some opportunities to publish in the less academic magazines like IEEE Software or SQE’s Better Software. Trading drafts of such papers also helped formulate the shared vision.

Looking back on those papers, I’m struck by how well they show that, to quote Farrell, “each person’s work is an expression of the circle’s shared vision filtered through his or her own personality”. Bach’s 1995 article “Enough About Process: What We Need Are Heroes” and my 1999 article “New Models for Test Development” both express discontent with the document-centric approach, but – rereading them for the first time in many years – I was struck by how *characteristic* each is of its author. Bach’s is a self-assured call to action; mine is a typical Marick dissection of the implications of a model, plus a tentative suggestion of the “shape” of a replacement. (I doubt it would surprise anyone to find the author of “New Models” went on to host this podcast.) Bach’s article promotes an *attitude*; mine attempts to convey *information*. That’s not to say Bach doesn’t care about conveying information – he very much does: it’s notable that his shorter article has more citations than my longer paper. But I attempt to persuade through plodding analysis, and Bach attempts to persuade through, well, *persuasion*. I also noticed that – again, I think characteristically – Bach talks in terms of gains from doing things better, and I talk more in terms of choosing among costs in an almost economic way.

Another interesting pairing is Pettichord’s 2000 article “Testers and Developers Think Differently” and Bach’s 2001 “Explaining Testing to THEM”. I’d say both articles clearly come from people who have a shared vision about what testers *are* and what they’re *for*, and how they’re commonly misunderstood. But it seems to me Pettichord’s article is closer to mine: a relatively systematic laying-out of claims. It’s a “what” article. Bach’s article is about *how* to persuade a non-tester – a good part of it is an annotated hallway conversation.

(Note: links to all four articles I just mentioned are in the show notes. I’d be curious to know if you think they support the contrasts I make, or if I’m reading something into them that’s not there.)

I belabor those articles because they remind me that you can rarely separate content from form. It’s easy (at least for me) to think of Farrell’s “shared vision” as being a set of propositions, like the four values of the Agile Manifesto or the seven principles of the Context-Driven School, but it also contains things like tone, style, default stances, and habitual reactions. During the development of a circle, those become shared just like the vision of what to work on is shared. Let me give an example.

The first two principles of the context-driven school are:

1. The value of any practice depends on its context.
2. There are good practices in context, but there are no best practices.

It’s not clear why both of those are needed, unless you know history. At the time of the context-driven circle, there was a lot of talk about “best practices”, not just in software. It seemed to be an era in which “The One True Way” had unusually strong appeal.

Certainly all of us agreed that “Best Practice” (implicitly independent of context) is a silly notion. The question was, what to do about it?

My approach in consulting was to avoid the issue, reasoning that usually it’s sloppy language, not so different from “speed limit”. There is no actual *limit* on speed. It is, instead, a number that lets the experienced driver know roughly how fast you can go before there’s some unacceptable (to you) risk of being pulled over by police, all things being equal. But all things aren’t equal. I exceed the speed limit on Chicago-area highways by a greater margin than I do where I live in sleepy downstate Illinois. That’s because if I kept to my normal 7ish miles-per-hour over the speed limit, cars would be zipping past me on both sides. That’s pretty good evidence that driving faster won’t get me a ticket. And also, I’m inclined to credit the folk wisdom that it’s safest to drive at the prevailing speed.

So my answer to the “what to do about it” question was “ignore it”. In the relatively rare case that I was working with someone dogmatic about a best practice, I found it most effective to divert to talking about the details of implementing that practice on this project, considering especially risks and constraints. Very few people think there’s *no* room for customization.

Bach, I think, feels (or felt) strongly that you have to get your thinking on the big picture straight, else you will continue to err and err and err. From that point of view, my approach is a cowardly evasion: I leave my partner still under the illusion that “best practices” are a real thing. So what happens in their next project, and the next, when I’m not around to trick them into being driven by the context? Which is a fair point.

Bach’s approach won out in the context-driven circle. Its style on this and similar issues was to come out swinging, which in turn heightened the felt importance of the issue, which is why there’s an explicit principle about “best practices” on context-driven-testing.com.

It might have been better for us four to talk more directly about such matters of style and default reaction, rather than mostly talking about techniques and propositions. Because I believe this and other default reactions helped lead to the rupture between context-driven-testing and Agile, and thus to my ejection from the circle. More about that in a hypothetical later episode. For now, back to process.

≤ short music ≥

Both Freud/Fleiss and the Ultras found that writing isn’t enough. Even if infrequent, in-person meetings seem to be essential. Freud and Fleiss had a yearly “congress” in some place distant from their respective home bases. There, they spent a few days hiking and talking. “Throughout the rest of the decade, the ritual of the congresses had an energizing effect on Freud. He repeatedly reported that his ability to write was restored after a meeting with Fleiss. On numerous occasions, Freud returned from their meetings in an almost manic state and poured out his thoughts in writing.”

Something similar happened with the Ultras. In 1848, Lucretia Mott (along with her husband) attended the Genesee Region Quaker Yearly Meeting in Western New York, and then they stopped by to visit Lucretia’s sister, Martha. While there, Jane Hunt (another Quaker) invited them to tea in the nearby town of Waterloo, along with two other Quaker women who’d also been to the Yearly Meeting.

This would prove to be a consequential “tea”.

One more person was invited, who proved crucial. Lucretia Mott had met Elizabeth Cady Stanton when the two of them were attending the World Anti-Slavery Convention (the one I mentioned last episode, where Mott and the other women were allowed to attend, but only if they kept quiet and stayed behind a gossamer screen so as not to distract the menfolk). Mott and Stanton had kept in touch, so Mott knew how Stanton had been suffering since moving from Boston (where there were reformist people and groups) to small Seneca Falls, New York. As Stanton put it, she “now fully understood the practical difficulties most women had to contend with in the isolated household, and the impossibility of [a] woman’s best development if in contact, [for] the chief part of her life, with [only] servants and children”. Mott prevailed on Hunt to invite Stanton to the tea.

No one knows exactly how the “tea” went. Because the Quakers were reformists, interested in the plight of local Native Americans and also the escaped slaves who’d settled in the area, I like to imagine the conversation was high-minded but ladylike until Stanton let down her guard. “My experience at the World’s Antislavery Convention, all I had read of the legal status of women, and the oppression I saw everywhere, together swept across my soul, intensified now by my personal experience. I poured out […] the torrent of my long-accumulating discontent, and with such vehemence and indignation that I stirred myself.”

She stirred the others, too, and her discontent “moved us all to prompt action, and we decided, then and there, to call a Woman’s Rights Convention.” They advertised it the next day, giving only five days’ notice. Surprisingly, it was well-attended and went well, and first-wave feminism became a thing. There were more conventions, and more in-person meetings, to follow.

In the context-driven school, the equivalent remote get-together was the first Los Altos Workshop on Software Testing (abbreviated LAWST), organized by Cem Kaner and Brian Lawrence, an experienced meeting facilitator. It, too, was a peak experience, was a prototype for meetings that happened around twice a year for some years, and spawned similar get-togethers like Pettichord’s Austin Workshop on Test Automation.

The way I characterized the first LAWST to myself was that there were these tools called “capture/replay” tools. They were sold as a way for a manual tester to execute a test script and check what they saw in the normal way, but also capture some form of snapshots of the display along the way. Think of the benefits:

* If the tester found a bug, the programmer could rerun the script through the tool and *see* the bug happen.
* If not, then the script could be automatically rerun at periodic intervals. It would re-do what the tester had done, take snapshots at the same points, and compare those to the “known good” snapshots – all much more cheaply than having a person do it.

Some early versions of such tools took the “snapshot” idea literally and captured a bitmap (a pixel-by-pixel image) of the screen. That worked as well as you’d expect: when some clown moves the login prompt two pixels right, all the tests that require login start failing. Some tools grew “fuzzy comparisons” for bitmaps. Others used operating system features to query the underlying “widgets” corresponding to buttons, text boxes, and whatnot on the screen (much the way you can do that today with any web browser’s developer tools). But those also had problems: for example, I remember one person reporting a case where something like a login widget got relocated to some ridiculous X,Y coordinate, way away from the visible screen. All the tests passed – because they talked directly to the widget – but this time it was real users who couldn’t locate the login box, which was discovered embarrassingly late.

The idea of LAWST 1 was to get a bunch of people with expertise with such tools together and have them talk about their bes- I mean, about practices that worked well for them in their context, and so the state of the craft would be advanced.

What happened – again in my retelling of the story to myself, so maybe overdramatized – was that each person had been having trouble getting value from the tools and was hoping some *other* expert would have found the secret to making them work. But no one had. It was a real “Oh, you too? I thought I was the only one!” moment.

And it led to the publication of a position statement (linked in the show notes) that essentially said, “Look, we’re a bunch of experts in this topic (and also there’s that Marick guy who knows practically nothing about it), and *we* can’t make these tools work. A big rethink is in order.” It was influential in an “emperor’s new clothes” sort of way.

≤ short music ≥

The LAWST workshops were highly structured meetings for 15-20 people. The show notes point to two process descriptions, but here are my highlights.

A fairly narrow topic is to be discussed. The first day is devoted to “war stories” or case studies relevant to the topic. After the end of the first day, someone produces a list of propositions for the second day. They are discussed and voted on. In the later workshops (not the first), consensus was required for a proposition to make it into the final, published statement.

People sign up to tell war stories during the meeting. A war story is, I think almost always, someone’s direct experience. Most of the time is spent on questions, and there’s no time limit. Questions are at first restricted to “clarifying questions,” intended to get the experience understood as well as possible. Next comes what I remember being called “open game”, in which challenges are offered. I liked to think of this as being mostly about understanding context: “Yes, that worked for you, but would it work in my situation, which I will now describe?” For example, in contrast to most software conferences, someone from Google would have had to spend a lot of time talking about which of their solutions only make sense at huge scale.

There is a moderator who calls on people to speak, favoring those who’ve spoken less. There are hand signals to indicate “I want to speak on the current subtopic” or “I want to start a new subtopic”. There’s a recorder who takes notes on big sheets of paper taped to the walls. (This is surprisingly effective at reducing repetition from people who aren’t sure they’ve been heard or understood correctly.)

I want to emphasize again that groups have a particular tone. Testers are *all about* probing for mistakes, so the challenging questions feel natural. (Indeed, the distinct “clarifying questions only – and I mean *only*” phase is probably necessary to keep people from jumping immediately into critiques.) To contrast, a different meeting style was imported into the patterns community from creative writing, where the ethic is less “if you can dish it out, you better be able to take it”, where personal vulnerability is more tolerated. I like that format, too, so let me summarize it.

The “work product” was a software design pattern that describes a solution to a common problem. Patterns need to be understandable, persuasive, and adaptable to new situations. In what’s called “a writers’ workshop”, a particular writing is distributed in advance and the participants are expected to have read it beforehand. Unlike the LAWST format, the author does not participate. The catchphrase is that they are to be “a fly on the wall”, listening to others discuss their text. After all, in real life, an author can’t scurry around to all their readers and say, “no, what I really meant was…”

Like the LAWST format, there are two phases to the discussion. In the first phase, people say what they *like* about the piece. When I’m introducing this meeting format, I say something like, “Look, you’re going to be suggesting a lot of changes soon. This is your chance to say what *works*, so that the author won’t delete it or break it while making changes.”

Yes, the praise is sometimes awkward – “most of the words are spelled correctly!” – but people get better with practice.

The second phase is criticisms. In the variant I’m most familiar with, they are typically phrased as “suggestions for improvement”. That is, they are of the form “thing X that the author does doesn’t work for reasons A, B, and C, so I’d suggest trying Y instead”. This is a bit controversial: noticing a problem doesn’t necessarily mean you have any useful insight into a solution. And if you can’t think of a solution, you should – what? not say anything? On the other hand, a proposed fix often really clarifies what the actual problem is, and it reduces the author’s defensiveness. The vibe is of people helping each other, not of testing to destruction.

Anyway, that’s another format to try. I introduce it because different groups work better with different meeting structures. I once tried to introduce the patterns style into a context-driven testing get-together. It was a flop, partly for individual reasons – some people don’t *like* being a quiet fly on the wall – but partly because a suggestion for how to fix a bug is, for a tester, explicitly *not part of the job* and, unless you carefully build up your credibility, likely to be counterproductive. Different cultures, different formats.

≤ music ≥

So let’s talk about “going public”. Going public means pushing the shared vision and the techniques out into the world. “Whereas in the previous stages, the objective was to create a new vision, in this stage the objective becomes to win support for the vision from authorities and from the public. Up to this point, the journey has been inward to mine the creative process; now it turns outward.”

The centerpiece of going public is the group project. For the Impressionists, it was staging their own public show, competitive with the Salon that had rejected them. For the Fugitive Poets, it was publishing a literary magazine, called “The Fugitive”. For the first-wave feminists, it was public conferences and speeches. For the context-driven circle, it was the publication of Kaner, Bach, and Pettichord’s 2001 /Lessons Learned in Software Testing: a Context-Driven Approach/, which is a blend of vision statement, high-level techniques, and some detailed techniques. There are 293 lessons in a 286 page book, so necessarily a lot of the techniques are of the form “develop a technique to address concern X *in your context*, perhaps considering A, B, and C.” It is the cumulative effect of those listed concerns that works best to convey the vision, in my opinion.

To give you a feel for the book, here are the headings on page 84 and 85 in the chapter on Bug Advocacy. (Remember that, according to Kaner et al in 1993, “The purpose of finding bugs is to get them fixed.”)

Lesson 85: Report the problem clearly, but don’t try to solve it.
Lesson 86: Be careful of your tone. Every person you criticize will see the report.
Lesson 87: Make your reports readable, even to people who are exhausted and cranky.

That last contains a bullet list of suggestions like “Number each step”, “Use whitespace to make the reports easier to scan”, “Indicate what happened and what you expected to happen”, and “Don’t make jokes; they’ll be misunderstood.”

I love that practicality and engagement with how real people really work. Another good example of that is Jonathan Bach’s 2000 article “Session-Based Test Management.” The article is about exploratory testing, which had come to be a major concern of the context-driven school. Exploratory testing is frequently defined by saying that, in it, test planning, design, and execution are near-simultaneous activities. That is, you prototypically design a test just before you execute it, and the results of executing a test can influence both the design of the next test and your overall plan for testing the software you’re working with. (People familiar with test-driven design will recognize the dynamic.) The design is typically in your head, supplemented with notes: there’s not an expectation the test will ever be repeated in exactly the same way.

Here’s a bit of the introduction of the article:

“Unlike traditional scripted testing, exploratory testing is an ad hoc process. Everything we do is optimized to find bugs fast, so we continually adjust our plans to refocus on the most promising risk areas; we follow hunches; we minimize the time spent on documentation. That leaves us with some problems. For one thing, keeping track of each tester’s progress can be like herding snakes into a burlap bag. Every day I need to know what we tested, what we found, and what our priorities are for further testing. To get that information, I need each tester on the team to be a disciplined, efficient communicator. Then, I need some way to summarize that information to Management and other internal clients.”

The technique described serves several purposes, but I want to highlight the one that calls back to the article James Bach, Jonathan’s brother, wrote, the one titled “Explaining testing to THEM”. Here, the “them” is, in particular, management. The raw materials of software development include people, their actions and reactions. Ignoring that would be like Impressionists ignoring the properties of paint and canvas. My bias is that any fool can make a vision; what matters is: can you make *techniques* that *realize* the vision?

(Speaking of techniques, if you’re interested in exploratory testing, check out Elisabeth Hendrickson’s 2012 book /Explore It!/. Hendrickson joined the context-driven circle later. *I* kind of think she became a core member; *she* kind of thinks she didn’t. Eh, there aren’t really hard boundaries between “core” and “periphery” in collaborative circles.)

Another aspect of going public is branding. If a circle hasn’t already gotten a proper name – a capitalized name – like “Impressionists” or “Fugitive Poets”, they almost invariably get it now. (I say “almost invariably” because – as far as I can tell – the feminist Ultras didn’t use that name externally. The public knew them as one notable group of the lower-case “suffragettes”.) The phrase “a context-driven approach” appears as the subtitle of the /Lessons Learned/ book, but I don’t remember how much before then it was coined. I know it was an explicit choice and topic of discussion, not just something one of us picked out of the air while the others shrugged and adopted it.

Farrell also says “the circle takes on features of a formal organization. For example, the Impressionists wrote a formal charter for their group during this stage.” It’s not a charter, but the publication of the short-and-catchy Agile Manifesto seems typical of the “going public” stage, and it pretty directly led to the creation of a formal organization, the Agile Alliance. The context-driven testing school put out its seven principles around the time of /Lessons Learned/ (I think) – that is, 2001 – and the Association for Software Testing followed not long after (2004, with its first conference in 2006).

≤ short music ≥

Here’s what Farrell says about the end of collaborative circles.

“As the members of a circle develop their skills and expertise in a discipline, they become less dependent upon one another for support. They may begin to receive recognition outside of the circle. Some may feel that, rather than freeing them to be creative, the circle’s vision is now constraining them. If they are highly ambitious and seeking individual recognition, they may even decide to sharpen the differences between themselves and other members of the circle. First some members then others may break away from the circle and attempt more individualized projects. Those left behind may feel abandoned or betrayed, and those who leave may feel guilty for their “disloyalty.” Earlier, ideas seemed to flow out of the members' interactions, and the ownership of ideas was unimportant. But during the separation stage, members become concerned about who gets recognition for what. Concerns about plagiarism and equity in recognition may divide the group further. As the negative feelings accumulate, the disintegration of the group accelerates.”

As I said in the introduction, I’ve been wrestling for a good while with how to talk about the disintegration of the Context-Driven circle. I don’t know about all possible pairwise combinations of its four original core members, but at least four pairs are characterized by strong negative feelings, and I’m only sure of one cordial relationship. I myself was accused of betrayal and something just short of plagiarism (by taking someone else’s ideas as my own). Which I deny. Were dueling still in fashion, we might have had a scene like this, from Patrick O’Brian’s /Post Captain/.

“You have said enough, sir”, said [Brian], standing up. “Too much by far: you must withdraw.”

“I shall not withdraw,” cried [Person X], very pale. […] I will stand by [it], and I am perfectly willing to give you any satisfaction you may choose to ask for.”

An interesting character and plot development in that novel, but gossip in the real world. The world has enough gossip, but there are aspects of the disintegration that I think might enrich what Farrell says about the topic.

I will think more on this. I’d welcome your opinion. In the meantime, thank you for listening.

E36: BONUS: One circle-style history of Context-Driven Testing

Broadcast by

headphones Listen Anywhere

Listen Anywhere