|
eWorkshop on Software Inspections and
Pair Programming
CeBASE and Visek
conducted a joint eWorkshop on December 16, 2003, to discuss the
practices of pair programming and software inspections. Although in
many ways dissimilar, both practices have the common aim of
supporting the development of quality software, with minimal
defects, through structured collaboration among
developers/reviewers. In fact, one of the motivating factors behind
the development of pair programming was to increase the
effectiveness of code inspections by moving them earlier in the
development lifecycle and doing them “all the time.” We were very interested in investigating
whether pair programming (PP) succeeds in its goals of providing the
same or improved benefits as inspections, with what cost, and in
general whether the two practices were complementary and under what
circumstances they each made the most sense.
Tom Gilb may have summed up the goals of the discussion best
when he said, “My position is that they are two different and
complementary techniques. We need to understand their costs and
benefits quantitatively, and their best practice modes” [78, agreed
to by Denger, Basili, Arisholm, Wiegers]. To achieve this, we were
happy to have the input of a very lively set of over 20 participants
from 5 different countries and 6 different time
zones.
Summary: There was a lot of good discussion but little consensus
among participants. Although many suggestions were made as to
features of both practices, few participants supported or refuted
each other’s statements. Our discussion seemed very effective at
raising the important points of comparison between the two
practices, but there may not yet be enough data or experiences to
permit a useful evaluation along all these dimensions. There was at
least an informal consensus that the two practices should be
complementary rather than exclusionary, under the right
circumstances, but more experience is necessary to refine their
exact contributions.
Details:
During the
discussion, the following important points were raised as areas of
comparison:
Effects on quality (defect
slippage)
One of the main claims of both PP and inspections is that
they raise the quality of the product. One direct measure of quality
is the number of defects in a product.
- Participants agreed that both
practices focus on improving quality, which in this discussion
seemed to be measured by the number of software defects that slip
through the practice.
- Premeeting feedback summarized by Dieter
Rombach seemed to indicate that inspection allowed higher quality
(lower defect slippage) to be achieved, but at a higher cost than
PP. Conversely, PP could achieve some level of software quality
very easily, but it was very hard to get very high quality using
this practice.
- This was one question for which
there was a large body of data, at least regarding inspections.
Inspection benefits are well documented [Gilb 197]
especially with respect to reduced defect slippage [confirmed by
vote 260]. There is even data regarding which types of defects
inspections are better at detecting [confirmed by vote 269], for
example: Inspections are effective for finding defects such as
programming blunders,
logic errors, interface errors, and omissions, but not so
effective for errors of timing, program dynamics, and numerical
approximation [Boehm 242].
- In contrast, while PP is
commonly hypothesized to lead to lower defect injection
rates and reduced defect slippage [confirmed by vote: 295], PP benefits seem to be not well
documented especially with respect to reduced defect slippage
[confirmed by vote: 273]
- There was some evidence that
PP has a positive effect on other types of quality than
correctness, in particular, that it leads to better maintainable
code. According to a small
pilot experiment conducted in industry, pairs did produce
solutions that were assessed to have better maintainability than
solo developers [Arisholm 192].
Feedback
cycle
The feedback cycle is the amount of time between
committing a defect in software development, and detecting and
removing that defect.
·
Participants argued that PP has a shorter feedback
cycle.
o
This is a strength because the feedback has a greater
‘present value,’ i.e. it’s more cost effective overall to correct
something as soon as it enters the system, than to correct it once
it has spent some time in the system and possibly led to further
problems [Krebs 111, Gilb 139, Ambler 164, McConnell
314]
o
As a result, the feedback cycle is “much more personal and
individual” [Manzo 166]. [However, it wasn’t clear if this is a
strength or a weakness: People may learn better when learning from
their own personal mistakes, or it may slow down team learning by
relying on personal
feedback.]
·
Inspections have a longer
feedback cycle than PP, which is a weakness
because:
o
Inspection might wait until “too
much damage is done,” while PP gives immediate feedback during
development [Gilb 139].
o
Moreover, “if people
think the work product is done, they can be psychologically
resistant to making changes suggested by inspection.” Therefore it’s
better to remove defects quickly, when it’s clear the product is
still under construction, either by PP or by early incremental peer
review [Wiegers 399].
·
The implication seemed to be
that this weakness of inspections was due to the fact that
inspections have an associated significant cost and time
requirement, and hence have to wait until a product is stable [Rumpe
378], while PP can be applied earlier in the development
process.
·
There are ways to mitigate the
long feedback cycle in an inspection: A good heuristic is to start
inspections as soon as 10% of the document is available, rather than
waiting until the whole document is done at which point it may be
more costly to repair all the defects. [Wiegers 159]. [However, this
bears the risk that the review might address defects that are no
longer important in a further iteration of the product (when another
10% of the document is done, the defect might be resolved
anyway)]
·
Because the PP feedback cycle is so short,
there was a side discussion about whether to describe the PP
contribution as very fast defect detection and removal, or as defect
prevention.
·
Some participants focused on one person in the
pair detecting his/her partner’s defects: “PP… shortens the cycle
time in detecting the defect to practically zero because the other
person in the pair sees the error quickly” [McConnell
314].
·
Other participants focused on PP as a collaborative effort:
“When a pair works together, they brainstorm and negotiate quite a
bit (for a few seconds/a few minutes) before the keyboard is
touched. I feel that is defect prevention” [Williams
321].
Third party
perspective
The third party perspective is an additional
view on the document under inspection; that is, people who are not
directly related to the document under inspection provide another
view on the product’s quality.
·
The premeeting feedback indicated that a
perceived strength of inspections is that they can be more objective
because they provide a third-party perspective, i.e. they provide
feedback regarding system quality from technical personnel who may
not have been responsible for its construction.
·
A related point is that the third-party
inspectors can be chosen so as to maximize quality checking on
certain attributes. In this way, inspections also allow
incorporating multiple quality foci or perspectives. “One advantage
of inspections is that you can work on multiple qualities.
Perspective-based inspections enable artifacts to be reviewed by
experts in safety, usability, performance, etc.” [Boehm 213,
agreement from: Denger, Rumpe, Wiegers, Ambler].
·
However, an associated danger is that developers
are afraid of being embarrassed in front of outside inspectors, and
so expend too much energy perfecting the product before asking for
an outside perspective [Wiegers 421].
·
PP, conversely, lacks the external, 3rd party
perspective of a reviewer who isn’t “immersed in the product, and
[hasn’t] absorbed all of its assumptions.” An outside perspective,
although slower, often reveals insights that people too close to the
work didn’t spot [Wiegers
172]
·
Some of the
benefits of outside, objective, or focused reviewers can be achieved
via pair rotation and collective ownership [Maurer
351, Ambler 170, 180], or
by augmenting PP with other agile methods like “proving it with
code” [Ambler 356, 360].
·
There was some
discussion over how serious this lack is in PP. One participant felt
that without allowing the involvement of multiple, important quality
viewpoints, converging on system requirements is likely to be
problematic: “From a stakeholder win-win
perspective, just getting two people to determine the correctness of
requirements is very risky, as it excludes success-critical
stakeholders from the process” [Boehm 523, agreed by Lanubile,
Wiegers, Basili, Rombach.] Other participants agreed but said that
this is why PP should never be implemented without more than one
pair on the team, collective ownership of code, and the rotation of
people through work pairs/groups [Ambler].
Learning/sharing
knowledge
Another subject that participants concentrated
on was the contribution made by the practices to increasing the
skills and knowledge of people on the team.
- Participants felt that both
practices are good for mentoring junior people, who can quickly
learn what other team members classify as good and bad practices.
[Inspections:
Denger 165, Lanubile 155, Basili 157; PP: Boehm 152]
- Participants also felt that both
practices help disseminate tacit knowledge among the team members
[PP Boehm 117, Inspection: Basili 133] and therefore
support learning issues [Lanubile 140]
- There was some disagreement
among participants as to whether one or the other practice was
better suited for achieving learning. Some felt that PP’s shorter
feedback loop made it more effective for learning (“the learning
effect is better when discussing decisions when made instead of
seeing results,” Rumpe 88) while others strongly disagreed
(Wiegers 82: “You can learn how to do better work any time you
look over someone else's shoulder, via PP or peer review,” also
Basili 77).
- The only data on the subject
was for inspections: “Our industrial data is that, at the
individual person level, once the defect-found feedback has
worked… the individual can systematically inject two orders of
magnitude fewer defects in their daily work” [Gilb
336].
Repeatability
The lead discussant, Dieter Rombach, asked which of the
practices was more repeatable. That is, assuming that a particular
development team performed a similar task again, what is the
likelihood that they deliver the same quality in the same
timeframe/effort using solo programming and inspections, versus
using pair programming?
·
Vic Basili noted that this question investigates the
likelihood that there would ever be sufficient predictive capability
for the practices to be able to make accurate estimates about cost,
quality, schedule, etc., based on past
history.
·
Ambler felt that PP is more repeatable, since if you keep
many of the team members together, then they will have built a
common culture and a way of working that will be much more
effective. [Ambler
305].
·
Other participants felt that inspections are more repeatable,
and that the use of PP made it harder to build predictive models
[Rumpe, Williams, Basili 297, Gilb 304, confirmed by vote 323 on “PP
more repeatable than CI:” 1 yes, 6 no, 6 not sure]. One reason is
that the effects of PP are harder to quantify [Williams 297].
Another reason might be that in Agile projects data collection is
often not performed (see next discussion point). However, there
wasn’t more evidence stated regarding the
repeatability.
Measurement
Participants discussed which of the two
practices was more amenable to measurement which could quantify the
effect on the development process.
- Participants agreed that inspections do leave
a clear audit trail describing their results [Krebs 148].
- Tom Gilb felt that this audit trail
for inspections gives a statistical basis for managing the whole
software engineering process: “I now believe that Inspection
should be used to sample and measure, not to try to clean up.” The
measurement of major defects found during inspection should be
used to decide on appropriate next steps for the development of
the product, and to motivate people to follow the standards used
to judge the specification [Gilb 72].
Acceptance
One particularly important aspect of a
technology is how well the developers accept it; that is, how likely
it is that the developers adopt the technology and keep it in place
over time.
- Although nobody knew of any systematic
studies, there was anecdotal evidence that developers seem more
accepting (i.e. find it easier to keep the practice going on its
own merits) of PP than inspections.
- “I rarely talked to a developer who was keen
on doing an inspection. I met many developers who love pair
programming. So, why not simply accept these preferences?”
[Maurer 95]
- “When I was in industry, I
found that people did not prepare as well as they should prior
to an inspection. The inspection seemed more of a technicality
-- something that needed to be ‘checked off.’ I'm sure our
results wouldn't have matched most of the research
studies.” [Williams
76]
- Pairing may motivate people to tackle more
difficult challenges [Krebs 386], as they do not feel that they
need to come up with a solution for a difficult problem by their
own.
·
As pointed
out by Basili, however, both practices can surely be found being
done badly in industry [Basili 87] – so basing conclusions on
anecdotal evidence is dangerous.
Cost
“Cost” here is mainly a function of developer
effort: the number of hours required for developing the system,
finding and reworking defects, etc.
·
Outside of defect reduction, most data so far
indicate that pair programming is also a way to trade extra cost for
reduced schedule, which is often valuable in itself. [Boehm 99,
179]
·
Inspections can be a bottleneck in the
development process: “Also, I’ve found Inspections to be too
inefficient (defect yield per staff hour) and perhaps even worse,
they slow the project’s natural rhythm requiring much staff energy
to regain momentum.” [Manzo 103]
(Formality of) the
process
Process formality refers to the level of specificity at which
developer activities are defined; a more formal practice is expected
to have more process steps that are described in greater detail,
while a less formal practice would have fewer steps and rely more on
developers’ own expertise to fill in the
gaps.
·
The premeeting
consensus was that PP is less formal than inspections, but it would
be a mistake to assume that there was no formality to PP. As Bill
Krebs said: “There is some formality to pairing
in that there is a set 'algorithm' for doing it per Laurie William's
book. Also, we rotate pairs, iterate, and refactor to address the
risk that the first two folks will miss a bug.”
Comparing the
practices
The above topics raised some important points
about the dimensions where it may be useful to think about comparing
the practices.
- Arisholm reminded the participants that a
comparison of inspections versus pair programming based on only
some of those attributes would be easy to see as biased because it
might not account for the areas of most potential benefits of one
practice or the other [Arisholm 587]
- Some suggested comparisons:
o
Boehm suggested that each practice has
tradeoffs between cost and schedule.
o
Gilb suggested using inspection as a way to
measure the difference between groups using PP and control groups
[Gilb]
Existing Data
Concerning the use of PP there is only
relatively little data published; that is, only few publications
document the benefits of PP regarding the issues discussed above. It
seems that most of the data
regarding PP is anecdotal [Rumpe 238, Manzo
229].
·
In design [Williams; Arisholm 230, 257; Gilb
233]:
o
“In my initial PP study, I have a break out of
use by phase. Use of PP was highest in design. However, I don't have
any evidence of design isolated -- just by phase for all of
development.” [Williams]
o
We have some evidence on the perception of
developers/students that they create better designs using pair
programming [Maurer]
o
For other evidence see: Flor & Hutchins:
"Analysing Distributed Cognition in Software Teams: A Case Study of
Team Programming During Perfective Software Maintenance", proc. 4th
workshop on Empirical Studies of Programmers, pp. 36-64, 1991
[Arisholm]
·
General evidence:
o
“We told our small team you must either pair, or
use inspections, or justify why you did solo programming. We got 48%
of the changes paired, 50% solo, 2% with informal multi-person
review. Little unit test. 2x improvement in quality as compared to
earlier days of lower pairing frequency.” [Krebs
199]
o
“The first PP experiment I did showed
statistically significant higher quality resulting from PP than from
solo desk checking (not inspections).” [Williams
209]
·
Concerning the use of inspections most of the
participants agreed that the benefits (the value) of inspections are
well documented in a set of publications. In addition, some
participants mentioned projects where they documented the benefits
of inspections.
o
“One of my clients did requirements inspections
for 5 years and measured a sustained ROI of 10:1.” [Wiegers]
o
“A good paper in latest issue of Software
Quality Professional showing inspections reducing defect leakage to
customers from 10.6/KLOC to 0.9/KLOC.” [Wiegers
203]
o
“…in rough terms the effectiveness of
Inspections (if properly done at proper rates of checking like one
page/hour) are about 75-90% effective for requirements and design
but are in the 15-60% range for code. I have seen higher numbers
from Capers Jones than 60% but I am not sure I trust them. (People
don't seem to be good at understanding remaining defects
downstream).” [Gilb 472]
o
“A recent Banking client using requirements
inspections has about 88 majors/page before motivation and
measurement and after a few months is at about 11 majors/page, and
we expect this to drop to less than 1 major/page, within months.”
[Gilb 341]
o
“Our results for inspections at TRW were about
the same: in the 60% effectiveness range for both design and code.”
[Boehm 482]
Another issue that was briefly discussed in the
eWorkshop was the use of one practice in the homeground of the other
practice. The participants discussed the circumstances and
conditions under which it might be valuable to apply inspections in
an agile project in addition to PP and the application of PP in a
CMMI context instead of code inspections
PP/Inspection
“homegrounds”
The participants agreed that in some cases it is
valuable to perform extra inspections after PP. However, there is
more research needed to define a process that gives explicit
guidance under which circumstances it is valuable to do so.
- PP usually resides in a
process with a rather high level of evolution (refactoring,
rework). Only when a module becomes stable, can we do inspection.
[Rumpe, 378, Rombach+,
Ambler+, Wiegers+, Krebs+]
- I see this as an economic issue. If the risk
exposure due to unfound defects is high, doing the inspection is
worthwhile. If the risk exposure is low, then it's not worthwhile.
[Boehm 385]
-
Ray Madachy's calibrated system dynamics model
indicated that the net payoff for inspections went negative as the
defect density for the inspected artifact decreased [Boehm
607]
- I would like to see a process, where the team
(leader) is able to decide rather dynamically, where to use extra
inspections, based on complexity, criticality, degree of
innovation. [Rumpe 412]
PP
in a CMM(I) environment
The participants also discussed
the potential application of PP in a CMM or CMMI context. Although
there were no specific experiences discussed, most comments seemed
to indicate that there was no reason the two approaches would not be
compatible:
- There isn’t any overt connection between PP
and plan-driven or CMM approaches. PP is simply a "good practice"
that could and should be selectively applied in an appropriate
environment after some thought. [Wiegers 528]
- CMMI generalized the Peer Reviews process area
to "Verification," (CMMI level 3) so theoretically PP can help
address this key area [Barry Boehm 584]
- Pairing improves an organization’s
learning culture and may this be an interesting element of an
optimizing organization (CMMI level 5)
[Rumpe 591]
- We have a theory-based study that says XP
meets the CMM’s level 2 requirements. But that one is rather
conservative; Paul himself is more optimistic [Rumpe 602]
- Applying PP does not mean that
process-driven practices like inspections wouldn’t also be useful:
“The point here is I think that if people pair
100% pairing is at its limit. The resulting defect rate cannot be
reduced through further pairing (alone) -- but through additional
inspections, as they are done post-construction.” [Rumpe
582]
Topics for future eWorkshops or
Studies
One main result of the eWorkshop
is the identification of potential research fields where a more
detailed analysis of the practices should be performed. Based on the
discussion during the eWorkshop and the pre-meeting feedback, the
following ideas were generated:
·
When comparing the techniques, one should also
consider the complexity of the module being developed or inspected.
If there is any truth in existing results from group dynamics,
complex tasks are best performed in solo (i.e., inspections) whereas
simpler tasks are performed efficiently in groups (i.e., PP)
[Arisholm 127]
·
In future eWorkshops we should consider
feasible strategies for how the two practices can be usefully
combined.
o
Start with pairing, then formally inspect key
artifacts [Krebs 100]
o
Use PP in general and inspection in critical
cases (complex modules, high quality necessary etc.) [Rumpe
102]
o
PP is better for "tactical" quality improvement
(how do I most efficiently use Eclipse to debug this particular
NPE), while inspection is better for "strategic" quality improvement
(look for concurrency/synchronization errors in this package).
[Johnson 169]
§
Hypothesis Rumpe (Pre-Meeting Feedback): With
PP you can reach a certain level of quality more efficiently, but
you cannot go beyond that level. With inspections its more tedious
to reach that level, but possible to go beyond. (You cannot add a
third person to PP, but you can have more inspections with regards
to more viewpoints.) This might mean, it’s interesting to combine
both techniques: PP for quality level one and beyond that
inspections (when necessary).
·
Have an experiment
on the effect of swapping pairs [Williams 189, Ambler 174] I think the concrete research
question needs to be defined here. Based on the eWorkshop discussion
I also perceive the question whether swapping pairs can replace a
third persons perspective as interesting.
·
Future eWorkshops or studies should try to
refine the defect types that can be more easily addressed with PP
versus those that are better suited to Inspections.[Boehm 242,
Denger 247, Wiegers 249, Williams 262]
o
Hypothesis Arisholm (Pre-Meeting Feedback): I
suspect that the two alternative techniques may be useful for
detecting different kinds of defects. For example, formal
inspections might detect defects caused by integration issues better
than pair programming. Clearly, such claims need to be investigated
empirically
·
We need more information on the effects of
pairs on other development activities besides programming. [Ambler
285] Do we have evidence that Pairing is useful for requirements and
design as well? (vote results [499] show: there is no consensus
between the participants, even most of them are not sure)
o
Hypothesis by Wiegers (Pre-Meeting Feedback):
PP works well for developing code but not for developing other types
of software work products, which also need to be
reviewed.
|