Harris Cooper and Larry V. Hedges (Eds.)
The handbook of research synthesis.
New York: Russell Sage Foundation, 1994.
573 pp. ISBN 0-87154-226-9. $49.95

Review by
Gene V Glass

     The Handbook of research synthesis is the third volume of a
coordinated publication program on meta-analysis sponsored by the
Russell Sage Foundation. Starting in 1987 under the direction of a
Research Synthesis Committee (Harris Cooper, Thomas Cook, David
Cordray, Heidi Hartmann, Larry Hedges, Richard Light, Thomas Louis
and Frederick Mosteller), the project has previously produced The
future of meta-analysis (Wachter and Straf, 1990) and Meta-analysis
for explanation (Cook et al., 1992). The Handbook is by far the
largest and most comprehensive publication of this project. It
means to be the "definitive vade mecum for behavioral and medical
scientists intent on applying the synthesis craft."(p. 7) At nearly
600 hundred pages and three pounds, researchers will have to leave
their laptops behind. 

     Although the editors and many of the chapter authors eschew
the term "meta-analysis" in favor of the broader "research
synthesis," potential readers should understand that the former
(statistical analysis of summary statistics from published reports)
is the subject of the Handbook and not the more general concerns of
theory commensurability or the planning of coordinated
investigations suggested by the latter. 

     The organization of the Handbook follows the common logic of
producing a meta-analysis: formulate the question, search the
literature, code the information, analyze it, write a report. Some
of the chapters are unremarkable, since much of the craft of doing
research is routine; this only speaks to the completeness of the
work. Chapter 6, "Research Registers" by Kay Dickersin, points to
new possibilities. Medicine has databases of prospective, on-going
and completed studies; Dickersin identifies 26 of them. Expand them
slightly to include the actual data from clinical trials and other
forms of study and many of the more vexing problems of meta-
analysis (which arise from the telescoping of primary data into
summary statistics--and the discarding of the former) will be
solved. It is past time when behavioral research, both on-going and
completed, is catalogued and archived. Telecommunications has
driven the costs of information storage and retrieval to near zero.
Who will create the Internet Behavioral Research Archives? 

     Two themes imparted by the editors and the committee, one
presumes, give the Handbook of research synthesis its distinctive
character. Chapter 1 by the editors, Harris Cooper and Larry
Hedges, is entitled "Research Synthesis as a Scientific
Enterprise." Research synthesis is likened to doing science itself:
both are seen as involving problem formulation, data collection,
data evaluation, analysis and publication. These stages in both the
pursuit of science and the conduct of research synthesis give the
Handbook its section titles, and perhaps its entire bent. Although
these stages might reasonably describe the stages in carrying out
a meta-analysis, they do not capture what is distinctive about
science. The stages describe as well how one may conduct the
evaluation of a device, a drug and program or what-have-you. In
effect, the Handbook draws no clear or convincing line between the
pursuit of scientific theory and the evaluation of technology. This
line is quite important and must be drawn.  

     To cast meta-analysis as dedicated to the construction of
science disposes the discussion of it in the direction of classical
statistical methods that evolved alongside quantitative science in
the 20th century. In particular, the methods of statistical
hypothesis testing have come to be associated with the scientific
enterprise. The unwholesome effects of this association are the
subject of a brilliant article by Paul Meehl (1990) on the progress
of "soft psychology"; see particularly the Appendix where Meehl
briefly addresses meta-analysis. Just as scientists bring forth
hypotheses to be accepted or rejected by data, so do statisticians
devise the strategies by which data are judged to be in accord with
or at odds with the hypotheses. This view of statistics gives the
Handbook its other defining theme: meta-analyses involve the
testing of statistical hypotheses about parameters in populations
of research studies.  

      The appropriate role for inferential statistics in meta-
analysis is not merely unclear, it is seen quite differently by
different methodologists. These differences are not reflected in
the Handbook. In 1981, in the first extended discussion of the
topic, McGaw, Smith and I raised doubts about the applicability of
inferential statistics in meta-analysis. Inference at the level of
persons within studies (of the type addressed by Becker in Chapter
15, "Combining Significance Levels") seemed quite unnecessary to
us, since even a modest size synthesis will involve a few hundred
persons (nested within studies) and lead to nearly automatic
rejection of null hypotheses. Moreover the chances are remote that
these persons or subjects within studies were drawn from defined
populations with anything approaching probabilistic techniques;
hence, probabilistic calculations advanced as if subjects had been
randomly selected are dubious. At the level of "studies," the
question of the appropriateness of inferential statistics can be
asked again, and the answer again seems to be negative. There are
two instances in which common inferential methods are clearly
appropriate: when a defined population has been randomly sampled
and when subjects have been randomly assigned to conditions in a
controlled experiment. In the latter case, Fisher showed how the
permutation test can be used to make inferences to the universe of
all possible permutations. But this case in of little interest to
meta-analysts who never assign units to treatments. The typical
meta-analysis virtually never meets the condition of probabilistic
sampling of a population (though in one instance (Smith, Glass &
Miller, 1980), the available population of drug treatment
experiments was so large that it was in fact randomly sampled for
the meta-analysis). Inferential statistics has little role to play
in meta-analysis: "The probability conclusions of inferential
statistics depend on something like probabilistic sampling, or else
they make no sense." (p. 199) 

      It is common to acknowledge that many data sets fail to meet
probabilistic sampling conditions, but to argue that one might well
treat the data in hand "as if" it were a random sample of some
hypothetical population.  Under this supposition, inferential
techniques are applied and the results inspected. The direction
taken by the Handbook editors and authors mirrors the earliest
published opinion on this problem, expressed by Mosteller and his
colleagues in 1977: "One might expect that if our MEDLARS approach
were perfect and produced all the papers we would have a census
rather than a sample of the papers. To adopt this model would be to
misunderstand our purpose. We think of a process producing these
research studies through time, and we think of our sample--even if
it were a census--as a sample in time from the process. Thus, our
inference would still be to the general process, even if we did
have all appropriate papers from a time period." (Gilbert, McPeek
and Mosteller, 1977, p. 127; quoted in Cook et al., 1992, p. 291)
This position is repeated in slightly different language by Hedges
in Chapter 3, "Statistical Considerations": "The universe is the
hypothetical collection of studies that could be conducted in
principle and about which we wish to generalize. The study sample
is the ensemble of studies that are used in the review and that
provide the effect size data used in the research synthesis." (p.
30)  

     These notions appear to be circular. If the sample is fixed
and the population is allowed to be hypothetical, then surely the
data analyst will imagine a population that resembles the sample of
data. Or as Gilbert, McPeek and Mosteller viewed it, the future
will resemble the past if the past is all one has to go on. Hence
all of these "hypothetical populations" will be merely reflections
of the samples in hand and there will be no need for inferential
statistics. Or put another way, if the population of inference is
not defined by considerations separate from the characterization of
the sample, then the population is merely a large version of the
sample. With what confidence is one able to generalize the
character of this sample to a population that looks like the sample
writ large? Well, with a great deal of confidence, obviously. But
then, the population is nothing but the sample.   

      Hedges and Olkin have developed inferential techniques that
ignore the pro forma testing (because of large N) of null
hypotheses and focus on the estimation of regression functions that
estimate effects at different levels of study characteristics;
nearly all of them appear in the Handbook. They worry about both
sources of statistical instability: that arising from persons
within studies and that which arises from variation between
studies. As they properly point out, the study based on 5 persons
deserves greater weight than the study based on 500 persons in
determining the response of the treatment condition to changes in
study conditions. The techniques they present are based on
traditional assumptions of random sampling and independence. It is,
of course, unclear precisely how the validity of their methods are
compromised by failure to achieve probabilistic sampling of persons
and studies. 

     The irony of traditional hypothesis testing approaches applied
to meta-analysis is that whereas consideration of sampling error at
the level of persons always leads to a pro forma rejection of "null
hypotheses" (of zero correlation or zero average effect size),
consideration of sampling error at the level of study
characteristics (the study, not the person as the unit of analysis)
leads to too few rejections (too many Type II errors, one might
say). Hedges's homogeneity test of the hypothesis that all studies
in a group estimate the same population parameter is the focus of
much attention in the Handbook. Once a hypothesis of homogeneity is
accepted by Hedges's test, one is advised to treat all studies
within the ensemble as the same. Experienced data analysts know,
however, that there is typically a good deal of meaningful
covariation between study characteristics and study findings even
within ensembles where Hedges's test can not reject the homogeneity
hypothesis. The situation is nearly exactly parallel to the
experience of psychometricians discovering that they could easily
interpret several more factors than inferential solutions (maximum-
likelihood; LISREL) could confirm. The best data exploration and
discovery is more complex and credible than the most exact
inferential test. In short, classical statistics seems not able to
reproduce the complex cognitive processes that are commonly applied
by data analysts.


      Rubin (1990) addressed most of these issues squarely and
staked out a radical position that appeals to the author of this
review : "...consider the idea that sampling and representativeness
of the studies in a meta-analysis are important. I will claim that
this is nonsense--we don't have to worry about representing a
population but  rather about other far more important things." (p.
155)  These more important things to Rubin are the estimation of
treatment effects under a set of standard or ideal study
conditions. This process, as he outlined it, involves the fitting
of response surfaces (a form of quantitative model building)
between study effects (Y) and study conditions (X, W, Z etc.). Of
the 32 chapters in the Handbook, only the contribution of Light,
Singer and Willett, Chapter 28, "the visual presentation and
interpretation of meta-analyses," comes close to illustrating what
Rubin has in mind. By far most meta-analyses are undertaken in
pursuit not of scientific theory but technological evaluation. The
evaluation question is never whether some hypothesis or model is
accepted or rejected but rather how "outputs" or "benefits" or
"effect sizes" vary from one set of circumstances to another; and
the meta-analysis rarely works on a collection of data that can
sensibly be described as a probability sample from anything. 

     Rubin's view of the meta-analysis enterprise would have
produced a volume substantially different from that which Cooper
and Hedges edited. So we can expect the Handbook of research
synthesis to be not the last word on the subject, but one important
word on meta-analysis. 


 REFERENCES 

Glass, G.V; McGaw, B. & Smith, M.L. (1981). Meta-Analysis in Social
Research. Beverly Hills, CA: SAGE. 

Rosenthal, R. (1984) Meta-Analytic Procedures for Social Research.
Beverly Hills, CA: SAGE. 

Rubin, D.R. (1990). A new perspective. Chp. 14 (pp. 155-165) in
Wachter, K.W. and Straf, M.L. (Eds.), The Future of Meta- Analysis.
N.Y., N.Y.: Russell Sage Foundation. 

Smith, M.L.; Glass, G.V & Miller, T.I. (1980). Benefits of
Psychotherapy. Baltimore, MD: Johns Hopkins University Press. 

Gilbert, J.P.; McPeek, B. & Mosteller, F. (1977). Progress in
surgery and anesthesia: benefits and risks of innovative surgery.
In J. P. Bunker, B.A. Barnes & F. Mosteller (eds.) (1977). Costs,
Risks and Benefits of Surgery. NY: Oxford University Press.  

Cook, T.D.; Cooper, H; Cordray, D.S.; Hartmann, H; Hedges, L.V.;
Light, R.J.; Louis, T.A.; & Mosteller, F. (1992). Meta-analysis for
explanation: A casebook. New York: Russell Sage Foundation.   
 
Meehl, P.E. (1990). Why summaries of research on psychological
theories are often uninterpretable. Psychological Reports, 66, 195-
244. (Monograph Supplement 1-V66)