In principle, clinicians and researchers have a common purpose; they want practice to be as effective, relevant, and safe as possible. One of the most important contributions that research can make to this end is to minimize bias. Clinicians, however experienced, have a relatively small pool of experience from which to draw conclusions that they hope will improve their practice. The range of patients (clients) is limited in variety and severity of problem, stage and life situation; their ideas on what therapy can achieve is narrowed by restrictions in type or types of therapy practiced and the constraining imperatives of the clinical settings in which they work. Any judgment is likely to be based on partial evidence and heavily influenced by recent selective experience. Research can help individual clinicians, researchers, and other stakeholders stand to one side of their inevitable bias and take a fresh look at what is revealed. Research can also enhance the validity of impression, elucidate processes, and provide evidence to confirm and disconfirm received clinical wisdom.
Research can help one think again about what actually happens in therapy—the outcomes and processes—and how and why they happen. All psychotherapy is an experiment. Interactions occur; clinicians make their best judgment on what to do next; what happens is reviewed in the light of experience and new judgments are made, the effects of what is done or not done is reviewed in turn. On a pragmatic level, it is an experiment, influenced by feedback and judged against the markers of progress, e.g., enhanced alliance, deepened empathy, fuller understanding, problem res-olution, and patient satisfaction. Research is a systematic form of experiment in which the significance of elements in the clinical story is tested by being held constant or varied or brought into prominence for detailed scrutiny. Finally, research is an essential element in reliable communication between colleagues; it brings to the table the discipline of clear definition, the benefit of transparent methodology, the value of access to the experience of others through shared results, and the potential for replication and further testing of conclusions.
Of course, this ‘best of all possible worlds’ ideal is difficult to achieve. Most times the answers provided by research are small, sometimes contradictory, increments in knowledge, a few of which add to our understanding of complex issues. Rarely are results definitive; studies generally breed more questions than answers. Grand epiphanous ideas have to be scaled down to what can be achieved in the available time and resource. For good methodological reasons, what is studied may not be representative of everyday clinical practice. Results may take years to arrive and have limited generalizability: they have to be interpreted in light of clinical and social context, researcher bias and allegiance. Sometimes however, research can meet the seven desiderata of being representative, relevant, rigorous, refined, realizable, resourced and revelatory (Aveline et al., 1995). Then the rewards can be great for the labor of doing research.
For the practitioner, research begins with pressing clinical questions whose answers may improve practice. Research may focus on qualitative questions, e.g.:
What is the effect of making this or that intervention in a session?
Does the effect persist from one session to the next?
What is the relationship between sessions that go well or badly to outcome at termination?
What are the contributory factors and how do they interact?
In what way do patients, therapists, and significant others perceive sessions or the therapy differently?
How may a patient's intrapsychic or interpersonal conflicts be formulated? Can this be done reliably? Can such formulations benefit clinical practice?
How do patients’ narratives alter over time and what relationship does this have to psychotherapy theory.
Or on quantitative questions, e.g.
Is this treatment more effective than other treatments?
How does the efficacy of new therapies conducted under experimental conditions translate into clinical effectiveness in everyday work?
What extra gain can a patient expect to derive from a therapy that lasts for 50 as opposed to 25 sessions?
Does the gain justify the cost of the larger investment in time and duration?
Do different forms of gain accrue with different durations of therapy?
Do different therapies have different effects?
Which patients with what conditions do best with what therapy?
What training is necessary to maximize gain or minimize harm from therapy?
Studying process is the subject of qualitative research. In focus, it is in the same domain as that considered by a clinician in internal or external supervision; the difference is in the degree of systematization. Quantitative research provides an empirical, often controlled, means of validating and refining psychotherapy theory and practice.
Having set out a rationale for doing research, we consider basic prin-ciples of methodology and give a brief history of the field before summarizing what we know about outcomes and process. Then we discuss how to implement evidence into clinical practice. Finally, we anticipate future directions. An appendix contains guidance on how to read a research paper.
In researching clinical problems in psychotherapy, investigators can call upon a wide range of methodologies, some well established in classical empirical research in medicine and psychology and some breaking new ground in their exploration of subjectivity and individual meaning. Each has potentials and limitations in providing significant answers.
The method chosen depends on where one is in the cycle of developing or refining a therapy. New ideas for practice begin with clinical observation or theoretical inference. The hypothesized therapeutic effect might be evaluated through observational single case studies. Should these generate large effect sizes (see Roth and Fonagy, 1996, pp. 379–8), this would indicate that there could be something worthwhile in the innovation. The next step would be small-scale single group designs, i.e., uncontrolled naturalistic studies. A major step up in rigor would be to move to a randomized controlled trial (RCT); this is an acid test of efficacy (see below). Results under controlled conditions, however, do not necessarily generalize to everyday clinical practice. Effectiveness has to be established through field trials; these establish generalizability (see implementing evidence into clinical practice). Finally, dismantling studies tease out what are the effective ingredients in the practice being studied. Qualitative studies at each stage can be a rich source of ideas about the process of change.
Going through the sequence once is not enough. New perspectives arising from the findings at various stages prompt new paths through the cycle; studies need to be repeated to test the robustness of the findings. All this has to be evaluated against a standard of clinically significant change (see Outcome section), a more stringent and relevant standard than simple statistical significance (Jacobson and Truax, 1991; Ogles et al., 2001).
The difference between efficacy and effectiveness is crucial in understanding the divergence between research and service planning.
Outcome studies are commonly divided into studies determining the efficacy of a treatment versus studies focusing on a treatment's effectiveness (Seligman, 1995; Strauss and Kächele, 1998; Lambert and Ogles, 2004).
Efficacy is determined by (randomized) clinical trials in which as many variables as possible are controlled in order to demonstrate unambiguously the relationship between treatment and outcome, and potentially infer causal relationships from the findings (Strauss and Wittmann, 1999).
Efficacy studies emphasize the internal validity of the experimental design through random assignments to treatments, controlling the types of patients included with respect to their diagnosis (commonly excluding patients with comorbid disorders), through using manualized treatments, pretraining the therapists in the study clinical practice, and monitoring adherence to the treatment manual. These parameters ensure uniformity of therapy and enable other researchers to replicate the investigation. The price of high internal validity is usually poor external validity; the nature of the intervention is clear and consistent but unrepresentative of everyday practice and, thus, the findings of the study may not generalize. An example of an important efficacy study is the NIMH Collaborative Depression Study (Elkin, 1994; Krupnick et al., 1996; Ogles et al., 2001) in which patients with major depressive disorder were randomly assigned to four treatments: imipramine + clinical management, placebo + clinical management, cognitive-behavior therapy (CBT), interpersonal psychotherapy (IPT). One surprising result was that there was little evidence for the super-iority of one treatment in contrast to the placebo condition. At least two explanations have been put forward to account for this ‘negative result’. One is that ‘placebo’ was not inert—it involved frequent contact with therapists, albeit of a supportive nature. It is possible that ‘nonspecific’ therapy effects may have been making a contribution to good outcomes even in this arm. Second, as many other studies have shown that CBT is superior to placebo, it has been suggested that this result may have been a fault of the ‘quality control’ in this study, and that CBT was poorly delivered in one center. All of which goes to show how complex a business it is to mount a large-scale psychotherapy evaluation study of this sort.
Effectiveness studies, on the other hand, focus on clinical situations and the implementation of a treatment in clinical settings. Such studies emphasize the external validity of the experimental design: patients usually are not preselected, treatments commonly are not manualized, the duration of the treatment and other setting-related characteristics are not controlled. These clinically representative studies show how interventions perform in routine clinical practice (Shadish et al., 1997). Their weakness is the converse of the strength of efficacy studies; it is difficult to know what was done, when, and how. The variability inherent in effectiveness studies makes it is much harder to disentangle what were the therapeutic elements and replicate the work in other settings. An example of an effectiveness study is the German multisite study on inpatient psychotherapy for patients with eating disorders (Kächele et al., 2001). Questions examined in this prospective, naturalistic design included: What is the effectiveness of inpatient psychodynamic therapy for eating disorders? What factors determine the length of treatment? How do treatment duration and intensity contribute to effectiveness? Can such effects be attributed to specific patient characteristics?
Naturalistic or effectiveness studies are the principal research approach for the assessment of outcome in treatments that are hard to assess within a controlled clinical trial, either because of formal characteristics (e.g., treatment length) or because of ethical reasons (e.g., impracticalities in randomizing subjects to treatments) such as inpatient treatments or long-term psychoanalysis. Examples of representative effectiveness studies from the psychoanalytical field are as the Menninger Psychotherapy Research Project (Wallerstein, 1986), the Heidelberg Psychosomatic Clinic Study (Fonagy, 2001), or the Berlin Multicenter Study on psychoanalytic oriented treatments (Rudolf, 1991) (see the ‘open door review of outcome studies in psychoanalysis’, Fonagy, 2001).
The first question usually asked about any psychotherapy is: ‘does it work?’ The most widely respected way to answer this question is by a randomized controlled trial (RCT).
RCTs are an adaptation of the experimental method, which is the closest science has come to a means for demonstrating causality. The logic of the experimental method is that if all prior conditions except one (the independent variable) are held constant (controlled), then any differences in the outcome (the dependent variable) must have been caused by the one condition that varied. For example, if one patient is given psychotherapy and another identical patient is not, but is treated identically in all other respects, then any differences in their outcomes must have been caused by the therapy. Difficulties arise in applying the experimental method to study psychotherapy because no two people are identical and because it is impossible to treat two people identically in all respects except for the theoretically specified treatment (Haaga and Stiles, 2000).
RCTs address the differences among patients statistically. Rather than comparing single patients, investigators randomly assign patients to groups that are to receive the different treatments, on the assumption that any prior differences that might affect the outcomes will be more-or-less evenly distributed across the groups. Even though individuals’ outcomes might vary within groups (because patients are not identical), any mean differences between groups beyond those due to chance should be attributable to the different treatments.
Researchers have attempted to standardize psychotherapeutic treatments by constructing treatment manuals (e.g., Beck et al., 1979; Elliott et al., 2004) and by assessing treatment delivery via studies of adherence and competence (e.g., Shapiro and Startup, 1992; Startup and Shapiro, 1993; Waltz et al., 1993).
Some investigators speak of quasi-experimental designs (T. D. Cook and Campbell, 1979), which refer to comparisons between groups of patients who were not randomly assigned—for example, groups of patients who seem generally comparable but were assigned to different treatments on some other basis, perhaps because they appeared before or after the introduction of a new program or because of scheduling constraints or because they were treated at different sites. Such designs are often more feasible than strict RCTs; indeed they may appear as natural experiments, in which apparently similar groups happen to receive contrasting treatments. In such cases, however, there are always variables that were confounded with the variable of interest, so the evidence of causality is, to some degree, ambiguous.
Another major genre in psychotherapy research is the process-outcome study, which uses a correlational approach. Correlational studies are those in which two (or more) variables are observed, and the degree to which they covary is assessed.
In a widely cited article, Yeaton and Sechrest (1981) argued that effective psychotherapeutic treatments should contain large amounts of helpful change ingredients (strength) and should be delivered in a pure manner (integrity). If the theory underlying the treatment is correct, then delivering interventions with strength and integrity should be effective in producing client change. This view of process-outcome relations has been called the drug metaphor (Stiles and Shapiro, 1989; Stiles and Shapiro, 1994). This logic suggests that clients who receive a larger quantity or greater intensity of the helpful ingredients (process variables) should show greater improvement (outcome variables), so that process and outcome should be positively correlated across patients. Much process-outcome research has adopted this drug metaphor and sought to assess the relationship of process ingredients with outcome by correlating the process and outcome measures. It has been assumed that this method would allow researchers to determine which process components are the active ingredients, which should be positively correlated with outcome, and are merely inert flavors and fillers, uncorrelated with outcome (Orlinsky et al., 1994). Some, however, including ourselves, suggest that this reasoning may be misleading (e.g., Stiles, 1988; Stiles et al., 1998).
Since long before Freud, case studies have been a standard tool for invest-igating the theory and practice of psychotherapy. Although they are vulnerable to significant bias and distortion, as investigators unintentionally (or intentionally) perceive and report data selectively, case studies have always been a principal source of ideas and theories about psychotherapy (Aveline, in press).
Theoretically, based case studies can be confirmatory as well as exploratory. Interpretive and hypothesis-testing research are alternative strategies for scientific quality control on theory (Stiles, 1993, 2003). In hypothesis-testing research, scientists extract or derive one statement (or a few statements) from a theory and compare this statement with observations. If the observations match the statement (that is, if the scientists’ experience of the observed events resembles their experience of the statement), then people's confidence in the statement is substantially increased, and this, in turn, yields a small increment of confidence in the theory as a whole. In case studies, however, investigators compare a large number of observations based on a particular individual with a correspondingly large number of theoretical statements. Such studies ask, in effect, how well the theory describes the details of a particular case. The increment or decrement in confidence in any one statement may be very small. Nevertheless, because many statements are examined, the increment (or decrement) in people's confidence in the whole theory may be comparable with that stemming from a statistical hypothesis-testing study. A few systematically analyzed therapy cases that match a clinical theory in precise or unexpected detail may strongly support a theory, even though each component assertion may remain tentative when considered separately.
Qualitative research differs from traditional quantitative research on human experience in several ways. Results are typically reported in words rather than primarily in numbers. This may take the form of narratives (e.g., case studies) and typically includes a rich array of descriptive terms, rather than focusing on a few common dimensions or scales. Investigators use their (imperfect) empathic understanding of participants’ inner experiences as data. Events are understood and reported in their unique context; theory is generated from data. Materials may be chosen for study because they are good examples rather than because they are representative of some larger population. Sample size and composition may be informed by emerging results (e.g., cases chosen to fill gaps; data gathering continued until new cases seem redundant). One well-known form of qualitative research is grounded theory (Glaser and Strauss, 1967).
Grounded theory starts not from a pre-existing theory or hypothesis, but ‘bottom up’ from experience-near observations. It tries to derive theoretical categories from the commonalities that are generated from a multitude of such observations. These categories are then ‘back-tested’ against the raw experiential data, and if they stand up, gives confidence that the theoretical principles that emerge are based in reality, not prior preconceptions of the researcher or clinician. The whole thrust therefore is an attempt to circumvent the inherent observer bias found in psychotherapy, in which, Kleinian therapists see ‘Kleinian’ material in their clients, Jungians find ‘Jungian’ themes, and so on.
Emancipation or enhancement of the lives of participants may be considered as a legitimate purpose of the research. As a consequence of these characteristics, interpretations are always tentative and bound by context (Stiles, 2003).
A scientific theory can be understood not as an organized edifice of facts but as an understanding that is shared to varying degrees by those who have propounded it or been exposed to it. In this view, research is cumulative not because each new observation adds a fact to an edifice but because each new observation that enters a theory changes it in some way. The change may be manifested, for example, as a greater or lesser confidence in theoretical assertions, as the introduction or revised meanings of terms, or as differences in the way particular ideas are phrased or introduced. In this view, theory can be considered as the principal product of science and the work of scientists as quality control—insuring that the theories are good ones by comparing them with observations (Stiles, 2003). If science is understood in this way, theory is just as central in interpretative (qualitative) research as it is in hypothesis-testing research.
Not all qualitative investigators of psychotherapy see quality control on scientific theory as their main activity. Some instead use alternative forms of discourse that can be described as hermeneutic, after Hermes, the messenger (e.g., Rennie, 1994a, b; Rhodes et al., 1994; McLeod and Lynch, 2000). This alternative discourse form represents a distinct sort of intellectual activity, entails different goals and procedures, and yield distinct products. The goal of hermeneutic discourse can be described as deepening. The activity consists in understanding what the target material, such as some text or concept, has meant or could mean to other people. Put another way, it is unpacking the experiences that have been or could be embodied in the words and other signs of the target material. Insofar as most words have very long histories, this process is potentially endless. Packer and Addison (1989) and Rhodes et al. (1994) described this process of unpacking as the hermeneutic circle-observing, interpreting, reviewing through the new interpretation, revising, and so forth. The product is thus a series of reinterpretations, leading to ever-deeper understandings but not necessarily to a unified synthesis (Hillman, 1983; Woolfolk et al., 1988). The exploration of alternatives is itself the product of the activity rather than a means of developing a particular theory. The understanding achieved is valued for its depth—the richer appreciation—not necessarily because it is more simple or unified.
Reductionism. The trade-off between the grand idea and a do-able study is simplification. Problems arise when the essence of the natural complexity of human problems is fractured by the pragmatics of doing research. The large picture is lost sight of in attending to the micro-focus; the fascinating minute process may be irrelevant to the overall outcome.
Nonrepresentativeness. In order to control variables or simply live within the constraint of the available resource, selective choices are made of type and intensity of disorder, duration of therapy and experience and competence of therapists, which simplify the clinical field, and result in nonrepresentative findings with limited generalizability.
Context. RCTs are snapshots in time, capturing the performance of sets of patients and therapists. Generalizing conclusions from one set to another needs to be done cautiously as much of the variance lies with the particularity of each set. Even within a service that evaluates its performance over time periods that are appropriate for the practiced therapy and demonstrates effectiveness, the healing therapists who contributed to the success may long since have gone.
Mistaking what is studied for what is important. It is easy to assume that what is positively correlated is causally related. Unappreciated important intervening (confounding) variables may lurk out of sight, yet to be discovered.
False positivism. By virtue of objectified methodology, there is a risk of giving false certainty to the external world when the inner world is essentially subjective and idiosyncratic.
Emphasis on mental disorders. Categorical diagnosis implies that dis-orders are discrete entities with possibly different etiologies and treatments. The categorical view, which is an import from medicine, does not necessarily fit the ‘problems in living’ presentations that are the province of psychotherapy and a major part of the work of psychiatry. It over-emphasizes difference between conditions and underplays the alternative view that the great range of nonpsychotic symptomatology is better seen a single manifestation of disturbance whose origins need to be understood and formulated (Aveline, 1999).
Comorbidity. On grounds of practicality or an intention to concentrate on ‘pure’ disorders, many studies specifically exclude comorbidity and, in particular, Axis II disorders. This is not representative of the real world. Also, from a psychodynamic perspective, how a person reacts to the world is a function of their personality; Axis II depicts exaggerated forms of personality dimensions.
Therapeutic change is not linear. Early in the development of physics, sequential rules were thought to govern processes; outcomes were the predictable consequence of interactions; cause and effect were linked in a pas de deux. Just as Heisenberg's uncertainty principle guides—if that is the right word—modern physics, so does uncertainty rule in psychotherapeutic interactions. Progress may be followed by regression, the gain of symptom reduction blunts the spur of discomfort, life choices are perceived in new light as fresh insights are gained, intentionality alters, and significant others in the subject's life have their own influential agendas. Similarly, therapy interventions are not linear but hermeneutic.
Manualization prioritizes internal validity over external validity. It increases reliability and replicability, but may decease reflexivity. If the manual is highly prescriptive, therapist responsiveness may be limited, thereby restricting a key factor in successful therapy. Except in training, clinicians rarely follow manuals to the letter in everyday clinical practice.
Randomization may conflict with subject preference for therapy. It is also difficult to provide comparable control interventions.
Measures. Symptoms are easier to measure than problems in relationship and, in their generic form, are a good marker of distress. They need to be supplemented by domain-specific measures including that of interpersonal functioning.
Statistical problems. In order to have the possibility of significant results, trials must have sufficient power (Cohen, 1977). For example, a comparison of two treatments with a 50% chance of detecting a true difference between groups (a ‘median effect’ in Cohen's terms) would require 64 subjects in each group (Shapiro et al., 1995). All studies have subject attrition but often the attrition is selective and, if not allowed for, biases the results. Studies should report results based on intention to treat, i.e., include all potential clients, encompassing those that fail to start, or drop out at an early stage, as well as ‘finishers’. In addition simple prepost testing does not give robust results. What is needed is clinically significant change. Significance results are poor guides for clinical practice. Better statistics are relative risk, confidence intervals and numbers needed to treat (NNT, R. J. Cook and Sackett, 1995; Altman, 1998; Jacobson et al., 1999). The latter refers to the number of patients who would need to have been effectively treated in order to produce benefit compared with an untreated (i.e., spontaneously recovering) group—the smaller the NNT the more useful a therapy is considered to be.
Allegiance effects. Clinicians and researchers often have loyalties to the therapy being studied. This introduces significant, systemic bias in favor of the preferred approach. Allegiance should be declared in write-ups.
Group results do not predict individual reaction. Research findings can inform practice but cannot be an absolute guide to what therapy to recommend at assessment or, in clinical practice, what to do for the best in a therapy session. This is an important caveat to set against the enthusiasm of health purchasers and planners for what they may see as the hard facts of empirical research.
Therapy is not the only change factor in patients’ lives. Unpredictable negative or positive events in someone's life may lead to change unrelated to therapy.
Research has to be ethical. Subjects should be seen as equals with a vital interest in process and outcome. Their interest is considered explicitly when approval is sought from the relevant Ethical Committee. If subjects are to be randomized to treatments, the clinicians have to be confident that the alternatives are of equal value, i.e., there is equipoise (Lilford and Jackson, 1995); they, also, need to consider how the patient might differentially value what is on offer (Lilford, 2003). The possibility of doing harm must be minimized. This does not mean that interventions have to be risk-free; this would be impossible with an active intervention such as psychotherapy but the risks need to be anticipated and the subject given sufficient information to make an informed choice. Some research designs specifically allow for subject preference.
Traditionally, detailed case accounts have been the source of theory and the way for clinicians to illustrate their work. However regulatory bodies in medicine, psychology, psychotherapy, and counseling are increasingly restrictive in allowing case-material to be published without the written consent of the subject. One can imagine that consent might not be given in the very cases that would be the most valuable for learning but which were problematic in some way for the original therapy dyad. A solution needs to be found that balances privacy and the legitimate needs of the field. Fortunately, for the most part, subjects give consent when the purpose is explained or they see a draft of what is to be written and have opportunity to comment.
The British Association for Counselling and Psychotherapy has adopted as policy an excellent framework for ethical practice in general (Bond et al., 2002) which has now been supplemented by specific research guidance (Bond, 2004).
Orlinsky and Russell (1994) divided the history of psychotherapy research into four phases, marked by the publication of distinct sets of synthetic reviews of the field and distinct types of major research projects.
Phase I (c. 1927–54) was a pioneering period, characterized as establishing a role for scientific research, in which investigators began tabulating therapeutic outcomes and (in the 1940s) recording psychotherapy sessions for process research.
Phase II (c. 1955–69), characterized as searching for scientific rigor, was marked by investigators ‘developing objective methods for measuring the events of recorded therapy sessions’ and ‘demonstrating effectiveness in controlled experiments’ (p. 193).
Phase III (c. 1970–83), characterized as expansion, differentiation, and organization, was marked by the growth of scientific organizations devoted to psychotherapy research and by increasing conceptual and methodological sophistication, as well as innovation, illustrated in comparative outcome studies, phenomenological and task-analytic approaches to process, and the use of meta-analytic reviewing techniques.
Phase IV (c. 1984–94 and beyond), characterized as consolidation, dissatisfaction, and reformulation, has been a period in which continuing growth in the sophistication of methods that now seem traditional has been accompanied by fundamental doubts about their appropriate application to the human enterprise of psychotherapy, and the proposal of alternatives. Some of these issues are touched upon in our section on Methodology.
Barkham (2002) sees the progression moving from justification (is psychotherapy effective?) to specificity (which psychotherapy is effective) to efficacy and cost-effectiveness (how can therapies be made more effective?) to effectiveness and clinical significance (how can the quality/delivery of therapy be improved?).
The term outcome describes all aspects of changes that patients can make during psychotherapy. The specific definition of outcome depends on the perspective of the stakeholder assessing the outcome (i.e., the patient, his or her social group, the therapist, representatives of the healthcare system, such as insurance companies, or the society as a whole). It also depends on the specific goals of a treatment or a treatment model (Ambühl and Strauss, 1999).
Ideally, outcome should be measured using multiple criteria, dimensions, measures, and modes, all on multiple occasions. Outcome should be related to the circumstances of a problem, the specific symptoms associated with the problem, and long-term consequences of a treatment. Schulte (1995) has proposed a classification system for the assessment of treatment success that differentiates content and methodological dimensions (see Table 38.1).
Measures related to the causes of a problem (the ‘defect’ such as impaired ego-functions, a discrepancy between perceived and ideal self, or specific cognitive strategies) mostly reflect the theoretical basis of the treatment model and are therefore school specific. On the level of symptoms, a wide variety of disorder-specific measures, independent of the theoretical model, are available. Finally, on the level of consequences, Schulte (1995) proposes outcome measures that are related to the ‘sick role’ (i.e., the utilization of healthcare services, or the subjective experience of the sick role) and to the impairment of normal roles (i.e., related to work, social activities, social relationships).
Methodological design structures the investigation and determines the generalizations that can be made across time, settings, behaviors, and subjects. An essential component is operationalization, i.e., decisions that have to be made about the specific methods or instruments used to measure change and the definition of outcome criteria, e.g., the amount of change that has to be reached or degree of goal attainment in order for it to be significant.
|
Table 38.1 Conceptual and methodological aspects of outcome measurement (according to Schulte, 1995)
|
||||||
|---|---|---|---|---|---|---|
|
Historically, outcome research dates back to the 1930s when clinicians started to tabulate systematically the benefits achieved by their patients, e.g., Fenichel at the Berlin Psychoanalytical Institute (Fenichel, 1930). Many outcome studies were stimulated by a provocative article that Eysenck published in 1952 in which he drew the conclusion that psychotherapy was no more effective than spontaneous remission (Eysenck, 1952). It took considerable time and numerous research efforts until McNeilly and Howard (1991) using Eysenck's original data set were able to show that psychotherapy produced the same recovery rate after 15 sessions as spontaneous remission after 2 years!
Our knowledge about the outcomes of psychotherapy is based on numerous comparative treatment efficacy as well as effectiveness studies. After several decades of research, the controversy about the general outcome of psychotherapy has largely been ended through the use of meta-analyses. Meta-analyses provide a tool for summarizing single studies on the efficacy and effectiveness of psychotherapeutic treatment by the application of methods and principles of empirical research to the process of reviewing literature. This procedure usually results in a summary statistic, the effect size, which quantifies the cumulative effects demonstrated within the single studies included in the review.
In a recent summary of the outcome literature, Lambert and Ogles (2004) concluded: ‘While the methods of primary research studies and meta-analytic reviews can be improved, the pervasive theme of this large body of psychotherapy research must remain the same—psychotherapy is beneficial. This consistent finding across thousands of studies and hundreds of meta-analyses is seemingly undebatable.’ (p. 148).
One milestone in the development of the meta-analytic methodology was the publication of M. L. Smith et al.'s (1980) article summarizing 475 single studies on the outcome of psychotherapy. The authors reported an average effect size of 0.85 for the comparison of treated and untreated groups. The statistic indicates that the average person treated in psychotherapy is better off than 80% of untreated people.
Following M. L. Smith et al.'s report, a large number of meta-analyses have been conducted that summarize the general effects of psychotherapy as well as the effects of treatments for specific disorders (e.g., anxiety disorders or depression) and of specific treatment models (e.g., CBT or psychodynamic therapy). In a review of a total of 302 meta-analyses of different treatments, Lipsey and Wilson (1993) concluded that ‘the evidence from meta-analysis indicates that psychological, educational, and behavioral treatments studied by meta-analysts generally have positive effects’ (p. 1198). Similar and consistent results showing that psychological treatments were superior to control conditions have been obtained in numerous reviews focusing on specific disorders and specific treatment settings, such as small group treatment (Burlingame et al., 2004). In addition, reviews of outcome studies support the cost-effectiveness of psychotherapy (Chiles et al., 1999; Gabbard et al., 1997) and show that treatment effects are maintained for several years after treatment (Stanton and Shadish, 1997). It is important to note that a small number of patients (5–10%) get worse during psychotherapeutic treatments (deterioration effect) (Mohr, 1995).
On the way to reaching this favorable position, outcome research has passed several important milestones, characterized by increased methodo-logical sophistication. Outcome measurement has been differentiated by the specific goals of treatments and standardized measures have been developed for quality assurance, including core batteries targeted on specific issues (Strupp et al., 1997). Starting in the 1970s, the development of approaches to determine and test individual changes and their clinical meaning became increasingly important. One of these is the determination of the social validity of individual outcome. Social validity is based upon social comparisons, i.e., the evaluation of changes related to a normal reference group, or on subjective evaluation ‘by gathering data about clients by individuals who are likely to have contact with the client or are in a position of expertise’ (Kazdin, 1998, p. 387). In addition, several statistical methods have been developed to determine the clinical significance of treatment interventions. Clinical significance is usually based upon the stringent definition that (1) treated clients make statistically reliable improvements as a result of treatment, and (2) treated clients are empir-ically indistinguishable from ‘normal’ peers following their treatment (Jacobson et al., 1999; Kendall et al., 1999; Lambert and Ogles, 2004).
As well as investigating the general efficacy of psychotherapy, outcome research has focused on the relative effectiveness between treatments. Different treatment modes such as psychodynamic, behavioral, cognitive, or humanistic approaches have been tested in comparative studies. Reviews of comparative studies can be divided into several phases (Lambert and Ogles, in press). Many older reviews reached the startling conclusion that the outcomes of alternative psychotherapies are equivalent. The equivalence paradox (Stiles et al., 1986) points to the puzzle that the outcomes of varied psychotherapies appear more-or-less equivalently positive even though their treatment techniques are very different (Luborsky et al., 1975; Lipsey and Wilson, 1993; Lambert and Bergin, 1994; Norcross, 1995). The evidence is often summarized as the Dodo verdict: ‘Everybody has won, and all must have prizes’ (Carroll, 1946, p. 28; original work published 1865; italics in original). The Dodo verdict may be an overstatement (Beutler, 1991; Chambless, 2002); there are exceptions, e.g., in vivo exposure for phobias and other anxiety disorders has consistently been found more effective than other behavioral procedures (Emmelkamp, 1994); and ultimately no two psychological procedures have exactly equivalent effects, i.e., the null hypothesis is never really true (Meehl, 1978). Nevertheless, the substantial degree of outcome equivalence relative to the technical diversity of treatments has long puzzled observers (Rosenzweig, 1936; Stiles et al., 1986; Luborsky et al., 2002).
Meta-analytic reviews conducted in the 1980s and 1990s generally showed an appreciable advantage for cognitive-behavioral treatment models over psychodynamic, process-oriented and interpersonal therapies (Svartberg and Stiles, 1991; Grawe et al., 1993).
On the other hand, several meta-analyses have shown that comparative studies yield equivalent results, when factors such as investigator allegiance and case severity are controlled (Wampold et al., 1997; Luborsky et al., 1999; Wampold, 2001). In view of this some investigators (e.g., Shoham and Rohrbaugh, 1999) have come to the conclusion that ‘the Dodo Bird verdict has been fortified by the allegiance effect bias’.
With the increasing development of outcome research, interest has shifted from testing specific psychotherapeutic theories to teasing out the relative contribution of specific components of various treatments. Such component analyses or dismantling studies have been advocated as an important alternative to the usual comparative treatment approach (Borkovec, 1993). Neo Dodo-bird proponents such as Ahn and Wampold (2001) summarized dismantling studies from an 18-year period in a meta-analysis of component analyses and found that these studies reveal ‘little evidence that specific ingredients are necessary to produce psychotherapeutic change’ p. 126 (Wampold, 2001). From a similar ideological position, Lambert and Ogles (2004) stress this finding as an argument against the identification of empirically supported therapies as ‘decades of research have not resulted in support for one superior treatment or set of techniques for specific disorders’.
It should, however, be noted that across the range of disorders in which a specific therapy has been shown to be effective, cognitive-behavioral therapies consistently show the greatest versatility and efficacy. Against drawing hasty conclusions from this, however, stands the aphorism that ‘absence of evidence does not denote evidence of absence’. In other words, psychodynamic and systemic therapies may well be effective in a range of disorders, but the inclination of their supporters, logistical difficulties, and expense of mounting appropriate trials mean that the results are currently not to hand.
Besides the general question of ‘how efficacious is (what kind of) psychotherapy?’ outcome research has dealt with a variety of more specific problems such as the dosage that is necessary to reach positive outcomes. In addition, outcome research has been increasingly linked with questions of specific and nonspecific ingredients of psychotherapeutic interventions and the amount of variance that can be explained by these factors.
In a classical meta-analysis of 2431 patients in psychotherapy in studies published over a period of three decades, Howard, Kopta, Krause, and Orlinsky (Howard et al., 1986) concluded that the relationship between the number of sessions (‘dosage’) and client improvement ‘took the form of a positive relationship characterized by a negatively accelerated curve; that is, the more psychotherapy, the greater the probability of improvement, with diminishing returns at higher doses’ (Kopta et al., 1994, p. 1009). This study also clearly supported the view that treatment produces benefits that surpass spontaneous remission rates.
Following this classical dose-effectiveness study, several other investigations were carried out to answer the question: How much therapy would be enough? In summarizing these studies, Lambert and Ogles (2004) conclude: ‘Research suggests that a sizeable portion of patients reliably improve after 10 sessions and that 75% of patients will meet more rigorous criteria for success after about 50 sessions of treatment. Limiting treatment sessions to less than 20 will mean that about 50% of the patients will not achieve a substantial benefit from therapy’.
As it is generally the case in outcome research, dose-effectiveness functions reveal differential responses to treatment depending on the level of measurement: Howard et al. (1993) reported an attempt to support empirically the phase model of psychotherapeutic change that has been originally conceptualized by Frank (1973b). This model postulates that the process of psychological restitution reverses the order of development of psychopathology, i.e., failure of functioning in different areas, the development of psychological symptoms and the failure of the individual coping strategies resulting in demoralization. According to the phase model, therapeutic change should first occur in a restitution of well-being (remoralization), followed by a relief of symptoms (remediation), and finally result in an improvement of functioning (rehabilitation). Although the empirical studies related to this model are still equivocal, it is evident that different aspects of functioning respond differentially to treatment; psychological symptoms respond faster than personality and interpersonal aspects of functioning. It is obvious that results like this are of considerable importance in the discussion of the usefulness of long-term psychotherapy such as psychoanalysis or some forms of psychodynamic treatment.
An important obstacle in the way of explaining specific psychotherapeutic effects is the placebo problem. In pharmacological research, placebos should not contain the curative substance. It is evident that ubiquitous psychological factors play an important role in the placebo phenomenon. These factors include the instillation of hope, a decrease in demoralization, the experience of self-efficacy, and the belief in the manageability of a problem. In contrast to pharmacological research, these factors are supposed to play an active role in patient improvement and are known as common curative factors in psychotherapy (Frank, 1973a; Strauss and Wittmann, 1999).
There is a long tradition of studies in psychotherapy research studies dealing with the relative benefit of therapies when compared with placebo controls. Recent meta-analyses show that the efficacy of specific treatments is superior to both no treatment and placebo treatments (Lipsey and Wilson, 1993). Grissom (1996) concludes on the basis of a meta-analysis that the ‘results are consistent with the view that the ranking for thera-peutic success is generally therapy, placebo, and control (do nothing or wait)’ p. 979. In Grissom's analysis, the ‘probability of superiority’ was 0.70 for the therapy versus control comparison, 0.66 for the therapy versus placebo comparison and 0.62 for the placebo versus control comparison, with the latter indicating that placebo conditions that usually emphasize nonspecific or common therapeutic factors such as therapist warmth, attention, or expectations for change contribute to positive outcome, although the effects of these factors are smaller than those of specific psychotherapy.
One possible resolution to the equivalence paradox runs as follows: yes, psychotherapies differ in their theories and techniques, but these factors are not the important ones. There are many features that all psychotherapies have in common, and some of these common factors may be responsible for different treatments’ equivalent effectiveness—most famously, Rogers's (1957) ‘necessary and sufficient conditions,’ which included genuineness, unconditional positive regard, and accurate empathy. Process research on common factors has looked at: (1) therapist-provided common factors, including the Rogerian conditions, warm involvement with the patient, and the communication of a new perspective on the patient's person and situation, (2) patient-provided common factors, such as patient self-disclosure (Stiles, 1995) and experiencing (Klein et al., 1986); and (3) the therapeutic alliance or the interaction between the therapist and the patient (Horvath and Bedi, 2002).
Thus one possible explanation for the general finding of only relatively small differences between treatments with respect to several outcome criteria is the assumption that different treatment modalities are characterized by common curative factors that are active ingredients of all particular schools (although sometimes not an explicit part of the formal change theory) and that these common factors go beyond those that might be important in explaining the placebo phenomenon.
Meanwhile, there is ample evidence for the relationship of common factors and improvement and even some evidence that common factors are superior to unique factors in explaining the variance of treatment outcome (Castonguay et al., 1996). In their recent review, Lambert and Ogles (2004) group common factors into three categories: support factors, such as catharsis, therapeutic alliance, therapist warmth, respect, and empathy; learning factors, such as insight, corrective emotional experiences, or assimilating problematic experiences; and action factors, such as mastery, reality testing, or behavioral regulation.
These common factors ‘loom large as mediators of treatment outcome’ (Lambert and Ogles, 2004), but are not sufficient to explain fully psychotherapeutic change. Other sources of variance such as unique interventions, patient and therapist related variables and their interaction have equally to be considered as factors that explain therapeutic improvement. The determination of the influence of such factors is a crucial issue in psychotherapeutic process research.
It is worth noting that an emphasis on common factors does not necessarily contradict the overall finding of the general superiority of CBT over other modalities, especially in specific disorders. It is possible the CBT is simply more efficient in marshalling the key common factors in its training and delivery. This kind of conclusions, emerging from the research literature is an example of the way in which research can help illuminate pressing clinical and training issues.
The aims of psychotherapy process research can be conveyed as a series of questions: What happens in psychotherapy? How do therapies differ? How do patients act and think differently as a result of therapy? What are the common factors across different therapies? Which are the effective ingredients? What happens as patients improve?
Much of this section is drawn from a chapter by Stiles et al. (1999), to which readers are referred for elaboration and further references concerning this material.
Treatment process research is characterized by a profusion of measures. Researchers have developed thousands of categories and scales, and they have organized these into hundreds of measuring instruments and systems of classification (for some compilations of examples, see Kiesler, 1973; Greenberg and Pinsof, 1986; Beck and Lewis, 2000). So many systems of process classification have been developed that there is even a literature on meta-classification—that is, classification of classifications (Russell and Stiles, 1979; Greenberg, 1986; Russell and Staszewski, 1988; Elliott, 1991; Elliott and Anderson, 1994; Lambert and Hill, 1994). Table 38.2 lists some meta-classificatory principles—ways in which process categories and measures differ.
As an illustration, consider the Working Alliance Inventory, patient form (WAI; Horvath and Greenberg, 1989), in which patients rate their agreement with 36 statements about their relationship with their therapist. It yields three scores reflecting the quality of the Bond, Agreement about Tasks, and Agreement about Goals. In terms of the characteristics listed in Table 38.1, the WAI uses the patient's perspective. Its target is the dyad. The scoring unit is usually the session or a sequence of sessions. It refers to all communication channels. It is a rating measure that is evaluative. It is based on the respondent's personal experience, accessed directly. It uses a pragmatic strategy. It is applicable to treatment of any theoretical orientation and has been used mainly in adult individual therapy, though versions have been developed for other modalities (as well as for therapist and observer perspectives).
Why are there so many measures? We think that informed researchers develop new measures because the old measures have failed to answer their questions or because they are interested in some previously unassessed aspect. Thus, although it may be tempting to advocate arbitrary standardization, this is probably not in the long-term interest of the field.
Process research has led the way in trying to unravel the equivalence paradox. To assess differences in treatment processes, investigators have applied process measures to contrasting treatments or conditions and compared the results. They have repeatedly identified systematic differences in therapists’ techniques across different orientations (Strupp, 1957; Stiles, 1979; DeRubeis et al., 1982; Elliott et al., 1987; Stiles et al., 1988; Hill et al., 1992; Startup and Shapiro, 1993). The empirically demonstrated process differences have generally been consistent with the theoretical differences between treatments.
Treatment differences are also important in comparative research. To ensure treatment integrity in clinical trials comparing different treatments, researchers have tried to standardize the treatments using detailed treatment manuals (DeRubeis et al., 1982; Luborsky et al., 1982). This step has led researchers to assess therapists’ adherence to therapeutic protocols. The logic is that if treatments are to be compared, they must be delivered according to protocol. If an adherence check were to show that the therapists were not following the manual, the treatment was not being delivered correctly and the clinical trial could not be interpreted. For example, Hill et al. (1992) tested therapists’ adherence to their respective treatment approaches in the National Institute of Mental Health Treatment of Depression Collaborative Research Program (TDCRP; Elkin, 1994) using a 96-item rating scale, which discriminated between the three different treatments very well. Therapists used more techniques consistent with their respective treatment modality, and fewer techniques appropriate to the other treatments.
The frequent implicit assumption that all patients with the same diagnosed disorder compose a homogeneous group is certainly false. People differ in all sorts of ways that may be manifested in the therapeutic process. These differences affect the ways patients are treated (Hardy et al., 1998). Process researchers have applied their measures to assess ways in which patients are internally consistent (e.g., self-similar from session to session) but different from other patients. For example, computer-based analyses of the text of psychotherapy sessions have demonstrated consistent patient differences in the frequencies of particular words, phrases, or categories in verbatim transcripts of sessions (Hölzer et al., 1996).
|
Table 38.2 Ways in which process categories and measures differ (after Stiles et al., 1999)
|
|||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Trying to capture the uniqueness of the individual, while at the same time yielding reproducible categorical data is a central task for psychotherapy researchers, especially those in the psychodynamic orientation. The Core Conflictual Relationship Theme method (Luborsky, 1976; Luborsky and Crits-Christoph, 1990) has been developed to assess treatment-relevant transference themes shown by patient narratives in psychotherapy; these commonly focus on interactions with other people in the patient's life, including the therapist (Barber et al., 1995). The Structural Analysis of Social Behavior (Benjamin, 1996; Henry, 1996) uses a complex circumplex coding scheme, in which three underlying dimensions (dominance, affiliation, and individuation) are used to describe patients’ interactions with self and others; this approach has been used to distinguish between good and poor outcomes of brief psychodynamic therapy (Henry et al., 1986). In the CBT tradition, researchers interested in the psychotherapy of depression have emphasized such components of patient cognitive processes as causal attributions and depressive schemata—specific knowledge structures that contain undesirable biases, which are a target of interventions (Beck et al., 1979).
Much process research has been driven by a search for the curative factors in psychotherapy. However, from a psychodynamic perspective the results of this search have been oddly disappointing. For example, Orlinsky et al. (1994) concluded that there was evidence for differential effectiveness of some therapeutic operations, including interpretation (along with paradoxical intention and experiential confrontation). A review of psychodynamic approaches in the same volume by Henry et al. (1994), however, concluded that ‘transference interpretations do not elicit differentially greater affective response or necessarily increase depth of experiencing when compared with nontransference interpretations or other interventions’ (p. 475). It may be noted that transference interpretations are not equivalent to interpretations, but insofar as the former are a subset of the latter, the contrasting conclusions were striking.
Yeaton and Sechrest (1981) urged therapists and investigators to attend to the strength, integrity, and effectiveness of treatment. Effective treatments, they argued, should contain large amounts of helpful change ingredients (strength) and should be delivered in a pure manner (integrity). If the theory underlying the treatment is correct, then delivering interventions with strength and integrity should be effective in producing patient change. This view of process-outcome relations has been called the drug metaphor (Stiles and Shapiro, 1989, 1994). However, the reasoning may be misleading (Stiles, 1988; Stiles et al., 1998).
The drug metaphor logic assumes that the cause-effect relations of process and outcome variables run in a single direction (i.e., that process variables cause outcome variables). However, this reasoning neglects therapists’ and patients’ appropriate responsiveness—the tendency of both therapist and patients to make appropriate adjustments in their behavior as a result of ongoing changes in their own and each other's requirements (Stiles et al., 1998). In human interaction, participants are responsive to each other's behavior on time scales that range down to a few tens of milliseconds, and for this reason, linear statistical descriptions of process-outcome relations can fail to reflect the value of a psychotherapy process component (Stiles, 1988).
Dismantling studies, at first glance, seem to offer a way around the responsiveness problem. They employ experimental methods to identify which components of a treatment package are responsible for facilitating change. Two or more treatment groups that vary in only one or a few of the treatment's techniques are compared. One group typically receives a complete treatment, whereas other groups receive only a portion of the treatment (Nezu and Perri, 1989). Dismantling studies represent a valuable tool, but interpreting them requires caution. They assume that components are self-contained modules that can be added and removed independently, which may not be the case in human interaction.
The difficulties in establishing linear links between process and outcome have encouraged interest in more descriptive studies—including qualitative studies—of what has been called the process of outcome or change process research. Researchers have tried to study sessions or episodes in which it appears that change is occurring and to describe what they believe to be good therapeutic process.
The events paradigm (Rice and Greenberg, 1984) focuses on the intensive analysis of significant events in psychotherapy—recurring categories of events that have a common structure and are important for change. Brief passages sharing some specified common feature are collected and examined using microanalytic techniques and close attention to the context. Task analysis is a method for studying a particular type of significant event in therapy (a task) and describing the process of change. Rice and Saperia (1984) illustrated task analysis with their description of a problematic reaction point (PRP) as a significant event in therapy. The marker of a PRP is a statement by the patient of finding his or her own behavior as problematic (e.g., ‘I overreacted but I don't know why; it was unlike me’). The therapist's task at a PRP is systematic evocative unfolding (Rice, 1974). The therapist directs the patient to reenter the scene of the original stimulus situation vividly and to explore his or her own understanding of the situation at the time of the problematic reaction. The therapist tries to get the patient to focus on either the stimulus or their inner reaction, but not on both at the same time. According to Rice and Saperia (1984), a marker of a PRP followed by therapist use of systematic evocative unfolding led to a resolution more frequently than if the therapist responded with empathic caring.
Qualitative approaches bearing such names as discourse analysis (Madill and Barkham, 1997), grounded theory (Rennie et al., 1988), consensual qualitative research (Hill et al., 1997), and assimilation analysis (Stiles and Angus, 2001), have offered a nonlinear approach that seeks to describe the therapeutic more discursively and thoroughly. Typically, these approaches study only one or a few cases at a time, but in far more detail than traditional hypothesis-testing process research. The intended yield of such studies is a richly descriptive understanding of particular processes rather than a specific generalizable finding based on a large sample of different cases.
The goal of a qualitative, descriptive study is often to elaborate a theory rather than to test a particular consequence. For example, the goal of task analysis is often explained as the development of a model of psychother-apeutic change. This gives qualitative studies a greater openness to new information, but their conclusions are correspondingly more tentative than those of hypothesis testing research (Stiles, 1993).
Clinical effectiveness is only one dimension in planning psychotherapy services. In addition, services need to meet the criteria of being comprehensive, co-ordinated and user-friendly, safe, and cost-effective (Parry, 1996). Research evidence is at the center of the drive by governments and health strategists in many countries to base practice on robust evidence. Optimally, clinicians would routinely and systematically review the research literature and come to conclusions about best practice. This, of course, is a mammoth task. Fortunately commissioned and individually generated reviews fill the gap. In the UK, the Cochrane database is open to all. The database uses a hierarchy of evidence with RCTs at its pinnacle.
Another source of summary information is to be found in the aptly named What works for whom? (Roth and Fonagy, 1996). Concentrating largely on RCTs, the authors review the evidence for benefit in different diagnostic groups, predominantly Axis I. Each chapter ends with a summary and implications for service delivery and future research. When, as now, RCTs are not fully representative of the range of therapies or types of presentation in clinical practice, it has to be recognized, as already stated, that absence of evidence is not evidence of ineffectiveness. Furthermore as previously noted, there is considerable problem in extrapolating from efficacy studies to clinic practice.
In the USA, there has been a move to favor empirically supported therapies [see Special section of psychotherapy research (1998, Vol. 8., pp. 115–70) for a critique]. This has the advantage of concentrating the minds of therapists, patients and those responsible for paying for a treatment, but it also has a down side. Concentrating on brand names may overemphasize the difference between approaches and risks fossilizing the field when there is still much innovation to come. As the person of the therapist and their allegiance contributes significantly to outcome, it has been, not entirely with tongue in cheek, suggested that we should speak of empirically supported therapists (Wampold, 2001).
Empirical research evidence from RCTs tells us what can be achieved under optimal conditions. The evidence is complementary to clinical judgment. For this reason, we welcome the ‘Guideline’ subtitle to the useful Department of Health report on treatment choice in psychological therapies and counseling (Parry, 2001).
Research (can) tells us what to do: audit tells us if we are doing it right (R. Smith, 1992). Audit is the systematic review of the delivery of health care in order to identify deficiencies so that they can be remedied (Crombie et al., 1993). Audit measures performance against standards. It is part of the process of ensuring that evidence-based practice is delivered in practice. Each audit cycle of observing current practice, setting standards of care, comparing practice with the standards, and implementing change initiates the next pass through the cycle (Fonagy and Higgitt, 1989; Aveline and Watson, 2000).
A new paradigm of practice-based evidence is well established (Margison et al., 2000). Inferences are drawn from naturalistic unselected clinical populations. The samples may be large, particularly when services pool routinely collected data through locally organized practice research networks (PTNs). Typically, the clinic work is with complex cases where therapist competence may be more important then therapy adherence. Here the clinician comes out of the planning and research shadows and is a stakeholder in the form of the service and its delivery. Routine monitoring of outcome is an essential component with performance feedback to the clinicians and the service as a whole. This facilitates quality management by charting the expected and actual course of patients in the service with various conditions. Benchmarks allow one service to compare and review outcomes with other similar services. Several reliable, relevant, and sensitive psychometric systems for routine use have been developed of which one of the most promising is CORE (Evans et al., 2002).
Once an individual's dose-response curve has been determined, predictions can be made about likely outcome (Lueger et al., 2001). This is the patient-focused outcome paradigm. There is good evidence that outcome can be enhanced by signaling to clinicians that the clinical course of a particular patient is problematic. Typically, a traffic-light metaphor is used: red signaling clinically significant deterioration, yellow being a lesser alert, and green indicating that the therapy is on its expected beneficial course. Clinical decision making is enhanced and there is an opportunity for timely corrective action (Kordy et al., 2001; Lambert et al., 2001).
Evidence, audit, and quality management are essential complements to clinical judgment (and supervision) in maintaining good practice. The appendix offers guidance on the critical questions in evaluating research studies.
Neuroscience is making huge strides in understanding how the brain functions. Static models of localized function are being replaced by that of an integrated collaborative whole brain, which reacts plastically to new experience, modeling that new experience through new, ever-changing arrangements of synapse. The importance of pattern recognition and preconscious processing is coming to the fore (Pally, 1997a,b; Gabbard, 2000). The convergence with basic science offers a rich opportunity for collaborative research as the explanations offered by neuroscience come close to the level of observed process in clinical work.
Another process from the opposite end of the spectrum, namely the user-perspective, is also likely to be highly influential in research design and focus. Users will help determine outcome criteria and shape the form of therapies by voicing their experience of what is helpful and what outcomes they particularly value. Self-help therapies are appearing, especially in primary health care.
Naturalistic effectiveness studies will help translate the lessons of efficacy studies into practice. Instead of pure models of therapy, which often feature for pragmatic reasons in RCTs, there is great scope for the evolution and testing of more complex therapy models, spanning both Axis I and II disorders, and resulting in optimal integration and better principles for their eclectic application. This will have implications for training that we predict will emphasize selection based on the personal qualities shown by effective therapists, the best use of the common therapeutic factors, and the application of phase-specific integrated therapies. Stepped care provides an interesting model of repeated review and deployment of different interventions as the patient progresses through a course of health care. The value of these new approaches will need to be tested in a new round of comparative and hermeneutic studies.
Now that there are many established symptom measures, there is a great need to develop useable relationship measures, which address the interpersonal and the interactive intersubjectivity that exists between people and is central to psychotherapy practice (Hobson, 2003). More work needs to be done on the optimal duration, frequency, and techniques for both brief and long-term therapy. Finally, cultural and sociological aspects of psychotherapy need to be investigated to see what is novel and valuable and how approaches may have to be modified to do well in local contexts.
Research is one way of knowing the world. Methods that facilitate precision in application and communication are applied to questions of clinical import; the precision helps colleagues understand what was done, assess its significance, and replicate the study. In short, research is part of discovery. Inevitably, the findings or, even, the process of doing the research raises unexpected questions. Taking new insights forward requires flexibility in attitude and assumptions. The results can benefit clinical practice, especially if the design is practice close and involves clinicians from the outset (Hardy, 1995). Worthwhile research is possible at all levels of complexity of investigation but generally needs team work and funding. The path from clinical insight to ‘laboratory’ studies to clinic is satisfying but long.
The research literature is vast and time is limited. Published work varies in quality and significance. How can the busy clinician sift the wheat from the chaff?
These apply to all studies.
1. What is the study about? What hypotheses are being tested?
2. What is being ‘done’ between whom and whom? Can you understand the context?
(a) type, duration, frequency, and setting of intervention. Adequacy of the intervention. Degree of standardization.
(b) real or quasi-patients, diagnosis (type, homogeneity, comorbidity), severity of disturbance, exclusion and inclusion criteria.
(c) representative exemplars in quantitative research, informative exemplars in qualitative studies.
(d) novice or experienced therapists, degree of competence in and commitment to interventions.
3. Are the change measures convincing?
(a) relevance.
(b) validity.
(c) sensitivity.
(d) reliability.
(e) multiperson perspective and dimension.
(f) multi-time point.
(g) in common usage (allowing comparison with other studies).
4. Is the research ethical?
(a) informed consent.
1. How well has bias been excluded?
(a) randomization.
(b) stratification.
(c) representiveness.
(d) blindness.
(e) independent rators.
(f) practice distortion.
(g) practice bias.
2. Is the study powerful enough to yield significant results? What assumptions for clinically significant effects have been made and do you agree with them?
(a) size of sample and power analysis.
3. Are the results invalidated by attrition? Intention-to-treat numbers should be reported.
4. Are the statistics valid?
1. How permeable is the study, i.e., does it show capacity for understanding to be changed by encounters with observations?
2. Validity of an interpretation is always in relation to some person, and criteria for assessing validity depend on whom that person is, e.g., reader, investigator, research participant. Is this explicit?
3. Has sample size and composition been informed by emerging results, e.g. cases chosen to fill gaps; data gathering continued until new cases appear redundant.
4. Are the methods for gathering and analyzing observations clearly described to the point where you could replicate them?
5. Is permeability enhanced by:
(a) Engagement with material.
(b) Grounding.
(c) Asking ‘what,’ not ‘why’.
6. Can you as reader can make adjustments for differing forestructure in the author, e.g., initial theories, relevant personal experience, preconceptions and biases and assess how well the observations permeate the interpretations?
(b) Explication of social and cultural context, e.g., shared assumptions between investigators and participants, relevant cultural values, data-gathering circumstances, meaning of the research to the participants.
(c) Description of investigators’ internal processes.
7. Is there convergence across several perspectives and types of validity, i.e., triangulation?
8. In making your own assessment of validity, look for:
(a) Coherence.
(b) Uncovering; self-evidence.
(c) Testimonial validity.
(d) Catalytic validity.
1. Is the author's selection of positive findings and interpretation of the results justified by the evidence? Do you agree with them?
2. How representative is the study of your clinical practice (what is being done between whom and whom)?
3. If the results are sufficiently robust, representative, and significant, what are the implications for your practice?
4. What further evidence do you require before changing or confirming practice?
5. What further questions does the study raise?
If you change your practice, how are you going to audit the implementation?