David Hill readings reprinted from The Politics of Schizophrenia: psychiatric oppression in the United States 1983 (University Press of America 1983) with the permission of the author (currently not in print).
Chapter 17
INADEQUATE RELIABILITY: THE DATA
The purpose of the preceding historical analysts has been to enable us to better evaluate our current situation. Specifically we should now be in a better position to consider whether the 'schizophrenia' notion is just one in an apparently endless series of attempts to understand certain rather unsettling modes of thinking and behaving. I ask you to bear in mind, there fore, while reading further, all those previous conceptualizations which have had their day, were rejected, and now seem so simplistic and even, if it were possible to overlook the suffering that almost invariably accompanied them, entertaining. Perhaps we can, there by, minimize our natural tendency to overestimate the virtues of contemporary viewpoints. If history has any predictive validity at all, we can assume that our current ideas in this area are transitory and will eventually meet the same fate as their predecessors.
The remainder of this book represents both a documentation of reasons for abandoning the 'schizophrenia' notion and the premises on which it is based, and an estimation of the likelihood of this occurring within the foreseeable future. The former, Parts 4,5 and 6, will take the form of arguments based on the utility of the construct; the latter, Part 7, by an examination of those forces which either prevent acceptance of the arguments or prevent action being taken to implement the changes which the arguments, if accepted, necessitate.
I will adopt two distinct approaches to the task of estimating utility. The first will take advantage of our new-found expertise in evaluating scientific constructs. The twentieth century has seen the development of methodologies for deciding whether a new construct represents something that actually exists and can be recognized by experts (reliability) and whether it represents what it is thought to represent (validity). The second approach to evaluating utility will document the suffering caused by the application of the schizophrenia' label itself, and by the 'treatments' administered as a result of this diagnosis. These two approaches can be contrasted in other ways. The former is an evaluation according to the scientific standards of those who created, believe in and utilize the construct; the latter employs my own, subjective, set of ethical standards regarding what is beneficent and what is punitive. The difference between the two can also be characterized as the difference between what 'schizophrenia' means to the mental health professions and what it means to the 'patient'.
I begin, then, with a discussion of the research that has assessed the reliability of the 'schizophrenia' diagnosis. Reliability refers, essentially, to the ability of independent observers to recognize the construct or, in this case, to agree on who has 'schizophrenia and who does not. I have previously argued that Kraepelin and Bleuler created nothing more than a conglomerate of behaviors that were anxiety- provoking because of their disregard for social norms and which were particularly aggravating to psychiatrists because they had been unable to make any sense of them. From this premise one could only hypothesize that, because of their extreme heterogeneity and the absence of a sound theoretical base, evidence of reliability would be hard come by. It was not too long, in- deed, before this problem presented itself to researchers. What follows is a documentation of the many ad- missions of inadequate reliability. The fact that most of these admissions came from individuals dedicated to the survival of the construct will be evidenced by their tendency to respond to their findings by at tempting to control for the sources of variance which, in their opinion, accounted for the disappointing results. Despite the fact that the purpose of reliability re search is, supposedly, to determine whether a construct 'exists', it was rare indeed for a researcher to ex press the opinion that such consistently negative findings should result in abandonment of the construct. As we will see, those who did were ignored.
Early Warnings
Concern about the reliability of 'schizophrenia' is hardly a recent phenomenon. We have already seen that Kraepelin and, to an even greater extent, Bleuler, acknowledged the heterogeneity of their creation. Al though the problem did not begin to receive the attention it deserved until the second half of the century, there had been some individuals willing to confront the failure of the construct, and of our entire classification system, to deal with the issues for which it had been designed. In 1922, James May wrote, in his textbook of "Mental Diseases":
No adequate reason for classification of mental disease for any other than statistical purposes has ever been advanced. ... They do not contribute anything of value whatever to our knowledge of symptomatology, diagnosis or treatment. Practically the only point on which the writers of our textbooks agree is that there is no fundamental principle upon which a satisfactory classification can be based.
Such early admissions of failure preceded a growing disillusionment with the Kraepelinian diagnostic system in the thirties and forties. In 1948, Wittman and Sheldon acknowledged "the present-day lack of interest in psychiatric classification and skepticism as to its value" (p.124).
Remembering that classification in general, and the notion of 'schizophrenia' in particular, were necessary for the survival of psychiatry as a science, we realize that such acknowledgments of failure were, if you will excuse the analogy, a bitter pill to swallow. The extreme reluctance of researchers to act on the empirical findings documented below, to abandon their diagnostic labels, is easily understood in this light. Wittman and Sheldon put it succinctly when they point out that to "go so far as to focus on individual human beings:
... appears to be giving up scientific methodology in the fields of psychiatry and abnormal psychology. ... It not only gives up classification but in effect says that these fields can never be developed into a science. For if there is no agreement on classification there can be no discovery of truths of laws. (p. 124)
Nevertheless, they go on to point out "the extreme lack of homogeneity within any given Kraepelinian diagnostic grouping" and to make the following statement.
At present we frequently have cases within a given diagnostic grouping that differs as much as the group as a whole does from some other diagnostic group. (p. 124)
In 1949, Anne Roe, calling for the integration of personality theory and clinical practice, discusses the proliferation of psychological tests designed to assist in the process of diagnosis.
I suggest that much of this research is not only waste of time, and a perpetuation of errors, but is actually preventing advance in the field. There are many reasons why this is true, but one of the most potent is that it involves clinging to a classification which has long since been outlived. I submit that using techniques which are not too precisely validated, if they are validated at all, to place patients in psychiatric categories, the inadequacy of which is admitted by all concerned, is a treadmill procedure guaranteed to keep us moving in circles... The psychiatric taxonomy which psychologists have been constrained to adopt is so inadequate, even for psychiatry, that no patching can fix it up. (p.38)
Like Wittman and Sheldon, she also contributes to our understanding of the response with which such statements, and subsequent empirical findings, were met, with her aside that "columns of figures and involved mathematical procedures have great anxiety reducing potentials" (p.36).
In the same year, Philip Ash was suggesting that the paucity of studies actually examining the reliability of diagnoses was not due to lack of concern but to the fact that the psychiatric profession was unfamiliar with the methodology and statistical procedures necessary for such research. Responding to his own call for psychologists to undertake the task, he found that:
In the case of three participating psychiatrists, total agreement with respect to specific diagnostic categories was obtained in only 20.0 per cent of the cases, while total disagreement occurred in 31.4 per cent of the cases. When considered in pairs, it was found that two psychiatrists agreed more than one-third but less than one-half the time. (pp.273,274)
He concluded that his findings were "generally consistent" with those of the small number of studies that had been conducted at that point.
His call for greater research efforts in this area was soon heeded. Studies of the reliability of 'schizophrenia' and other diagnostic categories began to make regular appearances in the research journals of the mental health professions. Apparently the problem had become too serious to totally ignore, although we shall see that its implications - theoretical and practical - are still being ignored, by many of us, to this day. I will, first, survey what has since become an extensive body of research literature, and then consider the response generated by the consistent finding that psychiatric classification was, as predicted by the commentators mentioned above, sadly lacking in reliability.
Recent Facts
Four major empirical approaches to determining reliability evolved during this period. They can be characterized as: "test-retest," "frequency," "cross- cultural," and "inter-rater." Some studies focussed only on major categories such as "psychosis," "neurosis" and "personality disorder" Others examined the reliability of specific diagnoses such as "schizophrenia" and its various sub-types.
Analysis of the data generated by the U.S. Navy's screening programme during World War II, undertaken by Hunt et al. (1953), is characterized by Zubin (1967) as "one of the most persistent attempts at investigating the consistency of diagnoses." This study generated findings for both general and specific categories. The degrees of test-retest agreement were, respectively, 54.1% and 32.6%. The degree of agreement for total schizophrenia" was 37%. In summarizing the literature on the consistency of diagnoses over time, up to 1967, Zubin characterizes the reliability of specific diagnoses within the functional psychoses as being of "a low order" (1967, p.386)
The second approach to estimating the reliability of diagnostic constructs, "frequency," involves comparison between the distributions of diagnoses of sample drawn from the same population. "If the two distributions differ no more than what would be expected from sampling errors, it may be concluded that the diagnostic process which underlies the classification is reliable" (Zubin, 1967, p.387). This would appear to be the weakest of the four approaches, being most likely to produce false evidence of reliability because it tells us nothing of the actual agreement between independent observers of the same 'patient'. I would certainly question Zubin's assertion that "even, if they differ significantly by statistical tests, but do not differ widely numerically, or, if categories maintain the same rank order in the two samples, some evidence of reliability may be inferred." Few studies of this type have been undertaken. One, conducted by Mehlman in 1952 found differences that were significantly greater than chance in the frequency with which different psychiatrists employed the diagnoses schizophrenia and 'manic-depressive psychosis'. The same finding emerged for both male (n-799) and female (n-597) samples. Mehlman concludes:
Since there is no reason to believe that the medical personnel of Toledo State Hospital are select in regard to medical training, diagnostic ability, or any other essential and relevant attribute from the personnel of other state hospitals, these results would appear to represent an addition to the body of evidence indicating the inadequacy of current nosological practices. (p. 578)
Zubin asserted that the few studies undertaken "do not yield a consistent picture regarding reliability" (p.388). This would appear to be a rather conservative estimate if applied to 'schizophrenia.' In the only two studies which cited specific categories as having been particularly responsible for the discrepancies in the distribution of diagnoses, 'schizophrenia' is the only category cited by both studies.
The cross-cultural approach involves comparisons of diagnosticians from two or more geographical locations. For instance, Kramer (1961), examining the national statistics of admissions to psychiatric hospitals in Britain and the United States, noted a much higher incidence of the 'schizophrenic' diagnosis in the U.S. than in Britain. Subsequent research efforts determined whether this difference was located in the 'patients' or the diagnosticians by having British and American mental health professionals diagnose a single set of 'patients.' It was consistently found (Copeland et al., 1971; Sandifer et al., 1968; Sharpe et al., 1974) that the national discrepancies in diagnosing 'schizophrenia' continued to occur, indicating that "the reported differences in national statistics are based on systematic variances in the diagnostic criteria used by British and American psychiatrists, rather than in differences in the patients being admitted to hospitals" (Sharpe et al. p.235). Copeland et al. (1971) report, for example, the discrepancies between 194 British and 134 U.S. psychiatrists in diagnosing "Mr.F", who was described as:
An unmarried American who had never held a steady job. He had a history of intermittent drug and alcohol abuse, and a brief paralysis of one arm which the patient said his doctor had called hysteria. He had undergone an unsuccessful course of psychotherapy. He also described how he had spent a lot of time lying in bed, drinking wine occasionally and watching television. He complained that his face often swelled and that his appearance changed. He gave graphic and colorful descriptions of his feelings, including his anxiety at the thought of work, and concern over his inability to keep friends. (p. 632)
Among the groups of psychiatrists representing seven different locations in Britain "diagnostic agreement for this tape is high and no significant differences are apparent." Seventy-five percent of all British psychiatrists diagnosed Mr. F as having a "personality disorder (mostly "hysterical personality''). Only 2% made the "schizophrenia" diagnosis. "Schizophrenia", however, was the diagnosis of 692 of the American psychiatrists.
U.S. psychiatrists generally perceive more pathology than do British psychiatrists (Sharpe et al., 1974: Kendell et al., 1971). Furthermore as suggested by Kramer's initial survey, the British tend to make greater use of the "Manic-Depressive" diagnosis than their American counterparts.
While these researchers concur that their findings reflect the inconsistency of diagnostic criteria, and, thereby, indicate, yet again, the poor reliability of their constructs, they are remarkably unwilling to postulate cultural differences to explain the nature of the differences they had uncovered. While it was permissible to wonder if cultural differences impact the incidence of the various 'mental disorders,' it was not acceptable to consider the possibility that cultural differences effect the perceptions of diagnosticians, who--we like to suppose--are relatively free from such influences. Having spent considerable portions of my life in the two cultures in question, I am willing to run the risk of indulging in national stereotypes by offering a tentative explanation of these discrepancies.
The "eccentricity" of the British may not be a total myth and the relative unwillingness of British psychiatrists to perceive unusual behavior as pathological may well reflect a cultural difference in the level of acceptance for norm breaking behavior. The differential tendencies to diagnose "schizophrenia" and "manic depressive psychosis" may reflect differences in the specific types of social norms to which one is expected to adhere in the two countries. Let us take the case of an individual emotionally expressing radical political beliefs in public. It might well be that what would be perceived as most disturbing to British psychiatrists, would be the affective component, since it breaks the British norm that feelings are to be expressed only between intimate friends. On the other hand, the same individual might not be perceived as over-emotional in the United States; instead, unusual ways of thinking might be of more interest to Americans searching for pathology. In the U.S., there appears to be more rigidly defined norms for permissible belief systems, evidenced, for instance, by the pervasive fear of anything resembling communism.
Before returning to the slightly more objective world of reliability research, it should be noted that the unwillingness of these researchers to speculate about their findings in sociological terms may well be related to their failure to consider the social control framework. We shall see, soon, how their response to their disappointing findings remained well within the medical paradigm and their obvious belief in the existence of its constructs. It was the same response as that of researchers finding similarly disappointing results in the first two approaches. We must first, however, discuss the fourth and most pervasively employed of the approaches, that of "inter-rater reliability."
I have already mentioned one early study (Ash, 1949) of inter-rater reliability, and the low levels of agreement revealed therein. There are, in fact, relatively few studies which have applied this approach to real-life situations. (As we shall see, the more typical study is undertaken under artificial conditions designed explicitly to improve the level of reliability). Foulds' (1955) study did employ "the usual diagnostic procedure" except, of course, for the fact that at least two psychiatrists diagnosed each patient following their admission to a psychiatric hospital. The degree of inter-rater agreement was assessed by a "Diagnostic Agreement Scale" which measured the similarity of both the primary diagnosis (e.g. "schizophrenia") and the more specific characteristics (e.g. "without sessional features"). The average agreement was found to be "in the region of 4" on the 0-6 scale. A "4" was obtainable by either identical diagnosis with completely different features, or by "similar" diagnoses (e.g. different sub-types of "schizophrenia") with identical features. Like many researchers in this area, Foulds' commitment to the process of diagnosis and to preserving the extant nomenclature is clear. He attacks, for instance, Roe's previously cited statement that psychiatric classification is outlived with the unconvincing argument that "such classification is still in fact very widely used" (p. 851). He argues that even if the nomenclature is "inadequate" it is not "unsuitable", adding that, in any case, "inadequacy is no reason for abandon in unless something more adequate can be substituted" (p. 852). Furthermore, he critiques those studies that have demonstrated poor reliability using such arguments as the fact that they "involved ordinary workaday diagnoses" and that "categories were used which were not mutually exclusive" (p. 853). In other words, these studies are inadequate because they accurately reflect reality! Despite this clear bias, the strongest claim Foulds feels justified in making for his own findings is that "they did not appear to be too discouraging".
Zubin's review of studies employing the inter-rater approach leads him to the conclusion that:
The degree of overall agreement between different observers with regard to specific diagnoses is too low or individual diagnosis. The overall agreement on general categories of diagnosis, although somewhat higher, still leaves much to be desired. The evidence for low agreement across specific diagnostic categories is all the more surprising since, for the most part, the observers in any one study were usually quite similar in orientation, training and background. (p. 383)