Chapter 4

Rise of Group Ability Testing:

W.W.I; School Tracking; and Early Vocational Guidance (1918-1932).

Paul F. Ballantyne

This chapter covers two overlapping periods in which both the applied subdiscipline of mental and vocational testing, and the academic issue of the supposed hereditary origins of human intelligence became persistent features of the psychological and educational professions.  Prior to World War I, psychometric ability testing was still in its infancy and was largely confined to rough clinical and applied settings.  Binet-style individual mental tests, for instance, had been tentatively used in urban schools as a means of discovering either mentally unfit or gifted elementary or high school children.  No organized program of group testing in either schools or in vocational settings had yet been worked out.  Standardized group ability testing techniques would soon become a wider concern in military, academic, public school, and vocational settings.

Chapter Overview

In the first period (1918-1925), scientific racism was still openly condoned or at least acquiesced to by social scientists; especially by psychologists.  Section one provides the details of the use of group testing techniques during World War I by Robert Yerkes, and others.  During the war, the "Army Alpha and Beta tests" were developed and used to aid both the selection of officers and the assignment of inductees to specific military occupations or units.  Specific attention is paid to the contrast between the practical military function of these tests and their more dubious post-war academic function (as psychometric exemplars for investigating latent intelligence).  The shared additive nature of the underlying assumptions regarding human intellect in both contexts, however, is also noted.

In the second period (1920-1932), the empirical tools of psychology and the very institutions which were teaching those techniques were being openly used as unabashed instruments for American industry.  Section two, therefore, introduces the ongoing political, economic, and higher educational backdrop for the rise of standardized group testing.  The years immediately after the war saw the rise of mental and vocational testing applications in schools and in business.  Psychometric "servants of power," however, did not function free of internal disciplinary strife.  In particular, this was a period of heightened debate between "nativist vs. environmental" interpretations of ability testing data.  Section three and four, therefore, cover the nationwide application of group testing in urban school settings by nativist Lewis M. Terman (and other Stanford University based researchers) but also highlight the less well known work of Bird T. Baldwin (and the Iowa Child Welfare Institute) in rural school settings.

The ostensive motives for the latter, largely "interactionist," account of human intellect and of the similar attempts to describe the "social" nature of vocational workgroups by the Hawthorne industrial plant experiments (in section five) are applauded but also given some critical historical consideration in the light of the social/societal distinction already made in chapters 1-3.  This was a period in which the former ontological concerns of mental testers were being largely replaced with the statistical concerns of test reliability and the professional issues of tests sales and marketability of vocational services in the industrial workplace.  These concerns pervaded both the ontological lessons and very empirical structure of contemporary research and theory.

Section One:

War Committees and Motives for Army Testing (Yerkes vs. Scott)

When the assassination of a minor European noble in 1914 plunged that continent into military conflict, the initial America political reaction was mild.  Even as the European conflict spread, Americans remained essentially isolationist believing this to be a conflict between anachronistic imperial powers with little relevance to U.S. domestic affairs.  In the 1916 presidential election, for instance, both parties pledged to keep the nation out of the war.[43]  Despite the initial proclamations of neutrality, however, the American Anglo-Saxon heritage manifested itself in public opinion which favored the Allies (Great Britain, Canada, France, and imperial Russia) rather than the so-called Central Powers (made up of Germany and Austria-Hungary).

Only when the steady expansion of the economy was threatened by both persistent naval restriction of American overseas travel and by hostile military blockage of international trade routes did the peace-loving president Woodrow Wilson allow America to officially declare war on Germany (Noble, 1971).  By the time U.S. troops actually entered combat, the European nations were largely exhausted from the war effort.  American participation lasted for little more than one year and this meant that the loss of life (as compared to that of other nations) was minimal.  Waiting to enter the war also allowed American industrialists to receive a double benefit.  They gained initially from huge war munitions contracts (employing one-fourth of the civilian work force in war industries) and they would gained later because the U.S. was now guaranteed to become a global creditor nation during the post-war reconstruction period (Baritz, 1960; Urban & Wagoner, 1996). 

Psychologists Prepare for War

Once the Congress accepted Wilson's declaration of war (on April 6, 1917), the incumbent APA president Robert M. Yerkes set out immediately to convince the military of the utility of mental testing for the war effort (Yerkes, 1918).  Of the 24,000,000 men registered by the nation's draft boards, almost 3,000,000 were inducted into the armed forces.  To equip these men was a gigantic problem because the U.S. had on hand supplies for an army of only 500,000 men.  To train these men and assign them to appropriate army units was also a gigantic task.  It was here that psychologists, especially those with pre-war applied personnel management experience, might make a contribution to the war effort (Baritz, 1960).

During the initial APA war planning meeting, however, Yerkes quarreled openly with Walter Dill Scott (the first American professor of Applied psychology) over the proposed goals and research design for wartime testing of recruits.  While Yerkes initially proposed a brief 10 minute individual test intended to select out the so-called "mentally unfit," Scott objected strenuously and advocated a much more practical group testing design to assess the familiarity of recruits with particular aspects of military work (von Mayrhauser, 1989, 1992; Carson, 1993).  This fundamental clash between the self-serving evolutionary elitist psychology (of Yerkes) and the more task-oriented approach (of Scott) was only kept in check by the active administrative intervention of both Walter Bingham (Scott's former employer at the Carnegie Institute of Technology) and E.L. Thorndike (who, while holding the hereditarian views of Yerkes, also defined "intellect" as a complex of diverse capacities).[44]

Scott's personnel assessment form

The main result of the above disagreement was that Scott withdrew from the Yerkes-dominated Committee on the Psychological Examination of Recruits and assumed the head of a specially formed Committee on the Classification of Personnel in the Army (CCPA).  There, he applied his pre-war Rating Scale for Selecting Salesmen with minimal modification as a Rating Scale for Selecting Captains and also constructed proficiency tests for 83 military jobs (see fig 25).

Figure 25 Rating scale for the assessment of subordinate officers. Walter Dill Scott's assessment form for officers (above panel ) was a compromise solution resulting from his disagreement with the Yerkes group. While it includes a practically defined criterion of "intelligence," the form was primarily designed to reflect a prewar military concern; to assess the "character" of the officer (i.e., his ability to lead and discipline his men, the skill and energy with which he fulfilled his duties, his judgment, his appearance, and whether he had any problems with money or drink). Similar assessment and questionnaire forms for other military occupations were worked out after extensive interviews with union, business, and military leaders as to essential requirements of each job. By the end of the war Scott's committee had grown from 20 to over 175 members and his assessment forms had provided an example of the practical worth of psychology in the area of vocational assessment (photo from Schultz & Schultz, 1996).

Not overly concerned with contemporaneous academic debates about intelligence (e.g., general intelligence vs. a complex of diverse capabilities), Scott designed practical assessment devices that would simply "be of use" to his clientele.  As Carson (1993) points out, "character," not intelligence, also loomed large in official army documents regarding the acceptability of recruits.  Army regulations concerning recruit eligibility, for example, were based solely on the following criteria. The candidate had to be: (1) older than 18 and younger than 35; (2) a citizen or intending to become one; (3) able to read, write, and speak English; and (3) not insane, intoxicated, or a convicted felon or deserter (p. 280).  However, by November 11, 1918 -Armistice Day- "intelligence" had become established as a new selection criteria for both entrance and occupational placement in the American military.  The Yerkes committee has been widely credited for this change.

Yerkes Committee and the Vineland Meeting.

After his dispute with Scott, Yerkes organized a famed testing conference  (running from 28 May - 9 June and again from 25 June - 7 July, 1917), held at the Vineland Training School for Feebleminded Girls and Boys, New Jersey.  It was at these meetings that the details of the Army "Alpha and Beta" mental tests were worked out.[45]   Walter Bingham, was an administrative presence at those meetings and he helped convince Yerkes of the urgent need for group testing along the lines of Scott's earlier work for business clientele.  The three other senior figures at the Vineland meeting shared Yerkes' hereditarian motive for legitimizing mental testing technology in the public mind.  Henry Goddard advocated mental testing as a valid measure of both educational fitness and of racial recapitulationism.  Lewis Terman was also a recapitulationist but was more concerned about the appropriate streaming of "exceptional" children into appropriate occupations (see Minton, 1988).  Even Bingham's position, however, was elitist by modern standards.  He held that superior manual skill followed from inferior general intelligence.  Those present also agreed with Yerkes (and disagreed with both Thorndike and Scott) that intelligence was a unitary phenomena that could be administratively summed up into one number or grade for each recruit.  The main issue under consideration therefore, was simply one of working out which technical and statistical devices could be employed so as to best balance so-called psychometric interests with the needs of the Army:

"Within less than two weeks, the committee produced a [technically novel] format for its tests that combined the mass administration of school examinations and the standardization of individual intelligence tests: they found [various ways] to transform the examinee's answers from highly variable, ...and always time-consuming oral or written responses into easily marked choices among fixed alternatives, quickly scorable by clerical workers with the aid of superimposed stencils" (Samelson, 1987, p. 116).

Ironically, amending the design of the testing program to the needs of the Army meant that the test batteries eventually developed by the Yerkes Committee were not ostensively intended to test for general intelligence.  Indeed, Thorndike, himself had come out strongly against the feasibility or need for such detailed testing.  Instead, the Army "Alpha and Beta" tests were marketed as a means to: (1) route out those who could not understand or follow orders (i.e., the mentally unfit); (2) assess the trainability potential of recruits (i.e., which rank a given soldier might be expected to attain); and (3) administratively balance the average intellectual of military units.

As Carson (1993) indicates, the first role was traditionally part of the standard Army medical examination (where lack of understanding was sufficient grounds for discharge).  However, the sheer number of inductees to be processed and the preponderance of immigrant and illiterate recruits made this traditional mode of assessment obsolete because too many men would be sent home.  The second role (selection of officer trainees) had been the duty of Army officers during the basic training of recruits, and this turned out to be one source of Army resistance to mental testing.  That is, many of the officers agreed with Scott that "superior mental ability" by no means guaranteed competent leadership.  The third purely "administrative" role, however, was entirely new and it became a second effective rationale for testing during the war (see fig 26).

Figure 26 Army "intelligence" testing. The above panel shows mental testing on groups of American Army recruits during W.W.I (photo from Engle, 1946). The Alpha test consisted of a printed booklet containing a variety of questions and tasks, all of which could be answered or carried out on the printed page. The recruits have their hands raised so that, when instructed to do so, they will all start the test at the same moment. After each successive question, their hands are again raised to ensure that prescribed time-limits are obeyed. The lower panel shows a prototype for the Army Alpha (called "Examination A") being hand scored by so-called military psychologists (former education and psychology students) at Camp Lee (photo from DuBois, 1970).

The main rationale for acceptance of the group testing procedure was simply that it would rapidly evaluate the relative performance of each man against the performance of the rest of the men (Spring, 1972;  Minton, 1988).  As Yerkes would put it: "Speed counts in a war that cost fifty million dollars per day."  However, the particular interpretation and utility of the test results was an ongoing matter of debate.  Only after the war would the so-called genetic-evolutionary interpretation of the mass testing data bring the ostensibly applied context of Army mental testing back into line with Yerkes' pre-war eugenics position.

Testing details

The eight subscales on the Alpha test involved the ability to read English and were as follows: Oral directions (e.g., "Draw a line from circle 1 to circle 4 that will pass below circle 2 and above circle 3"); arithmetical problems; practical judgment; synonym-antonym; disarranged sentences; number series completion; analogies; and information (e.g., "France is in Europe Asia Africa Australia").  The seven subscales on the Beta test were designed to be taken by recruits who either did not speak or read English.  They did, however, assume that recruits could use pencils to answer the questions.  These subscales included: Maze tasks; cube analysis (i.e., estimation of their number from a series of printed pictures); X-O series (i.e., series completion); digit-symbol substitution; number checking (i.e., same-different judgments); picture completion (i.e., finding the missing item in various drawings); and geometrical construction (i.e., a cardboard puzzle fitting task).  The instructions for the Beta test were given in pantomime from the front of the examination hall (see Brigham, 1923).

Individual examinations were of two types: (a) those involving the "use of English," the Stanford revision of the Binet-Simon scale and the Yerkes point scale; and (b) those "involving no English," consisting of construction puzzles, etc. -the instruction being given by gestures (Brigham, 1923, xxii).  This latter "performance scale" was a "composite" -using tasks originally designed for deaf subjects by Rudolph Pintner & Donald Paterson, 1915; maze tasks used in Australia by Porteus, 1915; and formboard tasks used at Ellis Island by Goddard but developed even earlier by H.A. Knox, and William Healy (Pintner & Paterson, 1923).

Unit assignment, testers, and statistical techniques

The allocation of men to military duties was traditionally the task of the officer class (see White, 1968), but Yerkes persuaded the army to establish a school of military psychology at Fort Oglethorpe, Georgia where "military psychology" staff would now be trained to take over that function.  Most of these trainees were former graduate students in psychology or education (Boring, 1961).  Upon completion of their 2 month training program in mental testing, 100 officers and over 300 enlisted men were dispersed to "Examining Units" in thirty-five army camps where over 1.7 million recruits were hurriedly subjected to group mental tests (Alpha or Beta).  Approximately 83,000 borderline cases were also given additional individual tests.  As the testing program progressed, more than 500 addition clerks were assigned to the staff (Baritz, 1960).[46]

With regard to recording the comparative ranking of test performance, Brigham (1923) reports the following: For categorical "army purposes" the three different scales (Alpha, Beta, Individual) were converted into one "general scale" of letter grades (A, B, C+, C, C-, D, D-, E) to categorize each recruit.  On the assumption that a high test score meant not only ability to learn specialized skills, but resourcefulness and adaptability to changing circumstances the following military occupation assignments were recommended: Those with the lowest letter grade ("E" men) were designated unfit for service and dismissed or recommended for additional individual tests.  The second and third lowest groups (about 15%) of those accepted, were assigned to simple work under careful supervision.  The middle groups (C- and C men) were assigned to regular soldier duties.   Men in the third group from the top (15%) were trained as noncommissioned officers, responsible for overseeing routine work details in their Companies.  Finally, the top two groups (10%) were recommended for Officer Training (Terman, 1918).  These alphabetical categories were used by Yerkes to justify the face validity of the W.W. I tests.  That is, they show the Army that (on average) performance on the test could be used to assign men to military roles.

For within test statistical purposes, Alpha tests were first scored by finding the score on each of the eight tests, adding to get a total, and then converting the total into the letter grade.  Beta tests were similarly scored.  However, recognizing that some tests on the Beta (and performance scales) might be easier that any test in Alpha meant that some adjustment was necessary if these test scores were to be compared.  The army statisticians (including R.S. Woodworth and E.G. Boring) therefore worked out a numerical "combined [Alpha-Beta] point scale" of "theoretical intelligence" (running from 0 to 25 points) so that "one [standardized] measurement instead of three" could be placed into the files of each tested recruit for future statistical analysis (see Brigham, 1923; Klineberg, 1933; Montague, 1945).  This numerical combined score was used by Brigham (1923) to carry out so-called "racial comparisons" of performance (see fig 27).

Figure 27 Categorical and Combined scale analysis of Army test data. The upper left panel shows the categorical distribution of intelligence ratings (A through D-) for those tested as organized by army rank. This distribution was used by Yerkes and others to "show" the value of the testing program in "identifying immediately the groups in which suitable officer material will be found and...those men whose mental inferiority warrants their elimination from regular units in order to prevent the retardation of training" (Seashore, 1923, p.393). The other panels indicate how combined scale statistics were utilized to empirically "support " Brigham's (1923) race hypothesis -that differences of latent intelligence exist between: (1) so-called "native vs. foreign born" drafted Whites (right); and (2) between the overall "White" vs. the "Negro Draft" (photos from Brigham, 1923).

Although all of this data is consistent with an environmental interpretation, the concluding two paragraphs of Brigham's A Study of American intelligence (1923), one of the benchmarks in the history of "scientific racism" speak volumes about the overwhelmingly hereditarian disciplinary interpretation of the World War I testing data:

"According to all evidence available, then, American intelligence is declining, and will proceed with an accelerating rate as the racial admixture becomes more and more extensive. The decline of American intelligence will be more rapid than the decline of the intelligence of European national groups, owing to the presence here of the Negro. These are the plain, if somewhat ugly, facts that our study shows. The deterioration of American intelligence is not inevitable, however, if public action can be aroused to prevent it. There is no reason why legal steps should not be taken which would insure a continuously progressive upward evolution.... The steps that should be taken to preserve or increase our present intellectual capacity must of course be dictated by science and not by political expediency. Immigration should not only be restrictive but highly selective. And the revision of the immigration and naturalization laws will only afford a slight relief from our present difficulty. The really important steps are those looking toward the prevention of the continued propagation of defective strains in the present population. If all immigration were stopped now, the decline of American intelligence would still be inevitable. This is the problem which must be met, and our manner of meeting it will determine the future course of our national life" (Brigham, 1923, p. 210).

Yerkes and comparative data interpretation

Was Brigham out on a limb with such proclamations?  Not really, Robert Yerkes whole-heartedly agreed with him.  Until Yerkes' own ongoing embrace of eugenics is recognized as fundamental to his involvement in any form of comparative research (animal or human; mental or vocational) it is tempting for reviewers to overemphasize his success in demonstrating the continuity and discontinuity of mental evolution.[47]  In any case, Yerkes' support of the post-war hereditarian interpretation of standardized mental testing data was very much less of a "contradiction" than is often claimed within comparative psychology circles.  Yerkes & Yoakum (1920) for instance, declared openly that the army tests were definitely known to measure native intellectual ability.  Similarly, the open endorsement of Carl Brigham's A Study of American Intelligence (1923), and his own laudatory testaments as to the "statistical reliability" of the war data (see Yerkes, 1921, 1932), were completely in line with his prewar views on the similarities between animal and human mentality.  Thus, he would broach no self-contradiction by emphasizing the "vital" importance of Brigham's race hypothesis: "[N]o one of us as a citizen can afford to ignore the menace of race deterioration or the evident relations of immigration to national progress and welfare" (Yerkes in Brigham, 1922).  The fact that the Yerkes account did not gain complete hegemony in this regard is a testament to the ongoing contributions of Bingham, Scott, and Thorndike.[48]

Fortunately, Samelson (1987) touches nicely on the underlying methodological similarities between American animal research (including Yerkes) and the rise of human intelligence testing.  They all subscribed, he suggests, to the American "cult of efficiency," which tended to rule out individuality and creativity within experimental situations.  The very use of pencil and paper multiple choice questions in the Army Alpha, for instance, was designed to rule out creativity or originality of subject answers as error variance.  The identical role had already been served in animal experiments by the Yerkes-based multiple choice testing apparatus, which treated species differences as error variance.  This efficiency, says Samelson, was as "American as the assembly line, and perhaps as alienating" (p. 123; see also Callahan, 1962).[49]

As pointed out in chapter 2 and 3, the search in both the animal and human realms of comparative investigation has traditionally been for some overall abstract (i.e., decontextualized) knowledge of sensory discrimination, habit formation, drive, or general intelligence.  All development in this abstractly general approach is conceived of as continuous and is usually described as biological maturation.  All change is conceived of as quantitative and is often described as horizontal or vertical "growth."   World War I inductees, for instance, had been classified into grades A through E, representing mental ages ranging from nineteen years down to ten years.  Those in category C were deemed rarely capable of finishing a high school course.  Twenty-five percent of the draftees were placed in this category and 45 percent in lower ones.  Yerkes agreed with Terman and Goddard that Americans should be concerned over the estimated "average mental age" of recruits being only 13 years (just above the edge of Goddard's label of "moronity").  But Yerkes himself abruptly pulled out of direct involvement with the collection of mental testing data after 1921 when he engaging in a debate with Terman over the validity of National Intelligence Test norms.

Section Two:

The Postwar Context for Testing: Political unrest, College funding paradox, and Uniform Entrance Exams

This section briefly introduces the wider political, economic, technical, and higher educational context for post-W.W.I ability testing.  Special attention is given to both the contemporary "College funding paradox" (where the interests of modern benefactors and academic ideals collided) and to the related gradual rise of the College Examination Board (which was attempting to establish a system of national "uniform" entrance examinations for higher institutions).

Economic Recession, Political Conflict, and Racial unrest

Initial postwar inflation and labor unrest caused by the cancellation of wartime Defense Department contracts and by the abrupt return of 5,000,000 men to the civilian labor pool, was felt equally by the middle, lower, working classes.[50]  The middle classes saw their pre-war savings (already worn down by wartime restrictions) further reduced by the economic slump of 1921 to 1923.  They began adopting the traditional upper class suspicion of striking workers and of labor unions in general.  The lower classes (particularly farmers who had spent the years 1914-1919 learning ways to increase per acre yields) now experienced successively fewer orders for produce as the European countries started up their own peacetime agricultural programs.  The organized working class (which had made modest gains from the tight wartime labor market by joining up with national unions such as the American Federation of Labor) now experienced a severe backlash.  Business-bankrolled organizations like the American Protective League, the National Security League, and the American Defense Society, which had devoted themselves to "superpatriotism" in wartime, were also ideal vehicles for anti-labor propaganda in peacetime (Cook, 1964, p. 81).

"Determined to counteract the trifling gains that labor had won during the first decade of the twentieth century, employers, led by the National Association of Manufacturers, waged a generally successful war on the closed [union] shop... Employers' associations broke strikes repeatedly and...prevented progressive labor legislation in most States" (Baritz, 1960, p.12).

When the Soviet Russian leaders declared the birth of an International Communist Movement (in 1919), the American business and educational community embarked uponan all out anti-Red purging campaign.[51]  Ironically, those in corporate America who had benefited most from the war economy and even from its aftermath (i.e., a less regulated domestic economy), were initially spared any significant scrutiny.  This economic elite soon sought to consolidate their control over one of the nation's largest peacetime businesses: education (see fig 28).

Figure 28 Stemming the tide of "Educational Bolshevism." The left and right panels show how the red scare was openly mobilized in 1919 as a way to sell "standard" public school textbooks. In the first of these full-page adds which graced the Journal of Education, "Bolshevism" (read any form of Socialism) is portrayed as the subjugation of "intellectuals" by the "jealous" masses. The second add suggests that "intelligent patriotism" must train young Americans to be the "right kind of citizens." Textbooks in history and social studies would now be routinely scritinized for unpatriotic and un-American passages (see FitzGerald, 1979).

While the "Red Scare" eventually subsided, a residual fear of foreigners and of "un-American" doctrines (such as Socialism) in business and educational circles remained throughout the 20s and 30s.  Similarly, racial unrest over ongoing demographic shifts of "Negro" Americans also marked the immediate postwar years.  During and after the war, a great migration of southern Negroes into the northern industrial cities (such as New York, Chicago, Detroit, and Philadelphia) was under way.  Lynching in the South increased and anti-Black sentiment was now widely shared in northern cities which had received large numbers of Negro migrants workers and their school age children.  Black soldiers too, returned to a society that continued to systematically discriminate against them.  When they spoke out, they were quickly labeled as political militants (Daniel, 1980).

The fear and intolerance that was then feeding conservative America's red scare, was also indicated by the growth enjoyed by the Ku Klux Klan during the 1920s.  Klan membership also grew as a result of adding Catholics and Jews to its roster of "un-American" groups (along with Negroes, Mexicans, and Socialists) which needed to be "purged" in the interests of national health.[52]

President Harding's election in 1920 ended the eight-year presidency of Wilson, a progressive Democrat.  The Republicans then controlled the White House for the next twelve years.[53]  Harding's successor Calvin Coolidge, who (as a Senator) had sent the Army out to break up Boston's police strike in 1919, subsequently cut government regulation in many quarters while supporting the development of business through corporate tax reductions.  The former fears of economic collusion and domestic monopoly (i.e., the long promised antitrust laws) were now conspicuously absent.  On the contrary, mass production of consumer goods such as Ford's "a"ffordable model A meant (by mid-decade) further domestic expansion of supporting industries such as steel, glass, and oil.

Technological change and youth culture

Under such cultural and economic circumstances, no progressive grass roots workers revolution in the United States was forthcoming.  What did happen during the "roaring twenties," however, was a successive widening of the geographical and intellectual horizons of the average Anglo-American.  One aspect of the geographical expansion came as the result of the mass production of the automobile.  For one thing, this made busing of urban and then rural school children more viable.  Private automobiles were also increasingly used for leisurely activities.  In the very act of travel, successively more middle-class Americans began to see how the other half lives.  The ongoing tension between small-town America (with its funadmentalist religious values, and fear of foreigners) and big-city America (with their ethnic populations and progressive political machines), however, continued to simmer below the increasingly uniform surface of American society.  It was this very contrast between the two-sidedness of American life that was highlighted in the national media frenzy surrounding the Scopes "Monkey Trial" of 1924 in Dayton, Tennessee.[54]

One of the main homogenizing influences on American culture was the ongoing advances in mass media and entertainment industries.  National networks of "Radio Broadcasters" were beginning to disseminate entertainment, news, and commercial jingles simultaneously to an audience of more than 10 million families within the decade.  Newspaper chains, however, both retained and actually expanded their readership during the 1920s.  Similarly, the talking film industry was booming with the "Big-eight" film companies forming the Motion Picture Producers and Distributors of America (MPPDA) in 1922.  Larger cities now averaged one movie seat for every five to seven people in the population.  In contrast to the predominantly working class audience of the former vaudeville and silent film industries, "talkies" were now capturing an increasing share of the leisure hours of the middle-class audience (May, 1980).

This high level of audience exposure to the movie medium, an influential but informal means of learning, was a perfect example of what William Ogburn (1922) had called "cultural lag" theory.  That theory states that specific modernizations in technology have typically advanced at a pace that exceeded the general ability of the public to adjust their ideals, values, and beliefs.  Hence, a combination of ongoing religious concerns with professional concerns over the rise of movies as a direct challenge to the supremacy of public schools to shape young minds, led the MPPDA to adopt a "voluntary" content code.  This was intended to assure the new middle-class audience that their movie going children would receive a suitably benign vision of the workings of the world (Moley, 1945; Randall, 1965).  But, in fact, it was already too late to turn back the tide of cultural modernity.

During the 1920s, youths, having been increasingly displaced from the labor market, began attending school and college in unprecedented numbers.  As a distinctly modern teenage "Flapper" culture (including carefree forms of dress, jazz age music, and conspicuous consumption) began taking shape, the reactionary middle-class popular press began questioning the "morality" of the motor car, of film, and especially of racial mixing (Spring, 1994, pp. 304-345).

Defenders of the older, more sedate culture did managed to bring in prohibition laws (making the manufacture and sale of alcohol illegal for more than a decade) but the ideals of technological, scientific, and cultural advance were to have an irreversible effect on the psychological life of the nation.  Emphasis on finding ways to measure and improve educational efficiency, ability test reliability, and so-called worker productivity  was simply part of this irreversible tide of modernity.

College Funding Paradox (Spiritual Capitalism vs. Socialist critiques)

 The higher institutions of learning featured a decade earlier in Slosson's Great American Universities (1910) continued, despite the war, to flourish and expand due to the annuities garnered from both their older perpetual endowments grants and from newer funds set up just prior to the war by the aging, though still politically savvy, capitalists such as J.P. Morgan, John D. Rockerfeller; and Andrew Carnegie (Lagemann, 1983).  Slosson's laudatory analysis covered nine endowed universities (Harvard, Columbia, Chicago, Yale, Cornell, Princeton, Pennsylvania, Stanford, and Johns Hopkins), and five State Universities (Michigan, Minnesota, Wisconsin, California, and Illinois).  These fourteen had been named in a 1908 list prepared by the Carnegie Foundation for the Advancement of Teaching which ranked them according to the money annually spent for instruction.

In the wake of World War I, however, the ongoing political contradictions between contemporary national interests and long-term international stability caused some of the more astute domestic observers to begin questioning the intimate fiscal relationship between ongoing "exploitive" Capitalist interests and the nation's higher educational system which was ostensibly intended for the "betterment" of human kind (Rudy, 1991).

Appeal to spiritually augmented capitalism

One common means of dealing with this educational funding paradox was to fall back on the old liberal argument that each student, alumni, professor, and administrator must augment their daily material pursuits with a spiritually guided sensitivity to others.  This approach portrayed the issue at hand as one of spirituality vs. modern material culture.  That portrayal was commonly utilized in both the year end alumni speeches made by university presidents and in numerous NEA delegate speeches during the 1920s and 1930s (Knight, 1951).  The vocational raison d'etre of college and university presidents, of course, was to always smooth relations between the public, the alumni, and the ongoing institutional benefactors.  Appeals to spirituality provided a common ground for all listeners.  In fact, though, college presidents were also keenly aware of where their own immediate loyalties lay.  That is, not with students, or professors; nor with the taxpaying public, but with the Board of Regents (made up of those under the direct employment of the self-same corporate benefactors).

Similarly, the administrative role of the NEA delegate (under the newly established delegate system adopted in 1921), was to attend the annual conference and work out  ways to make their own constituency a more "harmonious" work environment for students, teachers, and administrators.  The spiritual augmentation argument, therefore, was cleverly designed to: (1) individualize the issues of harmony and justice (essentially placing the impetus for change on the concerned individual rather than on the host institution); and (2) to avoid mention of (or skirt around) the uncomfortable societal fact that the vocational existence of those giving the advice depended upon a harmonious relationship with the groups now controlling the institutional purse.

The hypocrisy of early 20th century university presidents in this regard had literally no bounds.  For example, shortly after the stock market crash of 1929, long-time Columbia University president Nicholas Butler had the audacity to deliver the following two-pronged address on an international radio Alumni broadcast.  In the most plausible part of the address, Butler suggests that the "great universities" of the world had "more power than any government, more power than armies and forts and ships," -the power of "organized intelligence."  He cleverly accompanies this proclamation, however, with a pious admonishment of  Alumni and of the American public due to their aptness to confine themselves "too much to the political, economic, and financial aspects of modern life."  "Mind and character," he asserts, are more important.  And similarly, in June of 1933, while conferring more than five thousand academic degrees in the presence of an audience of twenty thousand, Butler further claimed that the American "social order rested upon moral rather than an economic foundation," and that "unless the gain-seeking motives of Americans were subordinated to the ideal of human service, grief and disaster would continue to becloud and distress the world" (Butler; as paraphrased in Knight, 1951, pp. 585-586).[55]

Appeal to Socialism

Another less common (and more controversial) means of enquiry into the practical paradox of educational funding, was to look not to spirituality but toward some form of fiscal Socialism as a possible replacement for the contemporary reliance of educational institutions upon exploitive capitalists.  A prominent exemplar of this approach is the series of works by Upton Sinclair, who between 1921 and 1924, openly addressed the issue of the intellectual price being paid for the American institutional reliance on the economics of greed.[56]

Sinclair's The Brass Check (1920), had already exposed the systematic information management (a.k.a., propaganda) and power brokery implied by the recent consolidation of local daily newspapers into national syndicates.  He also exposed the ownership relationships of the various popular magazines of the day that were now preaching the ethical capitalist lifestyle to formerly untapped segments of the American population.  Popular culture, once diverse and many-sided was becoming increasingly homogenized by a new technological elite with deep pockets (Kaestle, 1988).  The primary message was not only "don't believe what you read," but also to "take action" by way of boycott or open editorial debate.  In 1921, Sinclair began applying the same critical analysis to the nations educational institutions.  Two further controversial books were produced (see fig 29).

Figure 29 Goose Stepping colleges and Gosling producing public schools. The left panel was originally the embossed cover for the second edition of Upton Sinclair's The Goose Step: A study of American education (1922). Sinclair pointed out that the American college system was now covertly run by and for a "bandit crew" of capitalist cronies. The right panel is from Sinclair's sequel The Goslings (1924) dealing with the public grade-school system. He points out that until educators openly take up their political place beside other workers, they would remain as part of the problem.

According to Sinclair (1922), the network of "interlocking directorships" had vested interests in restricting the flow of both capital and information down to the nation's taxpaying public.  His clearly socialist account created an administrative uproar by indicating, in detail, whom at each of the major universities were routinely censoring novel or critical thought in their students and faculty members.  That is, even though the American Association of University Professors had codified the idea of "academic freedom" by setting down procedural safeguards to protect individual faculty (Metzger, 1955), these college and university rights were subject to infringement by external lay Boards of Trustees because they were not legislated by civil law.

For Sinclair, professors who remained complacently within the bound of this "empire of dullness," routinely allow themselves to be "ruled" by reactionary "department store" men (i.e., administrative reformers) and typically produce students who remain "pitiful victims of blind instinct" [hence the reference to "geese"] (1923, p. 62).  In his final "Call to Action," Sinclair (1924) urged both teachers and professors to "stand up and assert themselves."  That is, contrary to all of their professional training (and contrary to the explicit goals of the NEA), they must "let [workplace] harmony come when educational institutions are controlled by educators, and not by the owners of stocks and bonds and other symbols of parasitism" (p. 441).

Sinclair also warned Americans that if the Capitalist exploits of the nation's interlocking directorships was allowed to continue unchecked by public controls, it would lead inevitably to both global economic collapse and to a Second World War:

"We have seen the collapse come to Russia; as I write this book it is coming to Germany -and before I write many more books we shall see it come to the Central European countries, then to France and Italy, then to England and Japan, and -last of all, perhaps, but none the less inevitably- to America" (Sinclair, 1924, p. 442).

But by 1925, the American economy was booming again and this time a growing middle-class (including predominantly male high school teachers and university professors) were participating fully in that boom.  Female elementary teachers, however, like the wider labor movement itself, did not benefit significantly from the economic upturn because business interests (both large and small) stepped up their anti-union crusade with the clear support of the state courts and federal regulatory agencies.  The American Teacher's Federation, in turn, lost significant numbers during the 1920s.

College Examination Board and the Psychological Corporation

The creatively named Scott Company for "applied vocational testing,"  founded in Philadelphia, 1919, became the first post-war psychological consulting organization but it was forced to fold due to the economic dip of 1921 and 1922 (Ferguson, 1976; Sokal, 1977).  At this time, however, the more academic centered Psychological Corporation (made up of 18 professors including Scott, Bingham, Hall, H.L. Hollingworth, McDougall, Watson, Terman, and Yerkes) was organized by James McKeen Cattell as president from 1921 to 1926 (Sokal, 1981b).  The $10,000 starting capital for this non-profit holding company was gained entirely from the founding subscribers as one means of supporting the effort to convince more industrialists of the utility of psychological testing.  This new organization of psychometric psychologists eventually offered up for use a multitude of applied vocational, school, and so-called intelligence, aptitude, or achievement measures (Achilles, 1937).

The efforts of the Psychological Corporation, however, were initially done in parallel to another influential group of testers called the College Examination Board (founded 1901).  Some elaboration on the College Board is necessary because the production and sale of standardized group tests for the public school system by the Psychological Corporation was both a downward extension of the Board's ongoing drive toward establishing uniform college entrance examinations posed a direct (personal and institutional) challenge to the authority of those in charge of this previously established group (including Nicholas Butler).

    The movement for uniform college entrance tests began in 1900 with the formation of the College Entrance Examination Board which held its first annual entrance examinations for "member colleges" in 1901 (see Krug, 1966; Fuess, 1967).  To be a member, a given college simply had to have 50 or more applicants writing one or more of the Board's entrance exams (Valentine, 1987).  Prior to the war, institutional membership in the College Board was primarily limited to five of the big six Eastern colleges (Columbia, Cornell, Pennsylvania, Princeton, and Yale).  Harvard, in spite of Eliot's favor for the board exams, only slowly warmed to the idea of uniform entrance examinations  to became a member in 1917.  But even this cautious drift toward uniform testing was viewed as a bold move because most other colleges still admitted students on the basis of the older "certificate" system (where previously accredited high schools would simply recommend those who they deemed appropriate for college entrance).  By 1925, however, the board would be administering its own examinations to 19,775 candidates in 316 centers throughout the country (Krug, 1966).

Psychology would have a big part in the eventual success of the College Board but not all psychologists were enamored with the Board's initial activities or goals.  For James Mck. Cattell, this was in part due to the fact that the College Examination Board was the brainchild of Columbia president Nicholas M. Butler who was (in Upton Sinclair's terms) a long-standing member of a vast network of interlocking capitalist directorates that autocratically controlled the American college and university system.  During W.W.I, Butler had a direct hand in dismissing Cattell form his Columbia teaching post (without pension).  The ostensive reason was that Cattell had written a "treasonous" letter to his congressman supporting pending legislation to exempt conscientious objectors from combatant service.  In fact, Cattell had been one of the few Columbia professors who had "stood up" publicly against Butler's constant interference in the academic affairs of professors during the pre-war years.  Also, Cattell's pacifist leanings were now a direct affront to the ongoing plans of Butler and his business partners to profit handsomely from their new monopoly on war munitions contracts.  Consistent with its long-standing practice of allowing such harassment, Columbia waited years before dropping the groundless charges of treason and reinstating Cattell's pension (Sinclair, 1922).

It should be recognized, therefore, that the formation of Cattell' Psychological Corporation was a small but conspicuous post-war challenge to the planned educational testing monopoly of Butler and his NEA cronies.  Far from being an instrument of control over either the schools or over industry, the Psychological Corporation was initially dedicated to: (1) establishing standards in testing technology; (2) providing opportunities for (and payment of) psychological testers; and (3) to using the profits of testing for further research.  In the 1920s and 1930s, this organization would, in effect, restrict the purview of Butler's Examination Board to the upper end of the educational ladder and would eventually displace the Board altogether.[57]

Section Three:

School Tracking: Applying the additive definition of intelligence

It was under the auspices of the non-profit (but decidedly middle-class) Psychological Corporation, that the pre-war trend toward standardized educational testing moved successively down the educational ladder.  In the early 1920s, systematic testing trials of Terman's National Intelligence Test, for instance, were carried out in senior and junior high schools and these were aimed specifically at efficiently assigning students into high, middle, or low educational tracks according to their supposedly innate mental abilities.

This section briefly mentions the immediate educational and institutional incentives for using psychometric testing in schools (including the ongoing expansion of so-called comprehensive high schools and the active patronage of the World Book publishing Company). The demonstrative testing programs carried out between 1919-1925 in selected California schools (which provided the prototypes for Terman's national testing program) and the successive debates over the nature of human intelligence (up to 1930) are then detailed.

Comprehensive high schools and the "Industrial School" debate

One of the incentives for the eventual administrative use of standardized ability testing in schools was the outcome of a debate during the 1910s around the issue of vocational training in the schools.  When proponents of so-called industrial training schools (e.g., Edwin G. Cooley) put forward plans for separate school boards run by employers, they were voted down by a vocal coalition of educators and labor leaders who suggested that separate boards would not be in the best interest of the students, the laboring classes, or society (Wrigley, 1982).   The eventual result of the conflict was a widespread support  of so-called comprehensive (combined) high school system.

Comprehensive high schools (which would accommodate academic, commercial, and vocational tracks) allowed educators to maintain ostensible allegiance to the long-standing principle of common schooling while simultaneously permitting the separate curricular tracks that were now thought necessary for learning modern commercial and vocational skills. The comprehensive high school was first championed in the NEA funded report The Cardinal Principles of Secondary Education (1918) carried out under the leadership of Clarence Kingsley, a teacher in a New York City manual training high school.

Instead of the former 19th century aim of creating an elite class of leaders, the principle objectives of a modern high schools were now to be the promotion of: (1) personal health; (2) a command of fundamental processes (reading, writing, and arithmetic); (3) worthy home or vocational membership; (4) informed citizenship; (5) worthy use of leisure; and (6) ethical character.  The drafters of the report were, however, sensitive to the class distinctions which accompanied the multi-curricular tracks of comprehensive school systems.  They sought, therefore, to compensate for this through promoting: (1) cross-curricular activities (e.g., school assemblies, student government, and the newly created "homeroom"); and (2) extracurricular activities such as athletics, newspapers, and school clubs (Krug, 1964).  The curricular differentiation aspect of modern schools, however, was also appealing to ultra-conservative administrators who later made use of both psychometric and economic rationalizations to further extend curricular tracking downward into junior and even elementary school settings.

These ongoing tensions between the democratic and conservative aspects of curricular differentiation are evident in the educational writings of the day.  For example, in popularizing the recent inclusion of industrial courses in high schools, Michael O'shea (1924), also inadvertently portrays them as one way to counterbalance the lack of industrial skills or poor attitudes of immigrants or the urban poor.  He suggests, for instance, that American society is more democratic, more "plastic and mobile" than the prewar European societies in which "it would disturb the social order if the members of the rising generation should be trained differently from the generations that preceded them" (p. 375).  "[Europeans] look backward rather than forward, so they train their children on ancient rather than on modern knowledge." (p. 377).  The comments that O'shea includes along with his illustrative photographs are also indicative of the power and social status differences between even the democratic choice between business and labor tracks within the same comprehensive high school (see fig 30).

Figure 30 Vocational training in a "factory" school. The upper panel shows a high school automobile repair class where lower class and immigrant "pupils are taught to use their hands as well as their heads." O'shea also shows us a typical consolidated school printing shop (lower panel) where other pupils actually produce the "publications and circulars issued by the Schools and the Board of Education." The later adoption of standardized testing to guide such tracking decisions, simply allowed comprehensive schools to lay further claim to their modern democratic role as efficient but essentially inequitable arbiters of opportunity (Photos from O'shea, 1924).

Discipline, school efficiency, and testing

Although comprehensive schools were being advocated as a way of safeguarding the U.S. economy during the period immediately following World War I, it also began to be recognized that two-thirds of those entering secondary schools were still failing to graduate.  In other words, stepped up enforcement of the attendance laws, and widening curriculum choice was simply not enough to ensure success in climbing the newly democratized educational ladder.  But within the context of the 1920s classroom, both breaches of discipline and failure to achieve were personalized.  The former was viewed by administrators (and by parent groups) as a failure of the teacher and the latter was viewed as a failure of the student's innate capacity to live up to typical educational standards (see Rousmaniere, 1994).

In attempting to answer the question regarding of what would make the schooling system more efficient, contemporary academics viewed the high failure rates (especially  in mathematics, science, and language arts) as the result of the personal failings of students, separate and apart from the institutional structure or teaching emphasis of progressive public schooling.  As Cubberley had put it,  "The two requisites for [climbing the educational ladder] were [traditionally] money enough to obtain freedom from work in order to attend, and brains and perseverance enough to retain a place in the classes" (Cubberley, 1947, p. 273; see also Cubberley, 1919).  Since the first (monetary) requisite had now been structurally ameliorated by the comprehensive high school system, attention understandably shifted toward the latter (mental capacity) requisite.  The post-war comprehensive and consolidated school system, therefore, provided a willing market for Terman's new group ability testing technology.

Terman's Bid for Psychometric School Tracking

When approached after the war, educators and school administrators were more than willing to share the responsibility for grouping students (into curricular tracks) with the so-called objective (i.e., impersonal) results of mental tests. There had long been a panoply of administrative plans for sorting students (including the Gary, Platoon, and Cambridge plans) and the new psychometric techniques were simply another means by which such sorting might be brought about.  As Raftery (1988) has pointed out, local school principals (even within L.A.) often had the final say regarding the interpretation and usage of tests and regarding which programs of tracking would be put in place (see fig 31).

Figure 31 Immigrants and special classes. By 1921 there was a panoply of student tracking plans in place but it was the programs that involved the least amount of structural reform to the main body of students that were used most widely. The Above panel shows children "unable to keep abreast" of their classmates "engaged mainly in concrete, manual activities." This segregation approach followed Goddard's earlier Vineland Training School model but selection of children for such classes was usually based solely upon idiosyncratic teacher assessments. While hereditarians such as Goddard and Terman had been arguing for a decade that a systematic application of psychometric measurements should be used for such tracking decisions the very need for segregation was more often viewed by teachers as a temporary historically contingent issue which would change once ongoing restrictive immigration laws and public schooling helped "Americanize" the general population (photos from O'shea, 1921).

According to O'shea, the Platoon plan was explicitly designed to provide both Americanizing activities in the school auditorium and a theory based curriculum to meet the needs of the various school populations.  Each platoon spends a half day in the home room and the other half day in auditorium, gymnasium, library, shop, domestic science rooms, music, art, or special classrooms depending upon which platoon they have been assigned.  The platoons change places in these latter occupations during the middle of the forenoon and afternoons.  This plan was introduced in Detroit and was soon adopted in Pittsburgh, Akron, and Newark.

Institutions taking an active "experimental" role in the assessment and refinement of school tracking (as named O'shea,1924) are as follows:

"The University of Chicago Elementary and High Schools, and the Lincoln and Horance Mann Schools, of Columbia University, and the University of Iowa Observation Schools and Pre-School Laboratories have become great working laboratories for educators and psychologists...There are also schools at the universities of California, Illinois, Indiana, Minnesota, Missouri, Nebraska, North Dakota, Utah, Wyoming, and Wisconsin, and at Bryn Mawr College.... These...are centers of research where the gaps between the theoretical and experimental knowledge of child development and child training are constantly being bridged..." (O'shea, 1924, pp. 21-22).

The omission of Stanford from the list is notable but justified because Terman's research group was carrying out administratively motivated psychometric tests on existing schools rather than following the older Dewey "experimental school" model.  The World War I testing experience, however, had raised Terman's professional status from one of  disciplinary marginality to national visibility by acquainting him with many of the country's leading psychologists.  His two major works of the period,  The Intelligence of School Children (1919) and Intelligence Tests and School Reorganization (1922), were successive steps toward his ultimate argument that widespread use of intelligence tests and vocational assessment might help school administrators identify ability, prescribe curricula, and predict a student's vocational suitability.

The great hope of psychometrics was that empirical methods might decrease the failure rate for a net "conservation of talent " (Terman, 1922).  Terman (1919) argued that Ayres (1909) had misidentified the cause of retardation by citing physical defects, late starting times, and irregular attendance.  The real cause was to be found in individual differences in mental ability.[58]  His three suggestions for bring about the "proper education of exceptional children" were as follows: (1) have every pupil be given a Binet test in the first half-year of his school life; (2) give all pupils in the fourth grade and beyond a group test (then being developed) each year; and (3) give a Binet test to pupils that score very high or very low in the yearly group examination (Terman, 1919, p. 15).

The National Intelligence Tests, designed for elementary schools, and the Terman Groups Test of Mental Ability, for high school students were both published by the World Book Company in 1920 (see Terman, 1920a; 1920b).  In accordance with this new potential market for standardized testing, Stanford University was already developing a full program to prepare teachers and administrators in testing; offering courses on retardation, the education of "defectives," and the psychology of exceptional children (see fig 32).

Figure 32 World Book adds for Intelligence tests (1919-1922). Teachers were being actively "informed" about the potential of group intelligence testing from 1919 onward. Note the initial description of the Otis test clearly claims that "native mental ability" is being measured. Other tests (including the National Intelligence Test), test norms, correlation tables, and supporting literature were also made conveniently available. The lower panel shows the Chicago office of the World Book Company (built in 1916). This was a branch plant for the editorial offices in Yonkers-on-the-Hudson (N.Y.) from which most of this literature emanated. The buy-lines for the article were: "The house of applied knowledge;" and "Watch us Grow!" (photo from J. of Education, 1916; adds from same journal 1919-1922).

By 1922, Terman already had one of the most popular administrative reformers and the one of the most aggressive publishers in the country on his side.  For instance, both Terman and Cubberley (then the Chair of the Department of Education) were carrying forward expansion plans to provide lucrative professional consulting services to the state school systems.  It is not surprising that Terman soon (correctly) predicted that a new group of mental testers would become a valued professional elite.  As exemplified by the laudatory biographical write-up on Cubberley in the J. of Education (1921), a standard disciplinary journal for school teachers and administrators,  and as exemplified by the ongoing interest in intelligence testing commodities by the World Book Company (and especially by its owner Caspar W. Hodgson, whom had school ties to Stanford), Terman already had established the support  to make such a prediction a relatively safe one.

In his 1922 work, Terman called for a formal multiple-track plan made up of five psychometircally defined groups: "gifted, bright, average, slow, and special."  While the "road for transfer" between tracks must be left open, the abilities measured by the tests was considered largely constant and determined by heredity.  Test scores could also tell us whether a child's native ability corresponds approximately to the median for: (1) the professional class; (2) semi-professional pursuits; (3) skilled workers; (4) semi-skilled workers; or (5) unskilled labor.  When his Stanford Achievement Test was published in 1923, the evaluative fate of school children for the next few decades was sealed.

The original Stanford Achievement battery (1923) consisted of four forms standardized on 345,735 children from 363 school systems in thirty-eight states.  The test, later revised in 1929 and 1940, led all other group achievement measures as the most widely used in the United States.  Terman's APA Presidential Address, in 1923, also played up the importance of testing as an "instrument for further research" into important psychological questions such as mental growth, potential for re-education, individual differences, and of changes to the typical pattern of mental organization (Terman, 1924).

Demonstrative Testing in Urban Californian Schools

Between 1919-1925, Terman's initial prescriptions for the use of tests in both school reform and research were tentatively applied in three communities near Stanford University (Oakland, San Jose, and Palo Alto).  This series of empirical studies achieved the immediate disciplinary goal of highlighting the descriptive (sorting) use of tests in school settings but, for various reasons they entirely missed the mark as ways to investigate the development of typically human intellect.  Consequently, the eventual pragmatically motivated (but essentially top-down) institutional adoption of standardized tests actually generated far more confusion and frustration for teachers than historians have previously realized (Raftery, 1988).  However, knowing something about the narrow test-centered purview of these initial empirical studies can help us better appreciate other contemporary empirical endeavors (see sections four and five) which in my view came "closer to the mark" in revealing the societal genesis of human intellect.

As pointed out by Chapman's Schools as Sorters (1988), the communities in the Stanford studies represented a convenient distribution of urban communities in terms of size, ethnic diversity, and socioeconomic makeup.  In Oakland, Terman's graduate student Virgil Dickson had began trails of testing in 1917 using the Stanford Binet and Otis Group tests and continued until 1921 when a three-track plan of classes was formally adopted.  As reported by Dickson (1923), the vast majority of students (82 percent) were placed in "normal" classroom sections.  Other students were placed in "accelerated" classes (10 percent), "limited" classes (6 percent), "opportunity" classes for those entering the system but needing extra help to catch up (1 percent), and "atypical" classes for students more than three years behind grade level for their age (1 percent).

In San Jose, a Stanford education student named Kimball Young began a study of  "mental differences in immigrants" under the direction of Terman and Cubberley.  Mental tests in this case would provide a new way for administrators to deal with the ongoing problem of immigrant Mexican school children.[59]  Beginning in 1919, Young tested all twelve-year-olds in four schools selected due to their predominance of students of either "Latin" (now called Latino) or Northern European ancestry.  He found that overall, 42 percent of the students were behind their grade level, but that such "retardation" was nearly twice as high in Latino schools as it was in the northern European schools.  Finding that the Latino children are "decidedly inferior" when judged according to teacher assessments, Army intelligence tests (i.e.,  Alpha and Beta), and economic status, he concluded that the difference must be one of lack of mental endowment.  Young then recommended that standardized tests should be put in place throughout the San Jose elementary school system as an efficient means of grouping both the students and whole schools into superior, average, or backward groupings.

     In Palo Alto, immigrants were few but the mere increased number of students continuing onto the high school level meant that the system still had to adjust in order to educate those not headed for college.  In order to avoid a continued rise in failure rates of these students, it was decided to put into place a program of "vocational guidance" at the beginning of the seventh grade and, hence, the assistance of psychologist William B. Proctor (at Stanford) was requested by the school superintendent (himself a former Stanford student).  Proctor first examined 107 students entering Palo Alto High with Stanford-Binet Tests and found that there was a strong relation between subsequent withdrawal from school and "mental capacity" as indicated by test performance.

These results resembled Proctor's earlier research comparing the vocational aspiration of 900 students in eight Bay Area high schools with the so-called "occupational intelligence" levels indicated by their performance on the army Alpha test.  His Education and Vocational Guidance (1925), later argued that the increasingly complex nature of society and its occupational structure necessitates that schools guide children along appropriate paths of self-realization and public service.  Such vocational guidance would direct children toward the professional, business, mechanical, or agricultural occupations according to their "mental, moral, and physical endowment" (p. 17).[60]

Definition and origins of intelligence debated

That the Stanford based mental testing movement was de facto, though not always intent on, supporting the prevailing racial and economic elitism of modern capitalist society must now be obvious.  The post-war era, in particular, was one of political conservatism, immigration restriction (under the federal Immigration Act of 1924), and the stepping up of forced sterilization of lower class American citizens.  Ironically, the very nation built on the anti-aristocratic ideal that "all men are created equal" became the first in modern times to enact and enforce laws providing for eugenic sterilization in the name of purifying the race.   For instance, between 1907 and 1928, twenty-one states enacted eugenic sterilization laws (see Chorover, 1979, p. 42).  Visible minorities and potential immigrants from Eastern Europe (the very groups fingered years earlier by Goddard and by the W.W.I tests as "inferior" in intellect) were restricted from entry.  Although the mental testing movement did not have as great a role in the political decisions of the day as formerly thought, it certainly lent empirical justification to the widespread prejudices of the day (see Snyderman & Herrnstein, 1983; Chapman, 1988).[61]

In a symposium sponsored by the Journal of Educational Psychology (1921), Terman defined human intelligence as the ability to carry on abstract thinking.  He strongly believed that tests such as the Stanford-Binet and the newly developed National Intelligence Tests were valid and reliable measures of mental ability and that intelligence was a relatively constant endowment of given individuals.  Convinced of the essential correctness of Galton's theories on hereditary genius and based on his own ongoing research with gifted children, he concluded that intelligence was greatly controlled by genes and was but little influenced by environmental factors.

By 1924, Terman had confidently predicted that a series of ongoing scientific investigation would fairly well settle the question of how much heredity influences intelligence.[62]Over the next four years, he chaired the National Society for the Study of Education committee on "nature and nurture," a group who's members represented a broad cross section of opinion on the matter:  W.C. Bagley, L.B. Baldwin,  C.C. Brigham,  F.N. Freeman, and R. Pintner.  In the introduction to the report, Terman (1928) admits that even he, had been forced to modify the boldness of his claims because the collected evidence was in no way favorable to his personal bias toward hereditarian interpretation.  It is conceivable that the elusive nature of the problem is such as to preclude for a long time to come, if not forever, a complete and final solution.

While in historical hindsight we can claim that the nature-nurture debate must be surmounted in order to be resolved, this point was certainly not appreciated by contemporary thinkers.  Thus, despite the willingness of experimental schools to study various issues of physical and mental growth of children (as noted above by O'shea, 1924), none of these researchers were equipped with the intellectual tools to surmount the nature-nurture problem.    However, we now turn to an account of the Iowa school of mental testers because they provide an important contemporary environmental counter-argument to the hereditarian interpretations of the Stanford group.

Section Four:

Iowa Studies of Rural Schools

This section highlights the field-studies done by Iowa researchers between 1924-1927; a time when rural one-room schools were rapidly being replaced with consolidated (K-12) school districts.  By assessing the similarities and differences in the intellectual performance of these two groups against both each other and against the previously established city-based test norms of Pintner & Paterson (1917) and Terman (1916, 1919), the Iowa researchers graphically depicted the limits of hereditarian interpretations which were still somewhat popular (at least among academic psychologists).

I will also argue that by using a statewide unit of analysis they came "closer to the mark" of working out the typical pattern of mental development of human intellect than any other North American experimental group had done to date.  As was the case with Pfungst's animal research, however, the argument against latent intelligence (and its additive assumptions) implied in the Iowa group's experimental methodology  remained insufficiently recognized at the time and has remained unaddressed by both modern introductory textbooks and by previous historical reviews of the nature-nurture debate.

The Iowa Station

Although officially created by a legislative act of the Iowa General Assembly to chart the physical growth of children, the role of the Iowa Child Welfare Station in attempting to "settle" the so-called nature vs. nurture debate over testing results should not be understated. The station itself, first became operational in 1919 after being financially endowed by a bereaved Des Moines mother (Cara B. Hillis) who had lost three of her five child to physical illness.  The general aim was initially to provide the area of child study with the same sorts of resources already allotted to the Iowa Agricultural institute and to physical health movements in other states (Sutherland, 1972; Cohen, 1983).  A preschool laboratory was also opened in 1921.

Through the auspices of these two experimental settings, the first director, Bird T. Baldwin (1875-1928) recognized that along with physical measurements of growth, the mental growth of the rural school child (as measured by intelligence and achievement tests) could also be profitably studied.  The preliminary results of the in-station research were published as The Psychology of the Preschool Child (see Baldwin & Stecher, 1925) and their careful inclusion of pictorial material alone provides a marked contrast to the work of Terman's Stanford group.

Iowa Field-studies on Farm children

The Iowa studies we are most concerned with, however, were carried on outside the station.  These "field-studies" were carried out between 1923-1927 and the results were then presented in Baldwin's et al's  Farm Children: An investigation of rural child life in selected areas of Iowa (1930).[63]The initial aims of the research were to: (1) establish an account of the typical mental development of rural school children with the use of both individual and group mental testing technologies;  and (2) contrast those results against the previously collected data from urban school children.  A third further course of study, however, presented itself to the researchers during their initial qualitative background investigations.  This was to test the difference between rural children from local one-room schools and those from the newer consolidated school district system which was rapidly expanding at the time.  In 1924, the State of Iowa had 10,000 one room schools and 388 consolidated schools and there seemed to be observable differences in both the communities and in the typical pattern of mental processes of the children which attended school in those communities.

In my opinion, it is this third seldom covered aspect of the Iowa field-studies that contains the greatest historical significance.  Indeed, in the very act of recognizing its significance lays the first inklings of a means to surmount the nature vs. nurture debate regarding the development of  human intelligence (see fig 33).


Figure 33 Testing One-room school children. The upper panel shows an exterior view of a one-room school in Homeland, IOWA (c. 1925). This was one of the schools in which Baldwin and colleagues carried out systematic testing during the late 1920s. The lower two panels show individual performance task testing (with the Senguin formboard) being carried out in the "relative quiet" of a farm woodshed. A "noted characteristic" of one-room school children was their marked aloofness (clearly shown here). Typically, the children coming from even the poorer quality one-room schools performed equally well on these simple individual tests. However, they showed inferior performance on '"the more complex tasks of mental ability "used by the Iowa group. As the environmentally motivated Iowa researchers also noted, however, the high performing exceptions to this rule (as indicated by certain items on both the Stanford-Binet, Stanford Achievement Tests; and as based on the collected biographical data), had little de facto opportunity for the further development of their early promise (photos from Baldwin, et al., 1930).

What was not recognized fully at the time was that both the Iowa group's environmentalist aim and their terms of descriptive reference (e.g., endowments, aptitudes, traits, etc.) were in no way incommensurable with the fundamentally hereditarian views of the Terman camp.  Thus in their discussion of the "traits of one-room school children" versus the "traits of the consolidated school" crowd, some of their more subtle qualitative and quantitative findings are easily overlooked.  I will try to draw these findings out into the open.  The Iowa group's  fundamental recognition of the historicity of their field-studies, however, is evident in one of their opening statements:

"Children of all ages are everywhere in these rural communities...trudging to the one-room school or riding in the bus to the consolidated school. But of the actual conditions among these children, their endowments, ambitions and lastly their opportunities, very little has been known" (p. 1).

What is clearly distinctive about the Iowa studies is that the unit of analysis used is so much broader than the one being used by the contemporary Stanford studies.  As indicated in their appendix on "proper" methods for field research, this breadth necessitated the careful collection of: (1) the ethnocultural history of the communities being studied; (2) the economic, educational, and social backgrounds of the parents; (3) the immediate educational conditions (of the respective one-room vs. consolidated school systems); and (4) the physical growth and dental status of the children.

By using such a wide, concretely contextualized, unit of analysis it became immediately clear to the researchers "that life in the country is not uniform by any means" and that "the environment of children varies accordingly" (p. 2).  This claim was a direct rejection of Terman's earlier proclamation that the environment of school children living in the same country (except under extreme circumstances) could be assumed to be essentially homogeneous.  The Iowa study, therefore, provides both qualitative (survey-based) and quantitative (test-based) evidence that the historical shift going on was more than a shift in school architecture.  It was a shift in: (1) the broader societal makeup of the communities which children live and learn; and (2) it had allowed a shift in the normal pattern of development of mental abilities of rural consolidated school children.

The Qualitative survey

The Iowa group's qualitative survey took two forms (general and particular).  The general inquiry into the conditions of one-room schools vs. consolidated schools focused on the kind of teaching and structural facilities provided in these school districts.  The one-room schooling systems were openly recognized as inferior in terms of available educational resources.  For example, a list of  books in the libraries of six rural one-room schools were studied to ascertain whether these books were recognized by authorities as suitable for children.[64]

Of the composite list of 470 titles in the six libraries, 24 percent were not found on the authoritative lists.  There were very few books suited to the youngest readers and almost no picture books.  Although the size of libraries varied from less than fifty books to more than one hundred, the number of suitable, readable books was not necessarily greater in the larger libraries.  In one small library of forty-seven books, 73 percent were recognized in children's lists, while in the largest library only 50 percent were suitable for children.

The more particular aspect of the qualitative survey dealt with a specific comparison between the educational environment of two central-eastern Iowa communities: Homeland (which contained one-room school houses serving grades one-eight students) and Cedar Creek which had recently been made into a consolidated school district (and which had earlier maintained a high school).  These two communities also differed on the facilities provided, attitudes toward attendance, and prospects for advancement of children to high school.

It was recognized that the prevalently poor attitude toward school attendance and school improvement in Homeland was imbedded in various cultural aspects of that community.  Firstly this community was predominantly ethnic-German in background and (having come out of the war era in which they had been discriminated against) the Homeland folk tended to be somewhat suspicious of outsiders.  Wartime prejudice against anyone with a Germanic last name, debates over the seditious content of German language textbooks, and the elitist German system of education as a whole (see Cubberley, 1919) may have combined to help form their attitudes toward education.  In any case, the average number of years of school attendance for the adult population (being just over eight years) was noted by the Iowa group as helping to account for their ongoing acceptance of the poor school facilities being provided.

The parents in this community seemed to simply feel that the facilities they already had were good enough.  For instance, the superintendent's reports for the region between 1918 and 1926 were cited as evidence of an active resistance on the part of the local one-room school directors to avail themselves of the advantages of the state school modernization assistance program.[65]  In contrast, Cedar Creek already had a modern consolidated school system and had even formerly maintained a small high school.  This, it was maintained, was consistent with the "Dutch tradition" of respect for schooling.

Similarly, with regard to the availability of high school, in most cases, attendance of Homeland children to high school necessitated a significant amount of travel time.  The costs associated with this made such attendance less appealing to parents.  In Cedar Creek, however, where high school courses had long been available, the consolidated school bus system now promoted easy attendance (p. 64-65).  Among the families investigated in Cedar Creek, 87 percent of the children between the ages of eighteen and twenty-two had completed the eighth grade.  Of those, 56 percent attended high school and 41 percent of them had graduated.  Among those who completed high school, 16 percent had attended or were attending college.

One similarity between the two communities was that long-time residents had built a network of  patronage which depended upon traditional religious and sectarian and class conflicts rather than on individual convictions.  Families long established in the community were scarcely aware of their supremacy over the newcomers, but had established a prestige founded on family wealth and connections.  Indifference rather than intention caused long-time residents to ignore the new arrivals (usually tenant farmers with little financial resources and without influential relatives in the community).  The overall impression of the Iowa group's qualitative survey was that the "opportunities" for mental development presented by school attendance in the two communities under study were to a certain extent dependent upon the economic and cultural assets of those communities (p. 49).  Might this dependency be manifested empirically in scores on standardized intelligence and achievement test?

Empirical measurements

Having collected the above background information, the Iowa group then framed two empirical questions (one general and the other specific).   The most general question was "whether the mental ability of farm children has special characteristics which tend to mold their lives along lines different from children elsewhere" (p. 231).  The more specific empirical question, of course, was whether the two groups of rural school children (one-room vs. consolidated) perform differently on standardized tests (see fig 34).

Figure 34 Testing Consolidated school children. The upper panel shows one of the consolidated schools in Cedar Creek, IOWA where Baldwin and colleagues carried out standardized testing during the late 1920s. The lower panel shows testing with the Wallin Peg Board in the shady front yard of a middle class rural home. When tests were given in these homes, examiners were often asked to wait until the child's play clothes could be changed for more elaborate clothing consisting of white garments trimmed with hand embroidered or crocheted lace. An important finding was that rural consolidated school children tended to perform equally to city schooled children on most standardized test measures. Hence, the Iowa group's careful contextualization of their empirical data touched on issues of the historicity of higher mental processes in a way that decontextaulized testing never does (photos from Baldwin, et al., 1930).

Using the Detroit Kindergarten Test (Baker & Kaufmann, 1922) composed of thirty parts, in conjunction with the Stanford-Binet, it was possible to establish that substantial differences between rural and city children were found only after age 5 or 6.   In particular, when 103 pairs of children matched in age (differing in no more than fifteen days) were compared, it was found that rural children performed inferior to city children on the following three complex pictorial tasks: (1) selection of one out of a row of four pictures that is not shown in another row of three; (2) selection of two pictures from a group of four that indicate a given season of the year; and (3) tasks of stating "what is wrong" with a pictures.  It was immediately pointed out by Baldwin et al. that rural children who have fewer books in their homes, are (on average) less accustomed to looking at pictures (p. 235).

At the same time, however, there was no significant difference between groups in less complex pictorial tasks such as: selecting one picture out of three that answers a given description, or selecting one thing among three nearest like two others; and (importantly) in selecting a thing not used in the same functional way two others are.  This latter lack of difference is "important" because in 1931-32, A.R. Luria, when testing illiterate (and largely non-pictorial) Uzbekistan peasants, found substantial differences in both simple and complex pictorial task questions.[66]  So, in a way, both Terman and the Iowa group were correct in their assumptions about the relationship between cultural similarity and the pattern of demonstrable mentality.  That is, at least some aspects of test performance can be guaranteed by living in a pictorial and somewhat literate society.  But their mutual correctness can only be seen if we give up the nature-nurture terminology and talk only about assessing the general and specific means by which the higher mental processes are produced from lower forms of mentality (biological, individual, social, and societal).  Vygotsky and Luria (1930) certainly understood this needed to be done.  They actually intentionally carried out such research along that line of enquiry.  The North American research groups, however, did not understand this point -or at least, they could not find an adequate set of terminology to describe that insight.  It can be said too though, the Iowa group came much closer to establishing an appropriate empirical research design to draws out such differences in patterns of mentality into the open.

Section Five:

Mental aptitudes, Vocational guidance, and Social relations

Between 1924 and 1932, an ostensibly administrative and statistically guided approach to ability testing began replacing the former clinical, individually guided forms of testing.  In this section, administrative or market-oriented (rather than ontological) concerns are shown to pervaded the very structure and interpretation of: (1) psychometrically designed college entrance exams (e.g., the SAT); (2) vocational guidance courses or tests (e.g., the Strong Vocational Interest Blank); and even (3) contemporary groundbreaking "socially" contextualized vocational experiments (e.g. the Hawthorne studies).  The latter attempt at ecologically valid testing using a wider "social relations" unit of analysis, however, is recognized as an historically important blow to both the individualized rationale of mental testing and to the worker interchangability assumptions of scientific management.

Mental Aptitude and Achievement

A notable outcome of the disciplinary fallout surrounding the nature vs. nurture debate in the first quarter of the 20th century, was the need to find a way to both recognize education and life-experience and to continue mental and vocational testing (originally premised on the explicit assumption of latent intelligence).  By 1930, there was a sharp increase in the use of the terms aptitude and achievement to describe an individual's degree of disposition (learned or otherwise) to acquire proficiency at a new task (see Bingham, 1937).

The term, "achievement" had already been used in educational and agricultural circles from the 1910s onward to describe the progress made by scientific methods of animal husbandry and crop selection on farms.  The intimate relation between such technological advances and the commercial world is reflected in the fact that Journal of Education articles covering the "achievement" of rural farm clubs (or associations) were typically accompanied by ads for the "rotational method" of agricultural production and other techniques of increasing crop yield per acre of land.  The word's new usage in the area of psychometric technology, began around 1920 and allowed mental testers to temporarily leave aside the frustrating search for a measure of innate general intelligence.  In doing so, they were left free to pursue the more modest applied categorization of examinees (see Anastasi, 1984).  One manifestation of this newly pragmatic/administrative endeavor was the encroachment of psychometric measurements into the area of college entrance exams.

The Scholastic Aptitude Test: A psychologization of college entrance

The use of the term "aptitude" conveniently blurred the line on the issue of just what was being measured by a given test.  In doing so, it played a large role in the popularization of so-called "objective" methods of college entrance examinations as contrasted to the older essay-type methods.  These objective methods in turn helped usher in the gradual psychologization of the College Entrance Examination Board itself.

The most significant "aptitude" test produced during the 1920s was the Scholastic Aptitude Test (SAT).  It was produced for the College Entrance Examination Board under the chairmanship of Carl Brigham in 1926.  This was an admissions test administered to high school students wishing to further their education by attending college or university.  The SAT was very similar to the adult-level Stanford-Binet consisting of nine subtests labeled Definitions, Arithmetical Problems, Classification, Artificial Language, Antonyms, Number Series, Analogies, Logical Inference, and Paragraph Reading.

Two other psychologists joined Brigham on the SAT committee: Henry T. Moore from Dartmouth and Robert M. Yerkes from Yale.  Being a multiple choice rather than short answer or essay type examination, the SAT was a clear departure from, or rather a psychologization of,  the past uniform college exams.  Keeping in mind Brigham's eugenic interpretation of  W.W.I mental testing data, Valentine's (1987) comments on the choice of Brigham to chair the committee are highly ironic:

"In Brigham the College Board found a man whose credentials, both personal and professional, were ideal for the task not only of designing a test...but of bringing skeptical headmasters and college presidents around to the idea of accepting the test.  His New England lineage, Princeton affiliation, and manner of speech and dress all helped to make him an accepted member of 'the club' "(p. 34).

The SAT committee put out a carefully worded manual which, among other things, "explained" their reasons for calling it a test of "scholastic aptitude."  While acknowledging the test's kinship with so-called intelligence tests, they noted that whether or not such tests measure native intelligence is "elusive."  The connection between test scores and subsequent academic grades, however, can be "determined" (read demonstrated) empirically.  The term scholastic aptitude therefore, was to refer to nothing more than the "tendency for individual differences in scores to be associated positively with individual differences in subsequent academic attainment" (see Fiske, 1926, pp. 44-64).

In his history of the College Entrance Board, Valentine (1987) attributes part of this rise in popularity of uniform exams to the post-war psychologizing of Board test production methods.  In particular, the first SAT exam was administered on a trial basis on June 23, 1926 to 8,040 of the total 22,000 candidates writing entrance examinations that year.   Of these, 1,257 were applying for Yale, 1,176 for Wellesley, 602 for Vassar, and 536 for Harvard, the others being scattered over a wide area.  Practice booklets containing samples of all the various tests had been sent out to candidates one week prior.  The scoring for this initial trial was carried out by clerks recruited from the undergraduate bodies of Princeton and Columbia, working under the supervision of a staff composed of instructors and graduate students in psychology (Fuess, 1967, pp. 108-109).[67]

Similarly efforts by Columbia psychologist Ben D. Wood (formerly one of the younger W.W.I Army testers under Yerkes; and presently the director of the Columbia's Bureau of Collegiate Educational Research) also contributed to the psychologization of college entrance exams by familiarizing the College Board with the empirical aspects of statistical reliability and so-called test construct validity (see Learned, 1933).  Wood's Columbia based had been running adds in the J. of Education since 1926 and, by 1930, had sold over 3,000,000 (see fig 35).

Figure 35 Selling and Scoring group tests. The upper panel shows the initial add put out by the Columbia Research Bureau for achievement tests in six areas of college study. Ben Wood eventually brought about a clean sweep of the older essay-style college entrance exams with these newer "objective" multiple-choice approach area tests combined with the use of the GRE and SAT exams. The lower panel shows one of the immediate administrative "implications" (read advantages) of this change in testing technology: A change in the procedure used for scoring of such psychologized group test results. The former hoards of highly educated essay "Readers" could now be replaced by the semi-skilled (but more "statistically reliable") groups of female "Clerks" armed with transparent stenciled score keys to grade tests (photo from Engle, 1946).

Between the two world wars, the technology of the mental testing industry was increasingly applied to classroom placement and institutional admission at all levels of the educational ladder.  Although a national intelligence testing movement was first set into motion by both Yerkes and Terman under the auspices of the World Book Company, Yerkes ostensibly pulled out of the so-called intelligence testing area after 1921.  His involvement in the all important SAT (College Examination Board) committee, however, indicates his continued interest in the administrative aspects of ability testing and, perhaps, it also betrays a continued (more covert) intellectual investment in the category of latent intelligence.


As indicated above, the academic disputes of the 1920s centered around the issue of whether or not intelligence (as indicated by mental measurements) was predominantly hereditary or environmental (Eckberg, 1979).  The difference between hereditarianism and environmentalism, however, was typically a difference in degree of theoretical investment in genes and not a difference in kind between the theories.

With this line of reasoning in hand, the most cautious test supporters attempted to sidestep the nature-nurture debate altogether by relying on the ontologically agnostic dictum immortalized by Boring (1923) which states that intelligence (or aptitude) is simply "what the tests test."  Thorndike et al., (1926) in particular, advocated operational assessment of  into four quantitative "dimensions of intellect": Altitude, whose limits were felt to be genetically determined, referred to the complexity or difficulty of tasks that an individual was capable of performing.  Width referred to the number of tasks performed at a given level of difficulty and was considered to be largely dependent on experience.  Area was an arithmetic function of altitude and width, but width of intellect varied with altitude (being greater at higher altitudes).  Finally, speed gave the kind and complexity of performance in a given unit of time.  While thorough description of a person's intellect required assessment of all four dimensions, the speed was considered to be relatively less important when compared with absolute level (altitude) of attained intellect.

Under such an unthreatening (and ontologically agnostic) intellectual climate, those whom had, earlier on, tipped their hand too far in favor of hereditary determinism were, by 1930, beginning to produce qualified retractions.  Brigham (1930), for example, provided a retraction of his 1923 analysis of the army tests.  His article, called "Intelligence tests of immigrant groups," was written in a technical manner and appeared in Psychological Review, a journal unread by the greater public that had already been exposed by the popular press to Brigham's original claims (Mensh & Mensh, 1991, p. 35).  Notably, while Brigham suggests that (due to statistical reasons) the former superstructure of "national and racial" differences, collapses completely, he refers directly only to the so-called white immigrant groups "whose parents speak another tongue" (p. 165) and avoids any direct concrete mention of his past analytical mistreatment of Black American recruits.  Nor does he recant his active support for the ongoing eugenic-sterilization laws (Eckberg, 1979).[68]

In these early days of rough empirical psychometrics, virtually no supporters of the mental testing movement openly addressed what was to become (at least for the later anti-test lobby), the rather obvious implication of the testing tradition:  In treating all test takers as equal in rights and obligations, unequal though they may be in the de facto conditions of a stratified industrial society, the so-called meritocracy of entrance exams, vocational assessment, and intelligence testing batteries were routinely providing a scientific sanction to the very cultural inequalities which examinees brought with them to the examination room (Lemann, 1999).  Some of these more societal issues, of cultural disparity and diversity of past experience would soon assert themselves in the successive attempts to set up a system of vocational guidance for school children and for adult workers.

Vocational guidance in schools (1920s-early 1930s)

Before W.W.I, vocational guidance was a new but important topic of concern for educational administrators.  The inaugural issue of the journal Educational Administration & Supervision (1915), for instance, featured vocational guidance as a central issue.  At this time it was suggested that not only guidance but vocational placement be made part of upcoming public school vocational guidance programs.  The need for such guidance was premised on the assumptions that: (1) some youths were better fitted for certain occupations than others;  and that (2) left to their own devices, they would tend to choose occupations for which they were ill-equipped.  The guidance movement originated, therefore, from a paternalistic concern for those whom were already labeled losers or dropouts (i.e., those who could not function successfully in the traditional classroom setting and who tended to find taking an unskilled job at low pay preferable to staying in public high school).

Original prescriptions for guidance programs included components such as: class work, individual testing, visiting speakers (from different industries), group and individual counseling, job placement, and follow-up (of both high school dropouts and graduates).  Information on occupations was to be typically offered in the eight or ninth grade as part of the Social Studies or so-called "Life-Career" classes, and specialized placement was to be provided in the twelfth grade.  As Kantor (1986) has pointed out, however, this high-minded (assessment-placement) program set up unattainable expectations on the teachers, principles, and superintendents who were expected to oversee them.  While administrative experts and guidance advocates expected counselors to have a good deal of knowledge about the labor market, most counselors in California and elsewhere were merely former teachers or people with statistical training in psychology.  This situation led Virgil Dickson to suggest, in his  Mental Tests and the Classroom Teacher (1923), that schools simply utilize available intelligence testing measures to track students into vocational classes and dispense with the messy task of  placement (Dickson, 1923).[69]

Although the job description of counselors originally stressed vocational guidance duties (i.e., knowing what types of jobs were available, the kinds of skills needed in different occupations, wages and working conditions in local industries, and entry requirements for different jobs), these duties had been considerably revised by the end of the 1920s.  In Los Angeles, for instance, the school system's Bureau of Research and Guidance stated in 1927 that the counselor's "core" activities were: giving intelligence tests and using the scores for ability grouping, giving standardized achievement tests, conducting personal interviews with students and with teachers, adjusting student's programs, maintaining student files, and making class adjustments for student with special problems. Vocationally related tasks such as visiting industries and job placement were listed by the Bureau as supplementary activities (Kantor, 1986, p. 370).

Kantor points out four reasons for this shift toward educational guidance: (1) unrealistically high expectations about the background of counselors; (2) inadequacy of methods for matching youth with jobs; (3) proliferation of courses accompanying the expansion of high school attendance after W.W.I; and (4) the expansion of white collar work (after 1924) in stores and offices (increasing the demand for school based skills ranging from literacy and computation to typing and bookkeeping).  Because of this growing importance of high school and college as an entryway to white collar and professional employment, counselor's advice about school course selection had now become significantly more economically important than their earlier attempts to asses and place students according to their measurable or supposed vocational abilities.

Vocational tests and their use by industry

A combination of changes to demographics, the economy, and the labor pool itself combined in order to allow vocational testing to be taken seriously.  First of all, before the war, a substantial number of "second generation" immigrants (the children of immigrants who entered between 1890-1905) were coming of age.  Due to both labor sponsored public education campaigns and to the overall expansion of industrial activity around the Great Lake regions, these workers expected better treatment in the workplace than their parents had received (see Hogan, 1978; Reese, 1981; Teitelbaum & Reese;1983; Altenbaugh, 1983).

Secondly, between 1915 and 1918 about 5,000,000 men were taken from the wider labor pool and either drafted into the armed forces or put to work in wartime essential civil services.  Their absence (as a whole) caused a further tightening of the industrial labor pool and their return caused an overall glut of labor.  Thus, between 1914 and 1920, it was eventually recognized that a whole new set of industrial management tools and personnel selection techniques were now required to promote "company loyalty" and increase "worker productivity" in the modern industrial age.[70]

Incentive Plans

The typical approach to the problem of worker efficiency and turnover was to apply some variant of a "scientific management" regime (popularized by F.W. Talyor and others from 1907 onward).  Most notably, between 1913-1915, faced with a tightening labor pool due to the expansion of all Detroit industries, the Ford Motor Company attempted to stem employee turnover by implementing a so-called "profit sharing" wage incentive.  The daily income of eligible workers doubled from $2.50 to $5.00.  Fearing this increase might lure workers into inequity of lifestyle, Ford attached certain stipulations to the deportment of workers both inside and outside the plant.

To remain eligible for the bonus, a worker submitted to home visits from company "agents" (accompanied by interpreters).  Workers were forbidden to take in boarders, and were assessed as to: (1)  the cleanliness of their company supplied housing, (2) productive use of leisure time, and (3) any problems with drink or imprudent sexual activity (Levin, 1927).  Employees who did not live up to the company standard of "proper living" were subject to a probationary period (in which they lost their claim to the bonus).  In creation of this company assessment body, called the "Sociological Department," no outside advice was sought.  Absenteeism dropped from 10 percent to 0.5 percent and Ford proclaimed the program as one of the finest cost-cutting moves he had made to date (see also Nevins & Hill, 1954; Baritz, 1960).

Both during the war and in the expanding post-war economy, worker "productivity management" was predominantly carried out by way of wage incentives and/or by so-called time and motion techniques.   When vocational ability testing techniques were utilized to aid managers with their labor problems, industrial psychologists were not typically consulted directly.  Only after the war did employers begin turning seriously to the experimental and psychometric services of personnel experts such as psychologists.

Textbook portrayals of vocational testing

Textbook references to the "potential of vocational testing" were quick to follow War's end.  Frankel & Fleisher's The Human Factor in Industry (1920), for instance, notes the ongoing use of psychological tests for clerks and stenographers by Bell Telephone and the Curtis Publishing Company.  Though the standing of tests in the labor market is noted as "provisional," the ideal of using careful vocational tests is portrayed in a positive light.  The ongoing development of rating scales for Insurance salesmen at the Bureau of Salesmanship (at the Carnegie Institute of Technology) is mentioned in this regard as is the fact that the Metropolitan Life Insurance Company "pays fifty cents" to each applicant for each day devoted to vocational examinations (roughly equivalent to the expense of car fare and lunch).

Vocational tests (whether functioning in a "placement" or "elimination" manner) are portrayed by Frankel & Fleisher as at least potentially useful if used in conjunction with the more traditional physical examinations, reference letter; and/or with more novel worker orientation devices such as the "buddy system" (which pairs new workers with old) and the provision of personalized company or job training "handbooks."  Careful testing "modestly" interpreted, will not only provide more information on given applicants but might also make "obsolete" some of the derogatory functions of past hiring practices: "When the emphasis is placed on the proper placement of the individual [through vocational testing], references from previous employers will become increasingly valuable, and their purpose of the past -to weed out labor agitators and floaters -will become of minor importance" (Frankel & Fleisher, 1920, p. 63).

Vocational Testing Details (sampling, analogy, and statistical)

Most vocational aptitude tests of the day tended to use what Hollingworth (1915) had described as vocational "sampling" or "analogy" in order to develop predictive measurements.  The Minnesota Assembly Test (developed by Paterson, et al., 1930 from an earlier test by J.L. Stenquist, 1923), is the prime example of vocational sampling.  Other vocational guidance tests, however, used a primarily statistical method of correlating personality (likes and dislikes) of given examinee with the averages gained from workers in various professions (see fig 36).

Figure 36 "Objective" Vocational Guidance Tests. The upper panel, shows two adds from School and Society, 1922, which indicate that the W.W. I tension between estimating latent intelligence vs. pragmatic categorization of examinees was brought directly into the 1920s vocational assessment marketplace. Contemporary adds in the J. of Education (not shown) mention Thurstone's vocational tests which use primarily a statistical (rather than situational) method for assessing vocational fitness. The lower panel shows the Minnesota Assembly Test (1930) developed as a situational test which requires the examinee to assemble (as rapidly as possible) a number of common-place articles such as a mousetrap, an electrical plug, and so on. There were two different test forms with appropriate test materials, statistical norms, and predictive validity coefficients supplied by the publisher. Although it was found that "previous experience" with one or more of the test items "falsely" inflated the scores of some examinees, this statistical shortcoming was later eliminated by way of the Purdue Mechanical Assembly Test, an example of vocational analogy which used specially developed "novel" test objects to test their knowledge of seven different mechanical principles (photo from Paterson, et al., 1930; see also Munn, 1946).

Adds for vocational tests were careful to point out that scores on intelligence tests were not the only proper consideration when assessing either the present fitness of an employee for work, or when predicting the potential for a given individual to "profit" from vocational training.  The similarities between intellectual and vocational measurement, however, by far outweigh the differences and both sorts of tests were typically given in conjunction with each other.

For example, in the early 1920s, tests for "measuring clerical aptitude" typically combined a varied set of tasks including: (1) visual number and name-checking performance (e.g., 263849102983 vs. 263349102983; L.T. Piver vs L.T. Piser, etc.) presumably "limited by the efficiency of the physiological structure one possesses;"  (2) scores on certain segments of standardized intelligence tests (e.g., Pressey Senior Classification Test) as a measure of how well one can "understand the meaning of words and symbols" or of how well certain "schoolrooms skills have been mastered; " and (3) ability to literally "handle the tools of the trade" via manual manipulation tests (e.g., Minnesota Manual Dexterity Test; the O'Conner Finger Dexterity and Tweezer Dexterity Tests).  Sometimes, employment test batteries would last for 3 hours on the assumption that those not willing to put up with such batteries of tests would probably not be willing to put in a good days work either (Baritz, 1960).

The typical textbook rationale for the use of both intelligence and vocational aptitude test batteries was also similar.  The proper (and equitable) selection of students or workers would: (1) decrease training costs; and (2) increase the efficiency of education and economy by early diagnosis of educational difficulties or special abilities (Poffenberger, 1927).  The widely shared assumption that mental tests might one day provide a level playing field for job and educational advancement was eventually described as a call for a "meritocracy" (Young, 1958). I n their attention to the individual worker, however, vocational testers had given little recognition to the fact that there were always fellow workers in the factory.  The approach to individual testing for the workplace (i.e., fitting the right individual to the right vocation) also flied in the face of the more accepted claims of "worker interchangability" made by those implementing scientific management regimes. These techniques assumed that control over the physical environment, rest periods, and modernization of tools used were sufficient to increase overall workplace productivity (Hale, 1982).

Limits of psychometric tests

Among the earliest companies to avail themselves of vocational testing services were the American Telephone & Telegraph (AT&T), Boston Elevated Railway, Eastman Kodak, and Collgate.  But between 1922 and 1925, these firms started to discover that psychological tests were not increasing profits or production (Baritz, 1960).  If the basic rationale for vocational testing was incorrect, Applied psychologists would either have to find another way to become useful to management (that is, to somehow increase profits or productivity) or withdraw from industrial settings altogether.  Certainly by the mid-1920s, there were indications the neophyte sub discipline vocational testing would soon perish:

"In 1925 only 4.5 per cent of even the 'best-known' companies used tests; in the next five years the number of companies rose to only 7 per cent. Many firms that had started psychological testing programs discontinued them because they did not work or because the economic and labor relations picture became so favorable that their use seemed unnecessary. In the later years of the decade, only a very few firms continued their efforts to refine and implement psychological testing" (Baritz, 1960, p. 74).

Fortunately for Applied Psychology, an important set of industrial productivity experiments would soon cast doubt on both the individualized rationale of mental testing and the worker interchangability assumption of scientific management.  In their place, the popular account of these experiments would stress a new form of socialized industrial psychology -one that emphasized the economic importance of healthy workplace relationships.  In other words, the social psychology of both the scientific management and vocational testing movements would be exposed as severely limited.

The Hawthorne studies: Vocational experimentation and "social relations"

From 1924-1932, an innovative series of related research studies was carried out by the Western Electric Company at its Hawthorne plant, then a manufacturing division of the American Telephone and Telegraph (AT&T) company (Mayo, 1933, Roethlisberger & Dickson, 1939; Dickson & Roethlisberger, 1966).  The plant was one of the oldest in operation and was located in the West side of Chicago where immigrant homes and factories mingled to produce a gloomy milieu within which to work and live.  Approximately 25,000 of Western Electric's 45,000 workers were employed there when the research project began.

The initial illumination studies were started in the scientific management tradition by C.E. Snow (the head of the Electrical Engineering Department of MIT).  They were conducted in a specially arranged work space where a small group of employees could perform their normal work as the lighting was carefully altered.  A similar group whose lighting facilities remained unchanged were also used as a comparison (i.e., control) group.  The expectation was that some both optimal and minimal light levels for work on electronic equipment would be established through this experimental set-up.  However, what has come to be called the "Hawthorne Effect" was purportedly found in these early studies.  That is, under conditions of observation, worker productivity would steadily increase even under the imposition of less than optimal conditions of work (such as decreased illumination).  For instance, the efficiency of workers in winding small electronic induction coils onto wooden spools was said not to be effected even by illumination so reduced as to be equivalent to moonlight.  Since productivity increased in ostensible defiance of severe environmental adjustments, it was postulated that when human work relations (i.e., supervision and worker camaraderie) were "appropriate," environmental conditions had little negative effect upon worker productivity.  If the company could learn more about these "human relation" aspects of the workplace, they might soon be able to utilize them to increase overall plant production.  Subsequent empirical studies at Hawthorne, therefore, become more social-psychological in design.  The prospects of profits to be gained from control of these human aspects of productivity kept up the Western Electric Company's interest to the tune of over a million dollars and thousands of man-hours (Gillespie, 1991).

In what Harvard psychologist Fritz Roethlisberger described as a latching on process, various industrial and academic authorities were periodically brought into the project according to their technical, psychological, anthropological, and sociological expertise.  For instance, W. Lloyd Wagner, an anthropologist, brought an interest in non-obtrusive observation of workers in the industrial setting which was most valuable in discovering the dynamics of typical informal workgroup controls on the factory floor.   As Baritz (1960) has pointed out though, both the empirical techniques and the interpretive tools used during any given phase of this long research project were always a trade-off between the company's primary interest (to increase production or consolidate control over workers) versus the more varied interests of the researchers and popularizers of the data being collected.  In particular, W.J. Dickson (a officer of Western Electric) was constantly present as one of the technical advisors and eventually helped prepare a detailed though very conservative official research report (see Roethlisberger & Dickson, 1939).  By successively balancing these management-researcher mandates, one of the classic research projects in the history of industrial psychology was brought about in a piece by piece manner.

The research and its subdisciplinary importance

The order and pattern of Hawthorne empirical research itself proceeded through five discernible phases: (1) The initial Illumination studies (1924-1927) were aimed at evaluating the effect of lighting conditions on productivity; (2) the Relay-assembly Room studies (August 1928-March 1929) assessed the effects of pay incentives, rest periods, and active job input on the productivity of five selected woman workers; (3) the Mica-Splitting Test group (October 1928-September 1930) in which a group of pieceworkers were used to corroborate the relative importance of workgroup dynamics vs. pay incentives; (4) the Bank Wiring Observation group (November 1931-May 1932) a covert observational design in which the dynamics of control in a workgroup of 14 male employees on the regular factory floor were observed; and (5) the plantwide Interviewing program (September 1928-early 1931) essentially an attempt by the company to categorize concerns, mitigate grievances, and manipulate employee morale according to the principles of social control learned in the other studies).  The latter two phases were interrupted by the detrimental effects of the great depression on company production orders but the interviewing phase was later reinstated as a "Personnel Counseling" program and even expanded throughout the Western Electric company system between 1936-1955 (Gillespie, 1991).

Elton Mayo (1880-1949) became the most prominent early popularizer of these studies through his book  The Human Problems of an Industrial Civilization (1933) which stressed a "social relations'" interpretation of the data.  But Mayo's account is altogether more programmatic than the guarded comments of Roethlisberger & Dickson (1939), where more of the messy empirical details (good, bad, and ugly) are presented.  Both sources, however, intentionally portray the project as if it were equally benevolent toward worker and company interests.  In the light of historical hindsight, the limitations of the social relations approach, and the motives behind its pro-management portrayal of the project must be mentioned.

Mayo's previous industrial psychology work included reducing turnover in a Philadelphia area textile mill by establishing a system of rest periods for workers (Mayo, 1924).  The Hawthorne study would convert him from this scientific management approach into a so-called "social" analyst.[71]  In 1933, he argued that the work of industrial psychologists prior to the Hawthorne studies was dangerously narrow.  By focusing on technical skills of individual workers, psychological aptitude and employment tests had diverted the attention of both researchers and managers away from a more central requirement of the industrial workplace; the ability to build and maintain cooperative relations with others.  The immediate disciplinary effect Mayo's interpretation of the Hawthorne experiments was, therefore, to extend the purview of industrial testing beyond the previous abstract, individualized (and ailing) "selection and placement of workers" toward the new wider purview of addressing the dynamic and socially contextualized realm of human control through manipulation of workplace relationships (see fig 37).

Figure 37 Workers as part of "social" groupings. The upper panel shows the individual Minnesota Rate of Manipulation the and O'Connor Finger Dexterity tests, both of which were used as part of the routine clerical worker testing regime throughout the 1920s (photo from Valentine, 1937). In contrast, the lower panel, shows a group of women assembling electrical telephone relays in the "Relay Room" situation of the Hawthorne study (photo from Schultz & Schultz, 1996). This set of situational experiments was said to indicate that neither psychometric testing nor strait forward industrial management (of physical environmental, monetary incentive, or motion studies) were sufficient to explain the intricacies of workplace productivity. Group standards and controls (e.g., encouragement, ridicule, and sarcasm) seemed more important in determining an individual's productivity. The related "Bank Wiring Observation" study (not shown) indicated further how informal work-groups routinely manipulated production reports so as to: (1) reflect their own idea of a fair day's work; and (2) gloss over existing variations in the productivity between workers. Hence, not only did workers routinely "regulate" their productivity, the recorded productivity reports of each worker often bore no direct relation to their actual individual capacity to perform.

Criticisms of the "social" unit and the Hawthorne myth

The great irony in the interpretations of Mayo and other Hawthorne researchers was that they were at once both progressive and conservative.  The progressive aspect was that working groups and not individual workers would now be used as a unit of analysis.  The conservative aspect was that the logic of working group dynamics (such as productivity regulation or reporting irregularities) were portrayed as a manifestation of the sentiment of "irrationality" which (due to the fact that industrialization had alienated workers from the former "agrarian community" of caring relationships) pervaded their daily life.  Mayo suggested, therefore, that the capitalist firms might operate in a conflict free (mutually advantageous) manner with their workers if they would simply apply workgroup based vocational guidance principles (e.g., unobtrusive monitoring, job-input channels, and personnel counseling).  That is, if benevolent companies would build up a cooperative community of workers who found meaning and satisfaction in their work, these workers would be more likely to follow the lead of management toward overall increased productivity.  To substantiate his argument, Mayo emphasized evidence from the Relay Assembly Room (shown above) where "voluntary" subjects showed their appreciation of the novel cooperative conditions of supervision and work by  "steadily" increasing their productivity for the company.[72]

Despite the fact that Mayo's portrayal of "wholehearted cooperation" of workers and of the revolutionary management style used by supervisors in the Relay experiment came under sharp attack from industrial sociologists in the 1940s onward (Gilson, 1940, Lynd, 1937; Mills, 1948; Landsberger, 1958; Smith, 1975) it has been Mayo's convenient fictional account regarding the testing circumstances that has been taken up in successive textbooks.

The question which most concerns us however, is: Why did Mayo's interpretations stop short of an historical-societal analysis?  Baritz (1960), has suggested that the survival of the applied psychology subdiscipline (which depended directly upon pleasing their industrial benefactors) was hanging in the balance.  Thus, management-friendly designs and interpretations of research were forthcoming.  From a purely marketability standpoint, the social relations philosophy was certainly appear favorable for the subdiscipline of applied psychology.  That is, if ongoing funding for an intermediary group of vocational psychologists to promote a flexible and "cooperative" system of industrial relations (between benevolent managers and an appreciative labor force) could be found, then the ongoing expansion of economically disruptive organized labor unions might be halted in those industries using this new technology.  Traditional techniques of union busting (i.e., hiring informants or Pinkerton agencies, and calling out the National Guard for strikebreaking duty) could still be used but were certainly less conducive to marketing the company as a benevolent benefactor of the working class.

Bramel & Friend (1981) concur as to the conservative middle class bias of the Hawthorne researchers but go still further by exposing the consistently pro-management misrepresentation of the conditions of research as voluntary and cooperative at all times.  Both these points are important because much of the later vocational testing or training tradition took up this social relations unit of analysis -including its implied double standard of mentality- with workers portrayed as irrational, unorganized, and easily fooled, and management portrayed as logical, organized, and benevolent.

Bramel & Friend (1981) have compiled the available published evidence of ongoing worker resistance within the Hawthorne studies to show that Mayo and others successively "suppressed," or  "explained away" the resistance of  workers during the crucial Assembly room and Mica-splitting test room experiments.

The detailed Roethlisberger & Dickson (1939) account also reveals that the supervision of workers in the Relay room was not always friendly and permissive as Mayo claimed it to be.  The log accounts show that when workers were diligent, the supervisor was free and easy but when productivity fell off; the supervisory style shifted back to the standard pattern including: reprimands; sending workers up to the front office; and verbal threats -to end the experiment, to extend a disliked experimental condition, or even to dismiss test subjects from the company (Bramel & Friend, p. 870).

After eight months (in Period 8), operators 1A and 2A were indeed dismissed from the experiment for "gross insubordination" and low output (Roethlisberger & Dickson, 1939; Whitehead, 1938, Vol. 1, p. 118).  Mayo characteristically downplayed this fact by claiming that operator 1A and 2A had "dropped out" (1933, p. 56);  or had been "permitted to withdraw" (p. 110); or had "retired" (Mayo, 1945, p. 62).   He also made the incredible claim that:  "At no time in the five-year period did the girls feel that they were working under pressure" (1933, p. 69).  But three months after the dismissals, in a letter to his benefactor (executive Beardsley Ruml) at the Laura Spelman Rockerfeller Memorial, Mayo put things in a different light.  Here, he says plainly that:  "One girl, formerly in the test group, was reported to have 'gone Bolshevik' and had been dropped" (cited in Mulherin, 1980, p. 54).  In public, however,  Mayo (1933) further explained this away claiming that operator 2A had been "anemic at the time" and had later recanted her statements against the company (p. 110).

Some of the context for Operator 2A's paranoia, however, is obtained from an interview with her four months after she was replaced by a new and more enthusiastic worker.  She claims that comments from girls in the regular assembly department had made her believe that the company did not in fact care about her or the other girls but were merely after increased productivity (Roethlisberger & Dickson, p. 170).  Thus, what Mayo called "paranoia" might have been simply one manifestation of widespread suspicion of the company's motives in the relay department.

Anti-unionism, interviews, and the "Counseling" program

Convinced of the benevolent motives of the company, the Hawthorne researchers broached no self-contradiction in ignoring the fact that "regulation" of work was often the one and only means by which modern industrial workers had to defend themselves against arbitrary speed ups or ensuing layoffs.  In the Bank Wiring Room phase, for instance, the reported departmental output curves were "devoid of individuality" and this, it was suggested, was irrational because the Western Electric company had no history of layoffs due to individual differences in productivity (Baritz, pp. 92-94).  Rather, since the typical pattern of workgroup control opposed those of management, the fears of workers were either downplayed or dismissed as pathological.   Yet in historical hindsight, we must ask: How can the worker's motives of mutual (rather than self) protection be judged as irrational or pathological if the very phase of the Hawthorne studies within which they occurred was itself cut short due to a lack of company production orders (resulting from the beginning of the great depression)?   Surely, a less denigrating interpretation is possible: The carefully doctored production reports might be viewed as perspicuous and historically contextualized action of a new group of culturally savvy second generation workers who were already wise to the tactics of scientific management and to the tricks of company psychologists.

But such a generous interpretation was not open to Mayo because it ran counter to his views of organized labor.  In particular, Mayo dealt with the specter of organized labor by essentially leaving it out of his account (Gillespie, 1991).  Unions were viewed as necessarily disruptive to the goals of management and were therefore counterproductive to his own prescribed ideal: a harmonious relationship between benevolent management and a cooperative docile work force.  Marxism posits a relatively irreconcilable conflict of interest between workers and owner-managers.

Similarly, for hardline Communists, capitalist relations of production are inherently exploitive and necessarily produce resistance and self-organization among workers.  This resistance may or may not, however, be expressed as class consciousness and organized political action but are always present in one form or another (Bramel & Friend, 1981, p. 868).  But, even this point belies the fact  very few American unionists at that time (or any other time since) were hardline Marxists or Communists.  For instance, those who voted for the socialist ticket during the first two decades of the century were simply normal American workers or clergymen who had become progressively more fed up with seeing working class families getting an uneven share of company profits (see Klehr, 1984; Reese, 1981).[73]

Early 20th century immigrant workers were not simply hapless victims of capitalism; nor were they docile beneficiaries of benevolent corporate democracy.  Instead, they represented an historically situated heterogeneous group of subcultures actively selecting from (and producing their own conditions for meaningful appropriation of) the larger surrounding societal milieu of industrialization and urban growth (see Ogburn, 1922; Leontyev, 1981).  They were not passively assimilated into a mythical standardized American melting pot.  Rather, they used the limited opportunities presented by the industrial workplace to create the conditions for the further development of their own subcultures.  These conditions included such features as parochial schools, organized labor unions, and even labor colleges (Altenbaugh, 1983; Bernstein, 1960; Susman, 1983).

The General Electric workers in the Hawthorne study did not mindlessly react to industrial management techniques nor (as is often claimed) did they simply adjust  or adapt to the less than subtle contingencies of social relations experiments.  They recognized these contingencies for what they were: New company produced attempts to increase overall production and control union sentiment.  Transplanted peasant culture which emphasized family collectivism, religious devotion, and attitudes toward the value of work and formal education likely affected workplace relations as much as any company-based program (Hogan, 1978).  Contemplation of changes in the wider labor market most certainly affected the form, participation in, and the results of ongoing General Electric training programs (Nelson-Rowe, 1991; Gillespie, 1991).

Mayo, in contrast, was a Harvard Business school professor and a paid consultant in an industry that was demonstrably anti-union.  Hence, as argued by Baritz, both the so-called voluntary program of worker interviews (1929-32) which followed the experimental phases of the Hawthorne research project, and the later more active system of "personnel counseling" (1936-1955) can be viewed as an implicit program to cut off the development of class consciousness in the workers of General Electric company.   In particular, management hoped that through a process of vocational interviews, their typical labor problems (including: industrial grievances, tensions, absenteeism, turnover, low production, and militant unionism) might be diminished.

The job of such "servants of power"  was clearly to apply the listening techniques of the "Catholic confessional" and the "psychiatric couch" (Baritz, p.116) but decidedly not to interfere with matters of company policy.  In other words, the typical management perspective of  "adjusting people to situations" (rather than addressing or alleviating those situations from the workplace) was taken as a matter of course.  Under the company's Industrial Research Division (founded 1929) voluntary interviewing took place in which 40,000 complaints were collected.  Apparently,  however, the workers were not as "irrational" about the actual context of voluntary interviews as Mayo and others tended to believe because in the initial 20,000 voluntary interviews not a single employee is reported to have mentioned unions.[74]

In the case of the Hawthorne studies, we have the same pattern of deliberate (or at least convenient) overstatement, misinterpretation, and misrepresentation that characterized the increasingly unpopular mental testing and scientific management traditions.  Whereas scientific management decorticated workers, and mental testing individualized workers, social relations research portrayed workers as ruled by a sentiment of interpersonal irrationality.  This was, of course, an improvement; but not much of an improvement.  Facing the existing contradictions of economic power relations between workers and employers in the Hawthorne plant could not be accomplished by merely individual or even by social analysis (i.e., by adjustment of workers to conditions).  They could only be subsumed into a wider understanding of the shifting societal-historical context of management-union relations.  But that is the very definition of class consciousness which the whole system of interviews and personnel counseling was aimed at avoiding.


Between 1918-1932 both standardized group ability testing and counseling programs were introduced and assessed in military, academic, public school, and vocational settings.  Mayo's approach, (like that of Yerkes and Terman), was at least partially an exercise in public relations.  It was also a manifestation of the power of organized academic-based psychological knowledge to be applied and marketed in the same way as any other intellectual or material commodity.  By the start of the 1930s, then, industrial psychologists were busy trading in their former concerns (over the germ plasm or individual aptitude) for the newer sociology of management (including tests of achievement and promotion of workplace social relations).  This was a trend that would become still more pronounced during the great depression, W.W.II, and the Cold War eras when the federal and state governments became successively more involved in the areas of job creation, vocational guidance, and employment testing.



[43] The incumbent president Woodrow Wilson, a Democrat, was re-elected to a second term.  For Wilson, the fundamental American social structure for the new century should be that of a middle class of free (and informed) citizens without an aristocracy, peasantry, or proletariat in a European sense.  By 1914, the reform (anti-monopoly) legislation passed under his first administration was already in place and was designed to allow both regulated competition within the nation and stepped-up expansion into the vital international market-place.  Defining both sides of the military conflict as typical of the European tradition of moral corruption, Wilson advised Americans to hold themselves aloof until the wicked belligerents had so destroyed themselves that the U.S. could go to Europe and create "an association of nations, all bound together for the protection of the integrity of each" (Quoted in Noble, 1971).  Raised in the South, the son of a Presbyterian minister, Wilson had all the confidence of a Calvinist that he had been elected by God to fulfill a mission of national and world salvation.

[44]Scott (1869-1955) was born on a farm near the town of Normal, Illinois. To earn his college tuition, he picked and canned blackberries, salvaged scrap metal, and took odd jobs.  At age 19 he enrolled at Illinois State Normal University and then won a scholarship to Northwestern University in Evanston, Illinois (where he took tutoring jobs to make ends meet). After three additional years at the Chicago theological seminary he then studied with Wundt at Leipzig receiving a PhD in psychology.  He joined the faculty at Northwestern as an instructor of psychology and pedagogy and in 1902 was approached by an advertising executive who wondered if psychological principles might help move merchandise.  After Scott's The Theory and Practice of Advertising (1903) and a number of magazine articles on the topic, his contacts in the business community widened.  He then turned his attention to the topic of personnel selection for the American Tobacco Company and by 1916 had been appointed professor of Applied Psychology and director of the Bureau of Salesmanship Research at Carnegie Technical University in Pittsburgh (Baritz, 1960; Schultz & Schultz, 1996).

[45] The core committee consisted of Yerkes, W. Bingham, Terman, G.M. Whipple, T.M. Haines, F.L. Wells, and E.A. Doll, but it was the multiple choice testing methods developed by Terman's current doctoral student Arthur S. Otis that served as the basis for the Army group test.  After carrying out his Army tester duties, Otis received his PhD in 1920 and later worked as a consultant on tests for the World Book Company.

[46]See chapters 2 and 3 for the Eugenics content of their pre-war university training.  Boring, (1961) provides a quaint photograph of these trainees.

[47]For instance, by overlooking the fact that Yerkes' conflation of mental and cultural evolution was predicated upon an underlying continuity view of mentality, Reed (1987b) is left wondering why Yerkes would support both the human mental testing and the empirical investigation of the qualitative aspects of animal mentality: "The man who tolerated and rationalized questionable methods and interpretations in the Army testing program -and who participated in harsh judgments about the native capacity of human beings- was in other situations a more self-conscious methodologist and an eager defender of the 'intelligence' of other creatures" (1987b, p. 88).

[48] Thorndike's (aristocratic though anti-hereditarian) contributions to educational psychology have already been mentioned (see chapter 3). Similarly, Bingham's practical emphasis on testing provided a counter-weight to both Carl Brigham and Yerkes.  Walter Bingham had received his PhD in psychology at Chicago in 1908.  He then founded the Division of Applied Psychology at the Carnegie Institute in 1915 employing: (1) Scott and L.L. Thurstone (then a University of Chicago graduate student) prior to W.W. I, and (2) E.K. Strong (then training Life Insurance salesmen) after the war.  Bingham, a founding member of the Psychological Corporation (1921), continued to publish in the area of vocational "aptitude" testing (e.g., Bingham, 1926; 1937; 1948) and also designed the Army trade tests during W.W.II.  Scott's later pursuits are outlined below.  Ironically, Carl Brigham would later renounce explicitly his race hypothesis and would significantly influence the form of post-W.W.II College Entrance Examinations (Downey, 1961, also see below).

[49] It is worth noting too that Yerkes, at least later in life, was sensitive to this issue.  The final paper in his retrospective book, Chimpanzees: A Laboratory Colony (1943b), explicitly rejects the claim that his comparative animal psychology was spawned from eugenic beliefs (see also Yerkes, 1943a).  My point, however, is simply that the methodological overlap between his experimental treatment of animal and human mentality must not be overlooked.

[50]Healthy doughboys were welcomed back with a separation payment of only $60 and a railroad ticket home to help their transition to civilian life. The only ones among their contingent granted education benefits were the disabled who, under the Rehabilitation Law of 1919, received tuition, fees, books, and a subsistence allowance of $90-145 a month for rehabilitative and professional training.  The consequences for healthy vets were nearly catastrophic.  Many of the returning men found themselves in the bread line, in part because they lacked sufficient training and skills to compete for hard-to-come-by employment.

[51] The two competing branches of the New York Communist movement, for instance, were now under attack by U.S. Attorney General A.M. Palmer and State Senator Clayton Lusk.  Poorly orchestrated nation-wide raids by the Department of Justice (including young J.Edgar Hoover) and the unlawful detainment of thousands of innocent Russian immigrants, in 1919, were then followed by indictments, and jail sentences for a very few certified communist revolutionaries (Cook, 1964). The movement itself was forced underground until September 1922 when it re-appeared as the legally constituted entity, the "Workers Party" (see Draper, 1957; Iversen, 1959).  Meanwhile, 3, 600 labor strikes involving over 4 million workers had now spread across the country from Seattle to Boston (Murray, 1955).

[52] In the State of Oregon, the Klan helped pass a law that mandated attendance in public schools only. This was a direct affront to the long tradition of parochial Catholic schools in the state. The Supreme Court eventually invalidated that law in the famous Pierce v. Society of Sisters of the Holy Name decision, 1925 (see Tucker, 1991).

[53] The years of Republican presidencies were as follows: Warren Harding (1921-1923), Calvin Coolidge (1923-1929), and Herbert Hoover (1929-1932). 

[54] John Scopes, a local high school biology teacher was charged (and convicted) with violating the Tennessee law prohibiting the teaching of evolution or denying the doctrine of Divine Creation (see Ginger, 1958; P. Johnson, 1991).

[55] More detail on Butler's war-time pursuits and his importance to the history ability testing will be provided later in this chapter.

[56] Back in 1912, the Socialist Party had pulled in 897,000 votes for its presidential candidate, Eugene Debs, and in 1914 it had thirty members in the legislatures of twelve states and some 1,000 members in various municipal offices.  From the outset of the European war, the party had opposed it as a crime against humanity.  Under the loosely drafted Espionage Act of 1918, however, any whisper of disagreement was made a criminal offense.  The Socialist Party in America was broken and never retrieved its stature as a serious political force (Cook, 1964, p.79).

[57]More specifically, psychologists would play an increasing part in the formation of Board tests and by 1948, they would eventually displace the old-guard administrative educational reformers and allow the whole board to be subsumed into the Princeton based Educational Testing Service (see Bowles, 1967).

[58] "The fact is that, apart from minor fluctuations due to temporary factors...the feebleminded remain feebleminded, the dull remain dull, the average remain average, and the superior remain superior. There is nothing in one's equipment, with the exception of character, which rivals the IQ in importance" (Terman, 1919, p. 10).

[59]Discouraged by the chaos of their most recent revolution and encouraged by the expanding Californian agricultural interests, migrant Mexicans workers were now flocking into the U.S. to seek jobs in seasonal agricultural industries.  They were also now settling into more urban areas with their families during the off season (see Raftery, 1988 for the L.A. case; Spring, 1994).

[60]Chapman (1988) provides further details on all three of these studies.

[61]The hereditarian interpretation of  Army test data however did provoked a sharp criticism in the New Republic, a journal of progressive liberal opinion, by journalist Walter Lippman, but by an large most agreed with their applicability for curricular sorting in school settings (see Wallace, 1991).

[62]William C. Bagley (a Columbia Teachers College professor) in Determinism in Education (1925), attacked the available group school tests as unduly restrictive of educational opportunity.  For him,  the tests were appropriate for diagnosing the readiness level of individual children but were inappropriate when used to restrict the educational services offered to any child (see also Bagley, 1934).

[63]Baldwin had received his PhD from Harvard in 1905 and wrote an article on the contributions of William James to Education for the J. of Educational Psychology in 1910.  Minton (1984) points out that it was Carl Seashore (euthenics; guiding spirit of psychology at Iowa; and founding editor of the J. of Education) that selected Baldwin for the position.

[64]The following criteria were used: Catalogue of Library Books for School Districts of Iowa 1911 and 1917; List of Books for School Libraries in the State of Wisconsin, 1926-28; Catalogue of Books in the Children's Department of the Carnegie Library of Pittsburgh, 1920; and Children's Catalogue of the H.W. Wilson Company.  Library books that were beyond the range of grade school children were checked with books for Home Reading for High School and Junior High School prepared for the National Council of Teachers of English.

[65]These laws typically provided a state subsidy (on a per child and on a per teacher basis) to schools which had met certain standardized equipment and structural criteria.

[66]The terminological and empirical details Luria's research will be covered in the last chapter but it should suffice to say that both simple "naming pictorially represented objects" and more complex tasks such as "which is used like the other" tasks were used by Luria.

[67]AT scores were disclosed only to colleges to which student applied. Because of the possibilities of "misinterpretation," it was not considered responsible to release them either to candidates or to secondary school officials or teachers.

[68]Brigham (1930) is essentially jumping ship from his ailing "general" intelligence past and into the relative comfort of the rising "multi-factor" approach to testing. His argument was three fold: (1) since Spearman (1927) found that the Alpha data was inconsistent with a single general factor, Brigham sides with Kelley (1927) that a verbal, quantitative, and spacial typology of intelligence might be useful; (2) if the "g" factor is simply an example of the "naming fallacy" then the former use of  a "combined scale" (i.e., Alpha & Beta scores) to compare national and racial groups was "absurd;" and (3) even if homogeneous (national and racial) norms were obtained differences in "opportunities" make rough comparisons between groupings ill-advised. Hence, "comparative studies of various national and racial groups may not be made with existing tests, and ....the most pretentious of these comparative racial studies -the writers' own- was without foundation" (p. 165).

[69]Some of the early attempts at vocational guidance, however,  tried to live up to the assessment-placement model.  In Oakland, for instance, the Bureau of Research and Guidance developed a centralized placement system under the direction of one of the city's high school principals who was appointed half-time as assistant directory of guidance in charge of placement.  Working closely with the Education Committee of the Oakland Chamber of Commerce and with the school counselors, the directory placed approximately 100 students in jobs between January and June 1920.  This position was eliminated two years later but placement work continued under the Director of Vocational Guidance in cooperation with the school counselors, the various chairmen of the Boys Work Committees of the city's service clubs -such as Rotary, Kiwanis, and Lions- and several of the city's departments stores (which indicated when girls were needed for sales positions).  Finally, in 1924, the school board established a full-fledged youth placement bureau in cooperation with the Junior Division of the United States Employment Office (Kantor, 1986, p. 365).

[70]Management would have to act quickly because under Wilson, a series of labor friendly acts were being passed; and these contributed to a rise in the membership of  the American Federation of Labor from about 500,000 in 1900 to 4,169,000 in 1919 (Murray, 1955).

[71]Mayo, an Australian, took his M.A. at the University of Adelaide in 1899 but arrived in the U.S. only in 1922 as a visiting associate professor of industrial research at the Harvard Business School.  He remained there and became a central figure in the Rockefeller-financed Department of Industrial Research at Harvard's Graduate School of Business Administration.

[72]The decontextualized details of the Relay experiments are as follows:  The Relay Assembly Test Room was located in a corner of the regular Relay Assembly Department. In it were placed five experienced women whose job was to assemble a little gadget called a relay (which was made up of between thirty and forty pieces) and one girl to distribute parts for the others. Many variations of conditions were made in 13 periods between April 1927 and June 1929, including studies of rest pauses, fatigue, monotony, and the effect of shorter days, shorter weeks, and wage incentives. First of all, the girls were secretly observed for two weeks working in their regular departments so a base rate of production could be established. Secondly (five weeks) the girls were moved into the test room and given an opportunity to become accustomed to their new surroundings. Their production went up. Thirdly, (for most of the summer) the girls were given a new wage system. Instead of a department wide group-incentive plan (based on 100 relay assemblers) the girls were paid on the basis of the average production of their own group. Production went up further. During periods 4 through 11 (lasting over a year) rest pauses, length of working period, free lunches, Saturdays off were all tried.  Production increased even more. In the "crucial" period 12 (three months) the conditions of period three, when only the method of payment had been changed were reinstated.  After more than a year of different work and hour schedules, the girls had returned to the conditions from which they had started. Only in this single period did the productivity of the girls "leveled off" [disputed as actually decreasing].  Despite this, however, their output never declined to the original period three level [not disputed] but merely plateau at a relatively higher level.  In the last period of the study, morning and afternoon rest pauses were reinstated and the company provided a beverage for the girls. Production again started to climb (after Baritz, 1960).

[73]Between 1928 and 1932 the Socialist Party did grow from about 7,000 to 15,000 members.  But in the latter year, the Communist Party had only about 12,000 card carrying members.  Even after the great stock market crash of 1929, shame and apathy seems to have been far more prevalent than militancy (see Terkel, 1978, p. 321;  Shannon, 1960, pp. 11-135).

[74]The later personnel counseling program starting with a single counselor appointed in 1936 but grew rapidly.  At the end of the first year, five counselors were active, and by 1938 the number doubled; by 1941, twenty-nine counselors serviced about 10,000 workers.  By 1954, about sixty four men and women were devoting full time to counseling duties in five of the twelve Western Electric plants.