Bathroom renovation website. Helpful Hints

Simon's binet test is. Diagnostic examination on the mental development scale A

The Binet-Simon test is the most common method for quantifying the level of ability development.

Appearance history

Developed in 1905 by A. Binet and T. Simon by order of the Ministry of Public Education of France in order to screen out children who are not sufficiently developed to study in a mass school. Initially, the test contained 30 tasks, which were selected according to the degree of difficulty so that they could be solved by 75% of children of a certain age, whose mental development could be considered normal. The number of correctly solved tasks characterizes the so-called mental age.

Test modifications

The most famous modification was developed by L. Theremin at Stanford University (USA); the so-called Stanford-Binet test he created is the most recognized method for diagnosing intelligence. Based on it, the IQ is calculated. However, the practical use of this test, like most

of such methods, allows us to quantify individual differences in mental abilities without revealing their nature and development prospects. This makes it difficult to use the test results in making a psychological diagnosis and predicting the development of intelligence.

  • The Stanford-Binet Intelligence Scale is a scale for assessing intelligence, developed in 1916. The test uses a single indicator of the level of intelligence - IQ. The intelligence quotient is equal to the quotient of the mental age of the subject and his real age, multiplied by 100. Both ages are measured in months.

    Currently, the Stanford-Binet scale is used mainly in Western countries to assess readiness for school, the distribution of students to schools of different levels and when entering universities.

Related concepts

The account according to Kraepelin is a method of pathopsychological research. Used for qualitative and quantitative assessment of performance, exercise and fatigue. The method is used in clinical, school and professional psychodiagnostics, starting from adolescence. Proposed by the German psychiatrist Emil Kraepelin in 1895. Initially, the technique was a table, in a large column of which a long series of single-digit numbers was written, which had to be added up in the mind. In modification...

lat. scala - stairs] - an intelligence test designed to measure the level of mental development. The first option S.-B. y. R. sh. was developed by L. M. Theremin in 1916 and was a modification of the Binet-Simon scale of mental development. During development, a large number of changes were introduced into the basic methodology. Compared to the Binet scale, more than a third of new tasks were added, a number of old ones were either redone, or discarded or redirected to other age groups. In fact, the first edition of S.-B. y. R. sh. was a new test. In the future, the test was repeatedly radically improved. S.-B. y. R. sh. includes tasks aimed at exploring a wide range of abilities - from simple manipulation to abstract reasoning. At early age levels, tests mainly require hand-eye coordination, perceptual difference, the ability to understand instructions (in tasks such as building blocks, stringing beads, matching geometric shapes), and the ability to recognize objects presented in the form of toy models or images on cards. At the highest age levels, tests using the verbal content of tasks are most represented. Among them are a vocabulary test (explanation of the meaning of words), analogies, completion of sentences, definition of abstract concepts, interpretation of proverbs. Some tests are aimed at characterizing the degree of fluency and fluency of speech (quick naming of unrelated words, selection of rhymes, construction of sentences with given words). Among the tasks of the battery, tests of general awareness, knowledge of the norms of social life, rules of conduct (answers to questions, interpretation of situations, detection of inconsistencies in plot pictures or stories) are widely represented. The scale includes a number of tests of memory, spatial orientation (visual reproduction of figures, labyrinths, folding and cutting paper objects, etc.). At higher age levels, the degree of assimilation of certain skills acquired at school (the ability to read, knowledge of arithmetic) is analyzed. When examining with the help of a number of tests, the technique allows for the possibility of obtaining broad qualitative information about the methods of work of the subject, how he solves problems. Great opportunities are also provided for monitoring personal qualities: the level of activity and motivation, confidence, perseverance, concentration, etc. The complex procedure for conducting the survey and interpreting the results, the need for strict adherence to standards require high qualification and preliminary training of the experimenter. According to the application of S.-B. y. R. sh. vast experience has been accumulated, including factual data and their interpretation. In terms of the breadth of use, this technique occupies one of the leading places among intelligence tests in foreign psychodiagnostics. The duration of use and the breadth of distribution made the frame of reference for S.-B. y. R. sh. standard for other psychometric tests. The distribution of the results of IQ indicators of the Stanford-Binet scales is the basis for the classification of degrees of mental retardation, which is widely used in foreign psychodiagnostics. L.F. Burlachu k, S.M.Morozov

Binet-Simon test - a tool for diagnosing the development of intelligence, proposed in 1905 A. Binet and T.Simon.

At first, the test consisted of 30 verbal, perceptual and manipulative tasks, which were arranged according to the criterion of increasing difficulty in the appropriate age cohorts: each task of a certain age cohort had to be solved by 75% of children of this age with normal intellectual development. By the number of correctly solved tasks by the child, his mental age was determined. The concept of " mental age " was used by A. Binet and T. Simon in 1908 as a quantitative indicator of the development of intelligence. This is a characteristic of the intellectual development of an individual based on its comparison with the level of intelligence of other people of the same age. Quantitatively expressed as the age at which - according to average statistics - those test tasks that are available to a given individual are solved. According to Binet, this level does not depend on training, but is determined only by genetic factors.

The second version of the scale, 1908, was correlated with ages from 3 years to adulthood, and the third, 1911, was somewhat edited and supplemented.

Wexler test. The most popular test for diagnosing intelligence in our country was D. Wexler's test (1939). Wexler abandoned the concept of intelligence as "mental age", which was introduced by A. Binet, the creator of the first test of mental abilities. Wexler himself defined intelligence as a complex global ability of an individual to purposefully behave, think rationally and successfully interact with the external environment.

Veksler singled out two components in the intellect, as well as two spheres of its manifestation: verbal intelligence and action intelligence . Wexler suggested that in addition to general intelligence, there are verbal and non-verbal intelligences that should also be measured.

Veksler introduced the concept of "age norm". The subject received a test score based on a comparison of his results with the average results of the age group to which he belonged. The intelligence quotient was expressed in standard deviation units.

The test was intended for a comprehensive examination of patients in a psychiatric clinic. The main purpose of the test is to diagnose mental disorders in various diseases (psychosis, neuroses, etc.), as well as to determine the level of an intellectual defect in people with congenital intellectual underdevelopment and senile dementia.

Immediately after its appearance, the Wexler test began to be widely used outside the clinic: in professional selection, to assess the level of intelligence of "normal", that is, mentally healthy adults and children, and even to assess the level of intellectual giftedness.

Versions of the D. Wexler test for adults consist of 11 subtests, the version for children consists of 12. All versions have two scales: action scale and verbal scale. Veksler believed that the sum of points obtained for all test items characterizes general intelligence, and the sum for each of the scales - non-verbal and verbal intellects.

Subtests: 1) Awareness, 2) Comprehension, 3) Arithmetic, 4) Similarity, 5) Dictionary, 6) Remembering numbers, 7) Missing details, 8) Sequential pictures, 9) Koos Cubes, 10) Addition of figures, 11) Encryption 12) Labyrinths.

Thus, the test had to measure three abilities. However, factor analysis of the results of the application of the "adult" version of the test showed that the test actually measures four abilities: 1) general intelligence, 2) verbal comprehension, 3) perceptual organization, 4) the ability that affects the performance of subtests "Remembering Numbers", "Arithmetic", "Encryption".

The Sequential Pictures test, in which the subject must arrange the "comic story", is considered difficult: the success of its performance depends on both perceptual organization and verbal understanding.

The performance of each subtest requires a set of abilities. Therefore, the process of performing individual subtests should be analyzed in detail. The analysis of the profile is of paramount importance, in particular, the ratio of the success of the subtests, the assessment of the degree of dispersion of the results relative to the individual average level (the degree of "sawtooth" of the profile), etc. Each of the additional indexes has an important diagnostic value.

Raven's Progressive Matrices - a battery of tests developed by the English psychologist J. Raven in 1938 to diagnose the level of intelligence, based on the work of visual thinking by analogy. It has 2 options: 1) for adults and adolescents from 12 years old and 2) for children 5 - 11 years of age.

Each task of the test consists in inserting one of the 6 or 8 fragments under the main pattern into the place of the gap in the lower right corner of the main picture (“matrix”), which is a geometric pattern, in the time limited for the entire test. The test has 5 series, 12 matrices in each, with an increase in the serial number, the complexity of the task increases.

Raven's method is one of the most powerful methods for the study of human non-verbal intelligence. It is intended to determine the level of development of the logical thinking of a person, the development of abilities to identify patterns and build new objects in accordance with them.

Amthauer Intelligence Structure Test . The intelligence structure test was developed by R. Amthauer in 1953 to differentiate candidates for various types of training and activities in the practice of professional selection.

The growing interest of domestic psychologists in this test is explained by a number of advantages that distinguish it from the well-known methods for studying the intelligence of Veksler, Raven and others.

First of all, the Amthauer intelligence structure test is suitable not only for individual, but also for mass research, which is especially important when examining large contingents by a limited number of psychodiagnostics.

At the same time, this test has a scale for recalculating assessments into the usual IQ units of the Wechsler test, which makes it possible to compare the results obtained on similar samples using the Wechsler test.

The intelligence structure test was compiled by R. Amthauer in three options, two of which are equivalent and applicable to samples of individuals with different professional and life experiences. The test is composed of 9 groups of tasks (subtests) focused on the study of such components of verbal and non-verbal intelligence, which are: vocabulary, ability to abstract, ability to generalize, mathematical abilities, combinatorial thinking, spatial imagination, ability to short-term memorization of visual-figurative information.

When normalizing standard indicators, he adhered to the age criterion.

Subtests:

1 - includes tasks focused on the study of the subject's vocabulary ("sense of language" according to Amthauer),

2 - ability to abstract,

3 - the ability to make judgments and conclusions,

4 - ability to generalize,

5 - mathematical ability,

6 - mathematical abilities ("series of numbers"),

7 - combinatorial thinking ("geometric figures"),

8 - spatial imagination ("Koos cubes"),

9 - the ability to memorize and reproduce visual information.

Wexler: The scale for measuring the level of intellectual development (children's version of WISC, adult version of WAIS) consists of 11 subtests, which make up the verbal (1-6) and non-verbal (7-11) scales:

1) general awareness– the level of simple knowledge;

2) understanding the meaning of expressions- ability to judge

3) arithmetic- ease of handling numerical material

4) finding similarities- conceptual thinking

5) remembering numbers- memory

6) vocabulary- verbal experience, the ability to define concepts

7) encryption / numeric characters– hand-eye speed

8) missing parts/completion pictures- visual observation, the ability to identify significant signs

9) block construction– motor coordination, visual synthesis

10) consecutive pictures- the ability to organize a whole from parts, understanding the situation, extrapolation

11) folding figures- the ability to synthesize the whole

Determined by IQ-verbal, IQ-non-verbal, IQ-general.

Norms: 130 and above - very high intelligence, 120-129 - high intelligence, 110-119 - good norm, 90-109 - average level, 80-89 - reduced norm, 70-79 - borderline level, 69 and below - mental defect .

Equals: progressive matrices, measuring intelligence through identifying relationships between abstract figures. There are two options: color (more simple, for children 5-11 years old and adults over 65 years old) - 12 matrices, 3 series and black and white - 60 matrices (compositions) for 5 series; The standard IQ shows the ratio of a given person's performance to the size of the distribution of performance for that person's age.

Amthauer: Measures intelligence in people aged 13-61. Developed as a test for the diagnosis of general abilities in the field of professional suitability, consists of 9 subtests:

1) logical selection- inductive thinking, sense of language

2) definition of common features- the ability to abstract, operate with verbal concepts

3) analogy- combinatorial abilities

4) classification- ability to make judgments

5) check– level of practical mathematical thinking

6) rows of numbers- inductive thinking, the ability to operate with mathematical patterns

7) choice of figures– spatial imagination, combinatorial abilities

8) cubes- spatial thinking

9) memorization of words- ability to focus, memory

Scores are calculated for each of the subtests, the scores are converted into scale scores, and a profile is drawn that determines the ability for practical or theoretical activities.

Binet-Simon: scale of mental development. Originally (1905) contained 30 tests arranged in ascending order of difficulty so that the probability of success increased with chronological age. The level of difficulty was determined empirically on the basis of data from a sample of 50 normal children aged 3-11 years and a small number of feeble-minded children. The next edition (1908) made it possible to single out different levels of intellectual development in normal children (level = "mental age"). The third edition (1911) extended the scale to the level of adults, but still did not provide for the definition of IQ. Then converted to the Stanford-Binet scale, where IQ is introduced:

mental age

The current edition of this well-established scale is the result of the most extensive revision (Delaney, & Hopkins, 1987; Thorndike, Hagen, & Sattler, 1986a, 1986b). Retaining the main advantages of earlier editions as an individually applied clinical tool, this version reflects the results of the development of both theoretical ideas about intellectual functions and test design methodology. Continuity with earlier editions was partly ensured by retaining many task types from earlier forms. More importantly, we managed to maintain an adaptive testing procedure, thanks to which each test-taker receives only those tasks whose difficulty corresponds to the level of performance demonstrated by him.

At the same time, the scope of content has been greatly expanded from the predominantly verbal focus of earlier forms in order to provide a more representative coverage of problems involving numbers, spatial relationships, and short-term memory data. In addition, each item type is used, as far as possible, over a wide age range, thus ensuring almost complete comparability of scores at different age levels. The fourth edition of the Stanford-Binet scale is intended for use in the Age Range from two years to adulthood.

Testing and scoring. A typical set of materials required for the Stanford-Binet test is shown in Fig. 8-1. It includes four books of printed cards with images of test tasks, the change of which is carried out by flipping pages; test subject material, including cubes, a board of (geometric) shapes, a set of multi-colored and differently shaped beads, as well as a large picture depicting a doll indistinguishable by gender and ethnicity; notebook with protocols For recording answers and guidance on how to conduct the test and evaluate the results.

Like most individual intelligence tests, the Stanford-Binet scale requires that only highly qualified specialists work with it. Special training and experience with this scale is absolutely necessary for the right



Part 3 Ability testing

Rice. 8-1. Materials used in testing using the Stanford-Binet Intelligence Scale (fourth edition)

(Copyright © 1986 by the Riverside Publishing Company. Reproduced with permission from the publisher)

correct conduct, scoring and interpretation of test results. Uncertainty and ineptitude can be detrimental to rapport, especially with young children. Minor changes in verbal formulations, allowed through inattention, can change the difficulty of tasks. Additional difficulties arise in connection with the fact that the tasks must be assessed immediately after their completion, since the subsequent testing depends on how the subject coped with the tasks of the previous levels.

For decades, clinicians have treated the Stanford-Binet scale and similar individual scales not only as a set of standardized tests, but also as a clinical interview. The same features that make it difficult to use such scales create favorable opportunities for interaction between the diagnostician and the subject and allow an experienced clinician to identify the information he needs for diagnosis. The Stanford-Binet scale and other tests described in this chapter allow you to observe the respondent's work methods, his approaches to solving problems, and other qualitative aspects of task performance. The tester also has the opportunity to evaluate some of the test-taker's emotional and motivational characteristics, such as the ability to focus, activity level, self-confidence and perseverance. Of course, any qualitative observations made at the time of individual tests should be recorded as observations and not interpreted in the same way as objective test scores. The value of such qualitative observations is highly dependent on the skill, experience, and psychological insight of the tester, as well as knowledge of the pitfalls and limitations inherent in this type of observation.

Chapter 8 Individual abilities

Rice. 8-2. Age Range 15 Stanford-Binet Fourth Edition Notes on areas shaded in gray. Of the nine tests with restricted age ranges, some members of the standardization sample who were outside their age ranges were still presented with some of those tests because of unusually high or low scores on the testing path test. Their performance was taken into account when evaluating the results of the entire relevant age sample for the compilation of normative tables, but these estimates were included with special caution regarding their use. For details, see guide(Thorndike et al., 1986a, p. 7) and technical manual(Thorndike et al., 1986b, p. 30).

(Supplied with simplifications from The Stanford-Binet Intelligence Scale: Fourth Edition, Guide for administering and scoring, p. 7. Copyright© 1986 by the Riverside Publishing Company-Reproduced with permission from the publisher)

In contrast to the age grouping principle of items used in earlier editions of the scale, in SB-W tasks of each type are placed in separate tests in order of increasing difficulty. The scale consists of 15 tests, chosen to represent the four main cognitive domains: verbal racial reasoning, abstract/visual reasoning, quantitative reasoning, and working memory (see Figure 8-2). These 15 tests, although grouped into four categories for scoring purposes, are administered in a mixed order to keep test takers' interest and attention. The difficulty range of six of these exercises covers the entire age range of the scale. SB-IV. As can be seen on

Part 3 Ability testing

rice. 8-2, the remaining nine tests, due to the nature of the tasks they contain, either begin later or stop presenting earlier than the respective age limits.

Holding SB-IV is a two-stage process. At the first stage, the tester gives a Vocabulary test, which serves to select the survey route through the definition entry level (enshch level) for all other tests. With which task to start the Vocabulary test depends solely on the chronological age of the person being tested. For other tests, the initial level is determined by the nomogram (or table) based on the Vocabulary test score and chronological age. At the second stage of testing, the specialist conducting it must establish basal and ceiling levels for each test based on the individual's actual performance of the tests. The basal level is reached when the subject copes with four tasks at two adjacent levels. The limit level is reached when three of the four tasks (or all four tasks) at two adjacent levels are not performed by the subject. Upon reaching the limit level for a particular test, it is no longer used in further testing of the subject.

When the task is presented and the subject's reaction to it is received, the tester enters the assessment in a notebook for recording answers. The primary score (“raw score”) for each test is found by fixing the number of the task of the highest level of all presented to the subject and subtracting from the resulting number the total number of tasks that he completed incorrectly. In addition, 11 tests include sample tasks that serve only to familiarize yourself with the test and are never taken into account when calculating the indicator. In most tests, each item has only one correct answer; such answers are indicated on the back of the task cards and in the notebook for recording answers. All tasks are evaluated on a pass/fail basis, in accordance with established reference responses. Five tests involve free answers, and therefore require the use of more detailed standards and assessment rules, which are given in the manual for conducting and evaluating the results. SB-IV(Thorndike et al., 1986a), 1 which also provides some examples of ambiguous responses that require further clarification by the tester.

Although full scale SB-W has 15 tests, no person passes all of these tests, since some of them are applicable only in limited age ranges. Typically, a complete battery consists of 8 to 13 tests, depending on the age of the test subject and their score on the test that determines the examination route. The run time for a full battery is expected to range from 30 to 90 minutes, but less experienced users may need more time. As a rule, examination using a scale SB-YV carried out in one session, possibly with breaks of several minutes between tests. For some purposes, the SB-IV Performance and Evaluation Manual (Thorndike et al., 1986a) suggests several abbreviated batteries that require less testing time but focus on the tests most appropriate for a particular testing purpose. These batteries include a 6-test reduced battery of general

"These tests are Vocabulary, Comprehension, Nonsense, Copying and Verbal Relations.

Chapter 8 Individual abilities

appointments and 4-test rapid screening battery. Both have at least one test in each of the four cognitive domains. In addition, three batteries are proposed for screening students for inclusion in gifted programs, respectively, for each of the three age levels, and three batteries for students with learning difficulties, also corresponding to the three age levels. All of these reduced batteries use standard procedures for starting levels, testing, and scoring. In the "SB-IV User's Reference Guide" (Examiner's Handbook)(Delaney, & Hopkins, 1987) clarifies many of the procedural issues involved in administering (and evaluating the results of) this test with different types of subjects.

Standardization and norms. The SB-IV standardization sample size slightly exceeded 5,000 subjects aged 2 to 23 tested in 47 states (including Alaska and Hawaii) and the District of Columbia. This sample was stratified by geographic area, community size (community size), ethnic group and gender, in order to achieve a close match (at the level of proportionality) to the 1980 US Census data. In addition, the socioeconomic status of the subjects in the form of the professional and educational level of the parents was controlled. The results of this control revealed overrepresentation of the subjects at the upper and underrepresentation at the lower levels. These inconsistencies were corrected by assigning different weights to frequencies when calculating indicator values ​​in normative tables. Thus, each subject from a family with a high socioeconomic status was counted as some part of the observed case, while a subject from a family with a low socioeconomic status was counted as a case with some additive.

Normative tables are used to convert the primary scores for each of the 15 tests into "Standard Age Scores" (Standard Age Scores, or, for short, SAS).* They are normalized standard scores with a mean of 50 and SD= 8 in each age group. Normative tables are drawn up with a 4-month interval for ages 2 to 5 years, with a 6-month interval for ages 6 to 10 years and with an interval of 1 year for ages 11 to 17; for the age level from 18 to 23 there is only one standard table. The notebook for recording answers contains a special chart form for building an individual 5L5 profile based on the results of tests conducted with a particular subject.

Standard indicators of age (SAS) can also be obtained for each of the Four Cognitive Areas and for the cumulative score on the 55-IV full scale. Complex and four private standard indicators of age are found by the values SAS for tests conducted with a specific subject, for which you just need to refer to the relevant normative tables. These five SAS are also

These tables are also given by Thorndike et al., 1986a, p. 183-188. Some Meanings S.A.S. based on less than 100 observed cases, were statistically evaluated for the full age cohort and are highlighted in the normative tables with a dark background. Such indicators appeared when the subjects showed an unusually high or, conversely, low result for their age in

ST Y, which determines the survey route (Thorndike ct al., 1986b, p. 29-30).

Part 3 Ability testing

normalized standard scores, but with a mean of 100 and an SD of 16. Thus, they are expressed in the same units as the standard IQ earlier editions of the Stanford-Binet scale. However, the use of the term "/Q" has now been completely abandoned. For special purposes, it is possible to calculate standard age indicators for any combination of two or more private (i.e., corresponding to one of the four cognitive areas) SAS- the so-called "partial compositions" (partial composites). For example, the combination SAS for verbal and quantitative reasoning closely corresponds to "learning ability" (scholastic aptitude) and may be of particular interest in relation to assessing academic achievement or learning readiness.

Reliability. Because in SB-IV there is no alternative form, the reliability of this scale could only be assessed by calculating internal consistency or retesting. In most cases, the Kuder-Richardson method was used, which was applied to the data obtained on the entire standardization sample. As expected, the composite indicator for a full battery gave the highest reliability coefficients at all age levels, the values ​​of which ranged from 0.95 to 0.99. The reliability of private indicators in each of the four cognitive areas was also high. Although it varied depending on the number of tests included in each area, the corresponding safety factors ranged from 0.80 to 0.97. As for individual tests, most of them have reliability coefficients between 0.80 and 0.90, with the exception of the short (consisting of 14 items) Memory for Objects test, the reliability of which varies from 0.66 to 0, 78. In general, all safety factors tend to increase somewhat from younger to older age levels.

Additional data on retest reliability were obtained on 57 preschoolers (5 years old) and 55 schoolchildren (8 years old), who were retested several months later (from 2 to 8). In general, the composite score was highly reliable, with the respective coefficients for the two groups being 0.91 and 0.90. Although the partial measure in the area of ​​verbal reasoning gave reliability coefficients above 0.80, the retest reliability of other partial measures and individual tests showed significant fluctuations. These results are difficult to interpret due to the possible impact of the limited age ranges of some tests and the effect of practice, which could vary significantly from child to child.

In addition to the safety factors in the guidance for conducting and evaluating results SB-W (Guide) and in the technical manual (Technical manual) standard measurement errors are given (SEM) within each age level for each test, partial indicators for cognitive areas and a complex indicator for the full scale. Such SEM are needed to evaluate individual scores and to interpret differences between scores in profile analysis. General comprehensive SAS (M= 100, SD = 16) has SEM 2 to 3 scale units. For example, if, as an approximate mean, SEM take 2.5, i.e. 2 chances to 1, that the "true" complex indicator of a particular subject will not differ from the indicator received by him by more than 2.5 units; in addition, there is a 95 chance in 100 that its variation will be no more than 5 units (2.5 x 1.96 = 4.90).

Chapter 8 Individual abilities

V User Reference Guide 5B-/V (Delaney, & Hopkins, 1987) presents an interpretative framework that encourages the formulation and cross-checking of hypotheses based on the quantitative and qualitative data collected with this battery. Quantitative analysis follows the model first proposed by F. B. Davis (F. B. Davis, 1959) and applied by Kaufman (Kaufman, 1979, 1994) and others to the Wechsler scales. In essence, it consists of typical schemes of comparisons of complex and four partial (see Fig. 8-2) indicators in order to find statistically significant differences based on the magnitude S.E.M. The frequency of the differences obtained is also compared with the corresponding normative data from the standardization sample. In addition, they can systematically evaluate the strengths and weaknesses of the specific abilities of the individual identified by each test, for which they compare the average result of the test subject on complex and partial indicators with indicators on individual tests. This reference guide contains all the necessary information to perform these types of profiling analysis, and also gives four complete examples of their application; it will certainly be appreciated by both beginners and experienced users of the Stanford-Binet scale.

Validity. In accordance with modern concepts of test validation, the developers of the fourth edition of the Stanford-Binet scale followed a variety of approaches in identifying and defining the constructs underlying it. The primary choice of constructs was guided by an analysis of the available scientific literature on the nature and measurement of intelligence (R. L. Thorndike et al., 1986b, chap. 1). The experience of using previous editions of this scale and its strengths and weaknesses revealed in the course of it served as additional guidelines in drawing up plans for constructing a new scale and making decisions. For example, dividing item types into reliable subtests was a necessary replacement for the traditional clinical practice of lax analysis of response structure based on subjective item groupings.

After the initial selection and preliminary definition of the constructs assessed in SB-IV, old items were identified and new items were developed that correspond to these definitions. The entire set of items was subjected to a comprehensive and statistically sophisticated analysis, including both subjective and statistical evaluation of item bias (R. L. Thorndike et al., 1986b, chap. 2). The final version of the scale, obtained as a result of several preliminary checks and field tests, was carried out on a standardization sample and then examined in terms of three main types of validation data: 1) intercorrelation and factor analysis of indicators; 2) correlations with other intelligence tests; and 3) comparisons of results in predetermined special groups (Thorndike et al., 1986b, chap. 6).

First of all, according to the data of the full standardization sample, intercorrelations were calculated between the indicators of all tests, particular indicators for four cognitive areas, and complex indicators of the battery - separately for each age level. Median correlations (found by ranking the same type of coefficients for all ages) were used as input data for confirmatory (confirmatory) factor analysis. The main goal of this analysis by Naliz was to test the hypothesis of the presence of a common factor explaining the correlation.

Part 3 Ability testing

correlations between tests from different cognitive domains, and group factors explaining the residual correlations within each domain. A similar factorial analysis was also performed with median correlations in each of the three age groups (2 to 6, 7 to 11, and 12 to 18-23).

The results of factor analysis in each case showed significant loadings of the common factor in all tests, thus justifying the use of a common complex indicator. For three of the four cognitive domains, group factors explained a significant proportion of the residual total variance within the respective domain. The exception was the area of ​​"abstract/visual reasoning", where all four tests showed a high degree of specificity. It can be speculated that the failure to find clear confirmation of the group factor in this cognitive area could be due to the cumulative effects of a school curriculum that is not as carefully organized in terms of spatial-perceptual content as it is in terms of verbal and numerical material. Everyday personal experience that contributes to the development of spatial-perceptual abilities is not systematically organized into "training courses" or areas of content, like the experience associated with learning. Therefore, it is less likely that personal experience favors the formation of common bond structures among different people (Anastasi, 1970, 1986b).

An overview of the results of factor analysis given in the test manual, as well as the results of factor analysis conducted independently by other researchers on standardization data SB-YV, confirmed the legitimacy of using a complex indicator as a measure of general intellectual ability (R. M. Thorndike, 1990). However, researchers differ on the number and nature of narrower factors (see also McCallum, 1990). This situation is complicated by the fact that since SB-YV consists of different sets of tests at different ages, the "raw" data for factor analysis (i.e. correlations between test scores) differ accordingly. Hence the differences in the types and number of factors - ranging from two to four - appearing at different age levels. These discrepancies are exacerbated by the variety of factor analysis methods used in different studies. However, in general, as subjects age, the factorial solution better fits the four-factor model postulated in the development of SB-IV, especially when using conformational factor analysis as opposed to exploratory factor analysis.

The second source of validation data is based on a series of study groups in which SB-YV and some other intelligence test, including the L-form of the Stanford-Binet scale. 1 These groups consisted of schoolchildren who regularly attended classes and were described by teachers as “normal” (non-exceptional). In addition, the researchers had at their disposal three "special" (exceptional) groups of students involved in programs for gifted children, children with learning difficulties and children with mental retardation. In a regular sample, the correlation of the standard IQ according to the earlier version of the Stanford-Binet scale (form 1-M) with a complex indicator for 56-IV, it was 0.81; the second largest (0.76) was the correlation of the standard IQ forms L-Mc private yet-

1 Others included WISC-R, WAIS-R, WPPSI and K-ABC, which will be covered in this chapter a little later.

Chapter 8 Individual abilities

the agent SB-W in the field of “verbal reasoning”, and the lowest correlation (0.56) is the standard /Qdal with a private indicator SB-W in the area of ​​"abstract/visual reasoning", which is to be expected based on the similarities and differences in the content of these two forms of the Stanford-Binet scale. In all groups, the correlations of complex and particular indicators SB-IV with general or partial scores on other intelligence tests, for the most part, did not contradict the hypotheses regarding the tested constructs. At the same time, a thorough study of all correlations found between specific indicators SB-W and other intelligence tests contributes to a firmer understanding of the constructs measured by the modern Stanford-Binet scale.

The third series of special studies on special samples showed that SB-IW allows you to correctly determine the level of performance of gifted, having learning difficulties and lagging behind in the development of school-age children. The averages of the complex indicator and four partial indicators in the sample of the gifted turned out to be significantly higher than the corresponding averages in the standardization sample. The averages in the samples of children with learning difficulties and mental retardation were significantly lower than the averages of the standardization sample, and the averages of mentally retarded children were significantly lower than the averages in the sample with learning difficulties. It should be noted that all studies of special groups have defined their participants on the basis of tests or other indicators of performance, but the scale itself SB-1 V was not used.

In a later review of validity studies SB-W(Laurent, Swerdlik, & Ry-burn, 1992) conclude that this scale is at least as good a measure of general intelligence as other available measures; that it is strongly correlated with measures of achievement and, moreover, makes it possible to distinguish between the mentally retarded, the gifted, and the sick with neurological damage. The reviewers suggest that SB-IV can be used as a selection tool when evaluating gifted children due to the high "ceiling" provided by the age range of this test; on the other hand, they criticize SB- IV for the lack of extremely easy tasks - simple enough to diagnose mental retardation in the youngest children.

Research Needed to Strengthen the Interpretive Meaning of Various Test Scores SB-W and their combinations continue to accumulate rapidly. In addition, several papers have appeared that provide guidance on the use of this scale (Sattler, 1988; Glutting, & Kaplan, 1990; Kampha-us, 1993). The modern edition of Stanford-Binet reflects true progress in the construction of the scale. The 55-IV provides the necessary flexibility, allowing users to evaluate individual abilities according to specific testing goals. Finally, this version of the scale agrees much better with current theoretical understanding of the nature of intelligence and recent research in this area (see Chapter 11).

Wechsler scales

The scales of intelligence developed by David Wexler include several consecutive editions of three scales: for adults, for school-age children and ^ l I for preschoolers. In addition to their use to measure the general intelligence of eyelids,

Part 3 Ability testing

Slerov scales have been tried as an aid to psychiatric diagnosis. Based on the observation that brain damage, psychotic exacerbations, and emotional disorders can selectively affect intellectual functions, D. Wexler and other medical psychologists argued that a comparative analysis of the patient's performance of various subtests could shed light on the specifics of a mental disorder. Problems and results related to such profiling of Wechsler scores will be discussed in Chapter 17 as an example of the use of tests in the clinical setting.

The interest in the Wexler scales and the breadth of their application are evidenced by several thousand publications devoted to them that have appeared to date. In addition to the usual reviews of tests in Yearbooks of Psychic Measurements research on Wechsler scales is reported periodically in journals (Guertin, Frank, & Rabin, 1956; Guertin, Ladd, Frank, Rabin, & Hiester, 1966; Guertin, Ladd, Frank, Rabin, & Hiester, 1971; Guertin, Rabin, Frank, & Ladd, 1962; TD Hill, Reddon, & Jackson, 1985; Littell, 1960; Rabin, & Guertin, 1951; IL Zimmerman, & Woo-Sam, 1972) and summarized in several books (e.g., Forster & Matarazzo, 1990; Gyurke, 1991; Kamphaus, 1993; Kaufman, 1979,1990,1994; Sattler, 1988,1992).

Past and present of Veksler's scales of intelligence. The first form of the Wechsler scales, known as the Wechsler-Belleview Intelligence Scale, was published in 1939. One of the main goals of preparing this scale was to develop an intelligence test suitable for testing adults. Introducing this scale for the first time, D. Wechsler (1939) noted that previously available intelligence tests were developed mainly for schoolchildren and adapted for adults by adding more difficult tasks of the same type. The content of such tests was often of no interest to adults. If test items do not have at least a minimum of apparent validity, then it is almost impossible to establish proper rapport with adult subjects. Many items on an intelligence test, specifically tailored to the daily activities of a school-age child, clearly lack apparent validity from the point of view of most adults.

The focus of most tests on speed can also be disadvantageous for older people. In addition, D. Wexler believed that in traditional intelligence tests, unjustifiably great importance was attached to relatively stereotyped word manipulations. He drew the attention of colleagues to the inapplicability of mental age standards to adults and pointed out that previous standardization samples for individual intelligence tests included only a small number of adults.

The desire to overcome all these shortcomings led to the development of the first Wexler-Belleview scale. In form and content, this scale serves as the basic model for all subsequent Veksler scales of intelligence, each of which, in turn, introduced some improvements to the previous version. In 1949, the Wexler Intelligence Scale for Children was prepared. (W1SO as an extension of the Wechsler-Belleview scale towards lower age levels (Seashore, Wesman, & Doppelt, 1950). Many of the items were taken directly from the adult test, and easier items of the same type were added to each subtest. In 1955, the Wechsler-Belleview scale was superseded by the Wechsler scale of intelligence for adults ( WAIS), free from some technical non-

Chapter 8 Individual abilities

the strengths of the former scale regarding the size and representativeness of the normative sample, as well as the reliability of the subtests. In 1967, the Wexler family of tests was replenished with one more, "the youngest child" - the Wexler intelligence scale for preschoolers and primary schoolchildren (WPPSP), originally conceived for children aged 4 to 6.5 as an extension of the lower region of the age range WISC, which was intended for children from 5 to 15 years.

Development WISC was marked by some controversy from the outset, as Wexler set about creating his tests in part because of the urgent need for a scale to measure adult intelligence that not would have been a simple extension of the then existing scales for children towards higher age levels. First edition WISC was in fact completely criticized for its lack of focus on children. In the revised version of this scale ( WISC-R), published in 1974 and intended for children aged 6 to 16, the adult-oriented tasks have been replaced or modified in such a way as to bring their content closer to the usual children's experience. In the arithmetic subtest, for example, in the conditions of the problem, "cigars" were replaced by "sweets". Other changes included the elimination of tasks that may have been familiar to different groups of children to varying degrees, and the inclusion of more female and black characters in the visual material of the subtests. A number of subtests had to be lengthened in order to increase their reliability. In addition, some improvements have been made to the test and scoring procedures.

Description of scales. To date, each of the three Wechsler scales has undergone at least one or even several revisions. There are three modern versions of the scales published under the name of David Wexler after his death in 1981: Wechsler's Revised Adult Intelligence Scale (WAIS-R- Wechsler, 1981), covering the age range from 16 to 74 years; Wechsler Intelligence Scale for Children - Third Edition ( WISC III- Wechsler, 1991), intended for children from 6 years to 16 years 11 months; Revised Wechsler Intelligence Scale for Preschool and Primary School Children ( WPPSI-R- Wechsler, 1989), now covering the age range from 3 years to 7 years 3 months. The third edition of the adult intelligence scale ( WAIS), work on the improvement of which has been carried out since 1992, it is supposed to be ready by 1997.

WAIS-R, WISC-III and WPPSI-R have many features in common, including the basic organization of the Verbal and Non-Verbal scales, each of which consists of a minimum of five (and a maximum of seven) subtests and gives individual indicators in units of a standard IQ Individual scores for all 10 systematic subtests (11 for WAIS-R) combined into a full scale IQ (Full Scale IQ), which has the same mean and standard deviation (M=100, SD= 15) as two subscales - Verbal and Nonverbal. Of the 17 different types of subtests used in WAIS-R, WlSC-Shi WPPSI-R, eight (5 verbal and 3 non-verbal) are common to all three scales. When using these scales, verbal and non-verbal subtests alternate and are presented in a predetermined sequence, different for each scale.

The Awareness subtest is the first verbal subtest presented on all three scales and serves as a good means of establishing rapport with the person being tested. Much effort has been expended to avoid in it questions relating to special

Part 3 Ability testing

al knowledge. His first tasks are easy enough for the vast majority of test-takers to complete, unless they are mentally retarded or actually disoriented. In such cases, the tester can quickly decide to stop testing. Subtest questions "Awareness" in versions WAIS-R and WISC III relate to facts that most people living in the US most likely had a chance to know, such as "What month comes before December?" or “Who was Mark Twain?” In version WPPSI-R similar questions are offered, albeit at a lower level of difficulty. In fact, this version begins with tasks presented in a pictorial form, which require only the correct answer to be shown. For example, when presented with a picture of several household items, a child may be asked which one is used for cleaning. The "Arithmetic" subtest is another verbal measure that demonstrates a wide range of difficulty on the Wechsler group of scales. In the easiest arithmetic tasks WPPSI-R it is required to show only one item in a row illustrating a quantitative concept (such as "smallest" or "greater"). More difficult tasks may involve calculations or solving arithmetic problems, the most difficult of which require a good mastery of fractions.