BERBAGI ILMU BERSAMA KANAHAYA: DESIGN AND MEASUREMENT

DESIGN AND MEASUREMENT

In order to derive a scientific body of knowledge about behavior, data must be collected. This information should not be casually expressed but rather recorded with care and in detail. Research, conducted through a scientific method of analysis, provides a reasonably objective means of answering questions about animal and human behavior. Problems are studied and procedures designed for their solution. Inferences to large numbers (a specified population) are often made from the results of data collected on sample subjects.

Most of the research reported in this book will be experimental in nature. Investigations in the motor-learning area tend to be experimental, for researchers are attempting to determine antecedent and consequential relationships. What kinds of effects will a particular condition produce on behavior? If we were to help learners set reasonably high but attainable goals for themselves before and during the practice of a task, will performances be any more proficient than when learners practice without such assistance? Experimental projects would help to solve this issue. Motor-learning specialists often compare different treatments (situational modifications), as administered through practice, in their quest to ascertain best practice or training techniques.

They also look at relationships among variables. One example of a study of interest might e the relationship between the age of a person and his ability to learn a moderately difficult motor task. How about the association of personality traits, intelligence scores, and motor performance? Further study might be done on the learning process itself; e.g., the generality of rates of learning across a variety of learning tasks, the relationship of initial level of success with later task proficiency, or abilities underlying achievement in particular activities at different stages of practice.

Motor behaviors can be observed under artificial and manipulated conditions as well as in natural situations. The research laboratory or the gymnasium may provide the appropriate data for the problem of concern. Regardless of the task used, variables of interest, and the setting, the scores of the subjects are typically recorded and analyzed in some way so as to reach conclusions with regard to the stated problem. Thus scores—how they are obtained and how they are used—become the essence of the report. In learning experiments where the effects of some variable on learning are to be determined, a crucial decision must be made with regard to a score that constitutes learning. In other words, how is this score to be derived?

DECISIONS IN THE MEASUREMENT OF LEARNING
Although scientifically accepted methods for the study of motor learning can easily be decided upon on occasion, certain procedural aspects require careful deliberation, as the consequences of these decisions might be more dramatic than perceived at a superficial level. With this thought in mind, let us identify problems and decisions in the experimentation of motor behaviors. This discussion should help you to recognize and to be sensitive to these problems, aiding in the interpretation of research findings or the undertaking of research. Major decisions must be made in the following areas:
Selection of appropriate learning tasks.
Determination of number of practice trials.
Selection of dependent variable (what and how to measure).
Determination of a learning score or scores.

Selection of Appropriate Learning Tasks
If the investigator wants to study the learning process or the effect of situational changes on the acquisition of skill, the typical concern is for subjects who are naive to the task that will be practiced and learned. Saying it another way, the task should be novel, or new, to the subjects. Because there is difficulty in finding real-life activities that are novel to a group of subjects, the usual recourse is to select an artificial laboratory task or a contrived athletic task for a study.

Once a nonfamiliar task has been selected ( assuming this is desirable), the investigator must consider its degree of complexity. Complexity will be a function of input demands (the number of cues needed to be at¬tended to and under what conditions), processing demands (speed to initiate response), and response demands (number of movements, refinement of movement, and so on). The nature of the learning phenomenon to be studied will suggest the type of task that might be used. As the reader will see, many of the tasks employed by motor-learning researchers are of the laboratory variety, and relatively simple to perform at that. That is, they are easy relative to the dimensions of athletic activities, like learning to fence, to play tennis, or to perform gymnastic routines. The laboratory tasks may be fairly well learned after ten or twenty trials, whereas it takes years to become a good tennis player. However, more simple tasks often permit the study of a learning variable( s ) if taught in purer settings under good control, and in a brief time span.

If the task is learned too quickly, however, it is probable that learning variables of interest might be camouflaged. The effects of an imposed condition might go unnoticed. For instance, if one selected task is simple to learn well and we want to study the differential effects of reward and punishment on task proficiency, the two groups might not show any difference in performance, primarily because of the simplicity of the task. A related problem is that where a criterion of task mastery is predetermined and turns out to be unrealistically low, or easy to attain. All things being equal, the more desirable learning task is one that can be measured with a minimal amount of interference and bias. Response measures are never pure. The challenge is to obtain measures that reveal "true" scores. Disturbing environmental cues, subjective experimenter observations, direct experimenter interactions with the subject, and non-calibrated and nonchecked-out equipment will produce artifacts contributing to non-valid data. Finally, a task should be appropriate for the maturational level of the subjects. Inability to perform because of immaturity of the nervous system and the musculature would certainly be a factor contributing to confounded data.

Learning Tasks
The most commonly used laboratory tasks in motor-learning research, reported in one form or another, are pursuit rotors, star-tracing tasks, positioning tasks, and stabilometers. They are invariably novel to the subjects and yet can be acquired to a reasonable degree in a relatively short time. A wide range of learning phenomena can he studied with these tasks and implications are made about motor behavior activities. A pursuit rotor, illustrated in Figure 4-1, is composed of a moving turntable on which a small disc must be pursued by the subject with a stylus. Time on target is usually recorded to 0.01 of a second. The star-tracing task (Figure 4-2). requires a subject to trace the pattern with a stylus as quickly as possible but while making a minimum of errors (an error being determined by the stylus touching the side of the tract). Viewing the mirror instead of the actual pattern makes it a reversal task and increases difficulty.

Figure 4-1. A pursuit rotor with a control unit with which test periods and rest periods can be preset.

Positioning tasks make demands on the subject to replicate predetermined movement speeds and distances. Judgment and timing are important. A stabilometer is a movable platform that the subject attempts to stabilize. During a prescribed testing period, he must keep the platform as horizontal as he can. This balancing task is shown in Figure 4-3, and balance within the designated range is usually timed to 0.01 second.

Creative investigators have devised other kinds of laboratory tasks as well as field tasks. Furthermore, far more complex tasks have also been reported in the literature. For instance, Figure 4 4 indicates the complex coordinator, an apparatus that demands the appropriate timing of leg and hand responses to specific cues. Task difficulty increases when investigators develop tasks that contain a variety of cues that require complex decision making and a variety of response possibilities from which the subject must select the correct ones.

Another type of task frequently found in the literature is one involving reaction time and/or speed of limb movement. Many situational variables can be manipulated to determine how an individual processes information and responds. Although the tasks are relatively simple, much of what we know about human behavior has been determined from data collected in reaction-time experiments. A choice reaction-time task and timer are illustrated in Figure 4-5.

An experiment must be of long enough duration, in terms of trials, experiences, or days, for behavior to be altered due to specially designed practice examining the effects of manipulated practice conditions. If too much practice is provided, it is equally possible that special treatment effects will not be revealed when a treatment group of subjects is compared to a control group of subjects. More practice does not necessarily result in better data. Careful consideration must be given to the purpose of the study, and along with a review of previous research completed in related areas, decisions can be arrived at in a meaningful and logical way.

Figure 4-2. Star-tracing task. The star is traced as accurately as possible with the subject viewing it in the mirror. (From R. Fulton, "Speed and Accuracy in Learning Movements," Archives of Psychology, No. 300, 1945.)

Sometimes the question of warm-up trials is an issue. Should any warm-up, or task-familiarization, experience be given to the subjects? Once again, the answer is not simple. If the task is very unusual, there may be good reason for allowing a few task-familiarization trials. Sometimes warm-up trials can confound data, especially if the task is rather easy. Once again, the purpose of the study must be reviewed critically, the advantages and disadvantages of warm-up trials evaluated carefully, and decisions made accordingly.

Motor and Sensory Measurements
The instruments and tasks described in the preceding section, as well as many others, have been used for the experimental analysis of behavioral change, or learning. Yet there may be other purposes for conducting re-search in the motor-learning area. Specific motor behaviors may be of interest, either for the establishment of norms or to determine the effects of such factors as fatigue or drugs on them. Sensory measurements, on the other hand, are primarily used to detect phenomena associated with perceiving Physical stimuli (psychophysics). The level of functioning of sense organs may be related to proficiency in various motor-skill endeavors.

Figure 4-3.
A constructed stabilometer.

Figure 4-4. Complex coordination test. The subject is required to make complex motor adjustments of stick and pedal controls in response to stimulus light patterns. (From E. A. Fleishman, "A Comparative Study of Aptitude Patterns in Unskilled and Skilled Psychomotor Performances," Journal of Applied Psychology, 41:263-272, 1957.)

Instruments to evaluate motor responses or performance of the sense organs are found in many motor-learning laboratories. A dynamometer measures exerted force, or static strength. There is equipment to measure hand steadiness or body sway. Pieces of equipment used in the testing of learning changes can also be used for measuring motor responses, e.g., pursuit rotors for hand–eye coordination and stabilometers for balance.

In regard to sensory measurements, there are apparatuses to analyze depth perception and field of vision. Visual perception is evaluated with a tachistoscope, in which presentation times in sequences of visual displays are varied. An audiometer measures hearing acuity, weighted cylinders help to detect the sensitivity of tactile sense receptors, and positioning tests require proprioceptive involvement. An excellent source book on biomedical instrumentation—an introduction to various apparatus, their design and usage, and applications to the measurement of body functions—is Biomedical Instrumentation and Measurements, by Leslie Cromwell and his colleagues (1973).

Figure 4-5. Choice reaction timer (Marietta Apparatus Co., Marietta, Ohio) and Hunter Klockounter, which measures to 0.001 second (Hunter Manufacturing Co., Iowa City, Iowa).

Duration of Study
Most published research studies on learning reveal data collection that extends over a relatively brief time span. One reason is that the tasks used are not too difficult and the ceiling effect is noticeable before long. That is, potential maximum performance is attained easily and quickly. Performance scores with such tasks are insensitive to improvements as they approach the upper limits set for the task. Upper limits in performance are set by the experimenter or by the very nature of the task and the length of practice or test periods. Quick gains in performance are observed in about the first five trials and an asymptote is approximated after about ten or fifteen trials on such tasks as the pursuit rotor, stabilometer, and star-tracing apparatus. Thus, a study may be completed in from one to four days. Such constraints in laboratory settings bias the data although relatively "clean" data can be observed otherwise, as environmental and personal variables can be controlled in a reasonable manner. In contrast, a ceiling effect is nearly impossible to reach in athletic skills. The criterion of excellence is difficult if not impossible to ascertain. There is always room for improvement.

Of course if long-term retention is under study, an experiment may last a few months or a year. By the same token, long-term retention is not only studied with long retention intervals. If retention of information from long-term memory is of interest, it could be that a retention test is administered only five minutes from the original learning in some studies. An advantage to a short duration study is the general lack of subject mortality, fancy term for the loss of subjects that is usually found to be proportional to the length of the project. The longer a study goes on, the better the chance subjects will miss testing occasions for one reason or another. Consequently, long-term experiments should include more subjects than might be included in short-term ones to compensate for expected subject mortality.

The Dependent Measure(s)
Experiments contain independent and dependent measures. The independent measure is the' variable, of interest, the dependent variable represents the recorded data. For instance, we might be concerned with the effects of motivation on performance. The independent variable, motivation, would be introduced in some form and compared to a control situation. An example might be one group of subjects experiencing the learning of a task where verbal encouragement is constantly offered. Another group of subjects, the control, would learn the task without any comments offered. The dependent measure would be the method of recording the behavior observed. If the task were a pursuit rotor, the dependent measure would probably be the time the subject maintained the stylus in contact with the moving disc during a particular timed trial. If the task involved archery shooting, the dependent measure would be the subject's score.

Although the dependent measure may be an obvious decision in a number of tasks, it is often controversial in others. When there exists a variety of possibilities for recording data, the experimenter dilemma is increased. The severity of the consequences in selecting an inappropriate or less desirable measure from the alternatives should never be understated. Conclusions based on the selected dependent measure may be more or less valid, depending on the experimenter's wisdom.

The classic article by Barick, Fitts, and Briggs (1957) demonstrates the effects of changing the response-sensitivity measures used for subjects performing a tracking task. The size of the target zone (various-sized scoring zones) resulted in varying shapes in the learning curves. The arbitrary selection of a cut-off point (what is considered on-target or off-target) can significantly affect the data and the conclusions derived from them. The magnitude of the errors in performance, for example, seems to be a more justifiable measure than a gross recording of time on or off target. The latter score has limited value in research. Any task that is scored with an all-or-none performance measure instead of considering a continuous and normal distribution of scores is subject to experimental artifacts.

With many tasks it might be wise to record and analyze more than one dependent measure. The acquisition of skill, especially on more complex tasks, usually encompasses the mastery of a number of task components. The analysis of more than one dependent variable may provide additional insight into learning progress. As an example, the oft-employed star-tracing task calls for the performer to trace a pattern as quickly as possible with a minimum of errors. What then should constitute the subject's score? A number of techniques have been suggested, and John Drowatzky (1969) compared five scoring techniques, arriving at interesting conclusions. The methods were
Time required for the completion of each trial.
Number of error committed during each trial.
The product of task completion time and number of errors per trial.
The number 1,000 divided by the product of completion time and errors per trial.
The sum of completion time and number of errors per trial.

Drowatzky's conclusion was that "no one . .. performance measure appeared to fully meet all requirements of an optimal measure of skill, and . . . that evaluation of star-tracing performance should include the measures of completion time, errors/trial, and product of time and errors" (p. 229).

Sometimes the reporting of too many performance variables may be redundant and unnecessary. Obviously, if two measures are really measuring the same thing, there would be little gain in analyzing both of them. For instance, it is fairly common to see three dependent variables: (1) algebraic error, or constant error (CE); (2) absolute error (AE); and (3) within-subject variance, or variable error (VE)-presented in experiments dealing with positioning tasks. These tasks require the subjects to match movements against standards, with error or deviation scores the variable of interest. The less the deviation, the better the performance; Robert Schutz and Eric Roy (1973) have questioned the assumption of the independence of CE, AE, and VE scores and in turn the wisdom in using all of them in the same experiment. Using their definitions:
Error: The algebraic difference (Xij — Y)

Constant Error (CE): The mean algebraic error
(∑▒〖(X-Y)〗)/k=(∑▒(e) )/k

Absolute Error (AE): The average error or mean deviation
(∑▒|X-Y| )/k=(∑▒(e) )/k

Variable Error (VE) : Intravariance, intraindividual variance, and within-subject variance

Or because Y is a constant and therefore does not affect the variance:

Reworking data published in other studies and applying mathematical theory, Schutz and Roy arrive at the conclusion that both AE and VE are quite interdependent and therefore it is unwarranted to report both variables. AE is determined to be completely dependent on CE and CE on VE. The authors call for the elimination of AE as a dependent measure in experiments, for CE and VE are statistically independent and can adequately describe most data.

The Learning Score
Another difficult decision is related to the measure or score that best represents learning, or the effects of practice. Returning to the hypothetical study on motivation, with two groups, one a control and one experimental, what should be used as the achievement measures with which the groups can be compared to determine if verbal encouragement indeed resulted in increased learning over no verbal encouragement in either the pursuit rotor task or in archery shooting? Perhaps twenty trials of thirty seconds duration each were interspersed with thirty-second rest periods in pursuit rotor performance. Shall we compare both groups on their scores attained on trial twenty? Would it be better to work with gain scores; that is, the difference scores between the last and first trials? How about the best score reached by each subject on any one trial? What about the average score for each subject across all twenty trials? Is there a particular formula that might be more sensitive and valid than any of the preceding measured?

As we become more familiar with the research literature dealing with motor learning, it will become apparent that a wide variety of approaches has been described in an attempt to make comparisons between learning situations. As is the case with any step in an experiment, more appropriate decisions will lead to more valid conclusions. Unfortunately there appears to be no easy answer to the question of how learning should be determined. Probably the most widely accepted techniques include some form of arriving at a (1) gain or difference score, (2) average score, or (3) final score(s). The use of the average, or mean, score across all trials administered to the subjects can be defended if no trend is present. Walter Kroll (1967) supports this notion with the use of reliability theory. If a trend appears, alternatives for a criterion measure might include (1) a search for a measurement schedule free of systematic measure error variance, or (2) the use of the high or low score for each subject, with necessary cautions.

The argument of whether to use the best score or average score of subjects on trials appears frequently in the literature. Although Kroll (1967) and Henry (1967) generally favor the use of the average score, Heatherington (197x3) has taken issue with their statistical rationale used in support of their positions. He favors the best score obtained on any trial whenever the measurement error is believed to be small relative to within-subject variation. He raises the question of whether the between-trial variation is normally distributed about the "true" score, especially in cases where an estimation of maximum performance is derived. The relative advantages of the best or average score as "the score" are still being debated, and certainly a number of factors must be considered before a choice can be made for one or the other. Perhaps neither may be acceptable.

In certain kinds of experiments it might be advisable for the experimenter to establish a criterion performance measure that the subjects are expected to achieve. In many laboratory motor-learning studies data are usually analyzed in terms of three different measures:
The total time taken by the subject to reach a given criterion of performance.
The total number of errors made in reaching the criterion.
The total number of trials taken to reach the criterion.

These three measures of learning performance are confounded by several factors, notably:
The learning ability of the subject.
The subject's initial skill level.
The criterion of achievement established by the experimenter.

A satisfactory resolution of these factors that influence data deserves considerable attention.

There are various methods of analyzing changes within practice. Measurement of these changes usually associated with differences between initial and terminal performances indicates the degree of learning that has occurred. McCraw (1951, 1955) has compared many of the possible ways of measuring and scoring tests of motor learning. In each of his two studies, McCraw obtained data from subjects performing two novel motor skills. Learning that has occurred because of practice may be measured by considering such factors as initial status versus final score, difference be-tween first and last scores, and percentage of improvement. Also, there is the problem of ascertaining the number of trials that should represent the first score as well as the final score. How many trials are necessary for warm-up and task familiarity are open to question, but most authorities agree that at least a few should be provided before an actual initial score is recorded.

In attempting to reconcile these problems, McCraw (1955) formulated eight methods, based on those found in other experiments, to score the practice effects on learning two tasks. Some of these procedures were
Total Learning Score Method. Consisted of cumulatively adding all the trial scores during practice.
Difference in Raw Score Method No. I. Required finding the difference between the final and initial trials.
Three Per Cent Gain of Possible Gain Method. Represented by the formula:

((Sum of last N trials) minus (Sum of first N trials))/((Highest possible score of N trials ) minus (Sum of first N trials))

Two Per Cent Gain of Initial Score Method. Depicted by this formula:

((Sum of last N trials) minus (Sum of first N trials))/((Sum of first N trials))

McCraw reported considerable variability in the scores as yielded by the diverse means of measuring improvement. As to a comparison of methods, he states that the most acceptable appear to be those that relate gain to possible gain while the least desirable are those that interpret gain in relation to the initial score. The Total Learning Method and the Three Per Cent Gain of Possible Gain Method were the most valid measures in com-paring individuals with dissimilar initial scores. The author generally found little relationship between the various scoring methods, i.e., each yielded different results.

Thus, it can be seen that varying the techniques for measuring improvement results in dissimilar outcomes and interpretations of the data. The nature of each study must be scrutinized before the procedure of data analysis is selected, although some methods are apparently more acceptable than others. One of the difficulties in determining laws of learning pertaining to any factor is the variation in design from investigation to investigation, including the selected method of data analysis.

In the typical case of measuring change that is the result of practice, the "raw" change score is computed, which consists of the difference between a pretest and a posttest on the variable of concern. The definition of this measure from variable A to variable B ( e.g., pretest A and posttest B) is B — A. The usual contaminant in this methodology is that B — A is negatively correlated with A. That is to say, the higher the pretest score, the smaller the gain score. If a person had had no experience with the task and was beginning at a zero level of proficiency, any achievement later would be a gain. This assumption is reasonable with very unusual tasks, such as learning nonsense syllables and mazes. In real situations, however, every-one comes to the "new" learning situation with some previously related experience. Consequently subjects start at different skill levels. If we measure progression in a task where thirty points is maximum, we cannot at all assume that the gain from zero to twelve is the same as from twelve to twenty-four points. As the potential ceiling (score limits) are closer to being reached, gains are much more difficult to demonstrate. The same gain score for any two people, in this case twelve, does not truly reveal enough information when starting points are dissimilar.

In particular situations it is of advantage to have a change score that is dependent of the pretest. An alternate to the raw change score is the residual change score, also known as a base-free measure, where final scores are uncorrelated with initial scores (see, for example, Tucker et al., 1966). The portion of the posttest that is linearly predictable from the pretest is eliminated. The residual gain procedure yields estimates of deviations from the expected scores of individuals (or groups ). A zero residual gain means that the actual gain for the individual or group was identical with the gain that was predicted from a knowledge of the pretest score by linear regression techniques.

Franklin Henry (1956) used three motor learning experiments with the tasks in each involving (1) jumping, (2) speed of arm movement, or (3) balancing. As expected, he found raw learning scores to be unrelated to final skill accomplishments but negatively correlated with initial performance. He shows how the use of the residual method can alleviate this circumstance. It estimates the individual learning that would have occurred had all subjects begun at the same initial skill level.

Another model with a number of variations that has been supported in the literature is the true change approach. If the pretest and posttest were measured without error (experimental contamination), then a true change score is observed. Because this is rarely the case, some estimate must be made of the observed gain in performance attributable to measurement errors so that an estimate of true gain is possible. Regression equations have been proposed to handle this problem, and these models as well as other approaches are summarized by Chester Harris (1963).

An excellent summary of the many residual gain and true change models is offered by Lee Cronbach and Lita Furby (1970). Relative strengths and weaknesses are discussed. Interestingly enough, after laboriously rep-resenting the various measures of change, these writers argue against their usage. Assuming equality between groups of subjects at the start of the experiment as well as errors of measurement on the posttest that are randomly distributed, a simple posttest comparison would do. Other suggestions are made for specific situations.

Nevertheless, let us examine sample equations to estimate true change scores (Davis, 1964, Chapter 11). Data were collected in our laboratory with six subjects forming one group. They were pretested and posttested after ten trials on the pursuit rotor. In a typical study of change an initial measure (A) is obtained on each individual in a group, a "treatment" is given, and a final measure (B) is then obtained. If A and B are measured without error, then g (B–A = g) is the true change. But if A and B are fallible the conclusion is different. In the following, the true change, G, is estimated for each individual as distinct from the observed change, g.

If we let A stand for one's initial score, B for one's final score, and G for an estimate of one's true change in the period between the two measurements, equation (11 should be used to obtain a numerical value for G.

Equation ( 4) involves WA and WB as well as the average score on initial trial (denoted A) and on final trial (denoted B).

The standard error of measurement of any estimate of true change found by equation (1) is defined the following equation (5).

meas G is multiplied by t(α, df = n - 1) to obtain the smallest change significant at the α level.
Data fitted to the formulas:

The smallest change significant at the 0.05 level is 2.3387.

Conclusion: It is concluded that R. M., S. I., L. K., M. L., and L. G. made real gains in pursuit rotor task during the practice period (ten trials), but not K.I.

Whereas the preceding model allowed us to determine individual significant gains in performance, Patricia Hale and Robert Hale (1972) have developed a technique to assess relative improvement scores more accurately when subjects begin an activity with varying levels of skill. Students could be ranked as to their relative true gains. Using an example in archery shooting and with special implications for grading purposes, the technique is especially sensitive to the large increases shown by beginners and the smaller increases revealed by the more advanced. As has been pointed out before, the higher the quality of performance, the more difficult it is to improve.

Figure 4-6.
Individual differences in initial and final trial scores.

Using the Hale and Hale technique, let us see what happens when two groups are compared. The data are derived from six subjects performing a stabilometer task for ten trials with vision and ten trials without vision. The subjects were divided in half, each subgroup performing the sequence differently, one with vision, then no vision, the other without vision, then vision, to counterbalance treatment effects. Notice that under both conditions, the raw rank difference scores between the pretest and posttest are not the same. In other words, subjects ranked in different order under each condition.

Figure 4-6 illustrates the differences among and within subjects in initial and final time-on-balance stabilometer performances. Under each condition, with vision (V) and without vision (WV), all but one of the six subjects demonstrated an increase in time on balance. In Figure 4-6 these gains (and losses) are ranked by the numerical difference between trial 1 (initial) and trial 10 (final) scores. Ranking by the absolute gain scores ignored the evident differences between the initial ability of the subjects.

Application of the Hale and Hale equation to the data converted each score exponentially. Improvement gains from the differing initial scores 1 were converted into comparable units. The converted scores took into consideration the initial position, the gain, and the difficulty of the gain from the initial position. The A per cent scores in Table 4-1 are the weighted improvement percentage scores representing, for each subject, the amount of progress in relation to the initial score.

The following calculations typify the application of the exponential equation to the initial and final scores of each subject. The coefficient cc (constant), appearing in both the numerator and denominator of the original equation, is determined by the formula

where Smax is used to fix α , the coefficient which measures the difficulty of attainment for a given skill. In those activities with a top limit score occasionally attainable by α subject,

Table 4-1.
Comparison of Numerical and Adjusted Performance Gain Scores

Hale and Hale suggest that Smax be set equal to 100. Because it was possible for a subject to obtain the top score of 30 seconds on-balance, Smax was set equal to 100 for all calculations. The denominator X represents the maximum obtainable score for a given skill and was set equal to 30, the time limit for each of the ten stabilometer trials. Substituting the appropriate numerical values,

The denominator of the original equation can then be calculated:

As can be seen, the denominator, e^(∝^x )=1, is equal to the selected value of Smax. The numerator of the original equation requires x, the S's initial performance score; and ∆_x, the change in x from initial to final performance. Using the V performance scores of subject 4 (Table 4-1), the term becomes

(x + ∆_x) = 27.61 + (29.26 - 27.61) = 27.61 + 1.65 = 29.26

or the subject's final performance score.

Substituting the values calculated in other equations, the original equation for subject 4 is

As can be seen in Table 4-1, ranking performance gains by the numerical differences between initial and final scores identified subject 4 as fourth with a gain in on-balance performance of 1.65 seconds. However, taking into account initial position, the amount of gain, and the difficulty of the gain from that initial position, the subject's adjusted gain score (A per cent) becomes 19.91 per cent. S-4 actually ranked second among the six subjects in performance gain under vision conditions.

Most learning-change measurement techniques are useful for describing individuals, as the preceding model indicates. However, when encompassing groups other procedures might be used, such as those suggested by Davis in his book (1964) and through personal correspondence. Note the following equations.

The standard deviation of G ̅_1and G ̅_2 (where these denote the average raw-score gains in samples 1 and 2) are:

The two subscripts 1 and 2 indicate the two non-overlapping samples drawn at random from the same population. The symbol s refers to an estimate of the population standard deviation.

To test the null hypothesis that G1 = G2, where the tildes denote that these are true means, the t ratio is given by equation:

where an estimate of the denominator is the conventional

because the samples are non-overlapping. This procedure is especially reliable in that it requires no reliability coefficient for the pre- or posttests.

When using raw gain scores, the assumption is made that the groups of subjects to he compared are equal at the start of the experiment on the parameters that might be influential on the performance outcomes. This may occur, at least theoretically, when large enough samples of subjects are randomly placed into groups and the groups in turn randomly assigned to specific treatments. But there are instances when it is known that the groups are in fact dissimilar prior to any administered treatments. For in-stance, it is conceivable that one group of subjects might show a lower pretest score than another group in the same experiment. Or one group might, for some reason, possess a greater prominence of an influential factor, say desirable body builds, for the learning of a particular activity. In many (cases, an analysis of covariance will be the statistical tool that can adjust posttest scores according to pretest differences or differences on a variable(s) of influence. Yet the limitations of covariance should be recognized. If differences exist between the groups prior to the experiment other than in the covariate (e.g., pretest), confounded data will still occur.

LEARNING CURVES
One leading method of depicting skill acquisition is through the use of the learning curve. The curve is a graphic illustration of practice trials versus performance and an indicator of what one or more individuals accomplish from trial to trial. The abscissa line (horizontal) usually corresponds to trials or days of practice, and the ordinate line (vertical) represents the unit of measurement. Measurement might be in terms of points made, errors, or some other score. Typically, the vertical units are laid out to rep¬resent two thirds of the graphic dimension of the horizontal units. The appearance of the curves and their general interpretation can be influenced by the manner in which they are presented. Many factors, such as method of practice, administration of practice sessions, method of measurement, and nature and level of skill, and age of the subjects, will result in different curves for the same practiced skill. Some curves reflect factors facilitating or hindering performance, and as such are not truly representative learning curves.

Although it is extremely difficult to obtain a true learning curve, four distinct types probably exist. Practice conditions, the nature of the task, and the learner's abilities and organismic state will be reflected in the type of curve obtained from the data. It should be emphasized here that typical examples of learning curves that are found in real-life examples have been smoothed out in order to make it easier to follow any apparent trends in skill acquisition. Actually, great irregularities usually exist from trial to trial and performer to performer.

Figure 4-7 presents four typical smoothed curves for a limited practice session. Curve A has been termed a negatively accelerated curve. Greatest gains are made in the early practice trials, with decreasing improvements in later trials. A leveling-off point appears to be reached, but positive gains, ever so small, still are occurring. This type of curve usually denotes the learning of a skill that is relatively easy and where insight into the skill occurs quickly, as exemplified by the satisfactory performance on the early trials. As upper levels of skill are reached quickly, improvement diminishes, for little is left to be mastered.

Curve D is an example of a positively accelerated curve in which performance is poor in the early trials but increases from trial to trial. Although it appears to leave no upper limit, this curve would ultimately level off with practice. The example offered in Figure 4-7 represents relatively few practice trials. so the curve appears to be accelerating indefinitely. Curve B, the linear curve, is essentially a straight line. This curve has been obtained in a few cases where proportional increments are noted from trial to trial. It too would become asymptotic with increasing trials. Curve C. which is an S-shaped curve, indicates positive acceleration, approaching linearity, and finally negative acceleration. It contains many of the qualities of the other three curves.

Figure 4-7.
Typical learning curves

It is important to note that no single curve of learning exists, that the nature of the skill or the learner will be reflected in the manner in which he acquires skill. Some psychologists, Thurstone (1930) and Estes (1950), for example, have attempted to form mathematical equations in order to predict and fit learning curves. Varying degrees of success have been achieved in this endeavor. Certain data might very well be described by one of the equations whereas this same operation would not be appropriate for other data. However, Culler (1928) and Culler and Girden (1951) present evidence which seems to indicate that complete learning curves are ogive (S-shaped). They employed a mathematical equation that is proportional to the product of the amount already learned and the amount remaining to be learned before the limit of learning is reached. As of the present time, there is no equation that would fit all types of data. Therefore one concludes that the nature of the learning task (e.g., motor skill nonsense syllables, prose, or puzzle), and its degree of difficulty as well as the nature of the learner, determine the method in which the task is learned.

The method of measurement will often determine the smoothness or irregularity of a curve. Although a typical learning curve is obtained from practice trials completed in one or a few meetings, Figure 4-8 presents data collected on one discus thrower in competition during his four years at college. No doubt growth and development factors influence the curve, but it is presented for two reasons: (1) the similarity between this curve and a typical learning curve and, more important, (2) a comparison of three methods of measuring performance and each one's effect on the curve.

Figure 4-8.
Three methods of analyzing performance. Representative of one discus thrower's record for four years.

Coach Carl Heldt of the Illinois State University track and field team has collected typical data such as these on all his performers during his many years as a coach.

It can be observed that when more scores are averaged together, the lines become smoother. The most irregular line represents the best throw each meet, a smoother line is obtained by averaging five throws per meet, and the smoothest curve is derived from the average throw each season. The athlete's record indicates fairly consistent improvement from meet to meet and year to year, although irregularities in performance are clearly apparent.

Because it would be erroneous to believe that every athlete's record would be comparable with the preceding case and improve so consistently, three other athletes, all javelin throwers, have their performances recorded by the five-throw-per-meet-average method for comparison in Figure 4-9. Note the differences in the curves, many of which may be explained by such factors as a student getting married, loss or gain in motivation, proper or improper understanding of mechanics, poorer or better health, and, in general a host of psychological and physiological variables.

Figure 4-9.
Individual performances of three javelin throwers
during four years of competition.

If these curves were smoothed by such a method as averaging the throws per season, they would more nearly approach those curves found in Figure 4-7. Poorer records, caused by backache, marriage, or lack of interest are really factors in performance and not truly indicative of learning as such. True curves of learning should reflect positive increments, event if they are so slight as to be outwardly unobservable. The plotting of individual trials results in more noticeable trial increments and decrements and can serve a definite purpose. Without any major detrimental factors operating, aver-aged practice trials should yield positive learning performances.

Figure 4-10 illustrates learning curves for two different groups of subjects receiving ten practice trials each in the same time period on the stabilometer (an apparatus used to measure balance ability). The time in each trial in which the board was not ideally balanced is recorded, and the curve decreases because performance was plotted against time off balance. If it were plotted against time on balance, the curve would go upward in¬stead of downward, and it should be observable in either case that there is improvement in performance, hence learning, within the trials allocated for the experiment. If the curves are smoothed further, they would approach the typical negatively accelerated curve.

Figure 4-10. Stabilometer performance as a result of practice. (From R. N. Singer, "Effects of Spectators on Athletes and Nonathletes Performing a Gross Motor Task," Research Quarterly, 36:473¬482, 1965.)

The shape of the learning curve is greatly dependent upon the response measures utilized in the plotting of the data. The curve, as pointed out in the article by Bahrick, Fitts, and Briggs (1957), referred to earlier in this chapter, may be misinterpreted at various stages of practice, and effects deemed important are in reality confounded by the artifacts produced. These researchers used a tracking task to illustrate the effects of errors in the measurement of learning curves. However, they believe that their data have implications for a variety of tasks and conditions.

W. K. Estes (1956) supports the notion of group curves, data averaged across subjects for each trial. They are useful for summarizing information and for theoretical interpretation. However, he is also against transfering inferences made from group learning curves to individual learning curves. Although the form of the averaged curve does not determine the forms of the individual curves, it does provide a means for testing hypotheses about them. The risks in making generalizations from group to individual learning curves are powerful, and Estes discusses them especially as experimental and statistical violations might be involved. We should always remember that much information about individual learning rates is lost when data are averaged.

Another aspect of learning and performance curves concerns practice to an asymptote as a criterion used in certain learning studies. A task is usu¬ally thought of as being "learned" when stabilization of performance be¬comes apparent. In real-life skills, as in sports, years of practice contribute to proficiency and consistency. Yet, in a few practice sessions, it is not unusual for researchers to expect similar occurrences with relatively simple laboratory tasks. James Bradley's (1969) data, on one subject administered thousands of practice trials, led him to conclude that asymptotes are not truly reached. A subject could still be learning or showing upswings or downswings in behavior, depending on situational and personal variables. Bradley favors doing away with practice to an asymptote as a means of eliminating unwanted learning effects in an experiment. He feels that group means might mask individual scores. Although precautions might be taken against unwarranted overgeneralizations, there are many types of learning experiments in which the analysis of specified learning phenomena depends on the learners' attainment of a relative degree of stability in performance. A plateau in performance, to be examined next, is an excellent example of a potential temporary asymptote (perhaps even a nontrue asymptote).

PLATEAUS
Almost everyone in a lifetime experiences a frustrating point of no apparent improvement in performance though the task to be learned is practiced over and over—maybe the golf game, which consistently stays in the low 90's, or perhaps the bowling average, which always remains about 145. Specific skill acquisition or general sport performance appear to level off and may remain there seemingly forever or for a short period of time before an acceleration occurs in performance.

This phenomenon has been termed a plateau in the learning curve. A plateau represents stationary performance preceded and sometimes fol-lowed by accelerated learning increments. It is a condition that has not been found to occur in many experimental learning tasks, but the classic example of a plateau in the learning curve is the one obtained by Bryan and Harter (1897). The graphic illustration taken from their data and presented in Figure 4-11 refers to one's ability to learn to receive telegraphic signals in the American Morse Code.

A hypothesis set forth to explain the plateau in learning is that there is a hierarchy of habits to be mastered by the individual when he attempts to learn a complex task. After succeeding in the first order, he may be fixated at that level for some time before becoming able to integrate the patterns needed in the second-order habits. Information is consolidated and reorganized. An example of this situation is found in tennis. First-order habits might be the acquisition of basic strokes and skills underlying the sport, such as learning to stroke the ball when in a stationary position. Second-order habits could include hitting the ball while on the move, and a third-order category might include the integration of effective movement patterns in the game situation. Theoretically, depending on the manner in which the sport is taught and the performer involved, a plateau could occur at any one of these transitional periods.

Figure 4-11. Learning curve with plateau representing telegraphic coding. (From W. Bryan and N. Harter, "Studies in the Physiology and Psychology of the Telegraphic Language," Psychological Review, 4:27-53, 1897.)

After a period of time in which one attempts to transcend from one hierarchy of habits to another, insight manifests itself in the form of an integration of past learned responses and new ones to be utilized. The curve accelerates sharply until the next plateau. Because actual plateaus are rare in experimental evidence, possibly because of the difficulty in setting up investigations that might demonstrate this phenomenon, we must draw more from theoretical implications and everyday experience. The latter evidence appears to indicate the reality of such an effect in the acquisition of skill, at least in some of the more complex sports. But damage to the concept of plateaus is presented by F. S. Keller (1958), By the title of his article, "The Phantom Plateau," one can surmise the content.
Two questions naturally arise concerning the plateau in the learning curve and its theoretical explanation. (1) Will the manner in which the skill is taught affect the learning curve and any possible plateaus? In other words, a complex activity may be treated in parts and learning directed toward gaining insight into the parts in a progressive manner. Will this method of approach lessen the possibility of a plateau occurring during skill acquisition more than if the skill is taught as a whole? (2) Is there really such a thing as the plateau in the learning curve? One reason for the leveling of performance may be a loss of motivation. If the learner is continually motivated, a plateau in performance might not be observable.

Concerning the factor of motivation, disappointment and discouragement when not improving in performance make it even more difficult to advance the learning cause. Perhaps all these factors—task complexity, hierarchy of habits, interest, and frustration—contribute to the plateau, if indeed it does exist. Many other questions remain unanswered. No definite conclusions may be reached at this time until further experimental evidence is examined.

If, in fact, plateaus can occur in the learning of real-life activities (although not demonstrated with artificial laboratory tasks), several recommendations could be made to remediate the situation.

Attempt to maintain the ideal level of motivation throughout the learning experiences.
Attempt to maintain the learner's attention to the appropriate cues so that practice is not wasteful but instead meaningful.
Watch for fatiguing conditions that might be detrimental to learning, where inappropriate responses are released.
Analyze the final performance goal, breaking down the activity to smaller units so that transitions are smooth and logical from one performance level of the activity to a higher-level expectation. Pushing the learner fast in a complex activity places hardships on his ability to apply lower-order learned skills to higher-order ones that must eventually be mastered.
Analyze the learner's physical development. He might possess physical capacities to perform a task at a certain level of proficiency, but will need further development if higher-order skills are to be demonstrated.
Understand the learner's level of aspiration. Low goal levels result in lower performances, whereas higher but realistic goals will inevitably increase performance output.

MOTOR-LEARNING LABORATORIES AND EQUIPMENT
In order to investigate learning processes, the effects of training manipulations, learning and performance correlates, and other related considerations, decisions must be made about locations, subjects, and learning tasks. A laboratory in the formal sense of the word connotes an isolated area in which extremely controlled testing can occur. Motor-learning laboratories exist in many physical education departments, experimental and engineering psychology areas, and military and aerospace programs. Often they are equipped with a minimum amount of expensive equipment (much hand¬made apparatus) or, to the other extreme, computerized controlled operations. Depending on the sophistication of the experimenter, the equipment, the testing conditions, and the type of learning phenomenon investigated, the data will usually be handled in a technical manner, contributing to theory and knowledge, with implications for practical conditions.

If we expand our interpretation of the laboratory or consider field testing environments, subjects are tested under more real-life conditions and with more familiar motor tasks, often not involving equipment. These data are usually more directly applicable to programmatic or instructional concerns. As we will see, studies in the area of motor learning range considerably as to testing conditions, learning phenomena, and performance variables investigated.

With the assumption that you can readily comprehend how research might be conducted in gymnasiums, classrooms, and other familiar situations, let us discuss the nature and use of more formal established laboratories. Because one of the major thrusts in the motor-learning area is to determine how we learn skills and how we can learn them more effectively and efficiently, the identification of the "right" task is of paramount concern. As was mentioned earlier in this chapter, the need is for novel and unusual tasks, those with which the learner is unfamiliar so that processes and the effects of situational manipulations can be examined in a technical manner. These tasks can be purchased, but in many motor-learning laboratories they are constructed to fit the needs of the experimenter and the particular area he is investigating.

Such tasks so commonly used as pursuit rotors or star tracers can be purchased or made. Stabilometers and positioning tasks can be easily made. [See illustrations and discussion earlier in this chapter but especially the laboratory manual developed by Singer, Milne, Magill, Powell, and Vachon (1975) for a much more intensive description of a variety of purchased and constructed tasks.) Simple tasks can effectively provide answers to problems in the motor-learning area. Because the usual data are recorded in the form of speed of performance and/or accuracy, timers and counters are often hooked up to constructed or purchased equipment. It is quite usual to see motor-learning laboratories heavily armed with workshop tools and materials as well as electrical and electronic accessories. Learning experiments usually make innovative demands upon the researcher in a variety of forms, one of which is task development and utilization.

Figure 4-12. The Automatic Performance Analyzer (Dekan Timing Devices, P. 0. Box 712, Glen Ellyn, Illinois).

Besides those already illustrated, another versatile piece of equipment is the Automatic Performance Analyzer (Figure 4-12), which can be used to time a variety of events. Various stimuli and response modes can be adapted with the Analyzer. One possible application of the Analyzer is illustrated in Figure 4-13. As described elsewhere (Singer, Llewellyn, and Darden, 1973), the Analyzer was used with an interval timer, a 0.01 second performance timer, and a photoelectric relay system. The subject began the test (to determine reaction and movement-time scores under specified conditions) seated in a ready position with the index finger of his dominant hand depressing a key. Upon illumination of a light on the interval timer (randomly timed following the preparation signal) the subject, as quickly as possible, removed his finger from the starting key and moved his hand through the ray of the photoelectric relay system in a prespecified direction. The elapsed time from the illumination of the light stimulus to the release of the starting key was recorded as the subject's reaction time. Upon release from the key another timer was initiated. When the subject's hand passed through the ray the second timer was deactivated, providing a measure of movement time. Note in Figure 4-13 that the experimenter and subject were separated with the use of a divider so as to minimize any experimenter and subject interaction effects.

Figure 4-13. Arrangement for testing reaction time and movement time. (From Robert N. Singer, Jack Llewellyn, and Ellington Darden, "Placebo and Competitive Placebo Effects on Motor Skill," Research Quarterly, 44:51-58, 1973.)

Space is needed in any laboratory to store testing equipment and shop tools and materials. Space is also needed to construct equipment and to arrange various testing configurations. Another space consideration is testing privacy through the use of separate rooms or isolation booths. Distractions can contaminate data. An illustration of how pursuit rotor testing can occur without the noticeable presence of the experimenter appears in Figure 4-14. The subject operates the rotary pursuit apparatus in an isolation booth. The interval timer connected to the rotor provided the subject with a 5-second warning in the form of an illuminated light prior to the initiation of each trial. A series of trials, with tests and rests of 20-second duration, was administered. Performance was measured by time on target for each trial, using a 0.01-second timer located outside the isolation booth. The experimenter recorded the time on target for each trial and then reset the timer.

The apparatus and testing situations described thus far are relatively simple. Expense is small and testing arrangements are not difficult to formulate. With increasing sensitivity of the experimenter for control of extraneous sociopsychological variables as well as advances in technology and lowered prices of computers and solid-state systems, behavioral data are being collected more validly under improved testing conditions. Small-scale computers and solid-state programmers are becoming more widely used in behavioral laboratories, resulting in controlled cue presentations and response recordings.

Figure 4-14. Arrangement for testing pursuit rotor performance. (From Robert N. Singer and Jack H. Llewellyn, "Effects of Experimenter's Gender on Subject Performance," Research Quarterly, 44:185-191, 1973.)

For instance, the model presented in Figure 4-15 is very versatile and inexpensive. The experiment can be programmed on a plugboard, which is inserted in the console control panel with the appropriate stimuli input and response output accessories. The subject's performance data can be recorded with timers or counters. The user must be familiar with digital logic. Preprogrammed experiments free the experimenter's time and usu¬ally contain more precision and control than nonprogrammed experiments.

Figure 4-15. The Digilab, a portable soli-dstate system for programming experiments. (From BRS Foringer, 5451 Holland Drive, Beltsville, Maryland.)

Figure 4-16. Experimenter's display, with the BRS programmer and appropriate accessory equipment, for reaction-time-movement-time experiment. [From Richard A. Magill and Frank M. Powell, "A Consideration of Equipment and Method as Experiment& Variables in RT-MT Experiments". (unpublished study, 1974).]

In an experiment done at Florida State University to determine relation¬ships of factors associated with reaction time and movement time, Magill and Powell (1974) used the Foringer model. Two millisecond timers were used to record RT and MT latencies to a visual stimulus. The stimulus warning light, stimulus light, random presentation (1 to 4 seconds of the stimulus fight following the onset of the warning light), intertrial interval, total experimental time, number of stimulus presentations, and number of subject responses were controlled and recorded by a BRS Foringer DigiLab (DLC-002 ). The DLC is a portable solid-state digital logic system allowing great flexibility in programming. Because the experiment was controlled from a room adjacent to the testing area, the experimenter communicated with the subject by means of an intercom. A white-noise generator provided a masking noise to the subjects of 72 decibels intensity at ear level. During verbal interaction the ambient noise level was automatically attenuated to 66 decibels. Task instructions were prerecorded on magnetic tape and presented to each subject via the intercom. Displays are presented in Figures 4-16 and 4-17.

Regardless of the available financial and apparatus resources, the crux of any model experiment rests with the researcher's talents and abilities. Being aware of experimental control factors and understanding the nature of the learning phenomenon under investigation lead to a quality research project. We can all be envious of working in the most elaborate settings and with the finest and most expensive apparatus. But there are many examples of experiments that have made outstanding contributions to learning theory and practice with a minimal amount of costly equipment.

Figure 4-17. Subject's display, in separate room from experimenter's display, for reaction-time–movement-time experiment. [From Richard A. Magill and Frank M. Powell, "A Consideration of Equipment and Method as Experimental Variables in RT-MT Experiments" (unpublished study, 1974)]

RESEARCH METHODOLOGY AND EXPERIMENTAL DESIGN
The scientific study of learning and behavior is based upon research evidence. Research refers to a product completed in a systematic, controlled way, using a formalized type of process, and following a scientific method. By systematic, we mean that there is a definite arrangement of the processes used. They are not undertaken in a haphazard way. Certain generalized rules must be followed in order that confidence can be held in the results of the study. And the use of the word formal implies a similar interpretation. There are formal rules governing the conducting of research. The planning and execution of the study do not transpire solely out of logic, intuition, and reasoning. There is a set of concrete interrelated steps one follows.

The use of the word control means that the investigator has control over the situation. The study does not run him; he runs it. He is the one who selects the subjects and the content matter. If it is an experimental study, he chooses the task and the way the subjects will be manipulated and tested. Control is thus exhibited in numerous ways. He does not, of course, have total control as to how the subjects will perform in the experiment. Nevertheless, valid, or true, measurements are, one hopes, obtained. The researcher tries to exhibit control by successfully manipulating variables that will result in true results—not in chance, haphazard outcomes.

All of these procedures have to do with scientific methods of analyses. At one time, researchers spoke of the scientific method of analysis. Books on research methodology consistently inferred that there was only one scientific method. Well, that is simply not true. There are actually a number of acceptable ways in which one can conduct research, and a given scientific method may be applicable in a particular situation. Distinctions may be made in the nature of research projects ( e.g., historical, descriptive, and experimental), although there is a certain amount of formalized routine that is followed from study to study. Thus the novice should be aware that there are distinctions within research and acceptable types of procedures and techniques, as well as overlapping.

Gaining Information
To know something, to gain information about the universe or behavior, or any aspect of life, research is not the only answer. Other techniques besides a scientific approach may be reasonably acceptable. Knowledge can be obtained through at least three approaches.

Intuition and reason are used by all of us every day. Perhaps some of us do not use them enough. Some involved in research, it is argued, should call upon reason more when designing experiments or interpreting results. Nevertheless, by only reacting according to rationale, we are in danger of being too subjective in the way we view things. Objectivity is tossed aside. One must consider this when appraising or evaluating anything. When you say you know something—how do you know it? Through belief, reason, feeling, thought? How accurate is this process? Is there a better way? At any rate, reason and rationale are approaches to understanding and gaining information.

A second technique employed fairly often is the acquisition of beliefs through authority, a person directing another's thoughts or actions. We listen to individuals who are in positions of power, and have a tendency to go along with everything that is said. Are their statements all truisms? It is true that authorities provide a way to learn. Is it the best arrangement? You have to decide for yourself if being told what is right and wrong is desirable. Naturally some knowledge can be assimilated through the authoritarian approach. However, the danger exists of being misinformed or misled. Furthermore, the discouragement of probing, questioning, and problem solving can also be detrimental to the learning process.

Finally, the third avenue to knowledge is a scientific approach, in which some sort of formalized system and more objective ways are used for obtaining answers to questions that disturb us. Who discovered the game of baseball? I can tell you who I think invented it. The answer can also come from a book. But perhaps you are not ready to accept these sources and the information offered. So you take a year and examine all possible re¬sources in the archives and determine that Abner Doubleday probably did not invent the game of baseball after all, as is popularly thought. The circumstantial evidence would lead us to believe that he could not be responsible for its origin although, for some reason, scattered evidence has been passed on through the years suggesting that he did invent the sport.

Nonacceptance of "factual" material or the viewpoints of authorities can be countered by a personal undertaking of scientific processes to establish the "truth." In the case of baseball, you did not accept an established "fact." Much time and effort was spent in order to come closer to the truth. Of course, each of us would not care to utilize such a lengthy and precise process to ascertain the answers to our questions. But we can scrutinize the means by which others have arrived at their conclusions and determine their credence.

There is danger at times in accepting statements based on belief and intuition, or materials unquestioningly passed on through the years, as was the case with Doubleday and baseball. This point is quite apparent today, for the current generation of youthful scholars no longer accepts anything. Maybe the pendulum is swinging at too great an extreme today in this direction. But there is nothing wrong with probing, questioning, and searching. In other words, then, the product of a scientific method enables us to have greater objectivity and affirmation of the thoughts we hold on a given topic. To advocate scientific processes does not exclude other means that might yield solutions. Solutions may be found through logic and reasoning, or words written or spoken by authorities. But our greatest strength will come from answers that we actually have to find for ourselves, by using scientifically acceptable techniques in the search.

The Experiment
Experimentation is crucial in scientific research. Most of the research data reported in this book were collected in experimentally designed studies. In the classical sense, an experiment is a way of formulating and testing hypotheses. We begin with a hunch. These hunches provide direction for our thoughts on problems, and an experiment allows us, in a formal setting, to see if they are confirmed. With the experiment a planned attack is made in a particular problem. Typically, through well-constructed operations, changes are induced in natural events and results are observed, recorded, and analyzed. This is the outstanding feature of experimental research as constrasted with descriptive research; there must be some manipulation or change in at least one variable under study.

Historically speaking, concepts and implications of experimentation have been diverse and misunderstood. At one time people expected miraculous answers from experiments but were, instead, disappointed and disillusioned. It is now understood that experiments do not provide once-and-for-all answers. Facts are not necessarily permanent, and numerous experiments, constantly being refined in technique, provide tentative solutions to problems.

This is especially true of experiments dealing with human behavior. The degree of control and sophistication in an experiment in physics or chemistry is potentially far greater than in the human behavior experiments of psychology, sociology, physiology, or physical education.

The experimenter needs a spirit of curiosity, or inquiry. A search for the truth, a dissatisfaction with the present state of knowledge, a desire to resolve an issue in a scientifically acceptable manner are all involved. Background information is necessary before an experiment is attempted. Experience, reading, and communication contribute to the investigator's level of understanding of the problem, enabling him to raise legitimate questions, conceive of reasonable hypotheses, and formulate adequate experimental designs.

Because statistics and experiments often go hand in hand, an understanding of mathematics is most helpful. Statistics aid in various steps from the initial to the terminal stages of the experiment. Finally, good old-fashioned common sense is needed at all points throughout the experiment. The investigator must make numerous decisions as he proceeds, not all of which can be guided by directive statements. Evaluation of suggested procedures, analysis of the particular experimental conditions, and competencies to handle unpredictable and unexpected occurrences all require common sense. Through these and other personal qualities, the experimenter examines theories and statements, formulates workable hypotheses for a particular problem, and executes the investigation.

IMPORTANCE OF RESEARCH IN MOTOR LEARNING
As can be seen, research, and more specifically experimental research, is primarily associated with information gathered about the acquisition and performance of motor skills. Consequently, great care must be shown in the planning of studies, the collection of data, and the conclusions made from these data. It is a tremendous challenge to meticulously design an experiment that yields valid data, further advancing our state of knowledge.

REFERENCES
Bahrick, HARRY P., PAUL M. Frrrs, and GEORGE E. BRIGGS. "Learning
Curves—Facts or Artifacts," Psychological Bulletin, 54:256-268, 1957.
BRADLEY, JAMES V. "Practice to an Asymptote?" Journal of Motor Behavior,
1:285-296, 1969.
BRYAN, W., and N. HARTER. "Studies in the Physiology and Psychology of Telegraphic Language," Psychological Review, 4:27-53, 1897.
CROMWELL, LESLIE, FRED J. WEIBELL, ERICH A. PFEIFFER, and LEO B. USSELMAN. Biomedical Instrumentation and Measurements. Englewood Cliffs, N.J.: Prentice-Hall, Inc., 1973.
CRONBACH, LEE J., and LITA FURBY. "How We Should Measure "Change'. —Or Should We?" Psychological Bulletin, 74:68-80, 1970.
CULLER, E. "Nature of the Learning Curve," Psychological Bulletin, 34: 742-743, 1928.
, and E. GIBDEN. "The Learning Curve in Relation to Other Psycho
metric Functions," American Journal of Psychology, 64:327-349, 1951.
DAVIS, FREDERICK B. Educational Measurements and Their Interpretation.
Belmont, Calif.: Wadsworth, 1964.
DROWATZKY, JOHN. "Evaluation of Mirror-Tracing Performance Measures
as Indicators of Learning," ResearchQuarterly, 40:228-230, 1969. ESTES, W. K. "The Problem of Infel-erfc-c- rom Curves Based on Group
Data," Psychological Bulletin, 53:134-140, 1956.
"Toward a Statistical Theory of Learning," Psychological Review,
57:94-107, 1950.
HALE, PATRICIA W., and ROBERT M. HALE. "Comparison of Student Im-provement by Exponential Modification of Test-Retest Scores," Research Quarterly, 43:113-120, 1972.
HARRIS, CHESTER W. (ed.). Problems in Measuring Change. Madison: University of Wisconsin Press, 1963.
HEATHERINGTON, Ross. "Within-Subject Variation, Measurement Error, and
Selection of a Criterion Score," Research Quarterly, 44:113-117, 1973.
HENRY, FRANKLIN M. " 'Best' Versus 'Average' Individual Scores," Research Quarterly, 38:317-320, 1967.
KELLER, F. S. "The Phantom Plateau," Journal of the Experimental Analysis of Behavior, 1:1-13, 1958.
KROLL, WALTER. "Reliability Theory and Research Decision in Selection of a Criterion Score," Research Quarterly, 38:412-419, 1967.

My Home Page

Minggu, 15 Agustus 2010

DESIGN AND MEASUREMENT

Tidak ada komentar:

Posting Komentar

Infolinks In Text Ads

Label