Program evaluation is an essential part of the design and implementation of any intervention. Evaluation uses research methodologies to address fundamental questions of program development, including what should be attempted, what was done, to whom, how, and what effect the intervention had, if any.1 Evaluation draws on research methodologies used in social and behavioral sciences to answer these questions, but it is not equivalent to scientific research. There is usually more collaboration and cooperation with program administrators and designers. A well-conceived evaluation is an iterative process that provides program administrators and designers with important information critical to program development and implementation. To do this, evaluators and program staff must work “up front” to design an evaluation with clear goals and a well-developed plan that fits seamlessly into the overall program design and implementation.
This “up front” work, in turn, enables the evaluation to provide an assessment of whether the program achieved its goals.
Evaluation is commonly associated with educational and behavioral intervention programs, such as AIDS prevention, smoking prevention/cessation, and sexual abstinence programs, but is applicable for any type of program development.1 The purpose of the MCHB Bright Futures Resource Center for Curricula was to develop a curricula to enhance residency training in the areas of Behavioral Pediatrics and Adolescent Medicine. Evaluation of the program was an integral part of its design. This chapter provides an overview of three major types of evaluation (formative, process, and outcome) and describes how our Center has evaluated the program to date.
Formative evaluation is akin to needs assessment and is performed early in the evaluation process. The purpose of formative evaluation
is to understand the need for the intervention and begin making decisions about how to implement or improve the intervention. Thus, information gleaned through the formative evaluations is critical to designing a targeted, effective intervention. Our Center undertook a major formative evaluation effort. In order to understand residency training needs in Behavioral Pediatrics and Adolescent Medicine, all pediatric residency training programs in the U.S. were surveyed.2 The survey instrument consisted of three parts, one for the residency training director, one for the Adolescent Medicine director, and one for the Behavioral Pediatrics director. The surveys assessed current training practices and areas of the curriculum that directors felt needed enhancement. The surveys revealed that, though reading lists were numerous, there was a lack of case-based materials to teach residents about behavioral pediatrics and adolescent medicine.
The need for case-based materials identified through the above survey formed the basis for the curriculum that we developed. Critical areas in behavioral pediatrics and adolescent medicine were identified by program staff and targeted for case development. Educational and teaching goals were identified for each of these areas and then cases were written and adjunctive materials developed to create a complete curriculum package to meet these goals.
Case development, itself, constituted the second formative evaluation effort of this program. Cases were written, and then reviewed extensively by experts in the field and revised. They were then pilot tested on a small number of pediatric residents at a teaching conference. Each pilot test was accompanied by a survey of both the facilitator and the learners. These surveys assessed how well both the facilitator and the learners felt the case and ensuing discussion met the identified educational and teaching goals. There was also opportunity for open-ended
feedback. In addition, program staff attended many of these sessions to directly observe problems with the cases and how they were used and received. All this information was then brought back to program administrators and case writers to inform the process of case revision. Cases and the evaluation forms were revised multiple times. This provides a clear example of how important it is for evaluation and program staff to collaborate closely and how critical evaluation is in the interactive process of intervention development. It also illustrates how evaluation research is different from both outcome evaluations and scientific research. Since cases and the evaluation instruments were undergoing multiple revisions, the learners and facilitators of the teaching sessions based on these early cases were not equivalent. They were receiving different “interventions” (cases). Thus, though the goals and the evaluation forms used in these sessions were essentially identical, they were not comparable
and cannot be used in final assessment of outcomes.
The second type of evaluation is process evaluation. Process evaluation is used, hopefully periodically, after a program is implemented. For the Bright Futures Resource Center, this began after cases were finalized. Process evaluation assists program staff with determining whether program goals are being adequately met by answering the questions of what was done, how, and to whom. Like formative evaluation efforts, the results of process evaluation can be used to guide changes in the program that would improve the ability to meet stated programmatic goals. A number of the methods are similar to those used in formative evaluation, including surveys, direct observation, and open-ended interviews. The first two of these are performed during the intervention. The latter is performed outside the intervention. In addition, administrative records provide another important source
of information for process evaluation. Surveys are used to obtain information on program participants and determine characteristics of those who received the intervention. The learner and facilitator evaluation forms developed by this project used in the formative evaluation stage were also used as instruments to obtain process evaluation information on those receiving the newly derived curriculum. Direct observation was also used. During process evaluation, it is important that those performing the direct observation be unobtrusive and not disrupt the intervention. Observers need to be well-trained in how to systematically and uniformly make observations and record the encounter. To minimize inter-observer variability, our Center used one program staff person trained in adolescent medicine to observe all adolescent medicine case-based teaching sessions and another person trained in behavioral pediatrics to observe all behavioral cases.
Lastly, monitoring standardized administrative records provides
an important source of data for process evaluation. These records provide information on who received the intervention and what, specifically, was provided. The development of database templates for these standardized records is another critical piece of program design. The administrative records consisted of computerized databases. These databases maintain information on all programs requesting and receiving any or all parts of the curriculum. This provides, on a programmatic level, information about which programs are being reached and what cases are requested.
The last type of evaluation is the one that often receives the most attention. Outcome evaluation attempts to determine the meaning or effect of an intervention. Like process evaluation, it should be used periodically to assess if and how well program goals are being achieved. The ideal design for program evaluation would be one in which the same individuals could be compared to themselves
both with and without the intervention. Obviously, this is not possible, so evaluators have turned to experimental design to try and infer what the effect of the intervention is in the population of interest. There are basically three types of experimental design used in outcome evaluation—non-experimental, quasi-experimental, and randomized.1
Non-Experimental Designs: Non-experimental designs are the most commonly utilized outcome evaluation technique. These designs do not employ a comparison group. Instead, individuals receiving the intervention are compared with themselves before and after the intervention in terms of variables the intervention is designed to influence. Changes in any of these variables are then ascribed to the intervention. The Center assessed level of comfort with and skill on completing a Denver II developmental screen from a case focusing on use of this screen. The evaluation revealed that use of the case significantly
increased residents’ knowledge in interpreting the Denver II. 4, 5 Thus, this technique provided useful information as to whether programmatic goals of these cases were being met. One problem with non-experimental designs is that the inference that change was due to the intervention is subject to confounding. Therefore, it is not possible to emphatically state that changes are, in fact, due to the intervention itself.
Quasi-experimental designs: Quasi-experimental design is more rigorous than non-experimental design because it uses a separate, non-randomized comparison group. However, because the comparison group is not randomized, this design is also subject to bias and confounding. Factors associated with the outcome may not be equally distributed among the case and control groups. The use of matched controls decreases bias and confounding. However, matching requires extensive knowledge of the literature
to identify those confounding variables that the cohorts should be matched on. In addition, once matching variables are identified, it is often extremely difficult to actually match cases and controls. Besides difficulties of sample size and recruitment, relevant factors are often difficult to measure or, for that matter, unknown.
Randomized Experimental Designs: Randomized experiments are considered the sine qua none of outcome evaluation precisely because randomization reduces bias between intervention and control groups. However, randomized designs are the most difficult and most expensive to perform, and can be ethically challenging. Randomized designs require the development of only one cohort. That cohort is then randomly divided into those who receive the intervention and those who do not. Thus, they require that those targeted to receive the intervention and those requesting it be willing to not obtain the intervention until after the outcome
evaluation is completed. For the Bright Futures Center, this would mean withholding the curriculum from half of the programs who request it until after the outcome evaluation is completed at their site. Because this was not felt to be acceptable for the Center, one of whose major goals was to make readily available curricular resources to those in need, randomized designs were not employed as part of the evaluation. However, randomized trials of these cases as an educational intervention may be possible in a future project and we would encourage all faculty who use this curriculum to think creatively about ways to conduct such testing.
Elizabeth Goodman, M.D.
1. Coyle S, Boruch R, Turner C. Evaluating AIDS Prevention Programs. Washington, DC: National Academy Press; 1991.
2. Emans SJ, Bravender T, Knight J, Frazer C, Luoni M, Berkowitz C, Armstrong E, Goodman E. Adolescent medicine training in residency
programs: Are we doing a good job? Pediatrics. 1998;102:588-595.
3. Frazer C, Emans SJ, Goodman E, Luoni M, Bravender T, Knight J. Teaching residents about development and behavior: Meeting the Challenge. Archives of Pediatric and Adolescent Medicine 1999; 153:1190-1194.
4. Knight J, Frazer C, Goodman E, Blaschke G, Bravender T, Luoni M, Hall M, Emans SJ. Case-based teaching by pediatric residents (abstract). Ambulatory Pediatric Association; San Francisco; 1999.
5. Knight JR, Frazer CH, Goodman E, Blaschke GS, Bravender TD, Emans SJ. Development of a Bright Futures curriculum for pediatric residents. Ambulatory Pediatrics (In Press).