Early Childhood Education

Assessment in Early Childhood


Assessment in early childhood typically refers to the measurement of a child’s developmental status, whether at a given point in time or at multiple points to track change over time. Although an assessment can be narrowly targeted at achievement in specific areas (such as mathematics or reading/literacy), early childhood professionals are urging a wider perspective. For example, assessment has been recently defined as “the process of observing, recording, and otherwise documenting what children do and how they do it as a basis for a variety of educational decisions that affect the child ... [and] involves the multiple steps of collecting data on a child’s development and learning, determining its significance in light of the program goals and objectives, incorporating the information into planning for individuals and programs, and communicating the findings to families and other involved people” (NAEYC, 2003). In general, evaluation is a term referring to a broader enterprise of which assessments are a component. Evaluations often include multiple assessments over time, in the context of other sources, such as the quality of program implementation, staff qualifications, and participant demographics.


Purposes of Assessment

Assessments of young children generally have two primary purposes: first, to serve as yardsticks to measure the ways individual children are developing and second, to determine whether early childhood programs are effectively supporting children’s development and learning in the aggregate. The National Association for the Education of Young Children (NAEYC) considers assessment integral to planning curriculum and instruction for individuals and groups, communicating with families, identifying children who need services or intervention, and improving program practice (NAEYC, 2003). Parents and caregivers want to know whether a child in their care demonstrates particular strengths, is performing within normative ranges, or shows lags that signal a need for intervention. Developmental assessment refers most often to screening processes intended to identify the need for specialized services or intervention. Meisels and Atkins-Burnett (2000) define screening as “a brief assessment procedure designed to identify children who, because they might have a learning problem or disability, should receive more extensive assessment.” Screening tools often focus on visual-motor abilities, language-communicative competence, and gross motor abilities. Other types of screening include hearing and vision, health, and physical development, which could also affect a child’s educational needs and experiences (Gullo, 2005).

Early childhood programs develop theories of change and set goals with the expectation that particular practices will lead to expected learning outcomes for participants; assessment provides information that can demonstrate progress toward those goals. It is often beneficial to conduct assessments at multiple time points geared to age or program entry or exit to provide information for purposes ranging from planning individualized instructional approaches to rating the quality of instructional practice. Any intervention that attempts to change behavior or improve learning ideally seeks to assess change at many levels: management and supervision, classroom environment, professional development, as well as child knowledge and skills.


Concerns about Inappropriate Assessment in Early Childhood

Accountability and assessment are closely linked concepts that undergird public policy decisions. With the No Child Left Behind Act of 2001, the federal government mandated annual testing of reading and math beginning in grade three and sanctions for schools that do not improve student performance. Many states have adopted early learning standards that extend benchmarks for elementary grades downward into preschool. In addition, federal programs such as Head Start have implemented testing requirements in the year prior to kindergarten entry on a limited set of early language, literacy, and numeracy indictors. Much debate continues about the constructs, domains, and indicators for young children’s learning and development that should be assessed, and the best means of measuring performance on these indicators. Standards, and the tests to measure progress in achieving them, are often externally imposed for accountability purposes, rather than derived from appropriate developmental expectations. The formats of early childhood assessment are crucial because if implemented out of context, using a single source or method, testing instruments can yield unrepresentative or inaccurate results. Early childhood organizations are concerned that excessive emphasis on assessment could result in inappropriate changes to early childhood environments if, for example, teachers were to focus children’s learning activities on specific items of a test rather than provide a range of classroom experiences related to the broad developmental constructs being assessed. An additional concern would be a program-wide reallocation of resources devoted to educational materials and facilities, professional development, and support for teachers based on a narrow conception of what is being assessed (NAEYC, 2003).


What Constitutes High-Quality Assessment?

The value of any assessment hinges on the quality of the information collected. Assessments of young children are inherently difficult to do well, and the importance of obtaining good information is directly related to the way that information will be used. The higher the stakes of the assessment (i.e., the more far-reaching the decisions made based on the results of the assessment), the more stringent the quality of the process should be. In response to increased emphasis on assessment in early childhood programs, NAEYC and the National Association of Early Childhood Specialists in State Departments of Education (NAECS/SDE) released a statement supporting assessments that are “developmentally appropriate, culturally and linguistically responsive, tied to children’s daily activities, supported by professional development, inclusive of families, and connected to specific, beneficial purposes.” In general, high-quality assessments (1) include information from multiple sources on multiple dimensions, (2) are administered by highly qualified assessors, (3) are reliable, and (4) are valid. These standards apply equally to observational, contextual assessments, and to more direct, test-like assessments of young children.

Assessments that include multiple domains, modes, and perspectives are particularly important for young children. Child-specific indicators based on a comprehensive view of what the child knows and can do were developed by the National Education Goals Panel (Kagan, Moore, and Bredekamp, 1995; Love, 2003; Love, Aber, and Brooks-Gunn, 1994) and have been adopted by Head Start and other early childhood programs. Domains include physical and motor development, social and emotional development, language usage, cognition and general knowledge, and approaches toward learning. Assessments should capture the breadth of children’s development, including all five domains. Further, assessment through multiple modes is desirable to more accurately represent a child’s development and abilities. Modes of assessment include direct assessment (typically what is referred to as “testing”), parent or teacher ratings, observations, and self-report. Multiple perspectives (e.g., teacher and parent ratings) provide information about the child in different settings. The information from these multiple modes and multiple reporters can be combined to form a more complete picture of an individual child’s strengths and needs; in the aggregate, they can provide information on program performance in these areas.

Comprehensive alternative assessments are based on collection of information through a wider range of sources, such as portfolios and anecdotal records. One example of a standards-based approach is the Work Sampling System (Gullo, 2005; Meisels et al., 2001), which features developmental guidelines and checklists in seven learning domains (personal and social development, language and literacy, mathematical thinking, scientific thinking, social studies, the arts, and physical development and health). Teachers rate children on the checklists, select the contents of a portfolio to document each child’s learning in the context of the curriculum, and complete a summary report on children’s progress three times a year using specific criteria.

As a measure of growth or progress, assessment can be criterion- or normatively referenced. Criterion referenced assessments are those in which performance is judged in terms of mastery of items within given content areas (e.g., reading level). Normatively referenced assessments are those in which performance is judged relative to that of others within the same age group on the same instrument. For all assessments, but in particular for assessing young children, the training and competency of the assessor are critical. Assessors must be able to work effectively with young children in order to measure their performance accurately. Assessors should demonstrate extensive experience with young children as well as thorough training in the proper administration of the assessment tool. It is usually advisable to monitor assessors’ performance to ensure their fidelity to the administration procedures of each assessment instrument.

When selecting the individual components of an assessment, it is important that each is reliable and valid for the areas to be measured, and for the children to be assessed. Reliability refers to the ability of an assessment instrument or procedure to produce the same results if administered to the same child within a reasonable timeframe (test-retest reliability), and in the case of ratings, if completed by a different person for the same child (interrater reliability). Reliability of assessments of young children may be lower as a result of numerous factors including uneven development, behavioral fluctuation, situational variables, and prior experience with testing/assessment, all of which may affect the results in ways that have little to do with the child’s competence in the domain being measured.

Validity refers to the extent to which the assessment captures what it purports to measure. A perfectly reliable but invalid assessment is useless. Similarly when working with different populations, it is important to know whether the assessment has been used with specific groups and found to be valid for all of them. There are many kinds of validity, including face validity (the extent to which the assessment items appear to measure what they purport to measure), concurrent validity (children perform similarly on different measures of the same domain), and predictive validity (children’s performance on the assessment predicts later performance, usually in school). The utility of assessment in early childhood often hinges on the expectation that performance on the assessment predicts the children’s later performance. Assessment should be followed by specific planning to address areas of need and to maintain areas of strength. It is hoped that assessment is the first step in a process to remedy problems and improve later performance.



All early childhood assessments should take into account the family, care settings, and cultural contexts in which the child is developing. Optimally, assessment should be carried out within such contexts, but key adults must be informed about and involved in any decision making that results from assessments. Given the lack of consensus about goals and methods of assessment in early childhood, experts have concluded that the field should be considered emergent. The National Research Council has called for a broad program of research and development to advance the state of the art in assessment in the areas of (1) classroom-based assessment to support learning, (2) assessment for diagnostic purposes, and (3) assessment of program quality for accountability and public policy (Bowman, Donovan, and Burns, 2000). See also Families; Standards.

Cheri A. Vogel, Louisa Banks Tarullo, and John M. Love