How to Design an Evaluation

A well-designed evaluation answers the right questions with the right evidence -- without spending more than the program is worth. This guide covers evaluation types, design decisions, terms of reference, and the tools you need to get started.

Key design decisions TOR checklist

What is an evaluation?

An evaluation is a systematic assessment of a program's design, implementation, or results. Unlike routine monitoring, which tracks whether activities are happening as planned, an evaluation asks deeper questions: Are outcomes being achieved? Why or why not? What would have happened without the program?

Evaluations are one of the highest-cost M&E activities, and also one of the most frequently misused. A common failure pattern: an evaluation is commissioned to satisfy a reporting requirement, findings are produced too late to influence decisions, and the report sits unread. Good evaluation design starts with use: who will act on the findings, and how.

Types of evaluation

The four most common types in development programming serve different purposes and require different resources:

Formative evaluations happen during implementation and focus on improvement. They ask: what is working, what needs to change, and how can we do it better?
Summative evaluations happen at or after completion and focus on judgment. They ask: did the program achieve its objectives, and was it worth the investment?
Process evaluations focus on how activities were implemented. Useful when you need to understand delivery quality before drawing conclusions about outcomes.
Impact evaluations attempt to attribute observed change to the program by comparing outcomes with a counterfactual (what would have happened without the intervention). They require the most rigorous design and the highest budget.

When should you evaluate?

Most programs with donor funding are required to conduct at least one mid-term review and one end-of-project evaluation. Beyond compliance, the right time to evaluate is when a decision needs evidence: a scale-up decision, a course correction, a funding renewal application, or a learning agenda question that monitoring data cannot answer.

Evaluations that are not tied to a decision tend to produce reports, not change. Before commissioning an evaluation, the most important question is: who will use the findings, and what will they decide differently because of them?

Key Design Decisions

Four decisions that shape everything else in an evaluation. Work through them in order; each one constrains the next.

1. Evaluation purpose

Why are you evaluating?

Formative

Improve implementation during a program

Summative

Judge effectiveness at or after completion

Process

Assess how activities were implemented

Impact

Attribute change to the program (needs counterfactual)

Tip

Most evaluations try to answer too many questions. Decide the primary purpose first; then let it constrain everything else.

2. Design type

How rigorous does the design need to be?

Experimental (RCT)

Random assignment: highest rigor, highest cost

Quasi-experimental

Comparison group without randomization

Pre-post with theory

Before/after with contribution analysis

Qualitative

Process tracing, case study, most significant change

Tip

Match your design to your evaluation questions, not to what sounds most impressive. Most NGO evaluations do not need an RCT.

3. Data methods

How will you collect evidence?

Surveys

Structured data at scale, quantitative or mixed

Key informant interviews

Depth on process, barriers, and context

Focus group discussions

Group perspectives and shared experiences

Document review

Program records, monitoring data, secondary sources

Tip

Use at least two methods. Triangulation, comparing findings across methods, is how you build confidence in your conclusions.

4. Sampling approach

Who will you collect data from?

Probability sampling

Random or systematic, needed for statistical inference

Purposive sampling

Deliberate selection for qualitative depth

Stratified

Separate strata (sex, region) to ensure representation

Mixed

Quantitative probability + qualitative purposive

Tip

Your sample size calculator gives you the minimum. Always add 10-15% for non-response and data quality losses.

What Goes in an Evaluation TOR

A terms of reference (TOR) is the document that defines an evaluation's scope, questions, and requirements before the evaluator is hired. A weak TOR produces a weak evaluation.

Evaluation questions

Three to five prioritized questions that define what the evaluation will answer. The single most important element: everything else flows from here.

⚠️

Scope and boundaries

Time period covered, geography, target population, and what is explicitly out of scope. Prevents scope creep and focuses the budget.

Methodology overview

The design type, data collection methods, and analytical approach. Should be matched to the evaluation questions, not imported from a previous TOR.

Budget and timeline

Realistic estimates for all evaluation activities: fieldwork, data entry, analysis, reporting, and review cycles. Budget should reflect actual scope.

Independence and ethics

Evaluator independence requirements, conflict of interest policy, data protection provisions, and ethical review process.

Deliverables and use

What outputs are expected (inception report, draft, final), who reviews them, and, critically, how findings will be used after the evaluation.

TOR quality checklist

Evaluation questions clearly stated and prioritized
Methodology matched to questions (not the other way around)
Evaluation use plan developed before data collection
Terms of reference reviewed by key stakeholders
Budget and timeline are realistic for the scope
Evaluator independence requirements specified
Data protection and ethical review process defined

Full checklist available in the Evaluation TOR Checklist download below.

Using AI for evaluation design

AI tools can accelerate evaluation design: from drafting evaluation questions and scoping a TOR to reviewing sampling approaches and analyzing qualitative data. The AI for M&E guide collection includes dedicated guides on AI-assisted literature reviews, advanced AI methods, mixed methods analysis, and comparative effectiveness.

Browse the AI for M&E guides

Free Evaluation Tools

Practical templates and tools for evaluation design. Download and adapt for your program.

Evaluation TOR Checklist

Verify your terms of reference cover all essential elements before hiring an evaluator. Includes independence requirements and ethical provisions.

Coming soon

Evaluation Design Matrix

Map evaluation questions to methods, data sources, and analysis approaches. The core planning tool for any evaluation design.