How to Design an Evaluation
A well-designed evaluation answers the right questions with the right evidence -- without spending more than the program is worth. This guide covers evaluation types, design decisions, terms of reference, and the tools you need to get started.
What is an evaluation?
An evaluation is a systematic assessment of a program's design, implementation, or results. Unlike routine monitoring, which tracks whether activities are happening as planned, an evaluation asks deeper questions: Are outcomes being achieved? Why or why not? What would have happened without the program?
Evaluations are one of the highest-cost M&E activities, and also one of the most frequently misused. A common failure pattern: an evaluation is commissioned to satisfy a reporting requirement, findings are produced too late to influence decisions, and the report sits unread. Good evaluation design starts with use: who will act on the findings, and how.
Types of evaluation
The four most common types in development programming serve different purposes and require different resources:
- Formative evaluations happen during implementation and focus on improvement. They ask: what is working, what needs to change, and how can we do it better?
- Summative evaluations happen at or after completion and focus on judgment. They ask: did the program achieve its objectives, and was it worth the investment?
- Process evaluations focus on how activities were implemented. Useful when you need to understand delivery quality before drawing conclusions about outcomes.
- Impact evaluations attempt to attribute observed change to the program by comparing outcomes with a counterfactual (what would have happened without the intervention). They require the most rigorous design and the highest budget.
When should you evaluate?
Most programs with donor funding are required to conduct at least one mid-term review and one end-of-project evaluation. Beyond compliance, the right time to evaluate is when a decision needs evidence: a scale-up decision, a course correction, a funding renewal application, or a learning agenda question that monitoring data cannot answer.
Evaluations that are not tied to a decision tend to produce reports, not change. Before commissioning an evaluation, the most important question is: who will use the findings, and what will they decide differently because of them?
Key Design Decisions
Four decisions that shape everything else in an evaluation. Work through them in order; each one constrains the next.
1. Evaluation purpose
Why are you evaluating?
Formative
Improve implementation during a program
Summative
Judge effectiveness at or after completion
Process
Assess how activities were implemented
Impact
Attribute change to the program (needs counterfactual)
Most evaluations try to answer too many questions. Decide the primary purpose first; then let it constrain everything else.
2. Design type
How rigorous does the design need to be?
Experimental (RCT)
Random assignment: highest rigor, highest cost
Quasi-experimental
Comparison group without randomization
Pre-post with theory
Before/after with contribution analysis
Qualitative
Process tracing, case study, most significant change
Match your design to your evaluation questions, not to what sounds most impressive. Most NGO evaluations do not need an RCT.
3. Data methods
How will you collect evidence?
Surveys
Structured data at scale, quantitative or mixed
Key informant interviews
Depth on process, barriers, and context
Focus group discussions
Group perspectives and shared experiences
Document review
Program records, monitoring data, secondary sources
Use at least two methods. Triangulation, comparing findings across methods, is how you build confidence in your conclusions.
4. Sampling approach
Who will you collect data from?
Probability sampling
Random or systematic, needed for statistical inference
Purposive sampling
Deliberate selection for qualitative depth
Stratified
Separate strata (sex, region) to ensure representation
Mixed
Quantitative probability + qualitative purposive
Your sample size calculator gives you the minimum. Always add 10-15% for non-response and data quality losses.
What Goes in an Evaluation TOR
A terms of reference (TOR) is the document that defines an evaluation's scope, questions, and requirements before the evaluator is hired. A weak TOR produces a weak evaluation.
Evaluation questions
Three to five prioritized questions that define what the evaluation will answer. The single most important element: everything else flows from here.
Scope and boundaries
Time period covered, geography, target population, and what is explicitly out of scope. Prevents scope creep and focuses the budget.
Methodology overview
The design type, data collection methods, and analytical approach. Should be matched to the evaluation questions, not imported from a previous TOR.
Budget and timeline
Realistic estimates for all evaluation activities: fieldwork, data entry, analysis, reporting, and review cycles. Budget should reflect actual scope.
Independence and ethics
Evaluator independence requirements, conflict of interest policy, data protection provisions, and ethical review process.
Deliverables and use
What outputs are expected (inception report, draft, final), who reviews them, and, critically, how findings will be used after the evaluation.
TOR quality checklist
- Evaluation questions clearly stated and prioritized
- Methodology matched to questions (not the other way around)
- Evaluation use plan developed before data collection
- Terms of reference reviewed by key stakeholders
- Budget and timeline are realistic for the scope
- Evaluator independence requirements specified
- Data protection and ethical review process defined
Full checklist available in the Evaluation TOR Checklist download below.
Using AI for evaluation design
AI tools can accelerate evaluation design: from drafting evaluation questions and scoping a TOR to reviewing sampling approaches and analyzing qualitative data. The AI for M&E guide collection includes dedicated guides on AI-assisted literature reviews, advanced AI methods, mixed methods analysis, and comparative effectiveness.
Browse the AI for M&E guidesFree Evaluation Tools
Practical templates and tools for evaluation design. Download and adapt for your program.
Evaluation TOR Checklist
Verify your terms of reference cover all essential elements before hiring an evaluator. Includes independence requirements and ethical provisions.
Evaluation Design Matrix
Map evaluation questions to methods, data sources, and analysis approaches. The core planning tool for any evaluation design.
More M&E methodology guides
Practical, plain-language guides for every phase of the M&E cycle.
Browse all guides