Skip to main content
M&E Studio
Home
AI for M&E
GuidesPromptsPluginsInsights
Resources
Indicator LibraryReference LibraryDownloadsME Library
Services
About
M&E Studio

AI for M&E, Built for Practitioners

About

  • About Us
  • Contact
  • Insights
  • LinkedIn

Services

  • Our Services
  • Tools

AI for M&E

  • Workflows
  • Plugins
  • Prompts
  • AI Course

M&E Library

  • Browse Library
  • Indicators
  • Reference
  • Downloads

Legal

  • Terms
  • Privacy
  • Accessibility

© 2026 Logic Lab LLC. All rights reserved.

Library
  1. M&E Library
  2. /
  3. Survey Design
  4. ENFRES

Survey Design

The process of designing structured questionnaires and survey protocols to collect reliable, valid, and actionable data from a defined population.

When to Use

Use survey design when you need to collect structured, comparable data across a large number of respondents to answer specific quantitative questions about a population. Surveys are the primary instrument for baselines, midlines, and endlines. They are also used for needs assessments, coverage monitoring, and performance surveys. Use them when you need data that is systematic, replicable, and statistically representative.

Surveys are not the right tool when you need to understand why something is happening (use focus group discussions or key informant interviews), when the population is too small for statistical analysis (use qualitative methods), or when the question requires narrative or interpretive answers.

How It Works

Step 1: Define the evaluation questions the survey must answer

Every survey item should trace directly to an evaluation or monitoring question. Items without a clear question-to-indicator link should be removed. Unfocused surveys produce data that no one uses.

Step 2: Draft the instrument

Write items using clear, simple language. Each item should measure one thing. Avoid double-barrelled questions ("Do you feel safe and supported?"), leading questions, and jargon. Use established, validated instruments wherever they exist (e.g., HDDS for dietary diversity, WDDS for women's dietary diversity, MDD-W for minimum dietary diversity).

Step 3: Design the question flow and skip logic

Organise items into logical sections. Use skip logic to route respondents past irrelevant sections. Begin with non-sensitive, rapport-building questions. Place sensitive items (income, violence) toward the end.

Step 4: Pilot the instrument

Test the draft with a small sample (15-30 respondents) from the same population type as the study. Identify misunderstood items, translation issues, and skip logic errors. Revise based on findings. Do not skip piloting - it is the single highest-return investment in data quality.

Step 5: Train enumerators

Enumerators must be trained on the instrument, interview protocols, consent procedures, and data entry. Run calibration exercises where pairs of enumerators interview the same respondent independently and compare results.

Step 6: Implement with quality controls

Use digital data collection (SurveyCTO, KoBoToolbox, ODK) to enforce skip logic, range checks, and required fields. Conduct field supervision with back-check surveys (re-interviewing a random 10% sample to verify enumerator data). Review daily data reports during data collection.

Key Components

  • Coverage - which topics and indicators are included, and which are deliberately excluded
  • Question types - Likert scales, multiple choice, open-ended, ranking, observation-based
  • Response categories - exhaustive, mutually exclusive, and appropriate for the population's understanding
  • Skip logic - routing that prevents irrelevant questions and reduces respondent burden
  • Translation and back-translation - if conducted in a language other than English, translate forward, then independently back-translate to verify meaning
  • Piloting protocol - plan for who, where, and how the instrument will be tested before deployment
  • Data entry and validation rules - built-in range checks and required fields for digital data collection

Best Practices

Use validated instruments. Reinventing widely used instruments (dietary diversity, food security, WASH) introduces comparability problems and quality risks. Use established tools with documented validity and reliability where they exist.

Collect outcome data, not just output data. Many surveys track what was delivered (outputs) rather than what changed (outcomes). Outcome indicators require outcome questions.

Collect baseline data before the programme starts. Without baseline data, change cannot be measured and impact cannot be assessed.

Match survey timing to measurement logic. Some outcomes need time to materialise. Collecting endline data 3 months after a 2-year programme intervention may be too early to detect genuine change.

Keep instruments short. Respondent fatigue produces lower quality data in the second half of long surveys. Aim for under 45 minutes for household surveys. Every item cut improves data quality on the items that remain.

Common Mistakes

Over-designing the instrument. Adding items "just in case" produces surveys that are too long, tire respondents, and generate data that is never analysed. Every item costs respondent time, enumerator time, and analysis effort.

Skipping the pilot. Pilots reveal translation problems, confusing items, and skip logic errors that are invisible on paper. Piloting with 20 respondents typically surfaces 80% of instrument problems.

Collecting data that cannot change the analysis. If you cannot afford to act on a negative finding, do not collect the data. Collecting data without intention to use it wastes respondent time and erodes community trust.

Failing to standardise across enumerators. If different enumerators interpret and administer items differently, the resulting data is not comparable. Calibration training and back-check protocols address this.

Examples

WASH baseline, East Africa. A UNICEF-funded WASH programme in Ethiopia used the WASH Conditions Assessment Tool as the basis for its baseline survey, adding 12 programme-specific items on hygiene behaviour. The 40-minute household survey was piloted in two villages outside the programme area before deployment. Calibration exercises between enumerator pairs identified a misunderstood definition of "improved latrine" that was corrected before field data collection. The final survey was administered to 1,800 households across three districts.

Food security survey, West Africa. A WFP-funded programme in Mali used the Household Food Insecurity Access Scale (HFIAS) and the Household Dietary Diversity Score (HDDS) as the core of its monitoring survey. These validated instruments enabled comparison with WFP's global database and with the programme's own baseline. Local language translation used forward-translation by bilingual programme staff followed by independent back-translation by a university linguist.

Compared To

MethodData TypeSample SizeDepth
SurveyStructured quantitativeLarge (100-5,000+)Shallow-medium
Focus Group DiscussionsQualitativeSmall (6-12 per group)Deep
Key Informant InterviewsQualitativeSmall (10-30)Very Deep
Observation MethodsDirect observationVariableMedium

Related Topics

  • Sampling Methods - how to select who to survey
  • Baseline Design - designing the first data collection point that surveys enable comparison against
  • Data Quality Assurance - the processes for verifying survey data quality
  • Validity - whether the survey measures what it is intended to measure
  • Reliability - whether the survey produces consistent results

Further Reading

  • USAID (2012). Performance Monitoring and Evaluation TIPS: Conducting Key Informant Interviews. USAID PNAC. Also covers surveys.
  • Grosh, M. & Glewwe, P. (eds.) (2000). Designing Household Survey Questionnaires for Developing Countries. World Bank. Comprehensive design reference.
  • KoBoToolbox (2024). Free digital data collection platform with survey design support. kobo.humanitarianresponse.info

At a Glance

Designs structured questionnaires that collect valid, reliable data from a representative population to answer specific evaluation or monitoring questions.

Best For

  • Baseline, midline, and endline data collection
  • Measuring outcomes across large populations
  • Generating comparable data across time points or sites

Linked Indicators

34 indicators across 4 donor frameworks

USAIDDFIDWHOUNICEF

Examples

  • Percentage of survey items with confirmed face validity post-piloting
  • Interviewer consistency rate across enumerator pairs
  • Response rate for primary survey instrument

Related Topics

Overview
Sampling Methods
Systematic approaches for selecting a subset of a population to represent the whole, balancing statistical validity with practical constraints.
Overview
Baseline Design
A structured approach to collecting initial condition data that directly informs project decisions, minimizes burden, and enables valid comparison with endline measurements.
Overview
Data Quality Assurance
A systematic process for verifying that collected data meets five quality dimensions, Validity, Integrity, Precision, Reliability, and Timeliness, ensuring data is fit for decision-making.
Overview
Key Informant Interviews
In-depth, semi-structured interviews with individuals selected for their specific knowledge, experience, or perspectives relevant to the evaluation questions.
Overview
Focus Group Discussions
A qualitative data collection method that brings together 6-10 participants to discuss a specific topic, generating rich insights through group interaction and shared experiences.
Quick Reference
Validity (Internal & External)
The degree to which an evaluation accurately demonstrates causal relationships (internal validity) and generalizes findings beyond the study context (external validity).
Quick Reference
Reliability
The consistency and repeatability of a measurement, whether the same tool produces stable results across repeated applications, different raters, or different time periods.
Quick Reference
Bias
Systematic error in data collection, analysis, or interpretation that distorts results and threatens the validity of M&E findings.

Related Guides

How to Write AI Prompts That Actually Work for M&E
Stop getting generic outputs. The 4Cs Framework helps you write prompts that produce donor-ready indicators, analysis, and reports on the first try.
How to Build Better Surveys with AI
Most AI survey tools stop at generating questions. This guide covers the full lifecycle: choosing question types, catching bias, adding skip logic, and piloting before you deploy.
PreviousSampling Methods