Below hundred and fifteen articles were excluded mainly from


Below is the flow diagram for identification of
articles for review.

 A total
of 692 citations were generated through search of various databases and
reference list of published studies and grey literatures. Two hundred and one
non duplicate citations were screened. When inclusion and exclusion criteria was
applied, one hundred and fifteen articles were excluded mainly from screening
the title and abstract. Further exclusion of 56 articles was done after the
full text screening, and eleven more articles were excluded during data
extraction. Nineteen articles were used for final review.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!

order now


Imputation methods

Limitations of missing data methods

Missing data methods

Missing data mechanism and imputation methods

Comparing missing data handling methods

Missing data pattern and imputation methods

Missing data handling strategies

Likelihood methods in missing data

Performance of different missing data strategy

Bayesian approach



Glossary of search terms used in combination
with “cohort studies”,” longitudinal studies” ” observational studies”.

The search for information was constrained to
primary studies, articles, reviews and grey literatures that compare two or
more methods of handling missing values in cohort studies published from 1997.
The cut-off date was chosen to keep abreast with advancements and new
developments in missing data handling methods. There was no restriction on age
of participants because in some cohort studies, the follow up started from


Inclusion criteria

Exclusion criteria

Article types
Clinical trials
Observational studies
Intervention studies
Randomised controlled trials

Article types
Books and documents
Case reports
Conference papers
Duplicate articles/publication
Newspaper articles

Text availability
Full text article with abstract

 Text availability
Abstract only article
Full text article without abstract

Publication dates
Jan 1997-Dec 2017

Publication date
Prior 1997 articles


Animal studies

No restriction

No restriction


Non English

No restriction

No restriction

No restriction

No restriction




Using filter facilities on the databases(, Ovid
Medline, CINAHL and EMBASE), the following inclusion and exclusion criteria
shown in the table below were used in conducting searches for articles to be
used for the scoping review.

To identify relevant published articles, a
combination of strategies was employed, the aim was to identify published and
grey literatures from both primary sources and reviews.  Electronic data base search of CINAHL, Ovid
Medline and EMBASE was carried out. The Bibliographies of selected published
papers were manually scanned for relevant subject heading for screening.  For this project, search for articles
followed a three step search strategy 
proposed by  Joanna Briggs Institute
Reviewers’ Manual(The Joanna
Briggs Institute 2015) The first step involves a preliminary limited
search on the online databases;   Ovid
Medline, CINAHL and EMBASE.  An analysis
of  texts and words in the title,
abstract of articles retrieved and terms used to describe indexed articles on
missing data imputation methods in longitudinal studies was carried out. The
aim was to identify key words and terms to be used for the next step of
database search. The second step, involved using identified key words and index
terms on missing data handling methods to search the databases (, Ovid Medline,
CINAHL and EMBASE). In the final step, bibliography of retrieved articles and
reports were searched for more studies.

A preliminary search of existing scoping review
was conducted on the Cumulative Index of Nursing and Allied Health Literature
(CNAHL), EMBASE, Cochrane database of systematic reviews and Medline and no
scoping review with an objective of comparing missing data handling methods in
cohort studies was found.   

The above research questions were used in
developing inclusion and exclusion criteria for the scoping review. They guided
and facilitated the effectiveness in search for literature and were useful in
the development of scoping report structure.

2.    What are the impact of missing data handling
strategy or  methods in a cohort study
with various degrees of missingness

1.    What are the best available strategy or method for
dealing with missing data in cohort studies

Levac et al (2010) recommended that a scoping
review research question(s) should be broad to accommodate defining concept,
population intervention and outcome of interest that will help in establishing
a search strategy that is very effective. The following research questions were
identified in line with the research objective;   

The main objective of this scoping review is to
identify the knowledge and research gaps in the methods of handling missing
values in cohort studies through analyses of research articles that compared
missing data methods.

Collating, summarizing and reporting results

Charting the data and

Selection of studies

Identify relevant studies

Identify research questions

In summary, the five-stage methodology by Arskey
and O’Malley (2005) included;

This scoping review is guided by a five stage
methodological framework proposed by Arksey &
O’Malley( 2005) with further enhancement by the work of Levac
et al( 2010). The framework was used to explore the research
objectives and selected publications to establish gaps in research, limitations
and unresolved issue relating to missing data handling techniques in cohort


review methodology



4)    Direct manipulation of missing data

b)    Expectation maximisation algorithm

a)    Maximum likelihood

3)    Model based methods

c)    Composite imputation

b)    Multiple imputation

a)    Single imputation

2)    Multiple imputation methods

c)    Removal of weight (reweighing technique)

b)    Pairwise deletion

a)    Listwise deletion

1)    Removal based method

Over the years there has been an increased body
of literatures whose main objectives were to compare available methods and
strategies to handle missing data. The sole aim of these literatures were to
find the best method that will give unbiased estimates when data are missing.  A common approach to handling of missing data
is to delete all observations that contain missing data or analyses of complete
case only(Baraldi &
Enders 2010). The downside of this method is that if the
entire variables are missing, it will amount to deleting all observations,
meaning there will be no data to analyse. 
Another common approach to handling missing data is to create a complete
dataset by assigning values to missing observations(Baraldi &
Enders 2010). Some methods allocate several values for each
observation or avoid allocating precise values altogether. To understand the
principles behind each missing data handling method, it will be helpful to
group available methods according to their mode of operation. Although there is
a lack of consensus or unified approach in grouping methods of missing data
imputation methods, for this project  a
method of grouping  proposed by Silva &
Zárate( 2014) will be adopted. These methods or techniques of
handling missing data could be placed in four groups based on the following

The reason for a scoping review of missing data
handling methods in cohort studies is to identify research gaps based on
selected research articles for review. The analyses of literatures will help in
discovering gaps and unresolved issues in the methods of handling missing data
in cohort studies with or without repeated measures.

A scoping review could be described as a means
of synthesising knowledge, based on research questions with the objective of
mapping the key concepts, available evidence and research gaps by conducting a
systematic search selection and synthesis of existing knowledge(Colquhoun et al.


    Mapping methods
of handling missing data in Cohort studies: A scoping review




above reviews are aspects of what is already known. Other planned sections of
the review for literatures include; geographical location; lifestyle;
ethnicity; marital status; housing; and level of education. The second part of
the literature review will focus on characteristics pattern of missing data
from birth to midlife.

Other planned sections of literature review


Findings from past studies
indicate that people socioeconomic circumstances could determine how they
respond or participate in a health survey, those from lower socio-economic
strata are found to have lower participation rate. A lower percentage of
respondents in epidemiological studies are shown to have no paid job and of low
social economic status (Conway et al 2008). Available
socioeconomic data from the literatures on response to epidemiological studies
consistently shows that education attainment is an important predictor of
participation rate in a study. Low education attainment and lower social economic
status are often associated with a lower participation rate(Conway et al 2008)
Those with secondary or tertiary education were found to have a higher
participation and response rate than individuals without formal secondary
education as were skilled workers with highly skilled non manual job compared
to unskilled workers. People that have a full time job were less likely to
participate in a health survey than individuals without a job ( Volken 2013)




One of the robust findings of
epidemiological studies is the association between lower socioeconomic status
and poor health
(Lynch et al.
1997) In line with this,
is that persons of higher socioeconomic status are more likely to participate
or respond to epidemiologic survey than persons of lower socioeconomic status,
as evidence suggests that non participants have higher disease and mortality
rates, poor health and lower level of functioning than participants (Hille 2005;
Galea & Tracy 2007a). In contrast,
person with a particular symptom is more likely to participate in a study that
relates to the symptom because it is relevant their lives(Goldberg et al. 2006)().
Issues surrounding effects of health and socioeconomic status on continuous
participation in a study were highlighted by a Norwegian population based
health study (Langhammer et
al. 2012)  which
established that chronic disease prevalence was higher among non participants
compared to that reported by those who participated in the study.
Cardiovascular disease symptoms and diabetes mellitus were more common among
non participants. In contrast, musculoskeletal pain, urine incontinent and
headache were more often reported among younger participants below 80 years
than nonparticipants. Several studies have examined specific health issues or
behaviours that may affect continued participation in epidemiological studies.
Health issues that may influence participation in epidemiological studies were
highlighted in the findings of French Gazel studies(Goldberg et al.
2001) that aimed at
finding socioeconomic and health factors associated with participation in long
term epidemiological surveys. The study reveals that non-participation is
significantly associated with individual’s health status. The association was
especially strong for alcohol related diseases and was also observed in
psychiatric and respiratory diseases. Furthermore, the study results show an
inverse relationship between musculoskeletal disorders and non participation,
which was in compliment with findings by Langhammer et al(2012). The
implication is that musculoskeletal disorder is prevalent among younger
participants and they are much willing to talk about their health symptoms than
individuals with chronic illness. No response to study surveys could be
explained by the nature of health outcomes that are being investigated and
individuals’ state of health which includes significant numbers of participants
that died. In terms of health status, poor prognosis and survival are often
found in deprived areas. This is consistent with reports that socioeconomic
circumstances are strongly inked with individuals health status.. It is worth
knowing that comparing results of findings in relation to health and
participation rate in epidemiological studies may be difficult due to
difference in aims, objectives, design and settings. Importantly studies under
consideration may differ in relation to disease symptoms and the age group they
are investigating.


1.1.2      Health status


Response rate are low among
youngest and oldest respondent in an epidemiological survey (Lyn 2012). A
multivariate analyses of respondents’ demographic characteristics in a
longitudinal autism research, identified increasing child age and decreasing
maternal age as strong predictors of non-response to a health survey
questionnaire (Kalb et al 2012) Adolescents whose characteristics are
associated with poor health are less likely to participate in health surveys.
These characteristics include low maternal income or education; less favourable
lifestyle which includes substance misuse and alcohol. in addition, young
adults with young mothers; living in urban areas; low cognitive performance;
and those with psychiatric illness, are less likely to participate in health
survey (Kalb et, al 2012) In general, individuals of low socioeconomic status
have lower response and participation rates in health surveys(Goldberg et al. 2001; Galea & Tracy 2007c; Fekete
et al. 2015) . There were also
variations on the pattern of participation based on age and gender of an
individual. Participation and response rate for men and women increases with
increasing age for both men and women, it eventually reaches a maximum point
and thereafter started decreasing with decreasing age (Boshuizen et al. 2006). A study by
Volken ( 2013) aimed at finding determinants and bias in the outcomes in Swiss
health survey shows that interaction between gender and age to be significantly
associated with participation, it observed that decreasing participation rate
for women started at 45 years, and for men, the declining rate of participation
started at approximately 60 years of age . The study further reveals that
younger female were more willing to respond to survey than young men, and at 56
years there were equal participation between male and female. The same study by
Volken (2013) shows that beyond 56 years female participants were more willing
to participate and respond to health survey than men. Similar result was
reported by Boshuizen et al (2005) in a study that aimed at finding
determinants of non response in a survey of cardiovascular disease risk factors
in the Dutch population. The study result shows that there was a proportional
increase in participation rate with increasing age till the about 60 years and
thereafter the participation rate gradually decrease with increasing age. For
older generations, the issue of mobility may be influential in their response
to health studies. As evidence from past studies shows that older people are
less mobile and easier to reach (Thomas et al 2001, Lepkowski and Cooper 2002)
However, among different sex and age grade some studies found no substantial
differences in terms of response. Dissimilarities in response were only related
to socioeconomic and employment status ( Van Loon et al 2003) Overall women
have a higher participation rate than men


1.1.1    Age

The willingness to participate
or to respond to surveys in cohort studies and lost to follow up have been
shown to be influenced by many factors, including age, gender, marital status,
education, health status, smoking habit, life style, ethnicity, calendar
period, study objectives, means of contact, number of contacts etc., which are
not generally predictive in every epidemiological study(Stang 2003; Lynn
2009). The
effect of these factors on data missing cannot be overlooked because evidence
from past studies have shown that demographic differences of study participants
affects the validity of study results and findings. This literature review is
aimed at examining factors that could influence non-response and continued
participation of samples in cohort studies. It will highlight the impact of
sociodemographic characteristics of subjects on missing data and study quality.
Morse so, with the declining participation rate, the review will attempt to
address the issues bordering participation of different groups of subjects and
how these could be addressed, accommodated or adjusted in study design,
planning and data analyses. In this project, the review of literatures are
divide into two parts. The first part will review, analyse and synthesise what
is already known about sociodemographic factors that could lead to missing
data. The following known factors will be reviewed; age; health status;
socioeconomic status; geographical location; lifestyle; ethnicity; marital
status; housing; and level of education. The second part will review factors
that have not been explored in relation to missing data, it will focus on
pattern of missing data based on individual characteristics, experiences and
activities from birth to midlife