Research Report: Sample Size, Methods, and Data Collection

Verified

Added on 2020/03/02

AI Summary

This report analyzes the critical role of sample size in research, emphasizing its impact on the accuracy and reliability of collected data, particularly in a study involving 69,000 Belgian bank workers. It explores the advantages and disadvantages of using large sample sizes, such as 15,000 participants, and the factors influencing sample size selection. The report then delves into sampling methods, specifically stratified sampling, and its application in the research, highlighting its strengths and limitations. Furthermore, it examines research designs, contrasting cross-sectional and longitudinal approaches and their respective benefits and drawbacks. The report also discusses data collection procedures, focusing on the use of questionnaires, along with the associated challenges, such as open-ended questions, dishonesty, emotional responses, and unanswered questions, while suggesting solutions to mitigate these issues. Finally, it addresses the utilization of secondary data, emphasizing the importance of data relevancy, accuracy, and the application of probabilistic sampling designs for large datasets. This comprehensive analysis aims to provide insights into the methodological considerations for conducting effective research.

Q1 Sample Size
Sample size plays a vital role in the determination of the features of a population. A magnitude
of proportion of the population is referred to as sample size. The decision of coming up with the
exact sample size is essential in the collection of accurate information, this is according to
(Kühberger et al, 2014). As a result therefore, for the collected data to maintain its reliability, the
accuracy of the sample size has to be considered. By applying the rightful formula for the
determination of the sample size i.e.
Sample size =
population distribution percentage pick
( margin error %
confidence level score )2
From the population of 69,000 Belgian bank workers, the size of the sample from this population
by applying the stated formula was supposed to be 383 at the 95% confidence level with a
marginal error pf 0.05. The preferred percentage pick in the calculation of the sample size was
50%. This percentage was chosen because it is conservative when it comes to calculation of a
vast sample size. By taking all the factors into account i.e. percentage pick, marginal error and
confidence level, out of the population of 69,000 bankers, 383 is the recommended sample size
that ought to have been used by the two research institutions. Bearing this in mind therefore, we
can conclude that choosing to work with the sample size of 15,000 bank workers being that it is
beyond the recommended size, it is a large sample size. Working with large sample size
normally has some advantages and disadvantages as will be discussed.
Advantages of sample size
Large sample size is importance in minimizing the Margie of error which further helps in
boosting the accuracy of the obtained results from the sample, this is one among other

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

advantages of large sample size. This increases the precision by which the population parameter
fall within the range of the calculated point estimator (Clearly et al, 2014). Wide coverage by the
large sample size (15,000 participants) in a population helps in acquiring more accurate
information from the participants concerning the subject under study (i.e. stress in this case) as
compared to when the smaller sample size would have been used. Furthermore, big and
increased sample size raises the representativeness of the individuals in the population which
further takes care and caters for the outliers that would be present in the population, this is
according to (Belli et al, 2014).
Disadvantages of sample size
Collecting and covering a larger fraction of the population will require high expenses to be
involved and incurred hence large samples are costly, this is according to (Goodman et al, 2013).
In this case, reaching the targeted number of bank workers of 15,000 by the research institutions
will make the Union of Belgian Banks to spend much in order for target to be achieved. On the
same, being that the fraction is relatively large and the participants to participate in the data
collection process are not in same geographical location since the banks are spread across the
country, achieving the targeted sample size will be time consuming.
Factors to consider when choosing a sample size
Prior information obtained about the topic under study will be important in determining the
sample size being that prior point estimates such as means and variances act as the reference to
deal with variation could arise in the groups (Button et al, 2013). Cost is another factor that can
be used to determine the size of the sample to be used in a survey. Depending on the risk value

involved in the values that need to be collected, if the risk needs to be high, then small sample
size can be used and if the risk involved is to be low, then that calls for large sample size.
Q2 Sampling methods
Sampling method is the process by which group characteristics are obtained from the population
under study. Stratified sampling method was used by the research institutions in the selection of
15,000 bank workers who were to participate in the process. According to (Ye et al, 2013),
stratified sampling method reduces sampling errors when used. The population is first divided
into small groups called strata that are distributed to ensure that every element of a population is
represented thereafter have characteristics from each stratum selected by simple random
sampling in order to reduce selection bias. This method ensures that the population is highly
represented in the sample. One of the problems with this method is the difficulty to identify
means to be applied in subdividing the population into subgroups, this makes this method
unpopular and rarely used by the researchers. Furthermore, a lot of time is involved in the
determination of strata which will later require the selection of the sample from the available
strata by simple random sampling method (Acharya et al, 2013). In our case, the research
institutions were first to find the banks within the country, categorize the workers according their
bank institutions which will form strata where workers will now be selected from to form a
sample. Simple random sampling is involved in the selection of individuals from the banks to
provide for equal chances to be in the sample that represent the population. I hereby recommend
for the increased number of strata in order to improve the effectiveness of stratified sampling
method that will increase the representativeness of the population as the marginal error is
minimized.
Q3 Research Design

There are advantages and disadvantages associated with different research design used. Cross-
sectional design is described as the tool researchers use to acquire definite point time information
from gathered data. According to (Shen and Björk, 2015), incorporation of cross-sectional study
in the cross-sectional design is important in determining the value of assumptions used in the
study. The time involved in cross-sectional design is less compared to other research designs due
to specific point time information from already existing information. This also helps in easy
identification of the information of interest when cross-sectional design is used. Expense
involved when using cross-sectional design in the identification of point time information is less
hence making it inexpensive. In order to obtain a pattern displayed by variables in a certain
period of time, longitudinal design is used.
Lack of reliability by cross-sectional design to predict the relationship between findings is as a
result of having no tie element being that it is only capable of measuring point time information.
This becomes one of the major disadvantages of cross-sectional design. Having consistent results
in the extended period of time that could be less important or less serious and exist for a
relatively lengthy time. Longitudinal research design on the other hand covers a long period of
time and as a result can be more expensive. Being that longitudinal research design can come up
with a pattern for over a period of time, it consumes a lot of time. Furthermore, the longitudinal
research design cannot be used or rather becomes less efficient where less outcomes are expected
(Shen and Björk, 2015).
Q4 Procedure of Data Collection
Questionnaires were used by the research institutions to collect data from the bank workers. The
questionnaires were structured with questions that required the respondents to provide their

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

responses on the spaces provided. The use of this procedure in collecting data has some of the
problems as discussed below;
Having so many open ended questions in a questionnaire will result to some negative impact at
the time of data analysis. It becomes so difficult to analyze such kind of data from such questions
since people have different opinions, they will tend to provide their varied opinions in response
to the open ended questions making them difficult to analyze since too much data will be
available more than they can be handled and analyzed being that they cannot be coded. To
combat this problem, when constructing the questionnaire, questions are supposed to be closed
ended which will provide multiple choices that can be coded thus making the analysis easy since
it is simpler to work with coded data.
Dishonesty from the side of the respondents is another major problem associated with using
questionnaires to collect data. In some of the cases, respondents can decided to conceal the truth
when they are participating in the process (Chernik et al, 2011). Such problems are always
rampant when the participants feel that the public can get to know their identities. As a result of
this therefore, reliability and accuracy of the data collected from the process is tampered with
which will further affect the results of the research. Researchers in this case are supposed to
exterminate this menace by assuring the participants who participate in the process that the
researcher value and give their identities priority that they will be kept secret and private from
the reach of unauthorized people. Giving such assurance to the respondents will reduce the level
of dishonesty.
One of the effective means of communication is face to face communication where emotions of
the people involved in the communication can be read through their facial expressions. Using
questionnaires make it difficult to capture emotional responses that would develop in the process

of responding to the questions especially when the questionnaires are not administered by the
researcher. Such problems can be eradicated and taken care of by constructing a Likert scale
questions that would be applied to gauge the feelings, attitude and emotions of the participants.
Questionnaires also face serious problems when some of the questions remain unanswered. A
number of the respondents normally tend not to respond to some of the questions provided in the
questionnaire due to their own reasons. Some may fail to respond to some of the questions
because the questions are not clear or understandable and tend to skip them for a while to finish
up with other questions and end up collecting the questionnaires with the questions not
answered. This is a problems that needs to be dealt with to reduce its effects in the future
research. For instance, online surveys do tend to solve this problem by making all the questions
required such that they all need to be answered without which there is no proceeding to the next
step of the survey. With the questionnaires therefore, this problem can be put to the past by
constructing questions that are uncomplicated that are easy to understand and also making the
survey as short as possible.
Questionnaires are supposed to be made accessible to all the people irrespective of whether you
are disabled or not. This has been one of the major problems questionnaires have since people of
different forms of physical disability are not taken care of for example people with impairment
or visual problems who cannot be able to read what is on the questionnaires. This problem can be
eliminated by using questionnaires that are having their accessibility options built in.
Q5 Secondary Data
These are data that are obtained from the database archives from the previously conducted
research. All the data to be retrieved from databases for a certain study must be relevant to the

subject of study. According to (Piwowar and Vision, 2013), competency and accuracy of data in
regard to the subject of study are one of the first things to be considered when checking for the
representativeness of the sample using secondary data. Secondary data helps in saving time by
providing a clear picture of what the researcher expects. Unlike primary data, secondary data are
easier and cheaper to obtain since they are only retrieved from the databases where they are
stored (Irwin, 2013). The government funded projects do involve collection of large data samples
which in most cases will cover a wider subset of a population. The researcher is supposed to do
anything possible in order to understand the stored data before they are put to use in checking for
representativeness. Probabilistic sampling design is to be applied in case of large sample size
where stratified sampling methods are appropriate because of its wider coverage of the
population. Statistical design should be reflective of sampling design that was used at time of
data collection.

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

References
Acharya, A.S., Prakash, A., Saxena, P. and Nigam, A., 2013. Sampling: Why and how of
it. Indian Journal of Medical Specialties, 4(2), pp.330-333.
Belli, S., Newman, A.B. and Ellis, R.S., 2014. Velocity dispersions and dynamical masses for a
large sample of quiescent galaxies at z> 1: Improved measures of the growth in mass and
size. The Astrophysical Journal, 783(2), p.117.
Button, K.S., Ioannidis, J.P., Mokrysz, C., Nosek, B.A., Flint, J., Robinson, E.S. and Munafò,
M.R., 2013. Power failure: why small sample size undermines the reliability of
neuroscience. Nature Reviews Neuroscience,14(5), pp.365-376.
Chernick, M.R., González-Manteiga, W., Crujeiras, R.M. and Barrios, E.B., 2011. Bootstrap
methods. In International Encyclopedia of Statistical Science(pp. 169-174). Springer Berlin
Heidelberg.
Cleary, M., Horsfall, J. and Hayter, M., 2014. Data collection and sampling in qualitative
research: does size matter?. Journal of advanced nursing, 70(3), pp.473-475.
Goodman, J.K., Cryder, C.E. and Cheema, A., 2013. Data collection in a flat world: The
strengths and weaknesses of Mechanical Turk samples. Journal of Behavioral Decision
Making, 26(3), pp.213-224.
Irwin, S., 2013. Qualitative secondary data analysis: Ethics, epistemology and context. Progress
in Development Studies, 13(4), pp.295-306.
Kühberger, A., Fritz, A. and Scherndl, T., 2014. Publication bias in psychology: a diagnosis
based on the correlation between effect size and sample size. PloS one, 9(9), p.e105825.

Piwowar, H.A. and Vision, T.J., 2013. Data reuse and the open data citation advantage. PeerJ, 1,
p.e175.
Shen, C. and Björk, B.C., 2015. ‘Predatory’open access: a longitudinal study of article volumes
and market characteristics. BMC medicine, 13(1), p.230.
Ye, Y., Wu, Q., Huang, J.Z., Ng, M.K. and Li, X., 2013. Stratified sampling for feature subspace
selection in random forests for high dimensional data. Pattern Recognition, 46(3), pp.769-787.