logo

Reproducible Research: A Key Component of Data Science

   

Added on  2023-03-30

10 Pages1882 Words153 Views
Running Head: COURSE DATA VISUALIZATION
Course Data Visualization
Name of the Student
Name of the Organization
Author Note

1COURSE DATA VISUALIZATION
Reproducible Research
Reproducible research is a term, which is produced as an end-result of a research on a
specific topic having both academic research as well as the laboratory notes used in the process.
The computational capabilities of the paper in a research having data collected, code designed
for the purpose is also able to recreate something completely new with same outcome of the
research data in a similar environment is termed as reproducible research (Begley and Ioannidis,
2015). Typically a reproducible research is made of a collection of data, text files and series of
codes, which are arranged with the help of programming languages like R, Markdown with
available source of document or even with the help of a Jupyter notebook.
Data science as a domain has some key features for integration of data comprising of
reproducibility and replicability. Reproducible research can also be termed as the innate ability to
conduct analysis of data in order to achieve similar results in a new research. The reproduction of
data as well as replication of data is directly related to generation of data, whereas reproduction
of research is basically to repeat the analytical method.
The importance of reproducible research is vastly related to correctness of a gathered
evidence, newly observable points in a previously conducted research as well as to increase the
data analysis complexity.
Accuracy of observed output in a previously conducted research is reproduced in order to
confirm the correctness of a research (Boettiger, 2015). This helps in understanding the accuracy
of a result, when a previously conducted research, when repeated by someone else and achieve
similar results it is concluded to be accurate gathering and analysis of data. Similar scenario is

2COURSE DATA VISUALIZATION
applicable for negative output of a research that helps to analyze the required changes necessary
to be made to the initial results in the first place.
Observations pertaining to a research sometimes is analyzed by using a different
approach using unrelated analysis methods (Xie, 2014). Sometimes, on conducting the research
different conclusions are established by dissimilar findings of two similar research analyzed with
different approaches creating newly observable results for the research.
Increase in the complexity of analyzed data has grown at an exponential rate. The use of
data sets, which are comparatively larger require more sophisticated computational procedures
(Leek and Peng, 2015). Thus, there is an increase of demand of creating a reproducible research
for decreasing the potential occurrence of error. Human error is a prime section, which is
unavoidable, however by repeating a research to find out errors help to achieve accurate results
where reproducible research play an important role.
Jupyter Notebook is used in reproducible research in order to facilitate a structured
design of the entire process (Coombs, 2015). There are three phases involved in conducting the
procedure, which include organizing and documenting, operating on the code and preparatory
mechanisms for sharing the conducted work. There are ten rules for dividing the procedure,
which involve creating a story for the audience, documentation of procedure, division to create
easier steps, modularizing the code, recording the involved dependencies, using version control,
creation of a pipeline, explanation of data used, enabling the notebook for exploration after run
and read and final part include contribution for reproducible research.

End of preview

Want to access all the pages? Upload your documents or become a member.