CSE5BIO Bioinformatics: Analyzing Biological Big Data Management

Verified

Added on  2023/06/12

|15
|720
|386
Homework Assignment
AI Summary
This assignment focuses on the management and analysis of biological big data, emphasizing its sources, types, and challenges. It covers genomics, transcriptomics, proteomics, metabolomics, and cytometry, highlighting the need for noise removal and data organization. The assignment discusses bioinformatics tools, statistical analysis methods, and data storage systems like HDFS and NoSQL databases. It also addresses the integration of biological data from various sources and concludes with the impact of big data on bioinformatics, emphasizing the importance of public databases and data management projects. The document acknowledges various sources and research papers related to biological big data and bioinformatics.
Document Page
BIOLOGICAL BIG DATA
MANAGEMENT
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
BIOLOGICAL DATA
Biological data is the data which comes from every living organisms.
Biological data have a important information about a particular organism.
Biological data can be from human, animals, plants etc.
Biological data retrieves with help of biotechnology, bioinformatics and branches
of science.
Biological data can be a sequence, image, pattern, models, hypothesis or any
evidence etc.
Document Page
Document Page
TYPES OF BIOLOGICAL DATA
Genomics
Nucleotide sequences
Study of gene, gene finding, mutations prediction, sequence alignment
Transcriptomics
RNA expression, transcription factor
Proteomics
Protein structure, interaction, identification
Protein prediction, structure, function prediction, structure comparison, molecular dynamics, simulation, docking.
Metabolomics
Small molecules study/ metabolite
Network analysis, pathway analysis
Cytometry
Cell level study, cell population
Population clustering, cell biomarker finding
System Biology
Study all these above
Simulation, modeling, networking study,
target prediction for drugs
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Document Page
BIOLOGICAL DATA
ANALYSIS
Now days biological data is generating on large scale and
it decreases its quality
This data comes from different ways like metagenomic
data analysis or big data analysis
Having noise and more dimensions
Need to remove all noise
Make data in understandable format
Prepare pipelines for the analysis of biological data
Document Page
Biological data analysis starts with the development of
sequence algorithm
It generate multiple copies side by side
Needs its statistical analysis
Select the specific algorithm after setting the specific
parameter
Development and its testing needs a huge data storage
and its retrieving system
Handling of such big data like genomics and proteomics
needs good data management ways
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
BIG DATA
Document Page
BIG DATA MANAGEMENT
Bioinformatics tools still depend on the files stored locally
Statistical analysis method is also complex
Nowadays, handling of big data is big issue
Example: META-pipe pipeline which is from metagenomic
Works when dataset is of small size
Now datasets are very large and data is stored in a global
file system
There is also a distributed data storage system such as
HDFS, GPFS and the Google file system
Others system also there like RDBM System, NoSQL cluster
Rational databases(MySQL) are connected to distributed
databases
Example: Neo4j is graph data base used in bioinformatics
for protein interaction.
Document Page
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
INTEGRATION OF
BIOLOGICAL DATA
Everyone can store, access and analyze the biological
data which is available one the private side or web.
Integration of all biological data into a single source make
easy to access the data
Bioinformatics and I.T. gives solution:
- NCBI Entrez
- Multi- databases(TAMBIS)
- Data warehousing
- Distributed data system
Document Page
chevron_up_icon
1 out of 15
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]