Bioinformatics and Its Applications: Tools, Techniques, and Examples

Verified

Added on 2019/10/31

AI Summary

This report provides a comprehensive overview of bioinformatics and its various applications. It begins with a summary of bioinformatics, defining it as an interdisciplinary field integrating genetics, molecular biology, computer science, and statistics, and outlines its three fundamental sub-disciplines: algorithm creation, data evaluation, and tool application. The report then details the different types of biological data used in bioinformatics, including nucleic acids (DNA and RNA), structures of biological molecules, gene expression profiles, biochemical pathways, and phylogenetic data, along with the databases where this data is stored. The core of the report focuses on specific examples of bioinformatics applications, including Matcol, a cancer diagnostic tool; Genomic Target Scan (GT-Scan), a web-based tool for identifying potential genome targets; the development of a novel drug for viral diseases; and MEME Suite, a toolkit for motif-based sequence analysis. Each example is explained in detail, highlighting its functionality, applications, and significance in advancing biological research and healthcare. The report concludes by emphasizing the importance of bioinformatics in solving biological problems through the use of advanced information and computational technologies.

Running head: BIOINFORMATICS AND ITS APPLICATIONS 1
Bioinformatics and its Applications
Saleh Albahouth
S3494489

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

BIOINFORMATICS AND ITS APPLICATIONS 2
Contents
Summary..........................................................................................................................................2
Specific examples of bioinformatics application.............................................................................5
Matcol – A cancer diagnostic tool...............................................................................................5
Genomic Target Scan (GT-Scan).................................................................................................6
Viral Diseases Drug.....................................................................................................................7
Meme Suite..................................................................................................................................8
Conclusion.....................................................................................................................................10
Reference.......................................................................................................................................11

BIOINFORMATICS AND ITS APPLICATIONS 3
Bioinformatics and its Applications
Summary
Bioinformatics is an interdisciplinary field comprising of genetics, molecular biology,
mathematics, statistics and computer science. This field can simply be envisioned as a merger
between biology, information technology (IT) and computer (Can, 2014). Primarily,
bioinformatics aims to facilitate the discovery and characterisation of new biological insights and
design a global viewpoint from which unifying biological principles can be discerned.
Bioinformatics has three fundamental sub-disciplines, which are pursued by bioinformaticians
globally. Firstly, the creation of new statistics and algorithms that can be used to identify
relations among individual members of large data sets (Chiang, 2009). Secondly, it is the
evaluation and elucidation of different kinds of data such as nucleotides and amino acids, protein
structures, and protein domains. The lastly sub-discipline is the creation and application of tools
that allow for systematic assessment and organisation of various types of biological information.
Evidently, the definition and understanding of bioinformatics are not universal globally.
Researchers, however, agree that bioinformatics is the creation of advanced information as well
as computational technologies for solving biological problems (Wightman & Hark, 2012). As a
result, this field entails the storage, retrieval, evaluation and interpretation of biological data.
There are four steps in deriving a bioinformatics solution. The initial step entails the
collection of statistics from biological data. In the second stage, a computational model is built.
Next, the computational modelling problem is solved. The fourth step entails testing and
evaluation of a computational algorithm (Can, 2014).
Types of biological data

BIOINFORMATICS AND ITS APPLICATIONS 4
Various types of biological data are used in bioinformatics including nucleic acids,
structures of biological molecules (mainly 3D structures), biochemical pathways, gene
expression profiles, and phylogenetic data (Rigden, Fernández-Suárez, & Galperin, 2016). The
two kinds of nucleotides are (DNA) and (RNA). DNA is the building block of life otherwise
known as genetic material. It is the material that is inherited and passed from one generation to
the other.
Structures of biological molecules are important to biologists, since macromolecules
handle most functions of the cells. Bioinformatics has a specific focus on the three-dimensional
structures of macromolecules. Gene expression profiling entails the measurement of the
expression of thousands of genes at a short time to design a global picture of cellular function.
DNA mediates the synthesis of RNA, and controls protein synthesis through RNA, which is a
process known as gene expression. Gene expression profiling can show cells that are actively, or
exhibit how particular cells react to a given treatment. In gene expression profiling, an entire
genome can be measured simultaneously meaning that every gene in a given cell can be
characterised. Several transcriptomics technologies are used to generate the essential data for
evaluation. DNA microarrays are designed to measure the relative activity of already pinpointed
target genes. Sequence based techniques such as RNA-Seq, offer information on the sequences
of genes plus their expression level. Mantione and colleagues predict that RNA-Seq will form
the future of data collection in bioinformatics (Mantione, et al., 2014).
Another kind of data that is used in bioinformatics is biochemical pathways. Biochemical
pathways are a series of interlinked chemical reactions happening in a cell. Metabolites are the
primary reactants on metabolic pathways, and various specific enzymes catalyze the reactions. In
metabolic pathways, the products of one enzyme act as the substrates for the next. Specific

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

BIOINFORMATICS AND ITS APPLICATIONS 5
metabolic pathways occur based on the certain position in a eukaryotic cell and the importance
of the metabolic pathways in the particular compartment of the cell. For example, oxidative
phosphorylation, citric acid cycle, electron transport chain happens in the mitochondrial
membrane (Campbell, Farrell, & McDougal, 2016). On the other hand, fatty acid biosynthesis,
glycolysis and pentose phosphate pathway take place in the cytosol of a cell (Voet, Voet, & W,
2013).
Phylogenetic data is also of great importance to biologists. Phylogeny is crucial in
bioinformatics because it expounds knowledge and delineates how genes, genomes and species
evolve. For instance, phylogenetic data has been used to successfully study the evolution of
genomes and genes in drosophila (Clark, et al., 2007). Molecular sequences are the main focus in
phylogeny. Molecular biologists assert that phylogenetic data can be used to trace the evolution
of a certain sequence to the current date, and even in the prediction of how the sequence will
change in the future.
These data sets are stored in various biological databases. The biological databases are
categorized based on the specific data they hold. Currently, there are databases for nucleotides,
proteins, protein structures, and genomes maps. Two of the commonly used biological databases
are the Universal Protein Resource (UniPort) and European Nucleotide Archive (ENA). UniPort
is a resource for functional annotation and protein sequences (UniProt Consortium, 2008). These
biological databases contain complete sets of nucleotide and protein sequences from all
organisms that have been published (deposited) by the international research community. There
are, however, specialized biological database such as organism specific and functional databases.
Organism specific databases contain sequences of data from different organisms such as human

BIOINFORMATICS AND ITS APPLICATIONS 6
and mouse. On the other hand, functional databases include vector database and TRANSFAC:
Transcription factors.
Specific examples of bioinformatics application
Matcol – A cancer diagnostic tool
Mactol is a bioinformatics tool that has been developed to enhance early detection of
cancer. Mactol helps in the identification of protein and DNA colocalisation visualised through
fluorescence microscopy. The development of this tool was motivated by the fact that pixel
intensity-based coefficients cannot be used to study object-based colocalisation in biological
systems (Khushi, et al., 2017). Matloob Khushi, who developed this tool, acknowledged that
single image analysis is slow and takes many hours. Thus, this novel innovation can succeed
manual co-localisation counting (Australian Cancer Research Foundation, 2017). Besides, it can
be used in many biological areas. The tool automates the traditional quantification task and can
quantify multiple, possibly hundreds of images automatically in a short time. Mactol identifies
regions of fluorescent signal in two channels, determines the co-located sections of these regions
and calculates the statistical significance of the colocalisation (CMRI, 2017). The features of
Matcol allow users to view an area of interest and customise several parameters to analyse the
region of interest completely. Unlike traditional tools that focus on pixel intensity-based
correlation, Matcol is meant to visualise object-based colocalisation. It has a threshold multiplier
that filters the background. Cannistraci and colleagues note that the removal of background
minimises the visualisation of false-positive signals (Cannistraci, Montevecchi, & Alessio,
2009). This bioinformatics tools is a breakthrough in cancer detection and might assist
researchers in designing novel therapies to treat cancer in its early stages.

BIOINFORMATICS AND ITS APPLICATIONS 7
Genomic Target Scan (GT-Scan)
GT-Scan is an online bioinformatics tool (web-based) that organises the possible targets
in a user-chosen section of a genome based on the number of off-targets available (O’Brien &
Bailey, 2014). This bioinformatics tool offers the users’ flexibility to determine the required
attributes of targets as well as off-targets through a straightforward “’-target rule’-”. In addition,
GT-Scan delivers an interactive output allowing for comprehensive scrutiny of all the potential
candidate targets. GT-Scan is mainly used to distinguish the most favourable targets for
(Clustered Regularly Interspaced Short Palindromic Repeats) CRISPR/Cas systems (O’Brien &
Bailey, 2014). However, the tool can be used for other genome-targeting techniques because it is
highly flexible.
GT-Scan utilizes the basic idea of genome targeting. The initial stage in the successful
application of a genome targeting technique is to determine the potential target or targets with
the section of interest that possess the least number of off-targets. In some instances, the
potential target might be a gene, promoter or exon. The classical targets are sub-sequences within
the desired part that have no similar copies in another section within the genome.
An attribute that makes GT-Scan reliable is the interactive output. The interactivity of the
output enables the users to evaluate a potential target and the traits of is possible off-target.
These include points of incongruous, number of incongruous and even genomic location.
Currently, the web site support targets selections in over 25 Ensembl genomes (O’Brien &
Bailey, 2014). When using the GT-Scan, researchers choose a suitable genome from a list and
submit a DNA sequence of the genomic section in which they want to determine ideal targets.
Several options are available to the users based on how they want to perform the identification.
A user can select a rule-pair or design their personal customized rule-pair. A candidate target is a

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

BIOINFORMATICS AND ITS APPLICATIONS 8
point in the specific genomic section that complements the target rule. In all candidate targets,
the tool records the possible off-target in the genome that has less than three incongruous in the
candidate targets as well as match congruous off-targets filter. Also, the researcher can
independently control the number of incongruous in off-targets. In summary, GT-Scan helps
users to answer two specific questions. The first question is what the ideal candidate targets in
the genomic section of interest are? The second question is the number of potential off-targets in
the target genome being used by the researcher (GT-Scan, n.d.).
Viral Diseases Drug
Australian researchers in collaboration with international partners have developed a novel
drug for the treatment of common viral disease. The drug was developed following analyses of
nucleotides, and gene expression profiles. In their quest to characterise the occurrence of viral
diseases, the researchers found that NOX2 oxidase is activated by single stranded DNA and
RNA viruses in endocytic compartments. Once triggered, NOX2 generates endosomal hydrogen
peroxide, which subdues the body’s humoral and antiviral signalling networks (To, et al., 2017).
As a result, the body’s ability to fight viral diseases is suppressed, and the viral infection
becomes virulent. Many people experience this pathogenesis since NOX2 is found in most viral
such as common cold, influenza, HIV and dengue fever. The primary research on the action of
NOX2 was based on mice, and human subjects are yet to be included in the study.
NOX enzyme is unavailable in prokaryotes but evolved approximately 1.5 billion years
in single cell eukaryotes. The enzyme is present in eukaryotic groups such as algae, fungi,
amoeba, and nematodes. After characterising the enzyme, the team of scientists designed a novel
drug that proved effective in mice. Specifically, the prototype drug inhibited the effect of NOX2
oxidase in mice. The customised drug suppressed the disease caused by influenza infection.

BIOINFORMATICS AND ITS APPLICATIONS 9
However, the drug is still undergoing development and will only be available to humans after
five years. This novel viral drug developed using bioinformatics technology aims to improve the
efficiency of treatment. The current treatment techniques are limited because they target
circulating viruses and have an uncertain or minimal impact against new viruses that affect
humans.
Empirical evidence suggests that flu virus results in the hospitalisation of 13,500
Australians and results in 3000 deaths among the population aged over 50 years. Even the global
burden of the flu virus is increasing. It has been found that approximately five million cases of
infections are reported annually, and about 10% of these cases lead to death (Kenrick, 2017).
Based on this finding, the discovery of a viral diseases drug is a major milestone towards
addressing the disease burden.
Meme Suite
MEME Suite is a web-based and software toolkit for conducting motif-based sequences
analysis and is accessible through meme-suite.org (MEME Suite, n.d.). Motif sequence
inspection is important in the multiple scientific contexts. As such, Suite software is a
fundamental toolkit for studying biological processes comprising of RNA, DNA and proteins.
The toolkit has been used to analyse results for approximately 9800 published papers (Bailey, et
al., 2015). Onset of proteomics and genomics means that many researchers will need to conduct
motif analysis and thus, MEME Suite will become more important. Even before the advent of
these fields of study, MEME Suite has been used widely for biological discoveries.
On the web, MEME Suite contains several tools and integrated databases used to perform
motif analyses. The basis of the suite is the “meme motif discovery algorithm”. This meme

BIOINFORMATICS AND ITS APPLICATIONS 10
searches for the motif in unaligned collection of protein sequences, RNA and DNA. From its
discovery and launch, meme continues to gain popularity in the scientific field. For instance, in
2014, the meme gained about 2014 alone unique users (Bailey, et al., 2015).
The MEME Suite was developed based on the existing understanding of motif. An RNA,
DNA or protein motif sequence is a small pattern that is consistent within evolutions. Ideally, a
motif is conserved by the evolution. In either of these sequences, a motif might correspond to
different sites. For instance, in DNA motifs might be homologous to specific protein-binding
sites. On the other hand, in proteins, motifs might correspond to the active sites of enzymes. In
proteins, the motif might still correspond to a structural unit essential for correct folding of the
specific protein. Hence, a sequence motif is among the elementary functional units of molecular
evolution. Due to these facts, determining and characterising the motif is important to designing
models of cellular processes. The identification of the motifs is further important to understand
the mechanisms and pathophysiology of human diseases.
The MEME Suite toolkit consists of 13 tools for conducting motif discoveries, motif
enhancement scrutiny, motif inspection and motif-motif correlation. The newest six tools in the
MEME Suite toolkit are MCAST, DREME, MEME-ChIP, AME, CentriMO, SpaMo (Bailey, et
al., 2009). When performing motif discoveries and motif enhancement, the users give a set of a
unaligned protein sequence, RNA, or DNA sequence. Customarily, the sequences may be
promoters of coexpressed genes or proteins with a common role.
Motif discoveries locate de novo motif in the deposited sequence. A researcher can then
deposit the motif instantly, to the scanning and correlation tools with the MEME Suit to
determine any other protein or genomic sequence with the identified motif. This process might

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

BIOINFORMATICS AND ITS APPLICATIONS 11
also aim to discern if the motif is homologous to formerly studied motif. The Suite offers a wide
array of genomic and proteomic sequence databases for motif inspection and numerous motif
databases for motif correlation.
The MEME Suite toolkit has a flexible and straightforward user interface to facilitate fast
motif analysis. All the input fields explain the specific information that is needed, how to input
the information and in most cases, an example is provided. A question mark (?) guides the user
to get the required help (Bailey, et al., 2015). The whole interface of the MEME toolkit is
flexible and consistent. For instance, a user can input the required information in a certain field
by typing or cut-and-paste or choosing a file for upload.
Conclusion
Australia has a huge opportunity in the growing bioinformatics industry. Specifically,
there is an opportunity in leveraging the benefits of bioinformatics in medical and health
research. Stakeholders across the board now assert that the importance of bioinformatics
stretches beyond biotechnology into medical and health research. The application of computers
and information technology to match and analyse gene sequences allows a better understanding
of causes of diseases and difference in impact across different populations. The application and
utilisation of these techniques will give the country a competitive advantage in the
pharmaceutical industry. This paper has highlighted the Matcol tool, MEME Suite, development
of viral diseases drug and GT-Scan as some of the breakthroughs in the application of
bioinformatics. Although some of these applications are still in the trial stage, they are
fundamental in setting the ground for major achievements through bioinformatics.

BIOINFORMATICS AND ITS APPLICATIONS 12
Reference
Australian Cancer Research Foundation. (2017, 9 12). New bioinformatics tool to improve the
early detection of cancer. Retrieved 9 12, 2017, from https://acrf.com.au/news/new-
bioinformatics-tool-to-improve-the-early-detection-of-cancer/
Bailey, T., Boden, M., Buske, F., Frith, M., Grant, C., Clementi, L., & Noble, W. (2009). MEME
SUITE: tools for motif discovery and searching. Nucleic acids research, 37(suppl_2),
W202-W208.
Bailey, T., Johnson, J., Grant, C., & Noble, W. (2015). The MEME suite. Nucleic acids
research, 43 (W1), W39-W49.
Campbell, M. K., Farrell, S. O., & McDougal, O. M. (2016). Biochemistry. Cengage Learning.
Can, T. (2014). Introduction to bioinformatics. Methods Molecular Biology, 1107, 51-71.
Cannistraci, C., Montevecchi, F., & Alessio, M. (2009). Median-modified Wiener filter provides
efficient denoising, preserving spot edge and morphology in 2-DE image processing.
Proteomics, 9(1), 4908-4919.
Chiang, J. H. (2009). Tech-ware: Bioinformatics and computational biology resources [Best of
the Web]. IEEE Signal Processing Magazine, 26(5), 153-158.
Clark, A., Eisen, M., Smith, D., Bergman, C., Oliver, B., Markow, T., . . . Pollard, D. (2007).
Evolution of genes and genomes on the Drosophila phylogeny. Nature, 450(7167), 1-56.
CMRI. (2017, 9 7). Researcher Devises Tool To Speed Up Cancer Discovery. Retrieved 9 12,
2017, from http://www.cmri.org.au/News/Latest-News/CMRI-researcher-devises-tool-to-