Bioinformatics: Variation in Genes and Protein Structure

Verified

Added on  2023/04/20

|26
|3466
|366
AI Summary
This document discusses the variation in genes and protein structure through bioinformatics analysis. It explores the scope and aim of bioinformatics and its relevance in understanding molecular structure. The document also includes the BLAST format of the gene and the alignment of potential isoforms with given proteins.

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.
Document Page
BIOINFORMATICS
Name of Institution:
Name of Student:
1

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Contents
Introduction......................................................................................................................................3
Literature..........................................................................................................................................4
Fundamental bioinformatics principles,scope and aim...................................................................5
Scope................................................................................................................................................5
Goal..................................................................................................................................................5
The BLAST format of the gene :.....................................................................................................6
The alignment of these potential isoforms with the given proteins as follows:..............................6
The variation of the protein P19801-2 for the gene CASP9..........................................................10
Graphical representation of variation in genes and the regions of genome..................................11
Explanation on the above graphs...................................................................................................15
The helical stretching curves that represents the above graphs.....................................................15
Mathematical analysis of the genes sequences..............................................................................16
The use of covariance in structure classification:..........................................................................16
Relevance of theoretical, mathematical and statistical analisis in genes and regions of genome. 18
Result.............................................................................................................................................19
Discussion......................................................................................................................................19
References......................................................................................................................................20
2
Document Page
Introduction
The nucleotide can be used for the study the variation between the two species. DNA differs
from one organism the variation shall help in understanding the unique nature of the protein.
And also to analyze the evolution of the Genes (Samish, et al., 2015). The variation of this DNA
sequence in a different molecular organism will help understand the molecular structure. The
sequence in the DNA will help in understanding the differences in the protein structure.
The method that could be used to determine the variation in the DNA an organism is protein
structure. The exploration of two species of organism are closely related by evolution ought to
3
Document Page
have been the difference in their protein sequence (Peng, et al., 2012), for example, the sequence
in the Chimpanzee and humans. Another way of determining the variation is through, DNA
hybridization, “which the process is of hybridizes the genetic information from two different
organisms to ascertain the similarities between them (Moore, et al., 2010). A scientist separates
the strands between them from the two species using heat, which breaks the bonds between the
base pairs that link the two sides of the double helix” (Peng, et al., 2012). Then it is chopped to
small parts which are then mixed up to exhibit generic similarity.
It could be possible to determine the variation through gene sequencing, which could also a huge
chunk of genetic information which is used to compare one gene to another (Marz, et al., 2014).
When it is done this way, it looks the strands of DNA OR RNA or proteins. That exhibit
homogeneous sequence or sequence with similar genes in them. When the evolution is closer
there is the possibility of a closer relationship; the less the similar gene structure has changed in
that period (Moore, et al., 2010).
Variation can also be certain by the use of mitochondrial DNA. Mitochondria are used to assist
in constructing evolution relationships on humans; (Karnovsky, et al., 2012) The DNA of
Mitochondria which contains DNA rather than taking the information from another human
being. But because it is inherited, it exclusively comes from the mother rather than the father.
However, the scientist can examine the DNA on humans (Jelizarow, et al., 2010). To reconstruct
the genetical history of where we came from. However, this method was not perfect according to
Action Science, which has criticized the fact that mitochondrial DNA has a high mutation rate
and could not be accurate add (Fernald, et al., 2011).
4

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Literature
The genetical information gathered from all over the world on the variation. However, the
bioinformatics has been occurring for over fifty hundred years (Troshin, et al., 2011). “The
variation has occurred over the period. Which has actualized the need of the research operation?
The time for the foundation laid was in the 1960s with the publication of computerized to
computerized variation sequence analysis. (Notably, de novo sequence assembly biological
sequence databases and the substitution model). Later on, DNA analysis was critically discussed
due to other disciplines for example (I) molecular biology methods, which could be easily
manipulated to DNA (Tikhvinskiy & Porozov, 2013). As well as sequencing. (ii) Computerized
science, which facilitated the increase in miniaturized and more powerful computers as well as
the novel software which significantly increased bioinformatics information management. The
information (that was generated from the data runs through a blast) manipulated through the
technology to bring clear variation in the protein P1980-2.
The name of the gene in focus continue to be P19801-2, and related information regarding the
protein is derived from UniProt (Suplatov, et al., 2011). BLAST format used for the presentation
of the nucleotide sequence and the BLAST is used for the alignment information “as per
(Fernald, et al., 2011).
Fundamental bioinformatics principles,scope and aim of bioinformatics.
Bioinformatics is a interdisplinary between two displines : computer science and biological
science (Marz, et al., 2014). Difinitions exixts in all over the world on the world wide web of
which some are more inclusive than others. For example bioinformatics being a union of
informatics and biology, could easily be described as the technology that uses computers for
storage, retrival , manupilation , and and fuctional analysis of the gene (Karnovsky, et al., 2012).
5
Document Page
Scope
Bioinformatics contains two field which is the development of computation tool of database and
the application of these database to generate biological knowledge to better understanrd the
organisms and their genetical make up (Karnovsky, et al., 2012). The two field work hand in
hand. Writing software are for sequencing, structuring and functioning analysis and also to
develop biological analytical software (Karnovsky, et al., 2012). The new computerized
software could help solve the problem of biological data.
From the three aspects of bioinformatics analysis, the interaction from the database produced
integrated result that is desired in biological research (Marz, et al., 2014).
Refer to
http://www.cambridge.org/9780521840989
Goal
Despite the distinction, bioinformatics helps to understand a living cell, its fuction and
molecular structure. Bioinformatics can generate new insight and provide a global perspective of
the the cell through analyzing raw molecular and structural data.of the cell. The reasons on how
the cell fuction can be easily understood through analyzing the genetical make up for instance
how its DNA is translated to RNA.
6
Document Page
The BLAST format of the gene :
>sp|p19801-2|aoc1_human isoform 2 of amyloid-sensitive amine oxidase [copper-containing]
os=Homo sapiens ox=9606 Gn=aoc1
mpalgwavaailmlqtamaepspgtlprkagvfsdlsnqelkavhsflwskkelrlqpss
tttmakntvfliemllpkkyhvlrfldkgerhpvrearaviffgdqehpnvtefavgplp
gpcymralsprpgyqsswasrpistaeyallyhtlqeatkplhqfflnttgfsfqdchdr
claftdvaprgvasgqrrswliiqryvegyflhptglellvdhgstdaghwaveqvwyng
kfygspeelarkyadgevdvvvledplpggkghdsteepplfsshkprgdfpspihvsgp
rlvqphgprfrlegnavlyggwsfafrlrsssglqvlnvhfggeriayevsvqeavalyg
ghtpagmqtkyldvgwglgsvthelapgidcpetatfldtfhyydaddpvhypralclfe
mptgvplrrhfnsnfkggfnfyaglkgqvlvlrttstvynydyiwdfifypngvmeakmh
atgyvhatfytpeglrhgtrlhthlignihthlvhyrvdldvagtknsfqtlqmklenit
npwsprhrvvqptleqtqyswerqaafrfkrklpkyllftspqenpwghkrtyrlqihsm
adqvlppgwqeeqaitwarteggqpralsqaaspvpgryplavtkyreselcsssiyhqn
dpwhppvvfeqflhnnenienedlvawvtvgflhiphsedipntatpgnsvgfllrpfnf
fpedpslasrdtvivwprdngpnyvqrwipedrdcsmpppfsyngtyrpv
The variation which associated with the protein P1980-2 are,
The alignment of these potential isoforms with the given proteins as follows:
The proteins which are associated with this organisms are namely Eukaryote (Biosample)
(JH921433.1 MULTIGERMTUBI), Eukaryote (Biosample) (JH921437.1), A
(A0A2J8RK70_PONAB), Ascomycota (JH921451.1) (Karnovsky, et al., 2012).
The alignment of the variation sequence is as follows
7

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
P|P19801-2|AOC1 EUKARYOTA
1 MELPITAKAG QQQRQLDRFC RQYIQVTLEL DYPAEEYLRL DAIQQSIFRR
CFSEDVEYTP
61 PPRYKLRVLK ELVKRIETSI QDWDEEAISD DLMNCLTPLL SMSMPNEATA
AQQKSHVTYT
121 LSLLPRQQDI SPSITLHEAR NMLAAAGTTG LRTWEAGLHL GNYLCTNPHL
VRGKSILELG
181 SGTGFLSILC AKYLKPSHVL ATDGDDDVVA SFSTNFYLNG LQDSSDLNGR
ALKWGHPVTG
241 GEDPHWDPER PVDLVLGADL TYDPRNIPPL VSTFRDLFAL YPDAKILIAA
TVRSQETFAK
301 FPEACRKNDF GFEDIEFGML KSEDQEGPFY SDLAHVQLCV ITRT
************************************************************
P|P19801-2|AOC1_ASCOMYCOTA
1 MQLSEAWRKY LTSRSTTYLS TAATHPKTEK KMSPKQARMK TASPSSATMS
LDDLQLQRSA
61 QNPMSQNISP RLLQDGGPGS SSVSELEKAP SLRRPQKVVT VIVGPEDTKE
TYIIHKGIIC
121 YYSPFFNAAF NGNFAEGETQ TMRLDDVNSE TFGLLVDYLY TQQIDVDPKD
YDGNIIPLAQ
8
Document Page
181 LWVIAGRFFM PALQNKIMNE LRTMVEWAEE EGLRKFMHFA YEASVERTPL
KSLATDMMAW
241 MTPAAGLQIW ITKGYLPDGM MADIIMSLKK DHIFGAKPTR KFGVLGRAKE
YYVRVGEEAA
301 APKQEKQGVE KMPTPYFSES CVNIALPPLA LSHIKDNLST SPDMASIIVP
EFVKREQQQL
361 LQAEPEELVT IIVSDGDEEE EYMVHRELIC SCSVYFSYIF NSVIDGSKDN
SVTTLRLEDT
421 DPEIFGLVVR WIYTSDIESA EALSLAKLWM LCAEVHMPVL QNRAMDKIRS
LLRAGVWPGE
481 NLDDIKALVD YAFDANTDRL DRFPLLQKAL VDHFAYLPTG ALDTWMEHLP
ALLLVHLTKS
541 LNSHFNRLPM DLQSWELRKD EQYHVEVLDD RE
************************************************************
P|P19801-2|AOC1_Dermateaceae
1 MELPITAKAG QQQRQLDRFC RQYIQVTLEL DYPAEEYLRL DAIQQSIFRR
CFSEDVEYTP
61 PPRYKLRVLK ELVKRIETSI QDWDEEAISD DLMNCLTPLL SMSMPNEATA
AQQKSHVTYT
121 LSLLPRQQDI SPSITLHEAR NMLAAAGTTG LRTWEAGLHL GNYLCTNPHL
VRGKSILELG
9
Document Page
181 SGTGFLSILC AKYLKPSHVL ATDGDDDVVA SFSTNFYLNG LQDSSDLNGR
ALKWGHPVTG
241 GEDPHWDPER PVDLVLGADL TYDPRNIPPL VSTFRDLFAL YPDAKILIAA
TVRSQETFAK
301 FPEACRKNDF GFEDIEFGML KSEDQEGPFY SDLAHVQLCV ITRT
************************************************************
P|P19801-2|AOC1_Helotiales
ORIGIN
1 MDPPSEEPAT PKKHTRNASS GRGALPRRPT RGPLEVADSP ARPSIKSTPP
PNLRQQQSSQ
61 LSTTASQHAT PRVPSPGPGS NLTASFVTAR TSLSPSRPGS RSKDTSNMST
TSPAIQKDFS
121 FLVRPEIYHP LTLLDVPPPF RAASSEIDPS TSLASLLSSG HFRNAAIKAA
QLLTAPGLDV
181 KDHAAIFSLV YTRLSCLTLC NQTPLAAQEV KALEDLNSGY YRDDLTGAHM
VPWELRVLAV
241 RLQGMGFNDA RRGVMGYYDL AREARSMLNK LKRKRRKEEI GDDAARAELE
GIKVWEARLE
301 ELGVRVASAL VEMEDLEGAA RFLGTLKPET GTRLEIQKAL LWLCIGDVEA
ARKCVLGKGD
10

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
361 GHEQKVILAL AHMADSDFAA AVETWRALIA SDAAEDDGGE KAMWMQNLGV
CYLYLGRMDE
421 ARTLLESLIS GAQDLHAFHF HALTFNLCTI YELCTERSRG LKIALAERVA
GMQQQQGDGG
481 GSNGGWEKVN GDFKL
The variation of the protein P19801-2 for the gene CASP9 (Karnovsky, et al., 2012).
https://www.biorxiv.org/content/biorxiv/early/2017/12/15/234856.full.pdf
The variation related to CASP9 carried the patient. The graphical representation is of the
caspase-9 gene boxes that would represent the exon (Moore, et al., 2010); arrow shows the
mutation (b) a number indicates amino acid position. CARD, Caspase recruit domain; LD shows
11
Document Page
the link between two subunits; LSCD: large subunits catalytic domain; small catalytic domain
(Karnovsky, et al., 2012).
Graphical representation of variation in genes and the regions of genome (Jelizarow, et al.,
2010) (Marz, et al., 2014) (Peng, et al., 2012).
(Moore, et al., 2010)
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5411762/
12
Document Page
(Jelizarow, et al., 2010)
library.cshl.edu/resources/databases/bioinformatics-genomic
13

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
(Peng, et al., 2012)
14
Document Page
(Moore, et al., 2010)
(Troshin, et al., 2011)
library.cshl.edu/resources/databases/bioinformatics-genomic
15
Document Page
https://omictools.com/genome-annotation2-category
16

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
(Karnovsky, et al., 2012)
Explanation on the above graphs
The notation expression of ICOS, CD25 and HLA-DR have assessed in CD3 +, CD4 + and CD8
+ T cells by two-color immunofluorescence (Samish, et al., 2015). The results showed that CD19
+ PBMC H237P expressed lower levels of BAFFR than CD19 + PBMCs GFP, whereas the
expression of TACI, HLA-DR, B7h, and CD19 was similar in the two cell preparations (data not
shown). Moreover, CD3 + PHA-PBMC H237P expressed lower levels of ICOS than CD3 +
PHA-PBMCs GFP, which was detected in both the CD4 + and CD8 + T-cell subsets (data not
shown), whereas expression of CD3, CD4, CD8, CD25, and HLA-DR was similar in the two cell
preparations (data not shown). (Samish, et al., 2015)
The helical stretching curves that represents the above graphs.
17
Document Page
When the path from 1MP6 to 2K98. It changes the left structure, transform its shape to 2K98,
the right-most structure, by bending it in the middle, helical structure. From the current elasticity
approach, it gives several meaningful details of the geodesic path since the bending and
stretching is not compromised (Samish, et al., 2015). But the curves can be restricted through
manipulating geodesic paths can show that conformation changes or dynamics of the protein
structures are compared (Samish, et al., 2015).
https://omictools.com/genome-annotation2-category
1.
2.
Mathematical analysis of the genes sequences.
www.internationalgenome.org/
The use of covariance in structure classification:
18
Document Page
The features is being able to calculate a covariance for the population of proteins structure. The
group of given sequential variation among groups of the protein structures. Its uses can be to
classify the protein structures (Suplatov, et al., 2011). The direction can also be identified in
large variation like a group of proteins. The shape space is S. Therefore the mapping that can be
used is (Samish, et al., 2015).
:
You get vires by changing the movements of the proteins using the mean μ to qis. The e
computation of the standard variation matrix can also be done by single value deposition = UΣUt.
“Where Σ is a matrix for diagonal singular values (σ 1, σ 2, and σ 3…) and U contains the
corresponding singular vectors (Suplatov, et al., 2011). If the singular values are arranged in
decreasing order, the first few, say k, columns of U represent the directions of major variation, or
the principal components, in the underlying population. If we let z 1, z 2, zk be independent
standard normal random variables, we can define a multivariate normal density on the
direction v according to As obtained from (Jelizarow, et al., 2010).
The random movement can also be converted into the SRVF as a random shape in mapping.
That can be further integrated as:
19

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
The variance can be obtained as:
The deviation of the four genetical components of the P1908-2 protein with values:4,36,45,75
can be obtained as follows:
N=5 and the mean for x= 42
Xi Xi-X (Xi-X)2
4 -38 1444
36 -6 36
45 3 9
50 8 64
75 33 1089
∑Xi= 210 ∑(Xi-X)2= 2642
From the outcome the variance is 2642=660.5, and its standard deviation is √2642/5= 32.5
Coefficient of variation and the standard variation is calculated by:
20
Document Page
For this indication shall be 32.5/42 = 0.77. which means the size variation of different bases of
the protein that results to its differentiation among the organisms (Karnovsky, et al., 2012).
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5880901/
Relevance of theoretical, mathematical and statistical analisis in genes and regions of
genome.
The information obtained from the database of the genes analysis shall be very useful if they are
obtained from various databases. The statistical biological data from the databes on the variation
of the genes in the genome provides new feeds to the to users f the information (Fernald, et al.,
2011). However , the biological field the is always need for secondary and specialized database
to upload the sequence notation. The clear heart of bioinformatics analysis shall clearly show the
fuctional and structural analysis of the genes in their newly determined sequence. As these
sequences are are being generated at the exponential rate and it turns to important fuctional and
comparision inference (Jelizarow, et al., 2010).
Result
Therefore, it can show the current performance of our method of the structure of the protein
classification by use of SCOP database (Boulesteix, 2013). “Compare the result to the high
degree of SCOP levels (and alpha, all beta, alpha/beta +beta)” which determines how the protein
vary in structure” (Samish, et al., 2015). Therefore it can be found in a particular group of the
21
Document Page
population in a dataset and correlated consequences severity data. In general, variation is
discovered as polymorphisms. They are observed in population greater than one . for instance the
sequencing data, the Exome Variant sever which provides data for over 200000 individuals and
the dbSNP. Among the 373 analyzed, the rating was over five . 1 to 2 was common, 3t were rare,
and for 4 to 5 no frequency was allocated. And in one case the data was discordant between 1000
genomes and ESP.
Discussion
Therefore, the developed mathematical actualize and provide clear differentiation on the
structure of the protein. It helps to analyses and gives the clear outcome of the distinguishing
structures given two organisms (Suplatov, et al., 2011). And from this analysis, the protein
structure is compared by three elastic curves and treated as random variables for statistical
analysis. From that, the mean and the conversance can be ascertained (White, et al., 2009). The
probability that is obtained can be used to build a population of the structures of the protein. And
the hypothesis obtained can be used to test for protein structure and the known protein or family.
The protein structure has been studied for a longer period. Mutational computation has been
made for protein comparison as per what already is known (Suplatov, et al., 2011). And therein
the sequencial study of the genes study of the genomes enables us to properly understanding the
different in characters that different organisms display especially if they are of the same species
(Moore, et al., 2010).
22

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
23
Document Page
References
Boulesteix, A. L., 2013. On representative and illustrative comparisons with real data in
bioinformatics. Journal of Bioinformatics, 29(20).
Fernald, et al., 2011. Bioinformatics challenges for personalized medicine. 27(13), p. 8.
Jelizarow, M. et al., 2010. Over-optimism in bioinformatics: an illustration. Journal of
Bioinformatics, 26(16).
Karnovsky, A. et al., 2012. Netscape 2 bioinformatics tool for the analysis and visualization of
metabolomics and gene expression data. Journal of Bioinformatics, 28(03).
Marz, M. et al., 2014. Challenges in RNA virus bioinformatics. Journal of Bioinformatics,
30(13).
Moore, J. H., Asselbergs, F. W. & Williams, S. M., 2010. Bioinformatics challenges for genome-
wide association studies. Journal of Bioinformatics, 26(04).
Peng, H., Bateman, A., Valencia, A. & Wren, J. D., 2012. Bioimage informatics: a new category
in Bioinformatics. Journal of Bioinformatics, 28(8).
Samish, I., Bourne, P. E. & Najmanovich, R. J., 2015. Achievements and challenges in structural
bioinformatics and computational biophysics. Journal of Bioinformatics, 31(01).
24
Document Page
Suplatov, D. A., Arzhanik, V. K. & Svedas, V. K., 2011. Comparative Bioinformatic Analysis of
Active site Structures in Evolutionarily Remote Homologues of Hydrolase Superfamily
Enzymes. 3(1).
Tikhvinskiy, D. A. & Porozov, Y. B., 2013. Bioinformatics and Tools for Computer Analysis
and Visualization of MAcromolecules. Russian Open Journal, 2(1).
Troshin, P. V., Procter, J. B. & Barton, G. J., 2011. ava bioinformatics analysis web services for
multiple sequence alignment--JABAWS:MSA. Journal of Bioinformatics, 27(14).
White, A. M. et al., 2009. ELISA-BASE: an integrated bioinformatics tool for analyzing and
tracking ELISA microarray data. Journal of Bioinformatics, 25(12).
25

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
26
1 out of 26
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]