ProductsLogo
LogoStudy Documents
LogoAI Grader
LogoAI Answer
LogoAI Code Checker
LogoPlagiarism Checker
LogoAI Paraphraser
LogoAI Quiz
LogoAI Detector
PricingBlogAbout Us
logo

Bioinformatics: Storing and Searching Genetic Information

Verified

Added on  2023/04/20

|15
|2928
|245
AI Summary
This article discusses the storage and search algorithms used in bioinformatics to store and search genetic information. It covers the use of databases like Swiss-Prot and EMBL for storing protein and nucleotide sequences, as well as the use of BLAST and FASTA for sequence searching. It also explores the use of MSA tools like MUSCLE, T-Coffee, and Clustal Omega for multiple sequence alignment. The article focuses on the gene P19801-2 and provides its FASTA format and alignment information.

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.
Document Page
Running head: BIOINFORMATICS
BIOINFORMATICS
Name of the Student:
Name of the University:
Author Note:

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
1BIOINFORMATICS
Genetic information gathered from genome sequence all over the world in regard to
proteins and DNA must be stored in an intelligent and organised manner, so that it can be
availed easily. Thus, this information is stored in electronic databases. Separate databases are
available from nucleotide and protein sequences1. Example of a database for protein
sequencing and their mechanism is Swiss-Prot and example of nucleotide sequencing a
database is EMBL. With the storing of information, the need for searching these huge
databases also arises. BLAST (Basic Local Alignment Search Tool) and FASTA are two
searching algorithms for sequence searching in those databases. These genetic information
can be represented by the BLAST and FASTA electronic form and extracted sequence from
these databases presented in these format as an output result.
These Nucleotide and protein sequences can be used to study variation between two
organisms. Although the DNA sequence is unique to each species, a certain percentage of
similarities remains between two related organisms. Genetic variation or mutation is the
driving force for the evolutionary process in nature. Scientist and taxonomist use these
similarities to place newly found species in the animal or plant kingdom. Differences or
mutation in DNA sequence due to different molecular mechanisms. These differences
generally occur due to a small change in local DNA sequence, rearrangement of DNA
segments. A different direction in evolution pathway for these species are responsible for
these changes but maximum similarities remain between the closely related organisms.
However, genetic variations in nature commonly occur randomly instead of targeted
response. Hence, it can be argued that evolution happens because of these sequence change
rather than vice versa2. DNA is responsible for the creation of proteins in a biological
organism. As DNA is similar between two closely related organisms, thus two closely related
1 Pevsner, Jonathan. Bioinformatics and functional genomics. John Wiley & Sons, 2015.
2 Arber, Werner. "Molecular mechanisms driving Darwinian evolution." Mathematical and Computer
Modelling 47, no. 7-8 (2008): 666-674.
Document Page
2BIOINFORMATICS
organisms have similar protein structure. DNA-DNA hybridization is a process by which the
similarities between the two related organisms can be determined3.
To derive the homology and evolutionary relationships between two organisms
alignment of two or more similar length biological sequence is needed. MSA or Multiple
Sequence Alignment is a process which can perform the above-mentioned mechanism. There
are many MSA tool available right now and MUSCLE, T-Coffee, MView, Clustal Omega,
and Kalign are few examples of these tools. MUSCLE is an accurate tool for MSA and it is
particularly good for protein sequences with medium to large long protein sequences. T-
Coffee is suitable small alignments and is consistent MSA tool. Clustal Omega is a relatively
new MSA tool and it uses the HMM profile to profile techniques for the generation of
alignments. It is also suitable for medium to large alignments.
The name of the gene is in focus for this article is P19801-2 and related information
regarding this protein is derived from UniProt. FASTA format is used for the presentation of
the nucleotide sequence and BLAST is used for the alignment information and phylogenetic
tree. Clustal Omega is used for Multiple Sequence Alignment.
The FASTA format of this gene is as follow:
>sp|P19801-2|AOC1_HUMAN Isoform 2 of Amiloride-sensitive amine oxidase
[copper-containing] OS=Homo sapiens OX=9606 GN=AOC1
MPALGWAVAAILMLQTAMAEPSPGTLPRKAGVFSDLSNQELKAVHSFLWSKKELRLQPSS
TTTMAKNTVFLIEMLLPKKYHVLRFLDKGERHPVREARAVIFFGDQEHPNVTEFAVGPLP
GPCYMRALSPRPGYQSSWASRPISTAEYALLYHTLQEATKPLHQFFLNTTGFSFQDCHDR
CLAFTDVAPRGVASGQRRSWLIIQRYVEGYFLHPTGLELLVDHGSTDAGHWAVEQVWYNG
KFYGSPEELARKYADGEVDVVVLEDPLPGGKGHDSTEEPPLFSSHKPRGDFPSPIHVSGP
RLVQPHGPRFRLEGNAVLYGGWSFAFRLRSSSGLQVLNVHFGGERIAYEVSVQEAVALYG
GHTPAGMQTKYLDVGWGLGSVTHELAPGIDCPETATFLDTFHYYDADDPVHYPRALCLFE
MPTGVPLRRHFNSNFKGGFNFYAGLKGQVLVLRTTSTVYNYDYIWDFIFYPNGVMEAKMH
ATGYVHATFYTPEGLRHGTRLHTHLIGNIHTHLVHYRVDLDVAGTKNSFQTLQMKLENIT
NPWSPRHRVVQPTLEQTQYSWERQAAFRFKRKLPKYLLFTSPQENPWGHKRTYRLQIHSM
ADQVLPPGWQEEQAITWARTEGGQPRALSQAASPVPGRYPLAVTKYRESELCSSSIYHQN
DPWHPPVVFEQFLHNNENIENEDLVAWVTVGFLHIPHSEDIPNTATPGNSVGFLLRPFNF
FPEDPSLASRDTVIVWPRDNGPNYVQRWIPEDRDCSMPPPFSYNGTYRPV
3 Felgueiras, Juliana, Joana Vieira Silva, and Margarida Fardilha. "Adding biological meaning to human protein-
protein interactions identified by yeast two-hybrid screenings: A guide through bioinformatics tools." Journal of
proteomics 171 (2018): 127-140.
Document Page
3BIOINFORMATICS
The proteins which are associated with this protein are namely Amine oxidase (Fragment)
(C9J0G8_HUMAN), Amine oxidase (Fragment) (C9J2J4_HUMAN), Amine oxidase
(A0A2J8RK70_PONAB), Amine oxidase (A0A0D9R3E0_CHLSB), Amine oxidase
(F6TU25_MACMU) and others. The potential isoforms of this protein are the Amine oxidase
(C9J2J4_HUMAN) and Amine oxidase (C9J0G8_HUMAN)4.
The alignment of these potential isoforms with the given protein is as follows5:
CLUSTAL O(1.2.4) multiple sequence alignment
SP|P19801|AOC1_HUMAN
MPALGWAVAAILMLQTAMAEPSPGTLPRKAGVFSDLSNQELKAVHSFLWSKKELRLQPSS 60
TR|C9J0G8|C9J0G8_HUMAN
MPALGWAVAAILMLQTAMAEPSPGTLPRKAGVFSDLSNQELKAVHSFLWSKKELRLQPSS 60
TR|C9J2J4|C9J2J4_HUMAN
MPALGWAVAAILMLQTAMAEPSPGTLPRKAGVFSDLSNQELKAVHSFLWSKKELRLQPSS 60
************************************************************
SP|P19801|AOC1_HUMAN
TTTMAKNTVFLIEMLLPKKYHVLRFLDKGERHPVREARAVIFFGDQEHPNVTEFAVGPLP 120
TR|C9J0G8|C9J0G8_HUMAN
TTTMAKNTVFLIEMLLPKKYHVLRFLDKGERHPVREARAVIFFGDQEHPNVTEFAVGPLP 120
TR|C9J2J4|C9J2J4_HUMAN
TTTMAKNTVFLIEMLLPKKYHVLRFLDKGERHPVREARAVIFFGDQEHPNVTEFAVGPLP 120
************************************************************
SP|P19801|AOC1_HUMAN
GPCYMRALSPRPGYQSSWASRPISTAEYALLYHTLQEATKPLHQFFLNTTGFSFQDCHDR 180
TR|C9J0G8|C9J0G8_HUMAN
GPCYMRALSPRPGYQSSWASRPISTAEYALLYHTLQEATKPLHQFFLNTTGFSFQDCHDR 180
TR|C9J2J4|C9J2J4_HUMAN
GPCYMRALSPRPGYQSSWASRPISTAEYALLYHTLQEATKPLHQFFLNTTGFSFQDCHDR 180
************************************************************
SP|P19801|AOC1_HUMAN
CLAFTDVAPRGVASGQRRSWLIIQRYVEGYFLHPTGLELLVDHGSTDAGHWAVEQVWYNG 240
TR|C9J0G8|C9J0G8_HUMAN
CLAFTDVAPRGVASGQRRSWLIIQRYVEGYFLHPTGLELLVDHGSTDAGHWAVEQVWYNG 240
TR|C9J2J4|C9J2J4_HUMAN
CLAFTDVAPRGVASGQRRSWLIIQRYVEGYFLHPTGLELLVDHGSTDAGHWAVEQVWYNG 240
4 AOC1 (Human) | Gene Target - Pubchem (2018) Pubchem.ncbi.nlm.nih.gov
<https://pubchem.ncbi.nlm.nih.gov/target/gene/AOC1/human#section=PDB-Structures>.
5 CALIPHO Bioinformatics (2018) Nextprot.org <https://www.nextprot.org/entry/NX_P19801/sequence>.

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
4BIOINFORMATICS
************************************************************
SP|P19801|AOC1_HUMAN
KFYGSPEELARKYADGEVDVVVLEDPLPGGKGHDSTEEPPLFSSHKPRGDFPSPIHVSGP 300
TR|C9J0G8|C9J0G8_HUMAN
KFYGSPEEL--------------------------------------------------- 249
TR|C9J2J4|C9J2J4_HUMAN
KFYGSPEELARKYADGEVDVVVLEDPLPGGKGHDSTEEPPLFSSHKPRGDFPSPIHVSGP 300
*********
SP|P19801|AOC1_HUMAN
RLVQPHGPRFRLEGNAVLYGGWSFAFRLRSSSGLQVLNVHFGGERIAYEVSVQEAVALYG 360
TR|C9J0G8|C9J0G8_HUMAN
------------------------------------------------------------
TR|C9J2J4|C9J2J4_HUMAN
RLVQPHGPRFRLEGNAVLYGGWSFAFRLRSSSGLQVL----------------------- 337
SP|P19801|AOC1_HUMAN
GHTPAGMQTKYLDVGWGLGSVTHELAPGIDCPETATFLDTFHYYDADDPVHYPRALCLFE 420
TR|C9J0G8|C9J0G8_HUMAN
------------------------------------------------------------
TR|C9J2J4|C9J2J4_HUMAN
------------------------------------------------------------
SP|P19801|AOC1_HUMAN
MPTGVPLRRHFNSNFKGGFNFYAGLKGQVLVLRTTSTVYNYDYIWDFIFYPNGVMEAKMH 480
TR|C9J0G8|C9J0G8_HUMAN
------------------------------------------------------------
TR|C9J2J4|C9J2J4_HUMAN
------------------------------------------------------------
SP|P19801|AOC1_HUMAN
ATGYVHATFYTPEGLRHGTRLHTHLIGNIHTHLVHYRVDLDVAGTKNSFQTLQMKLENIT 540
TR|C9J0G8|C9J0G8_HUMAN
------------------------------------------------------------
TR|C9J2J4|C9J2J4_HUMAN
------------------------------------------------------------
SP|P19801|AOC1_HUMAN
NPWSPRHRVVQPTLEQTQYSWERQAAFRFKRKLPKYLLFTSPQENPWGHKRTYRLQIHSM 600
TR|C9J0G8|C9J0G8_HUMAN
------------------------------------------------------------
TR|C9J2J4|C9J2J4_HUMAN
------------------------------------------------------------
SP|P19801|AOC1_HUMAN
ADQVLPPGWQEEQAITWARYPLAVTKYRESELCSSSIYHQNDPWHPPVVFEQFLHNNENI 660
TR|C9J0G8|C9J0G8_HUMAN
------------------------------------------------------------
TR|C9J2J4|C9J2J4_HUMAN
------------------------------------------------------------
Document Page
5BIOINFORMATICS
SP|P19801|AOC1_HUMAN
ENEDLVAWVTVGFLHIPHSEDIPNTATPGNSVGFLLRPFNFFPEDPSLASRDTVIVWPRD 720
TR|C9J0G8|C9J0G8_HUMAN
------------------------------------------------------------
TR|C9J2J4|C9J2J4_HUMAN
------------------------------------------------------------
SP|P19801|AOC1_HUMAN NGPNYVQRWIPEDRDCSMPPPFSYNGTYRPV 751
TR|C9J0G8|C9J0G8_HUMAN -------------------------------
TR|C9J2J4|C9J2J4_HUMAN -------------------------------
The phylogenetic tree for the protein P19801 is as follows6:
6 "Tree Explorer | Phylomedb V4". 2019. Phylomedb.Org. http://phylomedb.org/?
q=search_tree&seqid=P19801.
Document Page
6BIOINFORMATICS
In this tree, ‘Blue Square’ signifies the orthologs and ‘Red Square’ signifies the paralogs for
the targeted gene (P19801) for this article. Genes present in different species which are
evolved from the same ancestral gene by the process of speciation is known as orthologs. In
this phylogenetic tree, ‘Blue Square’ before the lines denotes speciation event. Hence, all
protein having ‘Blue Square’ is orthologs to the protein ABP1/AOC1. For example, H2R2M5
protein from the species Pan troglodytes (Common chimpanzee) and F6TU01 protein from
Macaca mulatta (Rhesus macaque) are orthologs to the protein ABP1/AOC1. Genes which
are present within a genome by the process of duplication is known as paralogs. ‘Red Square’
in this particular phylogenetic tree represents the duplication event. So the protein having
‘Red Square’ before them is paralogs to the protein ABP1/AOC1. For example, F7E3V7 and
F7BY09 proteins from the species Ornithorhynchus anatinus (Platypus) are paralogs to the
protein ABP1/AOC1.
Using BLAST the following alignment has been found7:
7 Uniref - Cluster: Isoform 2 Of Amiloride-Sensitive Amine Oxidase [Copper-Containing] (90%) (2018)
Uniprot.org <https://www.uniprot.org/uniref/UniRef90_P19801-2>.

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
7BIOINFORMATICS
Function of the new protein
This protein that is developed is an Amiloride-sensitive amine oxidase containing
copper. This protein helps in catalysing the degradation compounds which include the
elements like putrescine, histamine, spermine, and spermidine. These substances include the
allergic as well as the immune responses in addition to cell proliferation, tumor formation,
tissue differentiation and other like apoptosis. The short name of this protein is DAO or
Diamine oxidase. Some of the alternative names of this protein are Amiloride-binding protein
1, Amine oxidase copper domain-containing protein 1, Histaminase and Kidney amine
Document Page
8BIOINFORMATICS
oxidase. The source of this protein is Homo sapiens (Human). The taxonomic lineage of this
protein is as follow: Eukaryota › Metazoa › Chordata › Craniata › Vertebrata › Euteleostomi ›
Mammalia › Eutheria › Euarchontoglires › Primates › Haplorrhini › Catarrhini › Hominidae ›
Homo8.
The above figure shows the 3D structure of the new protein9.
8 Transcript: AOC1-204 (ENST00000467291.5) - Summary - Homo Sapiens - Ensembl Genome Browser 94
(2018) Asia.ensembl.org <https://asia.ensembl.org/Homo_sapiens/Transcript/Summary?
db=core;g=ENSG00000002726;r=7:150826393-150861504;t=ENST00000467291>.
9 3HII: Crystal Structure Of Human Diamine Oxidase In Complex With The Inhibitor Pentamidine (2018)
Ncbi.nlm.nih.gov <https://www.ncbi.nlm.nih.gov/Structure/pdb/3HII>.
Document Page
9BIOINFORMATICS
The figure above shows the BLAST tree view of the new protein.
The above picture shows the dimeric molecular graphic of the protein AOC1.

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
10BIOINFORMATICS
The above table shows the molecular component of the protein AOC1.
Humans generally have three functioning genes which are responsible for the
encoding of copper-containing amine oxidases. The product of the AOC1 gene is known as
the diamine oxidase (hDAO), which has its name because of its preferences of substrates for
the diamines and particularly for the histamine. The hDAO has been cloned and has been
expressed mainly in the insect cells where the structure of the native enzyme is determined by
the X-ray crystallography toÃ… of 1.8 A. The homodimeric structure consists of the archetypal
amine oxidase fold. There are two active sites present, where one is present in each subunit.
These are characterised by the presence of the copper ion along with the presence of a copper
ion and a topaquinone residue formed by the post-translational modification of a
tyrosine. The hDAO shares a 37.9% sequence identity with the presence of the human copper
amine oxidase, semicarbazide sensitive amine oxidase or vascular adhesion protein-1. There
is a presence of the substrate binding pocket along with the entry channel where they are
distinctly different in accordance with the different substrate specificities. There are two
inhibitor complexes present of hDAO including the berenil and pentamidine, have been
refined to resolutions of 2.1 and 2.2 A, respectively. These bind to the active site channel in a
non-covalent manner. The inhibitor bindings that are present suggest that that an aspartic acid
residue, conserved in all diamine oxidases but absent from other amine oxidases, is
Document Page
11BIOINFORMATICS
responsible for the diamine specificity by interacting with the second amino group of
preferred diamine substrates10.
The orthologous proteins of this new protein show the function which are as follows:
catalysing the degradation of the compounds like utrescine, histamine, spermine, and
spermidine, substances that are involved in the allergic actions and the immune responses
including cell proliferation, tissue differentiation, tumor formation, and possibly apoptosis.
Placental DAO is thought to play a role in the regulation of the female reproductive
function11.
The different structures of the isoforms of this protein are as follows:
10 3HII: Crystal Structure Of Human Diamine Oxidase In Complex With The Inhibitor Pentamidine (2018)
Ncbi.nlm.nih.gov <https://www.ncbi.nlm.nih.gov/Structure/pdb/3HII>.
11 AOC1 (Pig) | Gene Target - Pubchem (2018) Pubchem.ncbi.nlm.nih.gov
<https://pubchem.ncbi.nlm.nih.gov/target/gene/100517436>.
Document Page
12BIOINFORMATICS
The change in the 3D structure of the present protein changes the function of the
protein. The above figure shows the changes in the function with the changes in the side
chains of the compound through the X-ray diffraction12.
12 RCSB Bank, RCSB PDB - Protein Feature View - Amiloride-Sensitive Amine Oxidase [Copper-Containing] -
P19801 (AOC1_HUMAN) (2018) Rcsb.org <http://www.rcsb.org/pdb/protein/P19801>.

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
13BIOINFORMATICS
References
1. Arber, Werner. "Molecular mechanisms driving Darwinian evolution." Mathematical
and Computer Modelling 47, no. 7-8 (2008): 666-674.
2. Felgueiras, Juliana, Joana Vieira Silva, and Margarida Fardilha. "Adding biological
meaning to human protein-protein interactions identified by yeast two-hybrid
screenings: A guide through bioinformatics tools." Journal of proteomics 171 (2018):
127-140.
3. Pevsner, Jonathan. Bioinformatics and functional genomics. John Wiley & Sons,
2015.
4. AOC1 (Human) | Gene Target - Pubchem (2018) Pubchem.ncbi.nlm.nih.gov
https://pubchem.ncbi.nlm.nih.gov/target/gene/AOC1/human#section=PDB-Structures
5. Bioinformatics, CALIPHO (2018) Nextprot.org
https://www.nextprot.org/entry/NX_P19801/sequence
6. "Tree Explorer | Phylomedb V4". 2019. Phylomedb.Org. http://phylomedb.org/?
q=search_tree&seqid=P19801.
7. Uniref - Cluster: Isoform 2 Of Amiloride-Sensitive Amine Oxidase [Copper-
Containing] (90%) (2018) Uniprot.org
https://www.uniprot.org/uniref/UniRef90_P19801-2
8. Transcript: AOC1-204 (ENST00000467291.5) - Summary - Homo Sapiens - Ensembl
Genome Browser 94 (2018) Asia.ensembl.org
https://asia.ensembl.org/Homo_sapiens/Transcript/Summary?
db=core;g=ENSG00000002726;r=7:150826393-150861504;t=ENST00000467291
9. Amiloride-Sensitive Amine Oxidase [Copper-Containing] - Structure - NCBI (2018)
Ncbi.nlm.nih.gov https://www.ncbi.nlm.nih.gov/structure/?term=Amiloride-
sensitive+amine+oxidase+%5Bcopper-containing%5D
Document Page
14BIOINFORMATICS
10. 3HII: Crystal Structure Of Human Diamine Oxidase In Complex With The Inhibitor
Pentamidine (2018) Ncbi.nlm.nih.gov
https://www.ncbi.nlm.nih.gov/Structure/pdb/3HII
11. AOC1 (Pig) | Gene Target - Pubchem (2018) Pubchem.ncbi.nlm.nih.gov
https://pubchem.ncbi.nlm.nih.gov/target/gene/100517436
12. Bank, RCSB, RCSB PDB - Protein Feature View - Amiloride-Sensitive Amine
Oxidase [Copper-Containing] - P19801 (AOC1_HUMAN) (2018) Rcsb.org
http://www.rcsb.org/pdb/protein/P19801
1 out of 15
[object Object]

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]