Bioinformatics Assignment: NCBI Accession Number and BLAST Analysis
VerifiedAdded on 2023/01/16
|16
|1368
|94
Homework Assignment
AI Summary
This bioinformatics assignment comprises three exercises focusing on EST and gene identification using various bioinformatics tools. Exercise 1 involves identifying an unknown EST (AI033864) through BLASTN and BLASTX, revealing its origin as a Homo sapiens mRNA. Exercise 2 focuses on another unknown EST (EV854885) from Agkistrodon piscivorus leucostoma, requiring annotation interpretation and BLAST analysis to determine its probable identity within the serine peptidase family. Exercise 3 utilizes TBLASTN to identify exons of a human gene (hemoglobin subunit beta) and its paralogs, involving NCBI and CCDS screenshots, TBLASTN searches, and genomic coordinate determination for each exon. The assignment showcases practical application of bioinformatics methods for sequence analysis and gene characterization, including the use of BLAST, TBLASTN, and database resources like NCBI and CCDS.

Exercise #1 no more than 1 page
Identify this unknown EST, NCBI accession number AI033864.
1. Using BLASTN, can you identify it?
2. Using BLASTX, can you identify it?
Answer:
It is a 414 bp mRNA linear EST deposited on 08-JAN-2011 defined as as
“ow10e03.x1 Soares_parathyroid_tumor_NbHPA Homo sapiens cDNA clone”
LOCUS AI033864 414 bp mRNA linear EST 08-JAN-2011
DEFINITION ow10e03.x1 Soares_parathyroid_tumor_NbHPA Homo sapiens cDNA
clone
IMAGE:1646428 3', mRNA sequence.
ACCESSION AI033864
VERSION AI033864.1
DBLINK BioSample: SAMN00155028
KEYWORDS EST.
SOURCE Homo sapiens (human)
ORGANISM Homo sapiens
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
Catarrhini; Hominidae; Homo.
REFERENCE 1 (bases 1 to 414)
CONSRTM NCI-CGAP http://www.ncbi.nlm.nih.gov/ncicgap
TITLE National Cancer Institute, Cancer Genome Anatomy Project (CGAP),
Tumor Gene Index
JOURNAL Unpublished
COMMENT Contact: Robert Strausberg, Ph.D.
Email: cgapbs-r@mail.nih.gov
cDNA Library Preparation: M. Bento Soares, Ph.D., M. Fatima
Bonaldo, Ph.D.
cDNA Library Arrayed by: Greg Lennon, Ph.D.
DNA Sequencing by: Washington University Genome Sequencing Center
Clone distribution: NCI-CGAP clone distribution information can be
found through the I.M.A.G.E. Consortium/LLNL at:
www-bio.llnl.gov/bbrp/image/image.html
Insert Length: 939 Std Error: 0.00
Seq primer: -40m13 fwd. ET from Amersham
High quality sequence stop: 384.
FEATURES Location/Qualifiers
source 1..414
/organism="Homo sapiens"
/mol_type="mRNA"
/db_xref="taxon:9606"
Identify this unknown EST, NCBI accession number AI033864.
1. Using BLASTN, can you identify it?
2. Using BLASTX, can you identify it?
Answer:
It is a 414 bp mRNA linear EST deposited on 08-JAN-2011 defined as as
“ow10e03.x1 Soares_parathyroid_tumor_NbHPA Homo sapiens cDNA clone”
LOCUS AI033864 414 bp mRNA linear EST 08-JAN-2011
DEFINITION ow10e03.x1 Soares_parathyroid_tumor_NbHPA Homo sapiens cDNA
clone
IMAGE:1646428 3', mRNA sequence.
ACCESSION AI033864
VERSION AI033864.1
DBLINK BioSample: SAMN00155028
KEYWORDS EST.
SOURCE Homo sapiens (human)
ORGANISM Homo sapiens
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
Catarrhini; Hominidae; Homo.
REFERENCE 1 (bases 1 to 414)
CONSRTM NCI-CGAP http://www.ncbi.nlm.nih.gov/ncicgap
TITLE National Cancer Institute, Cancer Genome Anatomy Project (CGAP),
Tumor Gene Index
JOURNAL Unpublished
COMMENT Contact: Robert Strausberg, Ph.D.
Email: cgapbs-r@mail.nih.gov
cDNA Library Preparation: M. Bento Soares, Ph.D., M. Fatima
Bonaldo, Ph.D.
cDNA Library Arrayed by: Greg Lennon, Ph.D.
DNA Sequencing by: Washington University Genome Sequencing Center
Clone distribution: NCI-CGAP clone distribution information can be
found through the I.M.A.G.E. Consortium/LLNL at:
www-bio.llnl.gov/bbrp/image/image.html
Insert Length: 939 Std Error: 0.00
Seq primer: -40m13 fwd. ET from Amersham
High quality sequence stop: 384.
FEATURES Location/Qualifiers
source 1..414
/organism="Homo sapiens"
/mol_type="mRNA"
/db_xref="taxon:9606"
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

/clone="IMAGE:1646428"
/tissue_type="parathyroid tumor"
/clone_lib="SAMN00155028 Soares_parathyroid_tumor_NbHPA"
/dev_stage="adult"
/lab_host="DH10B (ampicillin resistant)"
/note="Organ: parathyroid gland; Vector: pT7T3D
(Pharmacia) with a modified polylinker; Site_1: Not I;
Site_2: Eco RI; 1st strand cDNA was primed with a Not I -
oligo(dT) primer
[5'-
TGTTACCAATCTGAAGTGGGAGCGGCCGCACCAATTTTTTTTTTTTTTTTTTTTTTTT
T-3'], double-stranded cDNA was size selected, ligated to
Eco RI adapters (Pharmacia), digested with Not I and
cloned into the Not I and Eco RI sites of a modified pT7T3
vector (Pharmacia). Library went through one round of
normalization to a Cot = 5. Library constructed by Bento
Soares and M.Fatima Bonaldo. RNA from sporadic parathyroid
adenomas was kindly provided by Dr. Stephen Marx, National
Institute of Diabetes and Digestive and Kidney Diseases,
NIH."
Yes, Using BLASTN the EST shows 99 percent similarity to sequence of Homo
sapiens TNF receptor associated factor 5 (TRAF5), transcript variant 3, mRNA
/tissue_type="parathyroid tumor"
/clone_lib="SAMN00155028 Soares_parathyroid_tumor_NbHPA"
/dev_stage="adult"
/lab_host="DH10B (ampicillin resistant)"
/note="Organ: parathyroid gland; Vector: pT7T3D
(Pharmacia) with a modified polylinker; Site_1: Not I;
Site_2: Eco RI; 1st strand cDNA was primed with a Not I -
oligo(dT) primer
[5'-
TGTTACCAATCTGAAGTGGGAGCGGCCGCACCAATTTTTTTTTTTTTTTTTTTTTTTT
T-3'], double-stranded cDNA was size selected, ligated to
Eco RI adapters (Pharmacia), digested with Not I and
cloned into the Not I and Eco RI sites of a modified pT7T3
vector (Pharmacia). Library went through one round of
normalization to a Cot = 5. Library constructed by Bento
Soares and M.Fatima Bonaldo. RNA from sporadic parathyroid
adenomas was kindly provided by Dr. Stephen Marx, National
Institute of Diabetes and Digestive and Kidney Diseases,
NIH."
Yes, Using BLASTN the EST shows 99 percent similarity to sequence of Homo
sapiens TNF receptor associated factor 5 (TRAF5), transcript variant 3, mRNA

However, No significant similarity found using BLASTX. The reason for this
anomaly could not be determined.
anomaly could not be determined.
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide

Exercise #2 no more than 3 pages
Identify this unknown EST, NCBI accession number EV854885.
1) Based on the annotation for this sequence, what can you say about it? What
species of origin? Derived from what tissue? What cDNA library kit was used?
2) Using BLAST, determine the ESTs probable identity. Provide the key
alignments, and a thorough interpretation of the alignments, including rough
sketches.
3)explain the large area where the alignment are poor and if there were many
gaps.
Answer:
The sequence is of-
Jia19E03 Agkistrodon piscivorus leucostoma venom gland cDNA library Agkistrodon
piscivorus leucostoma cDNA 5', mRNA sequence
GenBank: EV854885.1
Source organism: Agkistrodon piscivorus leucostoma (Western cottonmouth)
FEATURES Location/Qualifiers
source 1..1047
/organism="Agkistrodon piscivorus leucostoma"
/mol_type="mRNA"
/sub_species="leucostoma"
/db_xref="taxon:459671"
/sex="male"
/tissue_type="venom gland"
/clone_lib="SAMN00153339 Agkistrodon piscivorus leucostoma venom
gland cDNA library"
/dev_stage="adult"
/note="Clontech (cat. #634903) Creator SMART cDNA Library
Construction Kit"
Depending upon the hits from BLAST it indicates that the EST probably belongs to the
serine peptidase family.
Identify this unknown EST, NCBI accession number EV854885.
1) Based on the annotation for this sequence, what can you say about it? What
species of origin? Derived from what tissue? What cDNA library kit was used?
2) Using BLAST, determine the ESTs probable identity. Provide the key
alignments, and a thorough interpretation of the alignments, including rough
sketches.
3)explain the large area where the alignment are poor and if there were many
gaps.
Answer:
The sequence is of-
Jia19E03 Agkistrodon piscivorus leucostoma venom gland cDNA library Agkistrodon
piscivorus leucostoma cDNA 5', mRNA sequence
GenBank: EV854885.1
Source organism: Agkistrodon piscivorus leucostoma (Western cottonmouth)
FEATURES Location/Qualifiers
source 1..1047
/organism="Agkistrodon piscivorus leucostoma"
/mol_type="mRNA"
/sub_species="leucostoma"
/db_xref="taxon:459671"
/sex="male"
/tissue_type="venom gland"
/clone_lib="SAMN00153339 Agkistrodon piscivorus leucostoma venom
gland cDNA library"
/dev_stage="adult"
/note="Clontech (cat. #634903) Creator SMART cDNA Library
Construction Kit"
Depending upon the hits from BLAST it indicates that the EST probably belongs to the
serine peptidase family.
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser


The region with small gaps are the conserved sequence while the region with
large gaps is apparently unique with dissimilar domains.
large gaps is apparently unique with dissimilar domains.
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide

Exercise #3 no more than 1.5 pages
In this exercise, you will use TBLASTN to identify the exons of a human gene with
a human protein. TBLASTN will also identify the exons of paralogs to the protein,
so you will have to decide which hits correspond to your gene of interest.
1. Retrieve the RefSeq record for . Provide a screen shot of the NCBI Gene record
showing how many exons you expect. Explain the details of this screenshot.
2. Provide a CCDS screenshot of how the amino acids of this protein are
distributed to exons. Explain the details of this screenshot.
3. Perform a TBLASTN search of the human RefSeq Genome Database.
4. Provide the key alignments of the expected exons, providing them in order
from n-terminus to c-terminus. Be sure to include the hit accession number and
description line, length. Explain why you chose this hit from the list. Describe
what you see for each exon hit and provide support for your explanation using
other data gathered for this problem.
5. What are the genomic nucleotide coordinates for each exon? Explain how you
determined this.
Answer:
1. hemoglobin subunit beta [Homo sapiens]
NCBI Reference Sequence: NP_000509.1
Exon count-3
In this exercise, you will use TBLASTN to identify the exons of a human gene with
a human protein. TBLASTN will also identify the exons of paralogs to the protein,
so you will have to decide which hits correspond to your gene of interest.
1. Retrieve the RefSeq record for . Provide a screen shot of the NCBI Gene record
showing how many exons you expect. Explain the details of this screenshot.
2. Provide a CCDS screenshot of how the amino acids of this protein are
distributed to exons. Explain the details of this screenshot.
3. Perform a TBLASTN search of the human RefSeq Genome Database.
4. Provide the key alignments of the expected exons, providing them in order
from n-terminus to c-terminus. Be sure to include the hit accession number and
description line, length. Explain why you chose this hit from the list. Describe
what you see for each exon hit and provide support for your explanation using
other data gathered for this problem.
5. What are the genomic nucleotide coordinates for each exon? Explain how you
determined this.
Answer:
1. hemoglobin subunit beta [Homo sapiens]
NCBI Reference Sequence: NP_000509.1
Exon count-3
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser


⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide

The screenshot of NCBI provides information about the gene that can be categorized into the
following 14 sections:
1. Summary
2. Genomic context
3. Genomic regions, transcripts, and products
4. Expression
5. Bibliography
6. Phenotypes
7. Variation
8. HIV-1 interactions
9. Pathways from BioSystems
10. Interactions
11. General gene information
following 14 sections:
1. Summary
2. Genomic context
3. Genomic regions, transcripts, and products
4. Expression
5. Bibliography
6. Phenotypes
7. Variation
8. HIV-1 interactions
9. Pathways from BioSystems
10. Interactions
11. General gene information
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

12. General protein information
13. NCBI Reference Sequences (RefSeq)
14. Related sequences
Each of these sections can be expanded to get more information of a specific nature, for example to
know about other intractiong partners of the protein, we can expand the tab titled “Interactions”.
2. CCDS
13. NCBI Reference Sequences (RefSeq)
14. Related sequences
Each of these sections can be expanded to get more information of a specific nature, for example to
know about other intractiong partners of the protein, we can expand the tab titled “Interactions”.
2. CCDS

⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide
1 out of 16
Your All-in-One AI-Powered Toolkit for Academic Success.
+13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
Copyright © 2020–2026 A2Z Services. All Rights Reserved. Developed and managed by ZUCOL.