Bioinformatics Lab: Analyzing DNA and Protein Sequences - Part 2
VerifiedAdded on 2019/09/20
|4
|1271
|407
Practical Assignment
AI Summary
This practical assignment for a Genomics and Bioinformatics Lab focuses on utilizing bioinformatics tools to analyze DNA and protein sequences. Students are tasked with using online resources like NCBI and EBI to perform BLAST searches, identify restriction enzyme sites, translate nucleotide sequences into amino acid sequences, and align sequences. The assignment involves analyzing two unknown sequences, identifying their origin through BLAST, determining suitable restriction enzymes for cloning, translating the sequences into proteins, and comparing the sequences to identify point mutations and amino acid differences. The students are then asked to interpret the significance of these differences in terms of viral evolution, vaccine effectiveness, and the rapid evolution of the flu virus. The goal is to apply bioinformatics techniques to understand the characteristics and evolution of genetic sequences, providing insights into biological processes.

GENOMICS AND BIOINFORMATICS LAB Name ______________________
5 points Lab __________________________
Part 2: Bioinformatics
In this activity you will learn how to use various programs on the internet that allow you to search for and manipulate
DNA and protein sequence data. Many of the programs are available through government run websites such as the
National Center for Biotechnology Information (NCBI) and the European Bioinformatics Institute (EBI).
BLAST
A BLAST search is a way to identify protein or DNA sequences related to the one you are interested in. BLAST
stands for Basic Local Alignment Search Tool. The emphasis of this tool is to find regions of sequence similarity,
which will yield functional and evolutionary clues about the structure and function of your novel sequence. A blast
website can be found at: http://blast.ncbi.nlm.nih.gov/Blast.cgi
At this website you will want to use either “nucleotide blast” or “protein blast”.
How to “BLAST”: Click “nucleotide blast” or “protein blast” depending on type of sequence. Copy
sequence (either DNA or amino acid), and paste it within box on webpage. Under the database pull-down
menu, select “nucleotide collection” for nucleotide or “non-redundant protein sequence” for proteins. Next,
click on “Blast” at bottom of page. Results will appear in several seconds. Scroll down to see matches.
Click on a match to get more details.
Restriction Digest Analysis
There are several programs that search a DNA sequence for known restriction sites – very useful when trying to
clone a gene. A nice complete site is: http://tools.neb.com/NEBcutter2/index.php
How to “digest” a DNA sequence: Copy and paste a nucleotide sequence into the box on the webpage.
Click “submit” on the right side. You will then see a map of your sequence with many different restriction
sites. You can click on the lower right tab “0 cutters” or “1 cutters” to see which enzymes do not digest the
sequence and which ones digest it only once.
Nucleotide Translation
Translating a DNA sequence into an amino acid sequence can be time consuming. Fortunately there are programs
available that can do it in a blink of an eye. This website provides a good program for translating DNA sequence:
http://www.ebi.ac.uk/Tools/emboss/transeq/
How to “translate”: Copy and paste DNA sequence into box on website. Click “Run” at lower right. In a
few seconds the amino acid sequence will appear (using the single letter name for amino acids).
Sequence Alignment
These programs allow you to align two or more nucleotide or protein sequences. One easy to use site is:
http://www.ebi.ac.uk/Tools/emboss/align/
Once here, click on the nucleotide link next to Needle (EMBOSS) if you are aligning nucleotide sequences,
or the protein link if you are aligning protein sequence.
How to “align” two sequences: All you have to do is copy and paste each sequence into the boxes and hit
“Submit”
5 points Lab __________________________
Part 2: Bioinformatics
In this activity you will learn how to use various programs on the internet that allow you to search for and manipulate
DNA and protein sequence data. Many of the programs are available through government run websites such as the
National Center for Biotechnology Information (NCBI) and the European Bioinformatics Institute (EBI).
BLAST
A BLAST search is a way to identify protein or DNA sequences related to the one you are interested in. BLAST
stands for Basic Local Alignment Search Tool. The emphasis of this tool is to find regions of sequence similarity,
which will yield functional and evolutionary clues about the structure and function of your novel sequence. A blast
website can be found at: http://blast.ncbi.nlm.nih.gov/Blast.cgi
At this website you will want to use either “nucleotide blast” or “protein blast”.
How to “BLAST”: Click “nucleotide blast” or “protein blast” depending on type of sequence. Copy
sequence (either DNA or amino acid), and paste it within box on webpage. Under the database pull-down
menu, select “nucleotide collection” for nucleotide or “non-redundant protein sequence” for proteins. Next,
click on “Blast” at bottom of page. Results will appear in several seconds. Scroll down to see matches.
Click on a match to get more details.
Restriction Digest Analysis
There are several programs that search a DNA sequence for known restriction sites – very useful when trying to
clone a gene. A nice complete site is: http://tools.neb.com/NEBcutter2/index.php
How to “digest” a DNA sequence: Copy and paste a nucleotide sequence into the box on the webpage.
Click “submit” on the right side. You will then see a map of your sequence with many different restriction
sites. You can click on the lower right tab “0 cutters” or “1 cutters” to see which enzymes do not digest the
sequence and which ones digest it only once.
Nucleotide Translation
Translating a DNA sequence into an amino acid sequence can be time consuming. Fortunately there are programs
available that can do it in a blink of an eye. This website provides a good program for translating DNA sequence:
http://www.ebi.ac.uk/Tools/emboss/transeq/
How to “translate”: Copy and paste DNA sequence into box on website. Click “Run” at lower right. In a
few seconds the amino acid sequence will appear (using the single letter name for amino acids).
Sequence Alignment
These programs allow you to align two or more nucleotide or protein sequences. One easy to use site is:
http://www.ebi.ac.uk/Tools/emboss/align/
Once here, click on the nucleotide link next to Needle (EMBOSS) if you are aligning nucleotide sequences,
or the protein link if you are aligning protein sequence.
How to “align” two sequences: All you have to do is copy and paste each sequence into the boxes and hit
“Submit”
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Unknown A
1 atgaagacta tcattgcttt gagccacatt ctatgtctgg ttttcgctca aaaacttcct
61 ggtaatgaca acaacatggc aacgctgtgc cttggacacc atgcagtgcc aaacggaacg
121 atagtgaaaa caatcacgaa tgaccaaatt gaagttacta atgctactga gctggttcag
181 agtttctcta caggtgaaat atgcaacagt cctcatcaga tccttgatgg agaaaactgc
241 acactaatag atgctctatt gggagatcct cagtgtgatg gcttccaaaa caataaatgg
301 gacctttttg ttgaacgaag caaagcccac agcaactgtt acccttatga tgtgtcggat
361 tatgcctccc ttaggtcact agttgcctca tccggtacac tggagtttaa caatgaaagc
421 ttcaattgga ctggagtaac tcaagacgga gcaagctctg cttgcaaaag gagatccagc
481 aaaagtttct ttagtagatt gaattggttg actcacttaa acttcaaata cccagcattg
541 gaagtgacta tgccaaacaa tgaacaattt gacaaattgt acatttgggg ggttcaccac
601 ccggctacgg acaaggacca aatctccctg tatgctcaag cagcaggaag aatcatagta
661 tctaccaaaa gaagccaaca agctgtaatt ccgaatatcg ggtctagacc cagagtaagg
721 gatatcccta gcagaataag catctattgg acaatagtaa gaccgggaga catacttttg
781 attaacagca cagggaatct aattgctcct aggggttact tcaaaatacg aagtgggaaa
841 agctcaataa tgagatcaga tgcacccatt ggcaaatgca attctgcatg catcactcca
901 aatggaagca ttcccaatga caaaccattc caaaatgtaa acaggatcac atacggggcc
961 tgtcccagat atgttaagca aaacactctg aaattggcaa caggaatgag aaatatacca
1021 gagaaacaaa ctagaggcat atttggcgca atagctggtt tcatagaaaa tggttgggag
1081 ggaatggtgg atggttggta cggtttcagg catcaaaatt ctgagggaag gggacaagca
1141 gcagatctca aaagcactca agcagcaatc gatcaaatca atgggaagct gaatagattg
1201 atcggaaaaa ccaacgagaa attccatcag atcgaaaaag aattttcaga agtcgaaggg
1261 agaattcagg accttgagaa atatgttgag gacactaaaa tagatctctg gtcatacaac
1321 gcggagcttc ttgttgccct ggagaaccag cacacaattg atctaactga ctcagaaatg
1381 aacaaattgt ttgaaaaaac aaagaagcaa ctgagggaaa atgctgagga tatgggcaat
1441 ggctgtttca aaatatacca caaatgtgac aatgcctgca taggatcaat cagaaacgga
1501 acttatgacc acgatgtgta cagagatgaa gcattaaaca accgattcca gatcaaggga
1561 gttgagctga agtcagagta caaagattgg attctatgga tttcctttgc catatcatgc
1621 tttttgcttt gtgttgcttt gttggggttc atcatgtggg cctgccaaaa aggcaacatt
1681 aagtgcaaca tttgcatttg a
Unknown B
1 atgaagacta tcattgcttt gagctacatt ctatgtctgg ttttcgctca aaaacttcct
61 ggtaatgaca acaacatggc aacgctgtgc cttggacacc atgcagtgcc aaacggaacg
121 atagtgaaaa caatcacgaa tgaccaaatt gaagttacta atgctactga gctggttcag
181 agttcctcta caggtgaaat atgcaacagt cctcatcaga tccttgatgg agaaaactgc
241 acactaatag atgctctatt gggagatcct cagtgtgatg gcttccaaaa caagaagtgg
301 gacctttttg ttgaacgaag caaagcccat agcaactgtt acccttatga tgtgccggat
361 tatgcctccc ttaggtcact agttgcctca tccggtacac tggagtttaa caatgaaagc
421 ttcaattgga ctggagtaac tcaaaacgga gcaagctctg cttgcaaaag gagatccaac
481 aaaagtttct ttagtagatt gaattggttg actcacttaa acttcaaata cccagcattg
541 gaagtgacta tgccaaacaa tgaacaattt gacaaattgt acatttgggg ggttcaccac
601 ccggctacgg acaaggacca aatctccctg tatgctcaag cagcaggaag aatcatagta
661 tctaccaaaa gaagccaaca agctgtaatt ccgaatatcg ggtctagacc cagagtaagg
721 gatatcccta gcagagtaag catctattgg acaatagtaa gaccgggaga catacttttg
781 attaacagca cagggaatct aattgctcct aggggttact tcaaaatacg aagtgggaaa
841 agctcaataa tgagatcaga tgcacccatt ggcaaatgca attctgcatg catcactcca
901 aatggaagca ttcccagtga caaaccattc caaaatgtaa acaggatcac atacggggcc
961 tgtcccagat atgttaagca aaacactctg aaattggcaa caggaatgag aaatatacca
1021 gagaaacaaa ctagaggcat atttggcgca atagctggtt tcatagaaaa tggttgggag
1081 ggaatggtgg atggttggta cggtttcagg catcaaaatt ctgagggaag gggacaagca
1 atgaagacta tcattgcttt gagccacatt ctatgtctgg ttttcgctca aaaacttcct
61 ggtaatgaca acaacatggc aacgctgtgc cttggacacc atgcagtgcc aaacggaacg
121 atagtgaaaa caatcacgaa tgaccaaatt gaagttacta atgctactga gctggttcag
181 agtttctcta caggtgaaat atgcaacagt cctcatcaga tccttgatgg agaaaactgc
241 acactaatag atgctctatt gggagatcct cagtgtgatg gcttccaaaa caataaatgg
301 gacctttttg ttgaacgaag caaagcccac agcaactgtt acccttatga tgtgtcggat
361 tatgcctccc ttaggtcact agttgcctca tccggtacac tggagtttaa caatgaaagc
421 ttcaattgga ctggagtaac tcaagacgga gcaagctctg cttgcaaaag gagatccagc
481 aaaagtttct ttagtagatt gaattggttg actcacttaa acttcaaata cccagcattg
541 gaagtgacta tgccaaacaa tgaacaattt gacaaattgt acatttgggg ggttcaccac
601 ccggctacgg acaaggacca aatctccctg tatgctcaag cagcaggaag aatcatagta
661 tctaccaaaa gaagccaaca agctgtaatt ccgaatatcg ggtctagacc cagagtaagg
721 gatatcccta gcagaataag catctattgg acaatagtaa gaccgggaga catacttttg
781 attaacagca cagggaatct aattgctcct aggggttact tcaaaatacg aagtgggaaa
841 agctcaataa tgagatcaga tgcacccatt ggcaaatgca attctgcatg catcactcca
901 aatggaagca ttcccaatga caaaccattc caaaatgtaa acaggatcac atacggggcc
961 tgtcccagat atgttaagca aaacactctg aaattggcaa caggaatgag aaatatacca
1021 gagaaacaaa ctagaggcat atttggcgca atagctggtt tcatagaaaa tggttgggag
1081 ggaatggtgg atggttggta cggtttcagg catcaaaatt ctgagggaag gggacaagca
1141 gcagatctca aaagcactca agcagcaatc gatcaaatca atgggaagct gaatagattg
1201 atcggaaaaa ccaacgagaa attccatcag atcgaaaaag aattttcaga agtcgaaggg
1261 agaattcagg accttgagaa atatgttgag gacactaaaa tagatctctg gtcatacaac
1321 gcggagcttc ttgttgccct ggagaaccag cacacaattg atctaactga ctcagaaatg
1381 aacaaattgt ttgaaaaaac aaagaagcaa ctgagggaaa atgctgagga tatgggcaat
1441 ggctgtttca aaatatacca caaatgtgac aatgcctgca taggatcaat cagaaacgga
1501 acttatgacc acgatgtgta cagagatgaa gcattaaaca accgattcca gatcaaggga
1561 gttgagctga agtcagagta caaagattgg attctatgga tttcctttgc catatcatgc
1621 tttttgcttt gtgttgcttt gttggggttc atcatgtggg cctgccaaaa aggcaacatt
1681 aagtgcaaca tttgcatttg a
Unknown B
1 atgaagacta tcattgcttt gagctacatt ctatgtctgg ttttcgctca aaaacttcct
61 ggtaatgaca acaacatggc aacgctgtgc cttggacacc atgcagtgcc aaacggaacg
121 atagtgaaaa caatcacgaa tgaccaaatt gaagttacta atgctactga gctggttcag
181 agttcctcta caggtgaaat atgcaacagt cctcatcaga tccttgatgg agaaaactgc
241 acactaatag atgctctatt gggagatcct cagtgtgatg gcttccaaaa caagaagtgg
301 gacctttttg ttgaacgaag caaagcccat agcaactgtt acccttatga tgtgccggat
361 tatgcctccc ttaggtcact agttgcctca tccggtacac tggagtttaa caatgaaagc
421 ttcaattgga ctggagtaac tcaaaacgga gcaagctctg cttgcaaaag gagatccaac
481 aaaagtttct ttagtagatt gaattggttg actcacttaa acttcaaata cccagcattg
541 gaagtgacta tgccaaacaa tgaacaattt gacaaattgt acatttgggg ggttcaccac
601 ccggctacgg acaaggacca aatctccctg tatgctcaag cagcaggaag aatcatagta
661 tctaccaaaa gaagccaaca agctgtaatt ccgaatatcg ggtctagacc cagagtaagg
721 gatatcccta gcagagtaag catctattgg acaatagtaa gaccgggaga catacttttg
781 attaacagca cagggaatct aattgctcct aggggttact tcaaaatacg aagtgggaaa
841 agctcaataa tgagatcaga tgcacccatt ggcaaatgca attctgcatg catcactcca
901 aatggaagca ttcccagtga caaaccattc caaaatgtaa acaggatcac atacggggcc
961 tgtcccagat atgttaagca aaacactctg aaattggcaa caggaatgag aaatatacca
1021 gagaaacaaa ctagaggcat atttggcgca atagctggtt tcatagaaaa tggttgggag
1081 ggaatggtgg atggttggta cggtttcagg catcaaaatt ctgagggaag gggacaagca

1141 gcagatctca aaagcactca agcagcaatc gatcaaatca atgggaagct gaatagattg
1201 atcgggaaaa ccaacgagaa attccatcag attgaaaaag aattctcaga agtcgaaggg
1261 agaattcagg accttgagaa atatgttgag gacactaaaa tagatctctg gtcatacaac
1321 gcggagcttc ttgttgccct ggagaaccaa cacacaattg atctaactga ctcagaaatg
1381 aacaaattgt ttgaaaaaac aaagaagcaa ctgagggaaa atgctgagga tatgggcaat
1441 ggttgcttca aaatatacca caaatgtgac aatgcctgca taggatcaat cagaaacgga
1501 acttatgacc acgatgtgta cagagatgaa gcattaaaca accgattcca gatcaaggga
1561 gttgagctga agtcagagta caaagattgg attctatgga tttcctttgc catatcatgc
1621 tttttgcttt gtgttgcttt gttggggttc atcatgtggg cctgccaaaa aggcaacatt
1681 aagtgcaaca tttgcatttg a
Answer the following questions about your sequences by using the data analysis tools provided above.
1. Run a nucleotide BLAST search on sequence A. Usually the first hit is the actual sequence. What year and
region is this sequence from?
2. Run a nucleotide BLAST search on sequence B. What year and region is this sequence from?
3. If you were to clone these genes you would need to find restriction enzymes that did not cut within the genes.
Which of these restriction enzymes do not cut your sequences?
EcoR I BamH I Pst I Xho I
Sequence A:
Sequence B:
4. Translate both nucleotide sequences into an amino acid (protein) sequence. Copy and paste the protein sequence
into a word document, naming each sequence based on the region and year.
5. Align the two nucleotide sequences. How many point mutations are there between the two nucleotide
sequences?
6. Based on the number of point mutations, how many amino acids would you predict are different between the two
protein sequences?
1201 atcgggaaaa ccaacgagaa attccatcag attgaaaaag aattctcaga agtcgaaggg
1261 agaattcagg accttgagaa atatgttgag gacactaaaa tagatctctg gtcatacaac
1321 gcggagcttc ttgttgccct ggagaaccaa cacacaattg atctaactga ctcagaaatg
1381 aacaaattgt ttgaaaaaac aaagaagcaa ctgagggaaa atgctgagga tatgggcaat
1441 ggttgcttca aaatatacca caaatgtgac aatgcctgca taggatcaat cagaaacgga
1501 acttatgacc acgatgtgta cagagatgaa gcattaaaca accgattcca gatcaaggga
1561 gttgagctga agtcagagta caaagattgg attctatgga tttcctttgc catatcatgc
1621 tttttgcttt gtgttgcttt gttggggttc atcatgtggg cctgccaaaa aggcaacatt
1681 aagtgcaaca tttgcatttg a
Answer the following questions about your sequences by using the data analysis tools provided above.
1. Run a nucleotide BLAST search on sequence A. Usually the first hit is the actual sequence. What year and
region is this sequence from?
2. Run a nucleotide BLAST search on sequence B. What year and region is this sequence from?
3. If you were to clone these genes you would need to find restriction enzymes that did not cut within the genes.
Which of these restriction enzymes do not cut your sequences?
EcoR I BamH I Pst I Xho I
Sequence A:
Sequence B:
4. Translate both nucleotide sequences into an amino acid (protein) sequence. Copy and paste the protein sequence
into a word document, naming each sequence based on the region and year.
5. Align the two nucleotide sequences. How many point mutations are there between the two nucleotide
sequences?
6. Based on the number of point mutations, how many amino acids would you predict are different between the two
protein sequences?
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide

7. Align the two protein sequences. How many different amino acids are there between the two sequences?
8. Is the number you reported in # 7 greater or smaller than the number you predicted in #6? Why do you think that
is?
9. When you align the protein sequences, the readout gives you both identity and similarity, which usually have
different values (for a nucleotide alignment they are always the same). Why do they have different values in
protein alignment?
10. What does the similarities and differences between the two sequences tell you about the evolution of the flu
virus?
11. Why do you think vaccines sometimes lose their effectiveness as the flu season progresses?
12. Why do you think the flu virus evolves so rapidly?
8. Is the number you reported in # 7 greater or smaller than the number you predicted in #6? Why do you think that
is?
9. When you align the protein sequences, the readout gives you both identity and similarity, which usually have
different values (for a nucleotide alignment they are always the same). Why do they have different values in
protein alignment?
10. What does the similarities and differences between the two sequences tell you about the evolution of the flu
virus?
11. Why do you think vaccines sometimes lose their effectiveness as the flu season progresses?
12. Why do you think the flu virus evolves so rapidly?
1 out of 4