Bioinformatics Assignment Sample

56 Pages10564 Words54 Views

Added on 2021-06-18

Bioinformatics Assignment Sample

Added on 2021-06-18

Related Documents

BIOINFORMATICS ASSIGNMENT 1
I
declare that all material in this assessment is my own work except where there is clear
acknowledgement and reference to the work of others. I have read the Academic Honesty and Assessment
Obligations for Coursework Students Policy and Academic Dishonesty Procedures
(http://www.adelaide.edu.au/policies/230/).
I give permission for my assessment work to be reproduced and submitted to other academic staff
for the purposes of assessment and to be copied, submitted and retained in a form suitable for
electronic checking of plagiarism.
Signed....................................................... Date ...................................................
Table of Contents
A. Introductory Bioinformatics........................................................................................................................2
1. Terms commonly used in bioinformatics....................................................................................................2
2. DNA Sequence Translation.........................................................................................................................4
3. Sequence homology.....................................................................................................................................6
4. Learn To Retrieve Gene/mRNA/Protein Information.................................................................................9
B. Genetic Analysis of a Human Cancer Disease Using Databases.................................................................35
School of Biological Sciences
Assessment Cover Sheet
Student Name
Student ID
Assessment Title
Course/Program
Lecturer/Tutor
Date Submitted
OFFICE USE ONLY
Date Received

BIOINFORMATICS ASSIGNMENT 2
5. Analysis of a Human Genetic Disease......................................................................................................35
6. Acquiring Sequence Information.............................................................................................................37
7. Uniprot/Swiss-Prot Database..................................................................................................................42
8. Basic Analysis of Evolutionarily Conserved Sequences........................................................................43
9. Multiple Sequence Alignments across Different Species......................................................................45
10. THREE-DIMENSIONAL VIEWING OF AN IDENTIFIED PROTEIN STRUCTURE:................50
References.........................................................................................................................................................54

BIOINFORMATICS ASSIGNMENT 3
A. Introductory Bioinformatics
1. Terms commonly used in bioinformatics
BLAST
Basic Local Alignment Search Tool (BLAST) finds regions of local similarity among
sequences. It compares nucleotide or protein sequences to sequence database and calculate the
statistical significance of matched results.
GenBank
It is a genetic sequence database. GenBank stores annotated collection of DNA sequences
which is publically available. It is a part of the International nucleotide sequence database
collaboration. It is designed to assess and encourage access within the scientific community to the
updated and comprehensive DNA sequence information.
Ensembl
Ensembl is a genome browser for the vertebrate genomes that used to support research in
comparative genomes, evaluation, sequence variation and transcriptional regulation. Ensembl annotate
gene, it computes multiple alignment, predicts regulatory function and stores data of various diseases.
Ensembl includes tools like BLAST, BLAT, BioMart and a variant effect predictor for all supported
species.
GEO
Gene Expression Omnibus (GEO) is an international public repository which archives and
freely distributes microarray, next generation sequencing and other forms of high throughout functional

BIOINFORMATICS ASSIGNMENT 4
genomic data which is submitted by the research community tools. These tools are provided to help
users query and download experiments and curated gene expression profile.
Pfam
Pfam is the database includes large collections of protein families, represented by multiple
sequence alignments and HMMs (Hidden Markov Model). It is available on World Wide Web and
provides higher level groupings of the related entries, called clans which are the collection of Pfam
entries.
KEGG
Kyoto Encyclopedia of Genes and Genomes (KEGG) is the collection of databases deals with
genomes, biological pathways, disease drugs and the chemical substances. KEGG is a database
resource to understanding the high level functions. It utilities the biological system, such as cell, the
organism and ecosystem from molecular- level information, specifically large scale molecular database
generated by genome sequencing and other high throughout experimental technologies.
OMIM
Online Mendelian Inheritance in Man (OMIM) is the database initiated by Dr. Victor A.
Mckusick in 1960. It is a comprehensive, authoritative compendium of human’s genes and genome
which is freely available and updated routinely.
PDB
Protein Data Bank or PDB is the open access digital data resources in all biology and medicine.
It is in demand for experimental data central to the scientific discovery. PDB provides access to the 3D
structure data for large molecules like protein, DNA and RNA to understand its role in human and
animal health.

BIOINFORMATICS ASSIGNMENT 5
GO
Gene Ontology (GO) is the major bioinformatics initiative to create a computational
representation of peoples evolving knowledge to understand how genes encode function at the
molecular, cellular and tissue system levels. GO resources plays an important role to support
biomedical research, which includes interpretation of the large scale molecular experiments and
computational analysis of biological knowledge.
R
R is the comprehensive statistical environment and a programming language for professional
data analysis and graphical display. Bio-conductor project which associated with R provides various
additional R packages for statistical data analysis in different fields of life science, such as tools for
microarray, next generation sequence and genome analysis. R software is freely available and works in
all common operating systems.
2. DNA Sequence Translation
A.
What is the correct open reading frame?
MADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNM
PNMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKPFTAATLEEKL
NKIFEKLGM
Why are there 6 reading frames instead of 3?
The coding strand refers to a DNA strand with the same base order like the RNA transcript for a
particular gene. Because one gene is always entirely present on a single DNA strand there are indeed 3
reading frames possibly present in this strand, and only 1 RF actually containing the correct codon
sequence for this gene, however when the entire genome is viewed, it is possible for one gene to be

BIOINFORMATICS ASSIGNMENT 6
available on single strand with another gene present on other which means that the coding strand for
single gene is the non-coding strand for the other. That is why there are 6 reading frames are found in
result for the genome as a whole as both strands contain genes.
Save and paste the protein sequence you obtain from the “Compact M- no space” format here?
Answer:-
5'3' Frame 1
MADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNM
PNMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKPFTAATLEEKL
NKIFEKLGM-
5'3' Frame 2
WRIKNLNFWLWMTFPPCDA-CVTC-KSWDSIMLRKRKMASTLSISCRQAVMDLLSPTGTC
PIWMAWNC-KQFVRMARCRHCQC-W-LQKRRKRTSLLRRKRGPVAMW-SHLPPRRWRKNS
TKSLRNWAC
5'3' Frame 3
GG-RT-IFGCG-LFHHATHSA-PAERAGIQ-C-GSGRWRRRSQ-VAGRRLWICYLRLEHA
QYGWPGIAENNSCGWRDVGIASVNGDCRSEEREHHCCGASGGQWLCGEAIYRRDAGGKTQ
QNL-ETGHV
3'5' Frame 1
SHAQFLKDFVEFFLQRRGGKWLHHIATGPRLRRSNDVLFLRFCSHH-HWQCRHRAIRTNC
FQQFQAIHIGHVPVGDNKSITACLQLIESVDAIFRFLNIIESQLFQQVTHYASHGGKVIH
NQKFKFFIRH
3'5' Frame 2
HMPSFSKILLSFSSSVAAVNGFTT-PLAPACAAAMMFSFFASAVTINTGNADIAPSARIV
FSNSRPSILGMFQSEITNP-PPACNLLRASTPSSASSTLLNPSSFSRLRTMRRMVEKSST
TKNLSSLSA
3'5' Frame 3
TCPVSQRFC-VFPPASRR-MASPHSHWPPLAPQQ-CSLSSLLQSPLTLAMPTSRHPHELF
SAIPGHPYWACSSRR-QIHNRLPATY-ERRRHLPLPQHY-IPALSAGYALCVAWWKSHPQ
PKI-VLYPP
B
What is the protein encoded by the ORF you identified and what species is it from?
Answer- Chemotaxis protein

BIOINFORMATICS ASSIGNMENT 7
Species- Escherichia Coli
To what family of protein does it belongs?
Answer- Family- CheY
What is the function of this protein?
Answer- It Involved in transmit the sensory signals from chemoreceptors of the flagellar motors. This
protein is also participated in changing the direction of flagellar rotation.
3. Sequence homology
Define the terms
Homologue: - A chromosome which is similar in physical aspects and genetic information to another
chromosome, where both make pairs during meiosis.
Orthologue: - Sequence that have common ancestor and have divided due to speciation event
Paralogue: - Sequences are the descendants of an ancestral gene that underwent a duplication event.
List the details of the each of these homologous sequences here, including its maximum identity
Answer- Homologous sequence
1 adkelkflvv ddestmrriv rnllkelgfn nveeaedgvd alnklqaggy gfvisdwmmp
61 nmdglellkt iradgamsal pvlmvtalak keniiaaaqa gasgyvvkpf taatleekln
121 kifeklgm
 Accession no. 3F7N_A
 SOURCE ESCHERIA COLI K-12
 AMINO ACID- 128
 MAX SCORE – 246
 MAX IDENT 98%
1 adkelkflvv ddfstmrriv rnllkelgfn nveeaedgvd alnklqaggf gfiicdwnmp
61 nmdglellkt iradsamsal pvlmvtaeak keniiaaaqa gasgyvvkpf taatleekln
121 kifeklgm
 Accession no.- 2CHY_A
 Source- Salmonella Enterica Subsp. Eterica Serovar Typhimurium

BIOINFORMATICS ASSIGNMENT 8
 Amino Acid- 128
 Max score- 251
 MAX Ident - 97 %

Define the terms score and P value and give example to explain?
E value
The Expectation value or E-value is the parameter which describes the number of hits which can
be expected to see by chance whn finding a database of particular size. When the score of the matches
increase e value decreases exponenetially the lower the E value, the more significant the score and
the alignment.
Example:-
Query: CPn0189 Score E-
(Bits) Value
Aligned with CT131 hypothetical protein 1240 0.0
Query: 1 MKRRSWLKILGICLGSSIVLGFLIFLPQLLSTESRKYLVFSLIHKESGLSCSAEELKISW 60
MKR W KI G L + L L LP+ S+ES KYL S+++KE+GL E+L +SW
Sbjct: 1 MKRSPWYKIFGYYLLVGVPLALLALLPKFFSSESGKYLFLSVLNKETGLQFEIEQLHLSW 60
Query: 61 FGRQTARKIKLTG-EAKDEVFSAEKFELDGSLLRLLIYKKPKGITLSGWSLKINEPASID 119
FG QTA+KI++ G ++ E+F+AEK + GSL RLL+Y+ PK +TL+GWSL+I+E S++
Sbjct: 61 FGSQTAKKIRIRGIDSDSEIFAAEKIIVKGSLPRLLLYRFPKALTLTGWSLQIDESLSMN 120
Figure 1 shows the E value in sequence search by using BLAST
Score
A score is a number which is used to assess the biological relevance of a finding
Figure 2 shows the score (258 bits) in a sequence search

End of preview

Want to access all the pages? Upload your documents or become a member.

Bioinformatics: Storing and Searching Genetic Information

|15

|2928

|245

Bioinformatics And Genetics

|1509

|19

Bioinformatics Analysis Report

|16

|4557

|214

Gycine max between different species

|469

|36

Bioinformatics: Variation in Genes and Protein Structure

|26

|3466

|366

Genomics and Bioinformatics Lab

|1271

|407

Bioinformatics Assignment Sample

Bioinformatics Assignment Sample

End of preview

Bioinformatics: Storing and Searching Genetic Informationlg...

Bioinformatics And Geneticslg...

Bioinformatics Analysis Reportlg...

Gycine max between different specieslg...

Bioinformatics: Variation in Genes and Protein Structurelg...

Genomics and Bioinformatics Lablg...

Bioinformatics: Storing and Searching Genetic Information

Bioinformatics And Genetics

Bioinformatics Analysis Report

Gycine max between different species

Bioinformatics: Variation in Genes and Protein Structure

Genomics and Bioinformatics Lab