LSC 30010 Bioinformatics Report: Analysis of RACK1 mRNA & MALAT1

Verified

Added on 2023/06/15

AI Summary

This bioinformatics report details the analysis of two transcribed sequences: RACK1 mRNA (coding) and MALAT1 (lncRNA), using tools like NCBI BLAST, ENSEMBL, GENECARDS, and LNCRNADB. Both sequences showed 100% sequence identity in NCBI BLAST. RACK1, a regulatory component of the 40S ribosome subunit, is extensively studied and involved in various signaling pathways and protein interactions. MALAT1, a highly abundant lncRNA, is less understood but plays a significant role in cellular processes. The report explores sequence features, protein summaries, genomic origins, transcript features, and interactions with other biomolecules. RACK1's involvement in ribosome quality control, protein kinase C stabilization, and interactions with various proteins in tissue-specific expressions are discussed, contrasting with MALAT1's non-coding RNA characteristics and structural domains.

Running Header; Bioinformatics Analysis Report
UNIVERSITY:
NAME :
STUDENT ID:
COURSE CODE
COURSE NAME
ASSIGNMENT
BIOCHEMISTRY REPORT

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Genome report P a g e | 2
Table of Contents
Abstract.................................................................................................................................................2
Introduction...........................................................................................................................................3
Methods.................................................................................................................................................3
Results...................................................................................................................................................5
Discussion.............................................................................................................................................9
References...........................................................................................................................................13

Genome report P a g e | 3
Abstract
The bioinformatics tools serve as a compendium of data from different sources and literature.
The software work on distinct algorithms to carry out prediction based searches. This report
utilises the information of two transcribed sequences from four different online tools. The
sequence identification was carried out from NCBI BLAST. The sequence features and
protein summary was performed with the assistance of ENSEMBL and GENE CARDS. The
long non-coding RNA information was fetched from LNCRNADB. The two transcribed
sequences were identified as RACK1 mRNA (coding) and MALAT1 (lncRNA). They
showed 100% sequence identity from NCBI blast. Other relevant information was collected
from the rest of the mentioned tools. RACK1, being the regulatory component of 40S subunit
of ribosome, is extensively studied. On the contrary, MALAT1, although a very abundant
lncRNA in most of the tissues, has less information relative to RACK1. Long non-coding
RNAs are relatively less studied than their protein-coding counterparts, as they have been
recently discovered.

Genome report P a g e | 4
Introduction
The human genome project and the ENCODE consortium opened the avenues to
explore the field of genomics, for the better understanding of the function of genes. The big
data generated by next generation sequencing is simplified and analysed using
bioinformatics. The variety of software tools run on different algorithms and are used to
understand the biological data. The bioinformatics is based on collecting the statistics from
the available literature, followed by computational modelling and solving the biological
problem using computational algorithms (Can, 2014). A plethora of novel transcripts, both
coding and non-coding, identified by next-generation sequencing are studied using
bioinformatics tools and based on the hypothetical predictions, they are functionally validated
in the lab.
The present study consists of an extensive research on two given transcribed
sequences using bioinformatics tools. The software’s used in this report are NCBI BLAST,
ENSEMBL, GENE CARDS and LNCRNADB. The two sequences are identified and
characterised as coding and non-coding. Sequence features are explored such as genomic
origin, transcript features such as CDS, UTRs, regulatory regions, translated sequence,
protein domains, interaction with other biomolecules and their relevant role in the cell.
Methods
We took two transcribed sequences as given in the assignment. The sequences were identified
and other features were found using the following methods.
1. Sequence Identification using NCBI BLAST: Basic Local Alignment Search Tool
is used to find similarity between sequences. It is an algorithm essential for
comparing biological sequences of amino acids and nucleotides of DNA. A BLAST
search enables comparison of sequence in the database and indentifies the sequences
which is similar to the sequence in the threshold set. It is used for DNA and proteins.
The given sequence is pasted in the BLAST search tool, BLAST is performed for
DNA and BLAST is for proteins. Upon submitting the query, the tool finds similar
sequences from the database to the submitted query sequence and calculates the
statistics of sequence homology. It provides links to matched sequence, which fetches

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Genome report P a g e | 5
further detailed information about the sequence features, genomic origin and the
related references. The tool can be accessed at https://blast.ncbi.nlm.nih.gov/Blast.cgi
. BLAST is an essential tool in bioinformatics program for searching sequences. The
usage of the heuristic algorithm is faster for calculating optimal alignment. In this
task the sequences were ran through the BLAST, where location of similar sequences
were identified. The process entails a seeding process, where the match is initiated
and the BLAST process initiates alignment in finding similarity. The query sequence
initiated initiates similar sub sequence smaller database. This will generate high
scoring segment pairs which are statistically significant between the query sequence
and existing sequence using heuristic approach.
2. Sequence Features using ENSEMBL: ENSEMBL is used to annotate genes using
multiple sequence alignment, comparative genomics, evolution, homology search,
protein domain information, secondary structure predictions and regulatory function.
There are several features of ENSEMBL such as BLAST, BLAT, BIOMART and
Variant Effect Predictor. The tool can be accessed at https://www.ensembl.org. The
ensemble protein coding used illustrates splice variants which results from the
alignments of DNA Protein. The transcript sequence employed entails transcription
on a single splice variant which can be encoded on non coding. The coding transcript
entails the UnTranslated region (UTR) at the base of the 5’ and 3’ ends and the
coding sequence. Splice variant will be selected and observed in the transcription tab.
3. Gene function using GENECARDS: GENE Cards is one of the largest databases
for annotation of genes. It utilises the information from ~125 sources and genomic,
transcriptomic, proteomic, genetic, clinical and functional information can be availed.
The database can be accessed at http://www.genecards.org/
4. LncRNA annotation using LNCRNADB: LncRNADB is a database with
annotations for eukaryotic lncRNA. The entries in the database are manually curated
and provide information for sequence, structural information, genomic, subcellular
localisation, expression, conservation, functional evidence and related literature. It
uses UCSC web browser for web visualisation and expression from Illumina Body
Atlas data. The database can be accessed at http://www.lncrnadb.org/. The database
reduces replication processes and those of unknown identity and inclusion of
alliances

Genome report P a g e | 6
Results
The two transcribed sequences are mentioned below with their related features.
Sequence 5a
ctctctttcactgcaaggcggcggcaggagaggttgtggtgctagtttctctaagccatccagtgccatcctcgtcgctgcagcgacacacgctctc
gccgccgccatgactgagcagatgacccttcgtggcaccctcaagggccacaacggctgggtaacccagatcgctactaccccgcagttcccgg
acatgatcctctccgcctctcgagataagaccatcatcatgtggaaactgaccagggatgagaccaactatggaattccacagcgtgctctgcgg
ggtcactcccactttgttagtgatgtggttatctcctcagatggccagtttgccctctcaggctcctgggatggaaccctgcgcctctgggatctcac
aacgggcaccaccacgaggcgatttgtgggccataccaaggatgtgctgagtgtggccttctcctctgacaaccggcagattgtctctggatctcg
agataaaaccatcaagctatggaataccctgggtgtgtgcaaatacactgtccaggatgagagccactcagagtgggtgtcttgtgtccgcttctc
gcccaacagcagcaaccctatcatcgtctcctgtggctgggacaagctggtcaaggtatggaacctggctaactgcaagctgaagaccaaccac
attggccacacaggctatctgaacacggtgactgtctctccagatggatccctctgtgcttctggaggcaaggatggccaggccatgttatgggat
ctcaacgaaggcaaacacctttacacgctagatggtggggacatcatcaacgccctgtgcttcagccctaaccgctactggctgtgtgctgccac
aggccccagcatcaagatctgggatttagagggaaagatcattgtagatgaactgaagcaagaagttatcagtaccagcagcaaggcagaacc
accccagtgcacctccctggcctggtctgctgatggccagactctgtttgctggctacacggacaacctggtgcgagtgtggcaggtgaccattgg
cacacgctagaagtttatggcagagctttacaaataaaaaaaaaactggcttttctgacaaaaaaaaaaaaaaaa
Nucleotide blast and sequence features
The transcribed sequence is identified as Receptor for activated Protein C Kinase 1,
RACK1 mRNA (NM_006098.4, coding) in humans with 100% identity (Fig 1). It is a
1125bp long mRNA encoded from chr 5: 181,236,230-181,244,604. RACK1 is also called as
GNB2L1, Gnb2-rs1 and H12.3. The transcript consist of 8 exons, 42 domains and 200
variations. The homologs of RACK1 are found in several other species such as Pan, M.
mulatta, C.lupus, B. taurus, M. musculus, rattus, G. Domesticus, D. rerio, Drosophila,
Culicidae, C.elegans, S.cerevisiae, K.lactis, E.gossypii, S.pombe, M.oryzae, N.crassa,
A.thaliana, O. sativa, and Anura. The gene is mainly involved in TNFR1 signalling.

Genome report P a g e | 7
Fig 1: Screenshot of Nucleotide BLAST of sequence 5a. The hit encircled with red demonstrates the
RACK1 mRNA with 100% identity with the query sequence.
Regulatory: Poly-A signal sequence
CDS: codon_start=1
Protein Summary and features
RACK1 mRNA codes for the protein_id="NP_006089.1" and the following amino acid
sequence:
MTEQMTLRGTLKGHNGWVTQIATTPQFPDMILSASRDKTIIMWKLTRDETNYGIPQR
ALRGHSHFVSDVVISSDGQFALSGSWDGTLRLWDLTTGTTTRRFVGHTKDVLSVAFS
SDNRQIVSGSRDKTIKLWNTLGVCKYTVQDESHSEWVSCVRFSPNSSNPIIVSCGWD
KLVKVWNLANCKLKTNHIGHTGYLNTVTVSPDGSLCASGGKDGQAMLWDLNEGK
HLYTLDGGDIINALCFSPNRYWLCAATGPSIKIWDLEGKIIVDELKQEVISTSSKAEPP
QCTSLAWSADGQTLFAGYTDNLVRVWQVTIGTR
The protein possess an average molecular weight of 35,076.73 g/mol, with 317 residues, a
charge of 5.0 and isoelectric point of 7.6304. The protein consists of different domains and
SNVs which are shown in the fig 2.

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Genome report P a g e | 8
Fig 2: Pictoral representation of the protein domains and single nucleotide variations (adopted from
ENSEMBL)
Homology Search
RACK1 has homologs present in other species. Fig 3 shows the homologenes in other
species.
Fig 3: Screenshot representing homologenes of RACK1 in humans and their identity at
protein and DNA level.

Genome report P a g e | 9
RACK 1 in humans can be expressed by gene encoding. The Guanine nucleotide binding
protein denoted by G, a beta polypeptide protein which has 8 axons and seven intorns.
Studies and literature has shown that the evolution of RACK 1 in human has the ability to
form different subgroupd and thus forming paralogs of RACK 1 in different organisms,
hovwee studies have shown that it is constant in various orgnaisms.
MiRNA
miRTarBase miRNAs that target RACK1 are hsa-mir-124-3p, hsa-mir-1180-3p, hsa-mir-
1226-3p, hsa-mir-877-5p, hsa-mir-744-5p, hsa-mir-296-3p, hsa-mir-30c-2-3p, hsa-mir-196b-
5p, hsa-mir-324-3p, hsa-mir-324-5p, hsa-mir-186-5p, hsa-mir-132-3p, hsa-mir-23b-3p, hsa-
mir-183-5p, hsa-mir-29b-3p, hsa-mir-18a-5p.
Pathways and protein interactome
The protein is involved in TNF signalling, MAPK-Erk pathway, regulation of CFTR activity,
Influenza A and Ca, cAMP and Lipid signalling.
Fig 4: The interaction map of RACK1/GNB2L1 with other binding partners
Sequence 5b
gtaaaggactggggccccgcaactggcctctcctgccctcttaagcgcagcgccattttagcaacgcagaagcccggcgccgggaagcctcagc
tcgcctgaaggcaggtcccctctgacgcctccgggagcccaggtttcccagagtccttgggacgcagcgacgagttgtgctgctatcttagctgtc
cttataggctggccattccaggtggtggtatttagataaaaccactcaaactctgcagtttggtcttggggtttggaggaaagcttttatttttcttcct
gctccggttcagaaggtctgaagctcatacctaaccaggcataacacagaatctgcaaaacaaaaacccctaaaaaagcagacccagagcagt
gtaaacacttctgggtgtgtccctgactggctgcccaaggtctctgtgtcttcggagacaaagccattcgcttagttggtctactttaaaaggccac
ttgaactcgctttccatggcgatttgccttgtgagcactttcaggagagcctggaagctgaaaaacggtagaaaaatttccgtgcgggccgtgggg
ggctggcggcaactggggggccgcagatcagagtgggccactggcagccaacggcccccggggctcaggcggggagcagctctgtggtgtgg
gattgaggcgttttccaagagtgggttttcacgtttctaagatttcccaagcagacagcccgtgctgctccgatttctcgaacaaaaaagcaaaac
gtgtggctgtcttgggagcaagtcgcaggactgcaagcagttgggggagaaagtccgccattttgccacttctcaaccgtccctgcaaggctggg
gctcagttgcgtaatggaaagtaaagccctgaactatcacactttaatcttccttcaaaaggtggtaaactatacctactgtccctcaagagaaca
caagaagtgctttaagaggtattttaaaagttccgggggttttgtgaggtgtttgatgacccgtttaaaatatgatttccatgtttcttttgtctaaag
tttgcagctcaaatctttccacacgctagtaatttaagtatttctgcatgtgtagtttgcattcaagttccataagctgttaagaaaaatctagaaaa
gtaaaactagaacctatttttaaccgaagaactactttttgcctccctcacaaaggcggcggaaggtgatcgaattccggtgatgcgagttgttctc
cgtctataaatacgcctcgcccgagctgtgcggtaggcattgaggcagccagcgcaggggcttctgctgagggggcaggcggagcttgaggaaa
ccgcagataagtttttttctctttgaaagatagagattaatacaactacttaaaaaatatagtcaataggttactaagatattgcttagcgttaagt

Genome report P a g e | 10
Nucleotide Blast and sequence features
The nucleotide sequence maps back to multiple sequences in humans with 100%
identity. However, in our present study, we took the first hit, i.e. MALAT1 long non-coding
RNA (Fig 5). Since the sequence 5a was a coding sequence and we explored the features of a
coding RNA, here we would like to describe the features of a non-coding RNA. The
sequence is part of a gene, which stands for Metastasis Associated Lung Adenocarcinoma
Transcript 1 (MALAT1), transcript variant 1, long non-coding RNA. It is an 8779 bp long
transcript with the genomic loci chr11:65,265,233-65,273,940 and is poly-adenylated. It has a
poly-A signal sequence. The gene has 3445 single nucleotide variations and 5 structural
variations.
Fig 5: Screenshot of nucleotide BLAST of sequence 5b
Discussion
The current report includes detailed analysis for two transcribed sequences. The first
sequence, RACK1 is a coding transcript, which translates to Receptor of Activated Protein C
Kinase 1 (Ikebuchi et al. 2009). It has homologous gene in other species. On the contrary,
the second sequence, MALAT 1 is a long non-coding RNA and does not have any
homologues in other species. Long non-coding RNAs are transcripts with more than 200
nucleotides and no protein-coding potential. They do not possess sequence homology across
species; however, they have conserved structural domains which conserve their function.
The RACK1 protein is a part of 40S subunit of ribosome and possesses a charge of
5.0. This enables the interaction of RACK1 with multiple other proteins involved in different

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Genome report P a g e | 11
pathways. It also binds to several miRNAs with respect to its mRNA sequence. Its inter-
relationship with several proteins is regulated, thereby causing tissue-specific expression and
any dys-regulation causes diseases. RACK 1 being a member of the tryptophan-aspartate
proteins family. It adopts the seven bladed structures which promotes protein binding. Its role
is significant in proteins shuttling around the cells, thus acts as a key mediator in various
pathways and contributes to various aspects of cellular functionality. Thus with this study ,
RACK 1 is important in various physiological process like cell development, central nervous
function and play key role in disease prognosis.
Role
RACK1 protein is a constituent of 40S ribosome and has a regulatory function in
ribosome quality control (RQC), where it promotes the ubiquitination of a part of 40S subunit
when the ribosome stops during translation (Sundaramoorthy et al. 2017). It binds and
stabilizes protein kinase C and mediates phosphorylation of EIF6. The binding properties are
activated through protein kinase. Rack 1 originates from the WD repeat which has proteins
and 7 WD40 repeats. It is assembled as typical seven figure beta structure which gives it a
part form for protein binding. RACK 1 is essential in mammalian cells.
RACK1 binds and activates RHOA, thereby promoting the migration of breast
carcinoma cells (Cao et al. 2011). The Receptor for Protein C kinase 1 inhibits phagocytosis
and bacterial survival upon infection of host cells by binding to Y.pseudotuberculosis yopK.
RACK1 stimulates PKCs to facilitate the phosphorylation of HIV-1 Nef. Interacts
with CPNE3 (Heinrich
et al., 2010) and with ABCB4 (Ikebuchi et al. 2009). It is also known to interact with
LARP4 (Yang et al. 2011). In carnicoma prognosis, it acts in angiogenesis, growth of tumour
and cell migration functions. It further regulates biological functions though control of
protein and complex assembly.
Tissue specific expression
RACK1 displays intra-organ variation with higher expression in activated hepatic
stellate cells than in hepatocytes or Kupffer cells. It has elevated levels in hepatocellular
carcinomas and in the nearby healthy liver tissue. It also has expression in Kidney, Liver,
Lung, Skin, Nervous system, Blood, Eye, Intestine, Muscle, Pancreas, Bone, Stomach, Heart,
Lymph node, Spleen, Bone marrow, Gall bladder, Thyroid gland and Adrenal gland.
Ribosomal binding associated with RACK 1 can have an effect on the
phosphyrylation of Elf6 and occurs on the 60S sub unit which yields an interaction in the

Genome report P a g e | 12
eLF6 via RACK 1. RACK 1 thus can expresses gene expression, (Gerbasi et al. 2004).
RACK 1 interactions with microRNA induces gene silencing complex. It is also relevant in
translation arrest and binding of dependent translation of polypeptide arrest, (Kuroha et al.,
2010). RACK 1 action on the ribosome can effect gene expression through the recruitment of
proteins which regulate the mRNA translation in various ways. A case example is when the
recruitment of mRNA binding properties can have an impact its translation. RACK 1 further
can have an impact on role and location of translation like in neurons, (Angestein et al. 2002).
Diseases
Since the evolution and discovery of RACK 1, it has been linked to various aspects of
diseases. Its alterations have seen different homeostasis and associated with various diseases
such as lung cancer (Zhou et al. 2017), gastric cancer (Cheng et al. 2017), esophageal cancer
(Liu et al. 2017), brain disorders (Li et al. 2017), breast cancer (Yang et al. 2016), gastric
tumors (Cheng et al. 2016) and osteoarthiritis (Huang et al. 2017).
Altered RACK 1 have different consequences and properties. The linked association
of RACK 1, Acetylcholinesterase and PKC Βii have an effect on the on stress induced
behaviours, (Birikh et al. 2003). The expression of RACK 1 is observed to be linked with
frontal cortex pateints having bipolar diseases, (Wang & Friedman, 2001). Studies have
linked role of RACK 1 with immune function system. The regulationn of RACK 1 on T cell
apoptosis have critical role in interacting in GTP, (Nakshima et al. 2008).
MALAT1 acts as a regulatory molecule by folding into a conformation with structural
domains. The domains interact with different biological partners in the cellular milieu and
causes tissue specific expression and several diseases. MALATI expression in transcripts and
the early stages cell lung cancers have the ability to have propensity properties for metastasis.
The expression of the uncharacterised forms of tumour cancers which later form metastasis
which indicates prognostic value of expression o MALATI in metastasis, (Cheng et al. 2018).
MALATI has shown broad expression in terms of normal human tissues and has been
expressed in various human cancers. The expression has effects on disease development
process. The regulatory functionality of the MALATI MRNA system have shown to be
important in various functions and products which vary from other systems thus leading to
possible associations of relevance in pathogenic synthesis. Due to the conservation of mRNA
transcript in human cells and tissues, the expression of MALATI has been observed with its