Analysis of DeepGOPlus for Protein Function Prediction: Report

Verified

Added on 2023/01/05

AI Summary

This report analyzes the DeepGOPlus method, a bioinformatics approach aimed at improving protein function prediction from sequence data. The research addresses the limitations of existing methods like DeepGO by developing a new architecture. The study utilizes CAFA3 and a 'test benchmark' dataset to evaluate DeepGOPlus's performance. The analysis highlights the use of sequence similarity and convolutional neural networks (CNNs) as key bioinformatics techniques. The report discusses the controls employed, primarily procedural controls, and suggests the potential use of version control software for enhanced efficiency. Furthermore, it explores alternative bioinformatics techniques, such as model validation prediction methods, to assess the speed and accuracy of DeepGOPlus, especially in applications like metagenomics. The report underscores the importance of protein function prediction in understanding disease mechanisms and facilitating drug discovery.

Article Analysis and
News & Views

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Table of Contents
Table of Contents.............................................................................................................................2
Article - DeepGOPlus: improved protein function prediction from sequence................................1
PART 1............................................................................................................................................1
What is the fundamental research question or bioinformatics problem that this paper is trying
to address.....................................................................................................................................1
What data has been used to answer the research question? In the case of a new tool/technique,
what data is used to validate the tool/technique being presented................................................1
Describe one main bioinformatics technique used in this paper. How does this technique work,
and how does it help the researchers answer their question........................................................2
Describe the controls used in this study. Are these controls appropriate? Are there additional
controls (or control experiments) you would use?......................................................................2
What alternative bioinformatics technique could be used to tackle the question/problem
addressed in this paper? Describe how the suggested technique helps address the research
question/problem.........................................................................................................................3

Article - DeepGOPlus: improved protein function prediction from sequence
PART 1
What is the fundamental research question or bioinformatics problem that this paper is trying to
address
Prediction on protein function considered as one of the essential tasks of bioinformatics, in
which information regarding with range of biological problems are determined, for
understanding the disease mechanisms. It helps in predicting the capabilities of protein or in
what manner they are capable of doing, to understand entire mechanisms of its role in disease
pathobiology and finding targeted drug. To conduct this activity, a number of methods are used
like protein structure, sequence-based function, protein-protein interaction networks and more.
But somehow, process of such methods create limitations and also take a lots of time in
predicting the protein function. It includes DeepGO which is considered as one of deep learning
models used to predict protein functions by using protein amino acid sequence and protein-
protein interaction networks. But this method suffers from many limitations in terms of lack of
computational, sequence length and no. of predicted class. Therefore, present report is made to
improve the limitation of DeepGo by developing a new architecture model as DeepGOPlus. The
main purpose of this study is to evaluate how this method help in improving the prediction of
protein function from sequence based features or other methods, by comparing its dataset with
other state-of-the-art methods.
What data has been used to answer the research question? In the case of a new tool/technique,
what data is used to validate the tool/technique being presented
In order to address the research question and obtain the validate result, two datasets have
been used in this article, that are – CAFA3 published on Sept.16 (challenge training sequence
and experimental sequence) and second one is ‘test benchmark’ which is published on Nov.17.
The CAFA3 (The Critical Assessment of protein Function Annotation algorithms) evaluation
measures are used to determine the performance of DeepGOPlus method, where annotations are
propagated by using hierarchical structure of GO (Gene Ontology). Researchers under this
article, has used the version of GO which is released on June.16, having 10,693 MFO (molecular
function) classes, 29,264 BPO (biological process) classes and 4,034 CCO (cellular component)
classes. This GO version is further used for evaluating CAFA3 predictions.
1

Describe one main bioinformatics technique used in this paper. How does this technique work,
and how does it help the researchers answer their question
Bioinformatics techniques defines as computer technology applications used for
management of the biological information, by gathering, storing, analysing and integrating the
same into computers. This process will then further be applied for gene-based discovery of drugs
and its development. In this regard, to address the main research question i.e. developing a novel
method to predict protein function, sequence similarity as the main bioinformatics technique
have been used. This technique is defined as a method of searching the sequence databases,
which works by aligning the query sequence. Through statistically accessing the way query
sequence and database match, homology can be inferred then information could be transferred to
query sequence. So, using this bioinformatics technique or completing the sequencing of human
genes, enabled the scientists to make drugs and medicines by targeting more than 500 genes.
Proteins having similar sequence are tended to capable of doing similar functions, so, a basic
way of predicting protein functions for new sequences is to determine the most similar sequences
of protein with known functional annotations. This technique helps researchers under the present
article while studying and developing the new model as DeepGOPlus using the CAFA3
evaluation, in getting an insight into types of features this model is used for predicting protein
function. By analysing the convolutional filters of this model, it has been found that sequence
regions which activate the filters are similar to protein families’ seed sequence and domain
within Pfam database. Using sequence similar technique, researches able to associate the
sequence of protein in a test set with half of InterPro annotations, through sequence regions.
Furthermore, In DeepGOPlus model, researchers have combined the sequence motifs and
sequence similarity within a single predictive model. While, for learning sequence motifs which
are predictive of protein functions, one-dimensional CNN (convolutional neural networks) over
protein amino acid sequence has been used.
Describe the controls used in this study. Are these controls appropriate? Are there additional
controls (or control experiments) you would use?
There are various types of controls that are used in research for making sure that details
will be used in the whole research are supported by appropriate sources. It includes procedural
control, variable control, randomisation and more. With the help of such controls the researcher
can minimize the influence of different types of variables on the experiments other than the
2

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

independent variables. In the present article, procedural control methods are used, which are
designed for reducing error in conducting whole experiment. With purpose to produce valid,
more reliable and reproducible results, procedure controls help in taking the measurement of
each test theory twice, so that possible of measurement error can be reduced. But if version
control software will be used, then it could help in managing the computation experiments more
easily and maintaining efficiency of research outcomes in a project.
What alternative bioinformatics technique could be used to tackle the question/problem
addressed in this paper? Describe how the suggested technique helps address the research
question/problem
There are various types of bioinformatics techniques that could be used by researchers for
addressing the main research problems. In this article, the authors have used the sequence similar
technique of bioinformatics to improve the performance of DeepGO model by developing a new
one i.e. DeepGOPlus. By using bioinformatics techniques other than sequence similarity
searches, like multiple sequence alignments, secondary structure prediction, identification and
characterization of domains and more, can be used for predicting the protein structure. But as
purpose of present report is to reduce the limitation of DeepGo Model, therefore, for this
purpose, it will be better for the researchers to use model validation prediction method. This
technique will prove more effective in analysing the speed and accuracy of new model
DeepGOPlus in terms of its efficiency of this tool in predicting the protein function. As this
article has argued that DeepGOPlus is one of the fastest tool that can annotate thousands of
proteins in a minute on a single computer system. So, by using validation prediction method,
performance of its applications in metagenomics and other projects based on identifications of
unknown protein functions, could be justified.
3