Proteins: Structure, Amino Acids, and Machine Learning
Verified
Added on 2022/11/16
|10
|2711
|447
AI Summary
This article discusses the structure of proteins, amino acids, and the four levels of protein structure. It also explores the role of machine learning in protein structure prediction and function determination.
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.
Proteins Proteins are polymers of amino acids [1]. In nature, 20 amino acids exist that arrange in different linear combinations through condensation by formation of a peptide bond to give rise to short sequences known as peptides. These peptides through polymerization and formation of covalent bonds ultimately form proteins.In a cell, proteins form structural components, participate in signalling cascades, perform cell recognition and adhesion and also function as biocatalysts in biochemical reactions [2]. Amino Acids Amino acids are monomers of proteins [3]. In nature, so far, 20 amino acids are known to exist that participate in various cellular reactions and in the formation of proteins. All amino acids contain a central carbon atom (C) which has a valency of four [4]. It is linked to an amino group (NH2) on one side and to a carboxyl group (COOH) on the other. The central carbon, also known as the alpha (α) carbon is linked to a hydrogen atom (H) and to a side chain (R). Among the different amino acids, the R group differs while the other three groups remain conserved [5]. The general structure of an amino acid is shown below: Figure1: General structure of an amino acid Since the amino group and the carboxyl group are linked to the central alpha carbon atom, these amino acids are referred to as the α-amino acids. The 20 α -amino acids possess different physical and chemical properties that are imparted to them by the virtue of the R group. The twenty α-amino acids are classified into different groups, such as- polar, non- polar, acidic, basic amino acids, on the basis of the differences in the R group [6]. Some prominent examples from different subs-classes of amino acids are:
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Basic amino Acids Histidine Lysine Arginine The hydrophobic R groups make an amino acid non-polar. Such amino acid residues tend to aggregatetogetherinanaqueousenvironment.WhilepolarRgroupsarepresentin hydrophilicaminoacidsandparticipateinhydrogenbondingandotherhydrophilic interactions. The acidic amino acids possess a carboxylate group that bears a negative charge at physiologicalpH. Such amino acids form active residues of enzymes and participate in biochemical reactions. Amino acids with basic side chains are often involved in enzyme catalysed reactions and hydrogen bonding. Protein Structure: Overview In biological systems, proteins perform a variety of important functions [7]. There have been various reports to establish a relation between protein function and their structure [8]. Further, protein structure is determined by the sequence of amino acids in the protein chain [9]. Due to various interactive forces that exist among the constituent amino acids and the side chains, a protein tends to occupy an energetically favourable structure. Protein structure is best understood at four levels- primary, secondary, tertiary and quaternary structures [10]. Primary structure The primary structure is represented by the sequence of amino acids in the protein chain, also known as the polypeptide [11]. The various constituent amino acids are linked together by peptide bonds that are formed by condensation of amino group on α-carbon of one amino
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
acid with the carboxyl group on the α-carbon of the other. This is the simplest level of protein structure in which the protein is represented as a linear sequence of its amino acids Secondary structure The secondary structure of proteins arises due to hydrogen bonding between amino and carboxyl groups of amino acids [12]. Two common types of secondary structures that are known so far are- the α- helices and the β- sheets. α – helix It is a helical structure that arises due to the formation of H-bond between the N-H group and the oxygen of a C=O group, at a difference of four amino acids in the next turn of the helix. A typical α- helix is about eleven amino acids long. A helix can be right handed or left handed in direction. The stability of an α-helix is dependent on steric interactions between the amino acids. The presence of amino acids with bulky R-groups (for example- tryptophan, tyrosine) or too small R groups (like glycine) tends to make α-helices less stable. Proline, which is characterized by its irregular geometry arising due to its R-group which bonds back to the nitrogenoftheamidegroupresultinginsterichindrance,alsodestabilizesα-helices. Moreover,prolinelackshydrogenonitsnitrogenatom.Thispreventsprolinefrom participating in hydrogen bonding which ultimately affects the helical structure. β- sheet Also known as the pleated β-sheet, consists of laterally attached proteins strands that are held together by H-bonds. Here the hydrogen bonds are formed between amine and carbonyl groups of two protein chains rather than within the protein chain. Tertiary structure This level of protein structure refers to the three dimensional shape of the proteins. It is formed as a result of folding and twisting of the secondary structures in a 3-D space. The tertiary structure is stabilized by hydrogen bonds, hydrophobic and hydrophilic interactions, ionic bonds and disulphide bonds. Tertiary structure of proteins categorised into two main types- globular and fibrous. Quaternary Structure It refers to the protein structure arising due the interactions between two or more polypeptide chains that are held together by covalent bonds or through an organic component known as
the prosthetic group. Quaternary structure is often categorised on the basis of subunits involved- homodimer (if all the subunits are identical) or heterodimer (if the subunits are different). Common examples include- haemoglobin and collagen. Secondary structure - torsion angles The peptide bond in proteins results from the sharing of electrons between C, N and O atoms. Since the electronegativity difference of these atoms is small, the electron cloud tends to be delocalized. This phenomenon imparts a partial double bond character to the peptide bond. As a result, the rotation of this bond is hindered. However the other two bonds, namely the Cα-C and the Cα-N, are free to rotate along the bond axis as shown below: Also known as the dihedral angle, torsion angles are formed by three consecutive bonds in a polypetide and are defined by theanglecreated between the two outer bonds.The rotation along the Cα-C axis give rise to psi (ψ) torsion angle while rotation along the Cα-N results in phi (φ) torsion angle. The limited rotation of the peptide bond (C-N) is represented by the Omega (ω) torsion angle. For the twenty side chains in found in the amino acids that contribute to protein formation, encoded are acidic, some basic, neutral, hydrophobic or hydrophilic. These side chains contribute to steric hindrances and limit the torsion angles that can exist for a given structure of protein. The rotation of torsion angles is best represented by the Ramachandran plot. Ramachandran plot [13] TheRamachandran plotis a 2-D representation of the rotation of the torsion angles in of a polypeptide and serves as a simple representation of the conformation of a protein. The φ and ψ angle values are clustered into distinct regions in the Ramachandran plot on the basis of steric clashes. Atoms are assumed as hard spheres with dimensions corresponding to their van der Waals radius and the φ and ψ angle values resulting in collision of these spheres are regarded as sterically disallowed conformations.
The above figure represents a Ramachandran plot. The white areas in the plot represent the disallowed regions. These correspond to those values of φ and ψ angles for which the atoms come closer than their van der Waals radii, thus resulting in steric clash. The yellow regions correspond to φ and ψ angles resulting in little steric hindrances while the red coloured region represents the most favourable φ and ψ angles with no steric hindrances. Ramachandran plot, therefore, serves as a geometric validation tool for protein structure refinement. Machine Learning and Protein Structure Prediction Introduction Machine learning refers to problem tools used in various scenarios extending from stock markets to computational chemistry [14]. Algorithms in machine learning discover patterns in historical data to and improve future decisions or actions in difficult situations through a set of complex calculations. Machine learning holds a great potential in biological and medical applications due to the large datasets that require analysis for arriving to a decision. Machine learning can act as a substitute of “wet-lab” experiments, and can guide research, and help to elucidate underlying interactions within the data. Machine learning finds application in prediction of structure and potential function of a protein. Machine learning involves developing complex algorithms that utilize the science of artificial intelligence and statistics.
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Some common examples of algorithms used in machine learning include neural networks and decision trees and have been in use for many years and are still being used. At present there are two basic types of machine learning algorithms: frequentist and evidential. In biology, machine learning has gained importance and popularity due to the demand for the analysis of high throughput data arising out of microarray analysis and sequencing. In recent years, the potential of machine learning for protein structure prediction and function determination has also been explored. Protein structure prediction Prediction of secondary structure of proteins from the primary structure has been a long studied problem in biology [15]. Many of the successful method in current times employ evolutionaryinformationandrelyon iterativesearchtoolssuchasPSI-BLAST. The prediction of secondary structure requires using many approaches such as neural or Bayesian networks and others. In addition, structured prediction is required as the structure of a protein depends on the interactions of a particular residue with other neighbours. Some of the successful methods use two stage approach in which the first creates a multiclass prediction and the second evaluates the best assignment by applying the information from the first stage. Prediction of protein structure occupies a key position in understanding protein function [16]. The information about a protein’s structure enables to unlock a set of important features that can be used to understand the protein function. Since experimental methods, like x-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy, expensive and are time consuming, machine learning based structure prediction can play an important part in attending to the issue. Further, since there is large a number of protein structures in the PDB, the number of sequences for which structures are yet to be solved, are far greater. CATH and SCOP are two systems that are used to label training data for a number of supervised learning problems found in protein structure prediction and organization. Several techniques have been successfully developed to determine sequence similarity. Since sequences are known to dictate the structure in proteins, sequences that have similarity tend to have the same structure. On the other hand, when the sequence similarity is less than a threshold, the sequence homology based modelling fails to yield reliable results. Under such circumstances, machine learning approach can play a key role to search for a template sequence with known structure. One of the approaches utilizes efficient representation of kernel classifiers while the other, known as the semi-supervised learning, utilizes the large
amount unlabelled sequence data to build models similar to PSI-Blast. In addition to these, some other methods utilise sequence-structure correlations or motifsin addition to kernel methods. Machine learning has become indispensable in protein structure prediction. In recent times, machinelearninghasshownpromisingapplicationsinproteinstructureandfunction prediction, however it is still in nascent stages and a lot has to be developed for its use as one of the main tools for computational biophysics. Most of the previous works relied on direct application of available machine learning software in conjunction with simple features. Further progress in this field requires breakthroughs beyond the straightforward application of classification software packages. Machine learning represents a powerful tool and has great potential applications in biomolecular modelling. In combination with biochemical and biophysical information, it is expected to yield greater advancement in the understanding of protein structures, functions, interactions and localizations. References [1]J. Berg, J. Tymoczko and L. Stryer, "Biochemistry",Ncbi.nlm.nih.gov, 2019. [Online]. Available:https://www.ncbi.nlm.nih.gov/books/NBK21154/. [2]G.Cooper,"TheCell",Ncbi.nlm.nih.gov,2019.[Online].Available: https://www.ncbi.nlm.nih.gov/books/NBK9839/. [Accessed: 26- May- 2019]. [3]"ChemistryforBiologists:Proteins",Rsc.org,2019.[Online].Available: https://www.rsc.org/Education/Teachers/Resources/cfb/proteins.htm.[Accessed:26-May- 2019]. [4]B. Kuhlman and D. Baker, "Native protein sequences are close to optimal for their structures",Proceedings of the National Academy of Sciences, vol. 97, no. 19, pp. 10383- 10388, 2000. Available: 10.1073/pnas.97.19.10383 [Accessed 26 May 2019]. [5]B. Alberts, A. Johnson, J. Lewis, M. Raff, K. Roberts and P. Walter, "Molecular Biology oftheCell",Ncbi.nlm.nih.gov,2019.[Online].Available: https://www.ncbi.nlm.nih.gov/books/NBK21054/. [Accessed: 26- May- 2019]. [6]O. Sparkman, Z. Penton and F. Kitson,Gas chromatography and mass spectrometry. Burlington, MA: Academic Press, 2011. [7]J. Berg, J. Tymoczko and L. Stryer, "Biochemistry",Ncbi.nlm.nih.gov, 2019. [Online]. Available: https://www.ncbi.nlm.nih.gov/books/NBK21154/. [Accessed: 26- May- 2019]. [8]G. Petsko and D. Ringe,Protein structure and function. Oxford [England]: Oxford University Press, 2009. [9]D. Whitford,Protein structure & function. New York: Wiley, 2003. [10]M. Ryadnov and F. Hudecz,Amino Acids, Peptides and Proteins 42. [S.l.]: Royal Society of Chemistry, 2018.
[11]C. Branden and J. Tooze,Introduction to Protein Structure. 2012. [12]D. Chasman,Protein structure. New York: Dekker, 2003. [13] J. Russell and R. Cohn,Ramachandran plot. [Place of publication not identified]: Book On Demand, 2012. [14]H. Bock, W. Lenski and M. Richter,Information Systems and Data Analysis. Berlin, Heidelberg: Springer Berlin Heidelberg, 1994. [15]Y. Zhou, A. Kloczkowski, E. Faraggi and Y. Yang,Prediction of protein secondary structure. [16]R. Huzefa and G. Karypis,Introduction to protein structure prediction. 2010.