Synthetic Peptides and Proteins to Elucidate Biological Function


Christian F. W. Becker and Roger S. Goody,, Max-Pianck Institut fur Molekulare Physiologie,, Dortmund,, Germany

doi: 10.1002/9780470048672.wecb583


The enormous progress made in the use of gene technological techniques over the past several decades has been the main driving force in the accumulation of our knowledge of biology at the molecular level. This progress has at times tended to push more classic approaches, such as those stemming from synthetic chemistry, into the background, and there has even been a tendency to regard contributions from this area as being superfluous. This attitude has begun to change recently, with the emergence of the field now referred to as chemical biology, and it is now appreciated that synthetic chemistry can make a unique contribution to the outstanding problems in fundamental biological and medically oriented research. The full potential of these methods is beginning to be realized in the area of peptide and protein synthesis, and this will be the topic of this article.

Synthesis of Biological Macromolecules

One of the early dreams of synthetic chemists was to achieve the total synthesis of important complex biological molecules (1). At the level of polymeric molecules, this includes proteins, nucleic acids, and polysaccharides. In all cases, early work initially involved synthesis of small fragments of the polymeric molecules (peptides, oligonucleotides, oligosaccharides) and addressed, and partially solved, the initially formidable synthetic obstacles, in particular those concerning protection and deprotection to prevent reactions occurring at unwanted positions of the molecules involved. The seminal breakthrough that led to extension of these methods to longer polymers in reasonably short periods of time was made by Merrifield (2), who was the first to show that synthesis of polymeric biological molecules could be achieved on a solid support, thus removing or at least dramatically simplifying the need for time-consuming purification and isolation of intermediates after the addition of each monomer. Merrifield introduced this principle for peptide synthesis, but in fact polynucleotide synthesis, in particular DNA synthesis, proved to work at least as well, and in terms of reaching the long-term aim of total synthesis of biological macromolecules was the first to be accomplished successfully and in relatively routine fashion (3). This is largely due to the fact that oligonucleotide synthesis of fragments of a length of ca. 50 nucleotides is relatively facile on a solid support, and that enzymes can be used to ligate such fragments in a directed fashion to achieve the goal of total gene synthesis. Although this is not the most routinely used method for generation of complete coding regions for specific proteins, there are often situations where this is the method of choice, because it allows complete control of codon usage to optimize protein expression in the organism to be used. Gene synthesis is now offered on a commercial basis and plays a significant role in modern biological research.

Progress toward total synthesis of proteins has been slower, mainly due to the lack of easy availability of an enzymatic procedure equivalent to DNA ligation that would allow coupling of peptides of a length that can be conveniently prepared by solid-phase synthesis (depending on sequence, the largest fragments that can be produced are between 50 and 100 residues long). This situation has changed significantly over the past 10-15 years with the introduction and widespread use of methods for the ligation of protein fragments together with the combination of the methods of synthetic chemistry with techniques originating in biology.

In the following, we initially discuss the advances that have been made at the technical level, and then introduce some of the many applications that exploit the new methods for the study of biologically important processes.

Chemical Methods for the Generation of Large Polypeptides

The principles of solid-state peptide synthesis have been reviewed extensively and will not be repeated here, except to remind the reader that this usually involves attachment of a suitably protected amino acid, which will become the C-terminal residue in the finished sequence, via its carboxyl group to a polymeric support. After exposing the N-terminus of this residue, this is allowed to react with the next protected and activated amino acid to form a peptide bond between the last and the penultimate amino acids in the target sequence. Repetition of this cycle allows the stepwise construction of the desired polypeptide from the direction of the C-terminus toward the N-terminus. Removal of protecting groups and cleavage from the solid support leads to the free polypeptide.

The procedure outlined here is limited to oligopeptides and polypeptides of up to ca. 50 amino acids, and thus, it limits the availability of fully synthetic proteins, because most proteins or functional domains are at least ca. 100 residues long. A solution to this problem would be to generate polypeptides with a length of several tens of amino acids and then to couple (or ligate) them to produce significantly larger proteins. In earlier work, this principle was used in a block condensation approach using fully protected polypeptides, but this did not prove to be a viable procedure in most cases. Another approach is to connect fragments of the protein using non-peptide linkers with chemistry, which obviates the need for side-chain protection. An example of this approach is given below, and it can be put to good use in certain cases. A major breakthrough was the introduction of the method known as native chemical ligation in 1994 (4). In this procedure, a peptide or polypeptide bearing a C-terminal thioester is mixed in aqueous solution under mild conditions with another peptide or polypeptide haboring an N-terminal cysteine residue (Fig. 1). The ligation reaction involves a thioester exchange reaction followed by an S→N acyl transfer to generate a native peptide bond, a reaction that had been reported much earlier (5) but that had not been considered as a ligation method.

Chemical ligation has been used for the total synthesis of a large number of proteins in recent years, as described in several reviews (6-9), and recent examples extend the size range to the order of 200 amino acids, in this case using multiple ligation steps (10, 11). Despite this progress, an attractive approach that is being used increasingly is that of a combination of synthetic and molecular biological methods in the technique referred to as expressed protein ligation, as discussed in a later section.



Figure 1. Native chemical ligation (NCL) between two unprotected peptide segments. The initial transthioesterification reaction leads to an intermediate that undergoes an S to N-acyl shift via a five-membered cyclic transition state and generates a native amide bond at the ligation site.


HIV-1 Protease as a Paradigm for Elucidating Biological Function by Chemical Protein Synthesis

Chemical protein synthesis and semisynthesis have been used to study the molecular basis of protein function in numerous cases. One of the very early and most impressive applications of chemical synthesis to the production of functional as well as site-specifically modified enzymes concerns the protease from human immunodeficiency virus 1 (HIV-1 PR). This enzyme cleaves the gag-pol polypeptide into functional proteins during virion budding from host cells and is essential for replication of the virus (12). Inhibitors of HIV-1 PR are an important class of anti-HIV drugs, and their development is at least partially based on the availability of structural and molecular information obtained with chemically synthesized HIV-1 PR.

First Access to HIV1-PR

HIV-1 PR is a homodimer made up of 99 amino acids (per monomer) that was made accessible for the first time in 1988 by Schneider and Kent, who synthesized this protein using solid-phase peptide synthesis (SPPS) (13). An automated, rapid, and highly efficient procedure in combination with purification by size exclusion chromatography was used to generate a partially purified HIV-1 PR (14), which then also became available later on in 1988 by recombinant expression in Escherichia coli (15). Proteins generated by these two procedures had the same enzymatic properties. After the initial synthesis of HIV-1 PR, one advantage of this methodology, namely the possibility to incorporate unnatural amino acids during chemical synthesis, was demonstrated by replacing all cysteine residues in HIV-1 PR by α-amino-n-butyric acid. The resulting enzyme was fully active and was crystallized to obtain one of the first three-dimensional structures of HIV-1 PR that formed the basis for structure-assisted design of HIV-1 PR inhibitors (16, 17). At the same time this structure confirmed that chemically synthesized proteins can fold and crystallize identically with proteins from natural sources. Three different crystal structures of chemically synthesized HIV-1 PR with bound peptide inhibitors were subsequently published and contributed to the further development of HIV-1 protease inhibitors (18-20).

Backbone Engineering of HIV-1 PR

The flexibility of chemical protein synthesis was used to introduce changes into the protein backbone that could not be incorporated by other means. This paved the way for a general protein engineering approach and at the same time introduced the possibility of joining two fully unprotected peptide segments by a chemoselective reaction that generated an unnatural thioester bond between Gly51 and Gly52 of each HIV-1 PR subunit (Fig. 2a). The thioester linkage was generated by reacting an N-terminal HIV-1 PR peptide segment (aa 1-51) carrying a C-terminal thioacid with a C-terminal segment (aa 52-99) having the N-terminal glycine replaced by bromoacetic acid and all additional cysteine residues replaced by α-amino-n-butyric acid (Fig. 2a) (21). This constitutes an early example of a chemoselective ligation reaction that provided access to a medium-sized protein by linking two smaller unprotected polypeptides (easily accessible by SPPS) without the need for elaborate protection schemes as used in fragment condensation reactions.

The resulting enzyme exhibited full activity, even though the thioester bond was placed inside a flexible β-hairpin loop (flap region) of HIV-1 PR, a region that undergoes drastic conformational changes during substrate and inhibitor binding. This is due to the positioning of the two glycine residues on the outside of the flaps, away from the substrate. However, the synthesis of another HIV-1 PR analog by Kent and Baca placed the thioester bond between Gly49 and Ile50, leading to a reduction in catalytic activity by a factor of 3000 (Fig. 2b) (22). This constituted the first experimental evidence that hydrogen bonds between the backbone of the flap region and the substrate are important for catalytic activity. However, substrate specificity and affinity were not affected. These particular hydrogen bonds are “transmitted” from the protease backbone to the substrate via an internal water molecule and are believed to contribute to the distortion of the scissile bond of the substrate (23).



Figure 2. Chemical synthesis of backbone engineered HIV-1 protease. The peptide segments were synthesized by SPPS and harbored a C-terminal thioacid (HIV-1 PR, aa 1-49/51, shown in blue) or an N-terminal bromoacetic acid modification (HIV-1 PR51/53-99, shown in red). These unique functional groups lead to an unnatural thioester bond either between Gly51 and 52 (Strategy A; the ligation site is shown in yellow in the cartoon representation of the HIV-1 PR dimer) or between Gly49 and Ile50 (Strategy B; the ligation site is located at the end of the N-terminal peptide segment depicted in blue). The chemoselective ligation reaction is followed by purification steps and folding of the protein into its functional conformation. Strategy A led to a fully functional HIV-1 PR, whereas strategy B led to a severe reduction in catalytic affinity. The functional dimer of the HIV-1 PR is drawn as a cartoon with only one subunit showing the modifications introduced during synthesis. The second subunit (in green) is shown unmodified for clarity. Aspartic acid 25, site-specifically labeled with 13C for NMR spectroscopy studies, is shown in magenta in one HIV-1 PR subunit.


The applicability of the thioester-forming chemoselective ligation approach was broadened by the fact that this chemistry can be carried out under acidic conditions in the presence of sulfhydryl groups. By taking advantage of this selectivity of the alkylation reaction, two different HIV-1 PR monomers were prepared. These monomers carried a free sulfhydryl group at their N- or C-terminus, respectively, and were, subsequent to the thioester-forming ligation step, joined together by a disulfide linkage to generate tethered dimers of two distinct HIV-1 subunits (24). This tethering of the two subunits produced one of the largest functional proteins prepared by chemical synthesis at that time and allowed the preparation of HIV-1 PR molecules with asymmetrically placed subunits. One example of such asymmetrical HIV-1 PR analogs was constructed with one subunit having a thioester bond between Gly51 and Gly52, which did not interfere with the biological activity of the protease, and a subunit that had a thioester bond between Gly51 and Gly52 and an additional ester bond instead of an amide bond between Gly49 and Ile50 (23). By replacing an amide with an oxygen atom in a unique position, no backbone hydrogen bond to a substrate carbonyl (via a water molecule) can be formed. Therefore, such a construct should exhibit a highly reduced catalytic activity if both flap regions are required to form hydrogen bonds. However, the ester analog of HIV-1 PR showed a reduction of kcat by only a factor of 2 upon this atom replacement. This demonstrated that only one flap region is used by the enzyme for catalysis and that the slightly reduced enzymatic activity of the ester analog is caused by the fact that, in such an asymmetric dimer, only one substrate orientation leads to productive binding. This is a vivid example of chemical protein synthesis as a unique tool in the quest of elucidating the molecular basis of enzyme catalysis.

Site-Specific Side Chain Labeling of HIV-1 PR

The incorporation of an aspartic acid residue with a 13C atom at the side-chain carboxyl function at position 25 into aspartyl protease has made this catalytically essential group visible for nuclear magnetic resonance (NMR) spectroscopy (Fig. 2) (25). The chemical shifts of this 13C atom were observed as a function of the pH and the presence and absence of substrate or inhibitor molecules. These titration experiments provided additional evidence for the suggested working model of aspartyl proteases and confirmed that HIV-1 PR is a member of this class of enzymes (26). The two aspartyl side-chain carboxyl groups (one from each subunit) act as general base and acid, respectively, thereby leading to the breakdown of the enzyme-substrate intermediate.

The work on HIV protease demonstrates how chemical protein synthesis allowed isotope labeling of a 22-kDa protein with atomic precision and provided further insights into the chemical basis of the proteolytic cleavage reaction. Isotope labeling with atomic precision has since then been used to reveal structural features of other either chemically synthesized or semisynthetic proteins (27-29).

A Mirror Image HIV-1 PR

A characteristic feature of many biomolecules is their chirality and the stereochemical specificity that is conferred to proteins and especially enzymes by being constructed from monomers with uniform stereochemical configuration. This fact has inspired chemists and biochemists to generate mirror images of proteins (as well as other biomolecules) to test the properties of proteins made up of D-rather than the naturally occurring L-amino acids with regard to their biophysical behavior, enzymatic activity, and specificity. Currently, it is still not possible to modify ribosomal protein synthesis so that all-D-polypeptides can be produced, and this would in fact be a daunting undertaking. However, chemical protein synthesis and its ability to link peptides produced by solid-phase peptide synthesis via chemoselective reactions to form medium-sized proteins allows the synthesis of peptides from D-amino acids. Milton et al. demonstrated this capability of chemical protein synthesis by producing a mirror image HIV-1 PR using their already described thioester-forming ligation approach (30). When compared with the L-form of this enzyme (also produced by chemical protein synthesis), both proteins exhibited full catalytic activity but inverse chiral specificity, meaning that the D-form only cleaves D-substrates and the L-form only L-substrates. A crystal structure of the D-HIV-1 PR revealed that it was the mirror image of the L-form, and in the presence of a substrate-based D-inhibitor (D-MVT101), all major interactions between enzyme and substrate were clearly visible (31). In addition, all secondary structure elements clearly exhibited mirrored relationships such as the inverse handedness of alpha-helices and twists of anti-parallel beta-sheets (6). The synthesis of D-HIV-1 PR impressively demonstrates the basic determinants of protein structure and emphasizes the freedom and power of chemical protein synthesis. So far only a few D-proteins have been prepared, but potential applications are mirror image-based screenings where one screens a large library of L-peptides (generated by phage display) against a D-protein for high affinity binders (32). Any hits out of such a screen could be translated into D-peptides that would bind to naturally occurring L-proteins and possess highly interesting properties such as high stability against proteolytic cleavage and possibly low immunogenicity.

Semisynthetic Proteins of the Ras-Superfamily

Although the total synthesis of a protein allows complete control over the structure, including posttranslational modifications and introduction of labels at desired sites in the sequence, it is still a major undertaking for which most laboratories whose main interest is in the biology of their target proteins are not equipped. In certain cases, for example when the site of introduction of a specific chemical modification is near the C-terminus, a combination of molecular biological and chemical methods has proved to be very powerful.

With the Ras-family of guanine nucleotide binding proteins, where the C-terminus plays a critical role in location to specific membranes, two approaches have been used to solve the problem of generating a C-terminus that is either naturally or unnaturally modified. In one of these, C-terminal peptides have been linked by a chemical method leading to an unnatural link. The chemistry used was based on the reaction of a truncated protein carrying a C-terminal cysteine with peptides carrying an N-terminal ε-maleimidocaproyl group (33-37). In this manner, Ras derivatives containing C-terminal lipids (farnesyl in the case of K-Ras and farnesyl and palmitoyl in the case of H- and N-Ras) could be prepared as well as those containing fluorescent or reactive groups. The most important result to emerge from these studies concerns the reversible modification of a cysteine residue by palmitate (38). Ras proteins seem to display weak and nonspecific general interactions with membranes via their farnesyl group (or a polybasic domain in K-Ras), but they are palmitoylated on Golgi membranes leading to their capture here. From there, they can be shuttled or reshuttled to their location on the plasma membrane by vesicular transport. This specific localization at the Golgi and plasma membranes did not occur when the palmitoyl group was replaced by a stable hexadecyl thioether, thus demonstrating the importance of a cycle of acylation and deacylation in the mode of action of these proteins.

In the example described, which uses chemistry to create an unnatural linkage between the C-terminal region of Ras and the rest of the protein, there was no apparent detrimental effect of this departure from the natural peptide backbone, as shown by various tests of biological activity. This is presumably because the most important function of the region in the experiments discussed is to provide a flexible linkage to the lipidated terminal residues. In other cases, there is a reason to believe that such a modification might be less well tolerated. In the case of the Rab proteins, which are members of the Ras-family involved in the regulation of vesicular transport, it is clear that the exact structure (sequence) of the hypervariable C-terminus is of critical importance for directing the individual members of the family of over 60 Rab proteins to distinct membrane targets. For this reason, and because one question to be investigated involved structural studies on complexes between Rab proteins and their partners, a method for producing posttranslationally modified Rab proteins with a natural polypeptide backbone throughout the whole protein was needed. This was achieved using the technique of expressed protein ligation (EPL), a procedure introduced by Muir et al. (39-41). The procedure has been used in the Rab field for the construction of a number of C-terminally modified proteins which have been used in biochemical, biophysical and cell biological studies (42-46). In a specific case, as shown in Fig. 3a, a yeast Rab protein, Ypt1, was expressed in C-terminally truncated form in E. coli as a fusion protein with an intein domain and a chitin-binding domain (46). This construct could be purified by affinity chromatography on chitin-agarose. The C-terminal thioester of the truncated Ypt1 was cleaved from this support using a thiol reagent, a procedure that emulates the attack of a serine or cysteine residue in the C-extein, which is normally present in natural intein proteins (47). This thioester could be used for an in vitro ligation reaction with monogeranylgeranylated di-cysteine to generate the C-terminus in monolipidated form. As both the prenylated peptide and the reaction product (prenylated Ypt1) are insoluble in an aqueous environment, the ligation reaction was performed in detergent solution. Using the expressed protein ligation approach, both singly and doubly prenylated Ypt1 molecules could be produced. The complexes of these proteins with their solubilizing protein, GDI (GDP-dissociation inhibitor), could be crystallized, and their three-dimensional structures were determined (46-48). This revealed for the first time the nature of the lipid interaction with a binding site in an unexpected part of the GDI molecule (Fig. 3b). In the previously determined structure of GDI without a bound Rab molecule, this binding site was not detected, because a movement of one of the α-helices of the lower domain of GDI has to occur to create space for lipid binding, and this seems only to occur when the lipid residues, or possibly the whole prenylated Rab molecule binds. The position of binding was essentially the same for single or double ger- anygeranyl groups, and Fig. 3b shows only the physiologically more relevant doubly prenylated structure (most Rab molecules are doubly prenylated).

The structural determination of the complex between GDI and prenylated Rab molecules has provided considerable information on the mechanism of action of GDI in the recycling of Rab proteins between target and donor membranes (48, 49). It also sheds light on the molecular basis of a form of x-linked non-syndromic mental retardation, in which there is an L92P mutation in GDIa, which is highly expressed in brain, and which results in a reduced ability to extract Rabs from membranes. It was previously thought that this residue would be in the lipid binding site, but the structure depicted in Fig. 3b shows that the corresponding residue in yeast GDI (I100) is not in the lipid binding site but makes an important hydrophobic interaction with a conserved hydrophobic motif in the Rab C-terminal hypervariable domain.

The same technology was used to create Rab proteins bearing a variety of fluorescent groups at the C-terminus. This approach allowed introduction of such reporter groups near to the reactive SH groups, which are the site of prenylation while leaving these groups free for the prenylation reaction, a process that results in large fluorescence signal changes in certain cases. Experiments on the prenylation of such selectively modified Rab proteins allowed insights into the molecular basis of another hereditary disease, namely x-linked degradation of chorioretinal cells in choroideremia, a disease caused by underprenylation of certain Rab proteins (50).



Figure 3. (a) Preparation of prenylated Yptl (a yeast Rab-protein) by expressed protein ligation. A C-terminal thioester of the truncated Rab protein was allowed to react with a doubly geranygeranylated tricysteine peptide, leading to transesterification and an S→N acyl shift to generate a native peptide bond. (b) Interaction of the C-terminus of semisynthetic doubly geranylgeranylated Yptl with the lower domain of yeast GDI. GDI is shown in green as a ribbon structure, the C-terminus of YPT1 in magenta, and the geranylgeranyl groups in red and blue CPK representation. Several residues of the C-terminus of YPT1 were not visible in the electron density map, so that the connection to the prenyl groups is not observed directly. One prenyl goup (in red) is buried deeply into the hydrophobic core of GDI, whereas the other (in blue) is more superficially bound and shows interaction with the other prenyl group. The lipid binding site is generated by an opening movement of two α-helices.


Split-Inteins for Protein Semisynthesis in vitro and in vivo

The technique of expressed protein ligation has been exploited extensively during the last couple of years to produce semisynthetic proteins with tailor-made properties (4, 39). Examples are described above, and the method has been reviewed in detail recently (41, 51-53). The discovery that naturally occurring inteins, protein splicing domains that can excise themselves from a given polypeptide and join the flanking domains via a peptide bond, can be split into two pieces that possess the ability to spontaneously associate and form a functional intein has further extended the utility of intein technology (54-56). In particular, two split inteins (the DnaE and DnaB inteins) that do not require a denaturation and renaturation step to become fully functional are highly useful for the semisynthesis of specifically modified proteins in vitro and in vivo (57, 58).

The DnaE Intein

The DnaE intein from Synechocystis ssp. is a naturally occurring split intein and consists of a longer N-terminal segment (123 amino acids) that can be C-terminally fused to almost any given protein sequence and expressed. The C-terminal segment consists of only 36 aa and is easily accessible by chemical synthesis and therefore allows the addition of specifically modified peptides to its C-terminus that are, upon trans-splicing, transferred onto the N-terminal protein that was expressed as a fusion protein with the N-terminal intein segment. This split intein system has enabled the first semisynthesis of a GFP-FLAG fusion protein in vivo (59). To achieve this goal, the N-terminal DnaE segment was fused to GFP and expressed in CHO cells. These cells were complemented with a chemically synthesized C-terminal part of the intein together with a FLAG tag and a protein transduction domain (PTD) for efficient uptake into the cells. The GFP-FLAG fusion protein that was generated upon successful trans-splicing was unambiguously identified by GFP- and FLAG-specific antibodies. Such a system allows the in vivo incorporation of biophysical probes, as long as the chemically synthesized part can be brought into the cells of interest. Detailed insights into the mechanism of the trans-splicing reaction of the DnaE intein were provided by crystal structures of this protein after excision and of a splicing-deficient precursor protein (60).

Further applications of the DnaE split intein include the development of a tandem trans-splicing system that is based on a combination of the DnaE split intein and the engineered, inducible VMA split intein (61). Such a system allows the segmental labeling of proteins with specific isotopes [as demonstrated by Otomo et al. with the artificial PI-PfuI and PI-PfuII split inteins (62)] and fluorophores. The DnaE split intein was also used by Camarero et al. to achieve the site-specific, oriented immobilization of proteins such as maltose binding protein (MBP) and enhanced green fluorescent protein (EGFP) onto glass surfaces (63). A covalent bond to the glass surface was established by thioether formation between a maleimide group on the surface and a thiol group bearing PEG linker that also carried four amino acids, including a cysteine residue, which could act as a nucleophile in trans-splicing reactions, and the C-terminal segment of the DnaE intein (36 aa). Upon addition of a MBP- or EGFP-N-intein fusion construct that was either produced by recombinant or cell-free expression the intein halves associated and trans-splicing occurred, leading to the immobilization of MBP or EGFP on the surface. The associated DnaE intein halves were washed away, and the proteins remained, covalently bound via a PEG spacer, on the surface. The advantage of this approach is that no purification of the expressed proteins is necessary because only intein fusion constructs undergo the highly specific immobilization reaction. Furthermore, only low concentrations are needed to achieve efficient trans-splicing reaction [dissociation constant of the DnaE split intein halves is 43 nM, and trans-splicing occurs at a rate of ca. 7 x 10-5 s-1 (61, 64)], which constitutes an advantage over immobilization techniques that rely on chemoselective reactions and strongly depend on reactand concentrations (65-68). Thus, this approach points to a new route to produce protein chips without the need for large amounts of purified protein.


Figure 4. Mechanism of trcms-protein splicing. (a) Initial association of the intein halves to form a functional intein. (b) Activation of the N-terminal splice-junction via an N-S acyl shift. (c) Formation of a branched intermediate upon transthioesterification. (d) Branch resolution and intein release by succinimide formation. Spontaneous S-N acyl rearrangement yields the processed product with a native peptide backbone.


The DnaB Intein

The DnaB intein from Synechocystis spp. consists of 429 amino acids, including a homing endonuclease domain, in its native form. The removal of 275 amino acids leads to a functional mini intein (154 aa) that can also be split into two halves that undergo trans-splicing when co-expressed in E. coli (57). To test whether trans-splicing also occurs in vitro, Mootz et al. expressed a fusion protein consisting of MBP and the N-terminal half of the DnaB intein (104 aa) and a fusion construct of the C-terminal half (47 aa) and a hexa-histidine tag (69). Upon mixing in stoichiometric amounts, successful trans-splicing produced the MBP-His-tag fusion protein. This constituted the first case of an artificial split intein that spontaneously assembled to form the active intein and underwent trans-splicing without the need for a denaturation-renaturation step. The only other artificial split intein that does not require such a renaturation-denaturation step reported previously was the VMA intein from Saccharomyces cerevisiae. However, the N- and C-terminal segments of this intein do not assemble spontaneously to form a functional intein. They require a dimerization domain that brings both halves in close proximity to each other, which induces trans-splicing (70-72). This renders the DnaB split intein highly interesting for protein engineering approaches, and in combination with the DnaE split intein or with an inducible split intein such as the VMA intein, it provides a valuable tool to combine three protein segments with each other by two concomitant or subsequent trans-splicing reactions. An additional advantage of the DnaB split intein is the occurrence of a serine residue as the C-terminal nucleophile for the splicing reaction instead of cysteine residues. Cysteine residues might not be desirable in some cases because they can interfere with folding or labeling of the newly generated protein. Nevertheless a cysteine can replace the serine as a nucleophile at this position as demonstrated by the fact that the DnaB intein has been used to generate protein segments with N-terminal cysteine residues. This was achieved by expressing the desired DnaB intein as a fusion construct with the target protein in inclusion bodies and by taking advantage of the pH sensitivity of the DnaB intein to prevent premature cleavage during work up (73).

To extend the utility of the DnaB split intein, Liu et al. have tested 13 different sites to split this intein into two segments of different length (58). Until this series of experiments, all known artificial split inteins had been split at the endonuclease domain. Out of 13 tested sites, 3 gave functional split inteins that would undergo trans-splicing, including 1 that consisted of only 11 N-terminal amino acids. Such a short N-terminal split intein half is accessible by chemical synthesis, and the introduction of chemically modified peptides at the N-terminus via trans-splicing was recently demonstrated. Such a system nicely complements the already established C-terminal modification approach via the DnaE split intein (74).


The work reviewed here illustrates that, in the century since Fischer formulated his vision that the synthesis of proteins should be achievable using the methods of organic chemistry (1) this prediction has been largely fulfilled. What he could not possibly have predicted was the role that molecular biological techniques would play in combination with chemical methods, although he was realistic enough to imply that chemistry would not be the method of choice if biotechnological methods were available. Future developments in the area of synthetic and semisynthetic proteins are likely to include extension of ligation methods to amino acids other than cysteine and the increased use of strategies for generating proteins with precisely engineered properties in cells, including such approaches as conditional splicing, a technique in which a specific protein activity is generated intracellularly by exposure to a small membrane-permeable molecule (70-72).


1. Fischer E. Proteins and Polypeptides. Angew. Chem. 1907; 20: 913-917.

2. Merrifield RB. Solid phase peptide synthesis. 1. Synthesis of Tetrapeptide. J. Am. Chem. Soc. 1963; 85: 2149-2154.

3. Nambiar KP, Stackhouse J, Stauffer DM, Kennedy WP, Eldredge JK, Benner SA. Total synthesis and cloning of a gene coding for the ribonuclease S protein. Science 1984; 223(4642): 1299-1301.

4. Dawson PE, Muir TW, Clark-Lewis I, Kent SBH. Synthesis of proteins by native chemical ligation. Science 1994; 266: 776-779.

5. Wieland T, Bokelmann E, Bauer L, Lang HU, Lau H. Uber Peptidsynthesen. 8. Bildung Von S-Haltigen Peptiden Durch Intramolekulare Wanderung Von Aminoacylresten. Ann. Chem.-Justus Liebig 1953; 583(2): 129-149.

6. Kent S. Total chemical synthesis of enzymes. J. Pept. Sci. 2003; 9(9): 574-593.

7. Dawson PE, Kent SBH. Synthesis of native proteins by chemical ligation. Ann. Rev. Biochem. 2000; 69: 923-960.

8. Muir TW, Dawson PE, Kent SB. Protein synthesis by chemical ligation of unprotected peptides in aqueous solution. Meth. Enzymol. 1997; 289: 266-298.

9. Goody RS, Alexandrov K, Engelhard M. Combining chemical and biological techniques to produce modified proteins. ChemBioChem 2002; 3(5): 399-403.

10. Kochendoerfer GG, Chen SY, Mao F, Cressman S, Traviglia S, Shao H, et al. Design and chemical synthesis of a homogeneous polymer-modified erythropoiesis protein. Science 2003; 299(5608): 884-887.

11. Becker CFW, Hunter CL, Seidel R, Kent SBH, Goody RS, Engelhard M. Total chemical synthesis of a functional interacting protein pair: the protooncogene H-Ras and the Ras-binding domain of its effector c-Raf1. Proc. Natl. Acad. Sci. U.S.A. 2003; 100(9): 5075-5080.

12. Kohl NE, Emini EA, Schleif WA, Davis LJ, Heimbach JC, Dixon RA, et al. Active human immunodeficiency virus protease is required for viral infectivity. Proc. Natl. Acad. Sci. U.S.A. 1988; 85(13): 4686-4690.

13. Schneider J, Kent SB. Enzymatic activity of a synthetic 99 residue protein corresponding to the putative HIV-1 protease. Cell 1988; 54(3): 363-368.

14. Kent SB. Chemical synthesis of peptides and proteins. Annu. Rev. Biochem. 1988; 57: 957-989.

15. Graves MC, Lim JJ, Heimer EP, Kramer RA. An 11-kDa form of human immunodeficiency virus protease expressed in Escherichia coli is sufficient for enzymatic activity. Proc. Natl. Acad. Sci. U.S.A. 1988; 85(8): 2449-2453.

16. Wlodawer A, Miller M, Jaskolski M, Sathyanarayana BK, Baldwin E, Weber IT, et al. Conserved folding in retroviral proteases: crystal structure of a synthetic HIV-1 protease. Science 1989; 245: 616-621.

17. Wlodawer A, Vondrasek J. Inhibitors of HIV-1 protease: a major success of structure-assisted drug design. Annu. Rev. Biophys. Biomol. Struct. 1998; 27: 249-284.

18. Miller M, Schneider J, Sathyanarayana BK, Toth MV, Marshall GR, Clawson L et al. Structure of complex of synthetic HIV-1 protease with a substrate-based inhibitor at 2.3 A resolution. Science 1989; 246(4934): 1149-1152.

19. Swain AL, Miller MM, Green J, Rich DH, Schneider J, Kent SB et al. X-ray crystallographic structure of a complex between a synthetic protease of human immunodeficiency virus 1 and a substrate-based hydroxyethylamine inhibitor. Proc. Natl. Acad. Sci. U.S.A. 1990; 87(22): 8805-8809.

20. Jaskolski M, Tomasselli AG, Sawyer TK, Staples DG, Heinrikson RL, Schneider J et al. Structure at 2.5-A resolution of chemically synthesized human immunodeficiency virus type 1 protease complexed with a hydroxyethylene-based inhibitor. Biochemistry 1991; 30(6): 1600-1609.

21. Schnolzer M, Kent SBH. Constructing proteins by dovetailing unprotected synthetic peptides: backbone-engineered HIV protease. Science 1992; 256: 221-225.

22. Baca M, Kent SBH. Catalytic contribution of flap-substrate hydrogen bonds in “HIV-1 protease” explored by chemical synthesis. Proc. Natl. Acad. Sci. U.S.A. 1993; 90: 11638-11642.

23. Baca M, Kent SBH. Protein backbone engineering through total chemical synthesis: new insight into the mechanism of HIV-1 protease catalysis. Tetrahedron 2000; 56(48): 9503-9513.

24. Baca M, Muir TW, Schnolzer M, Kent SBH. Chemical ligation of cysteine-containing peptides: synthesis of a 22kDa tethered dimer of HIV-1 protease. J. Am. Chem. Soc. 1995; 117: 1881-1887.

25. Smith R, Brereton IM, Chai RY, Kent SB. Ionization states of the catalytic residues in HIV-1 protease. Nat. Struct. Biol. 1996; 3(11): 946-950.

26. Suguna K, Padlan EA, Smith CW, Carlson WD, Davies DR. Binding of a reduced peptide inhibitor to the aspartic proteinase from Rhizopus chinensis: implications for a mechanism of action. Proc. Natl. Acad. Sci. U.S.A. 1987; 84(20): 7009-7013.

27. Kochendoerfer GG, Jones DH, Lee S, Oblatt-Montal M, Opella SJ, Montal M. Functional characterization and NMR spectroscopy on full-length Vpu from HIV-1 prepared by total chemical synthesis. J. Am. Chem. Soc. 2004; 126(8): 2439-2446.

28. Romanelli A, Shekhtman A, Cowburn D, Muir TW. Semisynthesis of a segmental isotopically labeled protein splicing precursor: NMR evidence for an unusual peptide bond at the N-exteinintein junction. Proc. Natl. Acad. Sci. U.S.A. 2004; 101(17): 6397-6402.

29. Cowburn D, Muir TW. Segmental isotopic labeling using expressed protein ligation. Meth. Enzymol. 2001; 339: 41-54.

30. Milton RCL, Milton SCF, Kent SBH. Total chemical synthesis of a D-enzyme: the enantiomers of HIV-1 protease show demonstration of reciprocal chiral substrate specificity. Science 1992; 256: 1445-1448.

31. Miller M, Baca M, Rao JKM, Kent SBH. Probing the structural basis of the catalytic activity of HIV-1 PR through total chemical protein synthesis. J. Mol. Struct.-Theochem 1998; 423(1-2): 137-152.

32. Schumacher TN, Mayr LM, Minor DL Jr, Milhollen MA, Burgess MW, Kim PS. Identification of D-peptide ligands through mirror- image phage display. Science 1996; 271(5257): 1854-1857.

33. Kuhn K, Owen DJ, Bader B, Wittinghofer A, Kuhlmann J, Waldmann H. Synthesis of functional Ras lipoproteins and fluorescent derivatives. J. Am. Chem. Soc. 2001; 123(6): 1023-1035.

34. Bader B, Kuhn K, Owen DJ, Waldmann H, Wittinghofer A, Kuhlmann J. Bioorganic synthesis of lipid-modified proteins for the study of signal transduction. Nature 2000; 403(6766): 223-226.

35. Reents R, Wagner M, Schlummer S, Kuhlmann J, Waldmann H. Synthesis and application of fluorescent ras proteins for live-cell imaging. ChemBioChem 2005; 6(1): 86-94.

36. Kuhlmann J, Tebbe A, Volkert M, Wagner M, Uwai K, Waldmann H. Photoactivatable synthetic Ras proteins: “baits” for the identification of plasma-membrane-bound binding partners of Ras. Angew. Chem. Int. Ed. Engl. 2002; 41(14): 2546-2550.

37. Volkert M, Uwai K, Tebbe A, Popkirova B, Wagner M, Kuhlmann J, et al. Synthesis and biological activity of photoactivatable N-ras peptides and proteins. J. Am. Chem. Soc. 2003; 125(42): 12749-12758.

38. Rocks O, Peyker A, Kahms M, Verveer PJ, Koerner C, Lumbierres M, et al. An acylation cycle regulates localization and activity of palmitoylated Ras isoforms. Science 2005; 307(5716): 1746-1752.

39. Muir TW, Sondhi D, Cole PA. Expressed protein ligation: A general method for protein engineering. Proc. Natl. Acad. Sci. U.S.A. 1998; 95(12): 6705-6710.

40. Severinov K, Muir TW. Expressed protein ligation, a novel method for studying protein-protein interactions in transcription. J. Biol. Chem. 1998; 273(26): 16205-16209.

41. Muir TW. Semisynthesis of proteins by expressed protein ligation [Review]. Annu. Rev. Biochem. 2003; 72: 249-289.

42. Iakovenko A, Rostkova E, Merzlyak E, Hillebrand AM, Thoma NH, Goody RS, et al. Semi-synthetic Rab proteins as tools for studying intermolecular interactions. FEBS Lett. 2000; 468(2-3): 155-158.

43. Alexandrov K, Heinemann I, Durek T, Sidorovitch V, Goody RS, Waldmann H. Intein-mediated synthesis of geranylgeranylated Rab7 protein in vitro. J. Am. Chem. Soc. 2002; 124(20): 5648-5649.

44. Durek T, Alexandrov K, Goody RS, Hildebrand A, Heinemann I, Waldmann H. Synthesis of fluorescently labeled mono- and diprenylated Rab7 GTPase. J. Am. Chem. Soc. 2004; 126(50): 16368-16378.

45. Brunsveld L, Watzke A, Durek T, Alexandrov K, Goody RS, Waldmann H. Synthesis of functionalized rab GTPases by a combination of solution- or solid-phase lipopeptide synthesis with expressed protein ligation. Chemistry 2005; 11(9): 2756-2772.

46. Rak A, Pylypenko O, Durek T, Watzke A, Kushnir S, Brunsveld L, et al. Structure of Rab GDP-dissociation inhibitor in complex with prenylated YPT1 GTPase. Science 2003; 302(5645): 646-650.

47. Paulus H. Protein splicing and related forms of protein autoprocessing. Annu. Rev. Biochem. 2000; 69(1): 447-496.

48. Pylypenko O, Rak A, Durek T, Kushnir S, Dursina BE, Thomae NH et al. Structure of doubly prenylated Ypt1:GDI complex and the mechanism of GDI-mediated Rab recycling. EMBO J. 2006; 25(1): 13-23.

49. Goody RS, Rak A, Alexandrov K. The structural and mechanistic basis for recycling of Rab proteins between membrane compartments. Cell. Mol. Life Sci. 2005; 62(15): 1657-1670.

50. Rak A, Pylypenko O, Niculae A, Pyatkov K, Goody RS, Alexandrov K. Structure of the Rab7: REP-1 complex: insights into the mechanism of Rab prenylation and choroideremia disease. Cell 2004; 117(6): 749-760.

51. David R, Richter MP, Beck- Sickinger AG. Expressed protein ligation. Method and applications. Eur. J. Biochem. 2004; 271(4): 663-677.

52. Durek T, Becker CF. Protein semi-synthesis: new proteins for functional and structural studies. Biomol. Eng. 2005.

53. Muralidharan V, Muir TW. Protein ligation: an enabling technology for the biophysical analysis of proteins. Nat. Methods 2006; 3(6): 429-438.

54. Southworth MW, Adam E, Panne D, Byer R, Kautz R, Perler FB. Control of protein splicing by intein fragment reassembly. EMBO J. 1998; 17(4): 918-926.

55. Mills KV, Lew BM, Jiang S, Paulus H. Protein splicing in trans by purified N- and C-terminal fragments of the Mycobacterium tuberculosis RecA intein. Proc. Natl. Acad. Sci. U.S.A. 1998; 95(7): 3543-3548.

56. Shingledecker K, Jiang SQ, Paulus H. Molecular dissection of the Mycobacterium tuberculosis RecA intein: design of a minimal intein and of a trans-splicing system involving two intein fragments. Gene 1998; 207(2): 187-195.

57. Wu H, Hu ZM, Liu XQ. Protein trans-splicing by a split intein encoded in a split dnae gene of synechocystis sp. Pcc6803. Proc. Natl. Acad. Sci. U.S.A. 1998; 95(16): 9226-9231.

58. Sun W, Yang J, Liu XQ. Synthetic two-piece and three-piece split inteins for protein trans-splicing. J. Biol. Chem. 2004; 279(34): 35281-35286.

59. Giriat I, Muir TW. Protein semi-synthesis in living cells. J. Am. Chem. Soc. 2003; 125(24): 7180-7181.

60. Sun P, Ye S, Ferrandon S, Evans TC, Xu MQ, Rao Z. Crystal structures of an intein from the split dnaE gene of Synechocystis sp. PCC6803 reveal the catalytic model without the penultimate histidine and the mechanism of zinc ion inhibition of protein splicing. J. Mol. Biol. 2005; 353(5): 1093-1105.

61. Shi J, Muir TW. Development of a tandem protein trans-splicing system based on native and engineered split inteins. J. Am. Chem. Soc. 2005; 127(17): 6198-6206.

62. Otomo T, Ito N, Kyogoku Y, Yamazaki T. NMR observation of selected segments in a larger protein: central-segment isotope labeling through intein-mediated ligation. Biochemistry 1999; 38(49): 16040-16044.

63. Kwon Y, Coleman MA, Camarero JA. Selective immobilization of proteins onto solid supports through split-intein-mediated protein trans-splicing. Angew. Chem/Int. Ed. Engl. 2006; 45(11): 1726-1729.

64. Martin DD, Xu MQ, Evans TC Jr. Characterization of a naturally occurring trans-splicing intein from Synechocystis sp. PCC6803. Biochemistry 2001; 40(5): 1393-1402.

65. Soellner MB, Dickson KA, Nilsson BL, Raines RT. Site-specific protein immobilization by Staudinger ligation. J. Am. Chem. Soc. 2003; 125(39): 11790-11791.

66. Watzke A, Kohn M, Gutierrez-Rodriguez M, Wacker R, Schroder H, Breinbauer R, et al. Site-selective protein immobilization by Staudinger ligation. Angew. Chem. Int. Ed. Engl. 2006; 45(9): 1408-1412.

67. de Araujo AD, Palomo JM, Cramer J, Kohn M, Schroder H, Wacker R, et al. Diels-alder ligation and surface immobilization of proteins. Angew. Chem. Int. Ed. Engl. 2005; 45(2): 296-301.

68. Camarero JA, Kwon Y, Coleman MA. Chemoselective attachment of biologically active proteins to surfaces by expressed protein ligation and its application for “protein chip” fabrication. J. Am. Chem. Soc. 2004; 126(45): 14730-14731.

69. Brenzel S, Kurpiers T, Mootz HD. Engineering artificially split inteins for applications in protein chemistry: biochemical characterization of the split Ssp DnaB intein and comparison to the split Sce VMA intein. Biochemistry 2006; 45(6): 1571-1578.

70. Mootz HD, Muir TW. Protein splicing triggered by a small molecule. J. Am. Chem. Soc. 2002; 124(31): 9044-9045.

71. Mootz HD, Blum ES, Tyszkiewicz AB, Muir TW. Conditional protein splicing: a new tool to control protein structure and function in vitro and in vivo. J. Am. Chem. Soc. 2003; 125(35): 10561-10569.

72. Mootz HD, Blum ES, Muir TW. Activation of an autoregulated protein kinase by conditional protein splicing. Angew. Chem. Int. Ed. Engl. 2004; 43(39): 5189-5192.

73. Hackenberger CP, Chen MM, Imperiali B. Expression of N-terminal Cys-protein fragments using an intein refolding strategy. Bioorg. Med. Chem. 2006; 14(14): 5043-5048.

74. Ludwig C, Pfeiff M, Linne U, Mootz HD. Ligation eines synthetischen Peptids an den N-Terminus eines rekombinanten Proteins durch semisynthetisches trans-Proteinspleien (p NA). Angew. Chem. (Engl.) In press.

See Also

Chemistry and Chemical Reactivity of Proteins

Structure, Function and Stability of Proteins

Lipidated Peptide Synthesis

Synthesis of Natural and Unnatural Amino Acids