A large proportion of biotherapeutic products are glycoproteins. These include erythropoietin and other cytokines, antibodies, glycosyltransferases, and glycosidases, which together generate billions of dollars in sales worldwide. Such drugs are inherently complex. As new treatments emerge and biosimilars are evaluated, the need to better understand their molecular structures is more acute than ever.
Therapeutic glycoproteins are typically produced as recombinant products in cell culture systems. Glycosylation is of major importance during development of these drugs because their glycan chains markedly affect product stability, activity, antigenicity, and pharmacodynamics. A detailed description of the structural features for such carbohydrate-containing molecules is increasingly expected as part of their new drug applications (NDAs) or comparability protocols.
Product Focus: Glycoproteins
Process Focus: Manufacturing
Who Should Read: QA/QC, product development, and analytical
Keywords: Mass spectrometry, biosimilars, sample preparation, N-glycans, O-glycans, capillary electrophoresis
Evaluation of protein glycosylation often involves first defining the different glycoforms after their release from the protein, then determining individual glycans and their relative ratios. Site-specific characterization of glycosylation may follow, which can be a powerful tool to help evaluate the specific sites that will be most susceptible to change. That’s important to know when monitoring process control, establishing the impact of scale-up procedures, or incorporating new steps into expression and purification processes.
In many cases, such detailed analysis requires a high level of expertise, powerful instrumentation (e.g., mass spectrometers with high resolution and accuracy), and a considerable amount of time. We have evaluated emerging new methods that support the field of glycopeptide analysis, focusing on applying some such methods for characterization of a specific biopharmaceutical glycoprotein: etanercept.
N-Linked Glycosylation Mapping
At present, site-specific glycosylation is routinely determined for monoclonal antibodies (MAbs) using capillary electrophoresis with sodium-dodecyl-sulfate (CE-SDS) or isoelectric focusing (CE-IEF) and peptide mapping by mass spectrometry (1, 2). The latter method is one of the most common and successful for initial site-specific characterization because CE-SDS and CE-IEF both require well-characterized standards for identification.
Peptide mapping involves digestion of a glycoprotein with specific proteases after reduction and alkylation followed by reversed-phase high-performance liquid chromatography (RP-HPLC) analysis of the resulting peptide and glycopeptides fragments. Typically, broader chromatography peaks come with heterogeneous glycopeptides that sometimes can be detected by on-line mass spectrometry. The method is semi-quantitative because different glycopeptides will ionize differently in a mass spectrometer. However, with electrospray mass spectrometry the relative intensities of glycopeptide signals in mass spectra provide a good approximation of the relative proportions of the glycoforms on each peptide (3).
With some large, complex glycoprotein biotherapeutics containing numerous potential sites for N-glycosylation, site-specific characterization of glycosylation
sometimes can be extremely difficult. Such glycoconjugates can have potential glycosylation sites that may be only partially modified. Glycans can be heterogeneous as well, consisting of many different glycoforms attached to one site. When we analyze a very complex mixture of nonglycosylated peptides and glycopeptides, some may go undetected because of suppression effects of coeluting nonglycosylated peptides. Glycopeptides ionize relatively inefficiently compared with unmodified peptides within the mixture (4). In addition, dilution effects attributable to the many attached different glycoforms can reduce significantly the intensity of signals for the glycopeptides.
Consequently, more than one protease often is required for identifying all sites, and negative-ion analysis may be required for detecting some glycopeptides that are highly sialylated. Therefore, site-specific characterization of many large, complex, biotherapeutic glycoproteins can be a challenging task. Streamlined methods are required to speed up identification of site-occupancy for such complex molecules with good sensitivity, recovery, reproducibility, and accuracy.
Characterization of N-linked glycosylated sites by glycopeptide enrichment has become increasingly popular in the academic community (5, 6, 7, 8, 9, 10, 11, 12). These methods remove nonglycosylated species from a digestion mixture so that glycopeptide signals can be clearly observed. Among other glycopeptide enrichment techniques (summarized in
Table 1), solid-phase extraction using hydrophilic interaction chromatography (HILIC-SPE) has expanded in use (8, 10, 11, 12, ). That is probably because HILIC-SPE is more slective for glycosylated species and is not biased toward particular glycan types. Furthermore, using that technique requires no chemical alteration of glycans. Their hydrophilic nature is sufficient to impart that hydrophilic characteristic to glycopeptides, which allows them to be retained by hydrophilic interactions on polar stationary phases and thus be separated from less-hydrophilic nonglycosylated peptides. Different types of HILIC material are commercially available (
This enrichment approach recently has been improved with addition of an ion-pairing reagent such as trifluoroacetic acid (TFA) in the mobile phase (11). Ion-pairing affects the overall hydrophilicity of unmodified peptides more than glycopeptides, increasing efficiency in separating the two species. This method has been shown to work very reproducibly, with excellent specificity for analysis of glycopeptides produced from protease digestions of various glycoproteins. Enriched glycopeptides can be clearly observed through reversed-phase high-performance liquid chromatography with mass spectrometry (RP-HPLC-MS), facilitating rapid site-specific characterization of N-glycosylation with good recovery.
Figure 1 shows HILIC-SPE enrichment of etanercept N-linked
glycopeptides. The elution positions of three N-linked glycopeptides spanning the consensus sites for N-linked glycosylation are obvious. Resulting masses observed in the N-linked glycopeptide spectra give information about the types of N-glycans observed at each site (
Figure 2). These data confirm that the N-glycans in the TNF-receptor portion of etanercept (Asn-149 and Asn-171) consist of more complex N-glycans than those found within the immunoglobulin region (Asn-317), which carries expected N-glycans associated with an immunoglobulin G1 (IgG1) antibody.
Thus HILIC-SPE with TFA enrichment shows great promise for site-specific characterization of very complex biotherapeutics that contain multiple N-glycosylation sites, and with good time efficiency. The method can be automated, and prepacked HILIC-SPE tips are available commercially to make the technique more reproducible. During method development for a particular biotherapeutic, however, it is important to analyze both nonenriched and enriched digestions along with a flow-through fraction to ensure complete recovery — or key information could be missed.
Once an analyst is satisfied with the enrichment procedure, then he or she can further characterize purified glycopeptides with exo- and endoglycosidase digestions to confirm links and anomericity. Sequence and branching information can be achieved by fragmentation of glycopeptides with electrospray-ionization, collision-induced dissociation, tandem mass spectrometry (ESI-CID-MS/MS). That allows for fragmentation of glycan species to be observed predominantly at the glycosidic linkages while the peptide backbone remains intact.
O-Linked Glycosylation Site Mapping
Determining occupied O-linked glycosylation sites is generally much more difficult than site-specific identification of N-linked glycosylation. A number of features make identification of O-linked sites difficult. First, there is no known amino-acid consensus sequence for O-linked glycans. Second, there is normally significant heterogeneity with O-glycosylated protein regions, both in the number of glycans and the extent of their occupancy. For example, the O-glycosylated domain found in the hinge region in-between the TNF-receptor and IgG1 in the fusion protein, etanercept, is known to be highly O-glycosylated, containing 18 Ser and Thr residues that are potential glycosylation sites. In all, 13 O-glycosylation sites have been reported for the entire fusion protein, although specific site occupancy was not described (13, 14).
The most favored technique for analyzing O-glycans begins by releasing those glycans from peptides, most commonly through reductive &bgr;-elimination (15). This gives information on the type and structure of the O-glycans, but not on their site
occupancies. Determining the latter involves analysis of the glycopeptides, most commonly using RP-HPLC-MS.
Many O-linked domains have many Ser, Thr, and Pro residues but few other residues such as Lys, Arg, Asp, Trp, Tyr, and Phe, which can be cleaved using proteases such as trypsin, Asp-N, and chymotrypsin. So in peptide-mapping experiments, O-linked glycopeptides are normally large and very heterogeneous. As with N-linked glycopeptides, the glycopeptide signals are minor or not observed when other more abundant nonglycosylated peptides are present.
The success of HILIC-SPE enrichment of these species depends heavily on the hydrophilicity of the glycopeptides. Because O-linked glycans on biotherapeutic proteins are normally small di- or trisaccharides (which are less hydrophilic), some glycopeptide species may be lost if they are less well substituted. Thus, reproducible methods that facilitate successful enrichment of such species from nonglycosylated peptides remain to be established.
Characterization of O-glycosylated regions using exhaustive nonspecific protease digestion has increased in popularity over recent years. Carlito Lebrilla’s group at the University of California in Davis, CA, works extensively with this technique (16, 17). Nonglycosylated regions are digested down to mono- and dipeptides; glycosylated regions sterically hinder digestion around their glycosylated sites. That leaves a sizable peptide footprint with fewer substituted Ser and Thr residues and gives more detailed site microheterogeneity.
Glycopeptides also can be analyzed successfully with RP-HPLC or porous graphitized carbon (PGC) columns using ESI-MS/MS. We have found this method to be extremely useful for identifying glycosylated regions in biotherapeutic proteins. For example, we have characterized the O-glycopeptides in the hinge region of etanercept using exhaustive pronase digestion and accurate mass LC-MS/ MS (
shows example spectra of an O-linked glycopeptide from etanercept). Glycopeptide assignments were corroborated after further digestion with neuramidase to remove terminal sialic acids (
). Fragmentation by MS/MS is also effective in further confirming those glycopeptide assignments (
Table 3 summarizes the glycopeptides we identified. And
Figure 4 depicts the occupied sites we found within the hinge region of etanercept using this pronase digestion method. This technique shows great promise as a routine method for characterizing O-linked sites because it requires no sophisticated mass-spectrometer systems fitted with specialized dissociation methods such as electron-transfer dissociation (ETD). Those are otherwise essential for site-specific analysis of large glycopeptides containing multiple glycosylation sites, such as those within the hinge region of etanercept (18).
Many emerging protein drugs are inherently complex glycoproteins that often contain multiple N- and O-linked glycosylation sites. Site-specific elucidation of N-linked glycosylation on such biotherapeutics has become more routine in recent years. However, O-linked glycosite characterization is still somewhat of a challenge that mostly requires skilled expertise in academic or research and development laboratories. Development of more streamlined methods such as glycopeptide enrichment strategies and nonspecific protease digestions has paralleled the introduction of increasingly sophisticated software and powerful computers for glycopeptide identification. In the near future, the complexity of these types of biotherapeutics may be deciphered more rapidly and routinely in contract research/manufacturing and biotechnology laboratories.
Elaine Stephens is associate director of protein sciences and head of mass spectrometry at Blue Stream Laboratories, 763 Concord Avenue, Building E, Cambridge, MA 02138; 1-617-234-0001; www.bluestreamlabs.com.