Regulations require that biomanufacturers assess the intactness of protein and glycoprotein products as well as confirm the terminal sequences to look for existing variations. ICH Q6B guideline section 6.1.1 c states:
Terminal amino acid analysis is performed to identify the nature and homogeneity of the amino- and carboxy-terminal amino acids. If the desired product is found to be heterogeneous with respect to the terminal amino acids, the relative amounts of the variant forms should be determined using an appropriate analytical procedure. The sequence of these terminal amino acids should be compared with the terminal amino acid sequence deduced from the gene sequence of the desired product.
BioPharmaSpec’s protein characterization services include
- intact molecular-weight analysis using online ultraperformance liquid chromatography (UPLC) with electrospray ionization mass spectrometry (UPLC/ES-MS) to assess intactness
- N- and C-terminal sequencing services to assess intactness of a product and determine amino acids at the respective termini of a protein/glycoprotein product.
Protein Terminal Structure
Proteins comprise linear chains of amino acids linked to one another through amide bonds. Each amino acid has an amine functional group at one end and a carboxylic acid functional group at the other. Linking the carboxylic acid group of one amino acid with the amine group of another produces this amide bond. This process is controlled by the cells’ protein translational machinery and results in the conversion of mRNA to protein.
This sequential linking of amine and carboxylic acids through the length of a protein chain means that once a protein chain is completed, one end of the protein will have a free amine group, and the other end will have a free carboxylic acid group. The free amine end of the chain is called the N-terminus or amino terminus. The free carboxylic acid end is called the C-terminus or carboxyl terminus. Because these two protein termini are chemically different, they naturally will have different chemical properties.
That fact allows use of specific protein chemical procedures for sequencing proteins by means of the termini.
Terminal Amino Acid Sequencing
Not all proteins are composed of a single chain. Monoclonal antibodies (MAbs) comprise four protein chains: usually two identical heavy chains and two identical light chains. Therefore, antibodies have two different N-termini and two different C-termini. For successful sequencing, the chains of multichain proteins should be separated to prevent ambiguity in the data generated. If a molecule shows posttranslational processing resulting in ragged termini, more than one terminal amino acid sequence will be detected. Those data ultimately can be used as indications of a molecule’s intactness.
BioPharmaSpec provides an N-terminal sequencing service (also known as gas- phase sequencing, Edman sequencing, and Edman degradation) using Shimadzu instrumentation for automated N-terminal sequencing. To define the protein sequence of the N-terminus, BioPharmaSpec scientists use the following N-terminal sequencing protocol:
- Multichain proteins or protein mixtures are separated using liquid chromatography or sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) blotted onto PVDF membranes and stained with Ponceau red.
- Pure or single-chain proteins are sequenced directly from the PVDF membrane.
- Blotted samples are derivatized with phenyl isothiocyanate.
- Derivatized N-terminal amino acid is cleaved from the protein backbone.
- The derivatized/released amino acid is analyzed using liquid chromatography and identified by elution position determined using a mixture of derivatized amino acids.
Edman chemistry sequentially removes amino acids from the N-terminus of a protein, provided that the N-terminus is present as a primary amine. If the N-terminus has been modified (and thus the amine functional group has been altered) because of, for example, pyroglutamination or acetylation, the initial derivatization with phenyl isothiocyanate will not work. Pyroglutamination is a common N-terminal modification of light and/or heavy chains of MAbs. Before sequencing, it can be removed from the N-terminus using pyroglutaminase from amino acid residue of the respective chains.
BioPharmaSpec has developed methods for dePEGylating proteins that are PEGylated at the N-terminus before analysis to allow N-terminal sequencing of blocked N-termini.
An important application of N-terminal sequencing is the unambiguous differentiation of the isobaric amino acids isoleucine and leucine in a protein sequence. MS-based sequencing cannot differentiate those two amino acids because they have identical mass. If the full sequence of a protein needs to be confirmed, N-terminal sequencing can be used to analyze purified peptides containing leucine or isoleucine and allow determination of a sequence to the fullest extent possible. That is required to demonstrate that the primary amino acid sequence of a biosimilar is identical to the reference/innovator sequence (per EMA and US FDA biosimilar guidelines).
There is no fully analogous, reliable method akin to Edman chemistry for determining the C-terminal sequence of a biopharmaceutical. To assess the C-termini, BioPharmaSpec scientists use carboxypeptidase digestion or mass spectrometric mapping strategies. In the latter approach, intactness of a protein C-terminus or presence of ragged ends can be assessed using data obtained from intact molecular-weight analysis and peptide mapping. That includes generation of confirmatory peptide fragment ions to confirm the nature of C-terminal peptide or peptides.
Figure 1 shows the C-terminal peptide (minus lysine) of the heavy chain of a MAb as identified in a peptide map (signal at 6.91 minutes) and the high-energy fragmentation of that peptide. Masses shown are consistent with the expected fragmentation pattern of that peptide. Because this peptide contains leucine residues, N-terminal sequencing of the collected peptide would need to be performed as described above for a definitive differentiation between leucine and isoleucine in this peptide.
Intact Molecular-Weight Analysis
Intact molecular-weight analysis of a drug product can be used as an orthogonal procedure to support conclusions regarding the structures of N- and C-termini and intactness. MAbs are examples of proteins in which C-terminal sequence variation is observed. In this case, C-terminal lysine residues of the heavy chains, which usually are removed in native antibodies, can be present to an extent in recombinant MAbs, resulting in variability of the C-terminus of the heavy chains. In such cases, a combination of intact molecular-weight analysis of released, de–N-glycosylated heavy chain to assess the drug product for the with-lysine and without-lysine forms (a 128-Da mass difference) and molecular-weight analysis as well as sequencing of the C-terminal peptides using peptide mapping can be used to confirm C-terminal heterogeneity. This combined analysis also enables an assessment of relative amounts of modification for batch-to-batch and biosimilar-to-reference/innovator comparisons.
A Complete Assessment
Several techniques are required to provide a full assessment of the intactness of a protein/glycoprotein biopharmaceutical product. Automated N-terminal sequencing, intact molecular-weight analysis (often following deglycosylation to allow analysis of the protein backbone without data-complicating heterogeneity), peptide mapping, and mass spectrometric sequence analysis of the terminal peptides (both the N- and C-termini) all must be performed to produce a complete assessment. Data from these methods provide an overall understanding of protein structure and enable a comparative quantitation of modifications at either terminus.
Dr. Andrew J. Reason is CEO and MD at BioPharmaSpec (email@example.com). Dr. Richard L. Easton is technical director at BioPharmaSpec (firstname.lastname@example.org).