Location:
Exogenous Protein Toxicity & Antinutrient Analysis

Exogenous Protein Toxicity & Antinutrient Analysis

Exogenous Protein Toxicity & Antinutrient Analysis

1. Background

This protocol provides a national standard-compliant bioinformatics assessment service for the potential toxicity and anti-nutritional properties of heterologous gene expression products in genetically modified microorganisms (including bacteria and fungi) intended for food processing. The protocol strictly adheres to the national standard "Bioinformatics Methods for Amino Acid Sequence Similarity Analysis between Exogenous Proteins and Toxic Proteins and Anti-nutritional Factors in the Food Safety Testing of Transgenic Organisms and Their Products" (Ministry of Agriculture Announcement No. 2630-16-2017) [1], providing critical scientific evidence for the food safety assessment of genetically modified microorganisms.

By performing sequence similarity alignment between the amino acid sequence of an exogenous protein and known toxic proteins and anti-nutritional factors in internationally authoritative protein databases, this protocol rapidly and accurately identifies potential biosafety risks. It constitutes an indispensable first step in the safety evaluation of transgenic microorganisms. Where significant similarity is detected, subsequent toxicological or nutritional experimental verification is warranted.

2. Technical Principles and Analysis Pipeline

The analysis follows a straightforward workflow comprising database selection, sequence alignment, and result interpretation.

Figure 1. Analysis Workflow Diagram

The analysis begins with the query protein sequence provided by the client, which is aligned against keyword-filtered public databases using the BLASTp program. Results are then evaluated against an E-value threshold to determine whether significant similarity exists.

2.1 Databases and Software

(1) Database: The analysis uses the NCBI non-redundant protein (nr) database and the UniProt database (including Swiss-Prot and TrEMBL), as required by the national standard [1].

(2) Keyword Filtering: The following keywords are used to filter the databases for targeted searches:toxin, toxic, antinutrient, anti-nutritional, protease inhibitor, trypsin inhibitor, agglutinin.

(3) Software: The core alignment tool is the BLASTp algorithm (version 2.13.0+), optimized for protein-to-protein sequence alignment [2].

3. Technical Advantages

Advantage

Core Value

Description

National Standard Compliance

Regulatory recognition, authoritative reliability

Strictly follows the Ministry of Agriculture Announcement No. 2630-16-2017 national standard; assessment workflow and decision criteria meet regulatory requirements and are recognized by NMPA, the Ministry of Agriculture and Rural Affairs, and other domestic and international regulatory authorities.

Authoritative Databases

Comprehensive coverage, continuously updated

Utilizes two internationally leading databases - NCBI nr (250 million+ sequences) and UniProt (230 million+ sequences) - covering 500,000-1,000,000 toxic protein and anti-nutritional factor entries, updated quarterly.

High-sensitivity Algorithm

Precise identification, statistically rigorous

Employs the BLASTp algorithm (cited 100,000+ times), quantifying similarity significance through an E-value (<=0.01) statistical model that balances high sensitivity with a low false-positive rate.

Expert Review

Eliminates false positives, ensures accuracy

A multidisciplinary expert team (bioinformatics + toxicology + protein chemistry) manually reviews all positive results, combining functional annotation and literature research to eliminate false positives.

Risk Stratification

Scientific decision-making, precise guidance

Risk is classified into three tiers (high/medium/low), with specific experimental validation recommendations (study type, endpoints, dose setting, etc.) provided for each tier, avoiding unnecessary trials.

Data Traceability

GLP-compliant, long-term archiving

Complete retention of raw data, database versions, and expert review records, compliant with 21 CFR Part 11 requirements, archived for >=5 years to support regulatory inspection and quality traceability.

4. Application Scenarios

Application Scenario

Typical Applications

Regulatory Requirements / Research Value

Genetically Modified Microorganism Food Safety

Exogenous proteins expressed by engineered bacteria, yeast, or fungi for food fermentation, enzyme preparation production, and nutritional fortification.

Complies with the Food Safety Law and national standards, providing scientific evidence for safety assessment.

Novel Food Ingredient Registration

Novel proteins including microbial fermentation proteins, single-cell proteins, and recombinant protein nutritional supplements.

Satisfies the National Health Commission requirements for novel food ingredient registration.

Transgenic Crop Safety Assessment

Exogenous proteins expressed in insect-resistant and herbicide-tolerant crops (Bt proteins, EPSPS enzymes, etc.).

Complies with the Ministry of Agriculture and Rural Affairs "Administrative Measures for the Safety Assessment of Agricultural GMOs".

Enzyme Preparations and Food Additives

Industrial enzyme preparations (amylases, proteases, lipases, etc.) and other food additives.

Satisfies the requirements of GB 2760 "National Food Safety Standard - Standards for Uses of Food Additives".

Functional Food Development

Probiotic fermented products, protein-fortified foods, and nutritional supplements (lactoferrin, lysozyme, etc.).

Safeguards consumer health and satisfies product labeling requirements.

Biopharmaceutical IND Filing

Exogenous proteins in recombinant protein drugs, gene therapy vectors, and cell therapy products.

Complies with the NMPA "Good Laboratory Practice for Non-clinical Laboratory Studies" (GLP).

Protein Engineering R&D

Early-stage toxicity risk screening when designing novel enzymes, antibodies, and fusion proteins.

Avoids the introduction of toxic sequences, guides sequence optimization, and reduces the risk of R&D failure.

5. Sample Report

The primary criterion for risk assessment is the Expect value (E-value). The E-value reflects the statistical significance of an alignment result; a lower E-value indicates a more significant, non-random match.

(1) No Significant Similarity (E-value > 0.01): If no alignment with an E-value less than or equal to 0.01 is identified, the query protein is considered to have no significant similarity to known toxic or anti-nutritional proteins, indicating a low biosafety risk.

(2) Significant Similarity Detected (E-value <= 0.01): If an alignment with an E-value less than or equal to 0.01 is identified, the query protein is considered to exhibit significant similarity. Such results require manual review by bioinformatics experts to exclude false positives (e.g., similarity to toxin receptors or resistance proteins), and may require further verification through toxicological experiments.

Figure 2. Example Results

This figure demonstrates how results are classified according to the E-value threshold. Positive matches (E-value <= 0.01) require further investigation.

6. Service Contents and Sample Requirements

6.1 Service Contents

We provide a one-stop service from project consultation to final report delivery, ensuring smooth project execution.

Service Item

Service Content

Project Consultation

Senior technical experts assist in understanding regulatory requirements and defining the analysis plan and information needs.

Data Analysis

Execution of the standardized bioinformatics analysis pipeline described above.

Report Delivery

Delivery of a comprehensive PDF report and complete analysis result files within the committed turnaround time (typically10business days).

After-sales Support

Professional report interpretation and ongoing technical consultation.

6.2 Sample and Information Requirements

Accurate analysis depends on high-quality sequence data and complete information. Please prepare strictly according to the following requirements:

Requirement Category

Item

Specific Requirements




Required Information

Protein Sequence

Provide amino acid sequences in FASTA format. Each sequence must have a unique and identifiable ID.

Sequence Origin

Clearly state the origin of each protein (e.g., host strain, inserted gene, etc.).

Project Background

Briefly describe the intended application of the protein to be analyzed (e.g., food processing, feed additives, etc.).

Turnaround Time

10 business days

7. References

[1] Ministry of Agriculture and Rural Affairs of the People's Republic of China. (2017). Ministry of Agriculture Announcement No. 2630-16-2017: Bioinformatics Methods for Amino Acid Sequence Similarity Analysis between Exogenous Proteins and Allergenic Proteins in the Food Safety Testing of Transgenic Organisms and Their Products.

[2] Altschul, S. F., Gish, W., Miller, W., Myers, E. W., & Lipman, D. J. (1990). Basic local alignment search tool. Journal of molecular biology, 215(3), 403-410.