Exogenous Protein Toxicity & Antinutrient Analysis
1. Background
This protocol provides a national standard-compliant bioinformatics assessment service for the potential toxicity and anti-nutritional properties of heterologous gene expression products in genetically modified microorganisms (including bacteria and fungi) intended for food processing. The protocol strictly adheres to the national standard "Bioinformatics Methods for Amino Acid Sequence Similarity Analysis between Exogenous Proteins and Toxic Proteins and Anti-nutritional Factors in the Food Safety Testing of Transgenic Organisms and Their Products" (Ministry of Agriculture Announcement No. 2630-16-2017) [1], providing critical scientific evidence for the food safety assessment of genetically modified microorganisms.
By performing sequence similarity alignment between the amino acid sequence of an exogenous protein and known toxic proteins and anti-nutritional factors in internationally authoritative protein databases, this protocol rapidly and accurately identifies potential biosafety risks. It constitutes an indispensable first step in the safety evaluation of transgenic microorganisms. Where significant similarity is detected, subsequent toxicological or nutritional experimental verification is warranted.
2. Technical Principles and Analysis Pipeline
The analysis follows a straightforward workflow comprising database selection, sequence alignment, and result interpretation.

Figure 1. Analysis Workflow Diagram
The analysis begins with the query protein sequence provided by the client, which is aligned against keyword-filtered public databases using the BLASTp program. Results are then evaluated against an E-value threshold to determine whether significant similarity exists.
2.1 Databases and Software
(1) Database: The analysis uses the NCBI non-redundant protein (nr) database and the UniProt database (including Swiss-Prot and TrEMBL), as required by the national standard [1].
(2) Keyword Filtering: The following keywords are used to filter the databases for targeted searches:toxin, toxic, antinutrient, anti-nutritional, protease inhibitor, trypsin inhibitor, agglutinin.
(3) Software: The core alignment tool is the BLASTp algorithm (version 2.13.0+), optimized for protein-to-protein sequence alignment [2].
3. Technical Advantages
Advantage | Core Value | Description |
National Standard Compliance | Regulatory recognition, authoritative reliability | Strictly follows the Ministry of Agriculture Announcement No. 2630-16-2017 national standard; assessment workflow and decision criteria meet regulatory requirements and are recognized by NMPA, the Ministry of Agriculture and Rural Affairs, and other domestic and international regulatory authorities. |
Authoritative Databases | Comprehensive coverage, continuously updated | Utilizes two internationally leading databases - NCBI nr (250 million+ sequences) and UniProt (230 million+ sequences) - covering 500,000-1,000,000 toxic protein and anti-nutritional factor entries, updated quarterly. |
High-sensitivity Algorithm | Precise identification, statistically rigorous | Employs the BLASTp algorithm (cited 100,000+ times), quantifying similarity significance through an E-value (<=0.01) statistical model that balances high sensitivity with a low false-positive rate. |
Expert Review | Eliminates false positives, ensures accuracy | A multidisciplinary expert team (bioinformatics + toxicology + protein chemistry) manually reviews all positive results, combining functional annotation and literature research to eliminate false positives. |
Risk Stratification | Scientific decision-making, precise guidance | Risk is classified into three tiers (high/medium/low), with specific experimental validation recommendations (study type, endpoints, dose setting, etc.) provided for each tier, avoiding unnecessary trials. |
Data Traceability | GLP-compliant, long-term archiving | Complete retention of raw data, database versions, and expert review records, compliant with 21 CFR Part 11 requirements, archived for >=5 years to support regulatory inspection and quality traceability. |
4. Application Scenarios
Application Scenario | Typical Applications | Regulatory Requirements / Research Value |
Genetically Modified Microorganism Food Safety | Exogenous proteins expressed by engineered bacteria, yeast, or fungi for food fermentation, enzyme preparation production, and nutritional fortification. | Complies with the Food Safety Law and national standards, providing scientific evidence for safety assessment. |
Novel Food Ingredient Registration | Novel proteins including microbial fermentation proteins, single-cell proteins, and recombinant protein nutritional supplements. | Satisfies the National Health Commission requirements for novel food ingredient registration. |
Transgenic Crop Safety Assessment | Exogenous proteins expressed in insect-resistant and herbicide-tolerant crops (Bt proteins, EPSPS enzymes, etc.). | Complies with the Ministry of Agriculture and Rural Affairs "Administrative Measures for the Safety Assessment of Agricultural GMOs". |
Enzyme Preparations and Food Additives | Industrial enzyme preparations (amylases, proteases, lipases, etc.) and other food additives. | Satisfies the requirements of GB 2760 "National Food Safety Standard - Standards for Uses of Food Additives". |
Functional Food Development | Probiotic fermented products, protein-fortified foods, and nutritional supplements (lactoferrin, lysozyme, etc.). | Safeguards consumer health and satisfies product labeling requirements. |
Biopharmaceutical IND Filing | Exogenous proteins in recombinant protein drugs, gene therapy vectors, and cell therapy products. | Complies with the NMPA "Good Laboratory Practice for Non-clinical Laboratory Studies" (GLP). |
Protein Engineering R&D | Early-stage toxicity risk screening when designing novel enzymes, antibodies, and fusion proteins. | Avoids the introduction of toxic sequences, guides sequence optimization, and reduces the risk of R&D failure. |
5. Sample Report
The primary criterion for risk assessment is the Expect value (E-value). The E-value reflects the statistical significance of an alignment result; a lower E-value indicates a more significant, non-random match.
(1) No Significant Similarity (E-value > 0.01): If no alignment with an E-value less than or equal to 0.01 is identified, the query protein is considered to have no significant similarity to known toxic or anti-nutritional proteins, indicating a low biosafety risk.
(2) Significant Similarity Detected (E-value <= 0.01): If an alignment with an E-value less than or equal to 0.01 is identified, the query protein is considered to exhibit significant similarity. Such results require manual review by bioinformatics experts to exclude false positives (e.g., similarity to toxin receptors or resistance proteins), and may require further verification through toxicological experiments.

Figure 2. Example Results
This figure demonstrates how results are classified according to the E-value threshold. Positive matches (E-value <= 0.01) require further investigation.
6. Service Contents and Sample Requirements
6.1 Service Contents
We provide a one-stop service from project consultation to final report delivery, ensuring smooth project execution.
Service Item | Service Content |
Project Consultation | Senior technical experts assist in understanding regulatory requirements and defining the analysis plan and information needs. |
Data Analysis | Execution of the standardized bioinformatics analysis pipeline described above. |
Report Delivery | Delivery of a comprehensive PDF report and complete analysis result files within the committed turnaround time (typically10business days). |
After-sales Support | Professional report interpretation and ongoing technical consultation. |
6.2 Sample and Information Requirements
Accurate analysis depends on high-quality sequence data and complete information. Please prepare strictly according to the following requirements:
Requirement Category | Item | Specific Requirements |
Required Information | Protein Sequence | Provide amino acid sequences in FASTA format. Each sequence must have a unique and identifiable ID. |
Sequence Origin | Clearly state the origin of each protein (e.g., host strain, inserted gene, etc.). | |
Project Background | Briefly describe the intended application of the protein to be analyzed (e.g., food processing, feed additives, etc.). | |
Turnaround Time | 10 business days |
7. References
[1] Ministry of Agriculture and Rural Affairs of the People's Republic of China. (2017). Ministry of Agriculture Announcement No. 2630-16-2017: Bioinformatics Methods for Amino Acid Sequence Similarity Analysis between Exogenous Proteins and Allergenic Proteins in the Food Safety Testing of Transgenic Organisms and Their Products.
[2] Altschul, S. F., Gish, W., Miller, W., Myers, E. W., & Lipman, D. J. (1990). Basic local alignment search tool. Journal of molecular biology, 215(3), 403-410.