GMM WGS Safety Assessment
1. Background
With the rapid advancement of biotechnology,,genetically modified microorganisms (GMMs) have demonstrated enormous application potential in the food industry, pharmaceutical manufacturing, and environmental protection. However,,safety remains the fundamental prerequisite for all applications. To safeguard public health and ecological security,,regulatory authorities worldwide have established rigorous requirements for GMM safety assessment. Among these,,providing complete and accurate whole genome sequence information,and conducting comprehensive genetic stability and biosafety risk analysis based thereon,constitute the core components of the assessment process.
The Food Safety Law and regulatory documents such as the "Requirements for Submission Materials for the Safety Assessment of Genetically Modified Microorganisms Used in Food Processing" issued by the National Health Commission of China explicitly stipulate,that applicants must provide detailed genomic data for the production strain,,including the integration sites, copy number, and genetic stability of exogenous genes,,as well as a systematic assessment of the strain's pathogenicity, antimicrobial resistance, and toxin-producing capability [1]. Whole Genome Sequencing (WGS,) is the preferred method for obtaining this information,,providing the most fundamental and comprehensive view of all genome-level changes potentially introduced by genetic modification.
This protocol is developed precisely against this regulatory background and technical requirement,,offering a tailored one-stop solution for genetically modified bacteria and fungi used in food processing, covering high-depth whole genome sequencing, comprehensive bioinformatics analysis,,and regulation-oriented safety assessment. The protocol explicitly acknowledges the significant differences between bacteria (prokaryotes) and fungi (eukaryotes) in genome structure, gene regulation, and potential risks,””,employing differentiated analytical strategies and specialized databases,to ensure the scientific rigor, stringency, and professional quality of assessment results,,providing the most compelling data support for your product safety application.
2. Technical Principles and Analysis Pipeline
This protocol adopts a“"Sequence - Assemble - Annotate - Screen"”four-in-one integrated analytical strategy,,ensuring seamless integration and deep exploitation of data production and downstream analysis. We employ the industry-recognized“NGS+TGS”2.1 "NGS + TGS" Hybrid Sequencing Strategy,to obtain the highest-quality genome,,and have designed dedicated analysis pipelines tailored to the biological characteristics of bacteria and fungi.,
2.1 "NGS + TGS" Hybrid Sequencing Strategy
To simultaneously achieve both high accuracy and high completeness in genome sequencing,,we employ a hybrid sequencing strategy combining“next-generation sequencing (NGS)(NGS)”and third-generation sequencing (TGS)“(TGS)”2.1 "NGS + TGS" Hybrid Sequencing Strategy:
(1) next-generation sequencing (NGS) (Next-Generation Sequencing,): Exemplified by the Illumina NovaSeq platform,,NGS achieves high-depth (>=100x) coverage of massive short reads (~150 bp),,providing extremely high per-base accuracy (>99.9%). This is critical for precise identification of single-nucleotide variants (SNVs) and small insertions/deletions (InDels),and serves as the cornerstone for ensuring the accuracy of the final genome sequence.
(2) Third-Generation Sequencing (TGS,): Exemplified by PacBio HiFi or ONT platforms,,the core advantage of TGS lies in its ultra-long reads (averaging 15-20 kb,or longer). Long reads effortlessly span complex genomic structures including high-repeat regions and high-GC/AT regions,,and are key to resolving assembly fragmentation and obtaining complete chromosome and plasmid sequences.
Using professional hybrid assembly software (e.g., Unicycler),,TGS long reads serve as the genomic“"scaffold"”,which is then polished and error-corrected using high-accuracy NGS short reads,“”“”,ultimately producing a reference genome sequence that is both complete (near-finished quality) and accurate (error rate below 1 in 100,000).

Figure 1. NGS and TGS Sequencing Strategies((Complementary Advantages))
2.2 Detection Principle and Mechanism for Genetically Modified Microorganisms
Whole genome sequencing technology comprehensively resolves the genomic characteristics of genetically modified microorganisms at single-base resolution. By integrating NGS and TGS data for hybrid assembly,,we obtain high-quality complete genome sequences,,enabling three core analyses: exogenous gene detection precisely locates all introduced foreign DNA fragments, vector sequences, and their integration sites; genetic stability analysis monitors unexpected mutations, rearrangements, or sequence loss by comparing genomes of strains from different passages or batches;,and safety assessment systematically screens for potential virulence factors, antimicrobial resistance genes, pathogenicity-related genes, and toxin biosynthesis genes using multiple internationally authoritative databases,,and safety assessment systematically screens for potential virulence factors, antimicrobial resistance genes, pathogenicity-related genes, and toxin biosynthesis genes using multiple internationally authoritative databases,,thereby providing comprehensive and reliable scientific evidence for the biosafety of the strain.

Figure 2. Detection Mechanism for Genetically Modified Microorganisms
2.3 Bacterial Whole Genome Sequencing and Safety Analysis Pipeline
Tailored to the characteristics of bacterial genomes - relatively small, structurally compact, and frequently harboring plasmids -,we have designed the following efficient and precise analysis pipeline,aimed at rapidly generating a complete genomic map,and conducting a comprehensive safety assessment.

Figure 3. Bacterial Whole Genome Sequencing,Assembly,and Safety Analysis Pipeline
2.4 Fungal Whole Genome Sequencing and Safety Analysis Pipeline
Given that fungi, as eukaryotes,,typically have genomes much larger than bacteria,,contain introns, have a high proportion of repetitive sequences,,and possess linear chromosomes,,we employ a hybrid sequencing strategy combining NGS and TGS to achieve complementary advantages:comprehensiveanalytical strategy,to ensure the generation of high-quality genomes and precise annotation data.

Figure 4. Fungal Whole Genome Sequencing,Assembly,and Safety Analysis Pipeline
3. Technical Advantages
Our protocol integrates state-of-the-art sequencing technology and deep bioinformatics analysis,,aiming to deliver the most reliable and comprehensive GMM safety assessment service. The core advantages are as follows:
Advantage | Description |
Service | We provide a fully integrated service covering experimental design, sample processing, high-throughput sequencing, and deep bioinformatics analysis,,through to regulatory-compliant report delivery,,eliminating the need to coordinate multiple vendors,and saving both time and effort,,ensuring a smooth project workflow and consistent data quality. |
Optimal Strategy,,Reliable Data | Employs the industry-recognized“NGS+TGS”"NGS + TGS" hybrid assembly strategy,,combining the high accuracy of NGS and the long-read advantage of TGS,to maximize the probability of achieving“"finished-quality"”genome assemblies. This provides the most solid and reliable data foundation for all downstream analyses,,especially for precise localization of exogenous genes and genetic stability assessment., |
Species-specific Analysis, Greater Professionalism,Greater Professionalism | We have a deep understanding of the profound differences between bacteria (prokaryotes) and fungi (eukaryotes) in genome structure, gene regulation, and biological function. Accordingly,,at every critical analytical step including gene prediction, functional annotation, and safety screening,,we consistently employ distinct, more targeted analytical tools and specialized databases,to ensure the professionalism, accuracy, and depth of analytical results. |
Regulation-aligned,,Hassle-free Filing | Our protocol design, analytical content, and report format,strictly follow regulatory documents including the "Requirements for Submission Materials for the Safety Assessment of GMMs Used in Food Processing" issued by the National Health Commission. The delivered comprehensive report and complete raw data,can be directly used as core submission materials,,powerfully supporting your product safety approval and market authorization process. |
Comprehensive and In-depth Safety Screening | Beyond general functional databases,,we integrate multiple internationally authoritative, species-specific professional safety databases (e.g., VFDB,, CARD, PHI-base,CARD, PHI-base, Mycotoxin DB, etc.),to comprehensively screen for virulence, antimicrobial resistance, pathogenicity, and toxin-related risk genes,,providing rigorous decision criteria and detailed alignment evidence,to ensure no blind spots in safety assessment. |
4. Application Scenarios
The comprehensive, in-depth whole genome sequencing and safety analysis services provided by this protocol,can be broadly applied across multiple critical stages of GMM research and development:
4.1 Food and Pharmaceutical Product IND/NDA Filing
To support marketing authorization applications for GMMs used in food processing, microecological preparations, live biotherapeutic products, and other GMM-derived products,,the most essential whole-genome and safety assessment data package that meets national regulatory requirements is provided. A scientifically sound, rigorous, and comprehensive genome analysis report is key to obtaining regulatory approval.
4.2 Strain Selection and Optimization
During the R&D phase of strain engineering,,parallel comparative analysis of candidate strains with different engineering strategies, integration sites, and expression elements,enables rapid and accurate selection of the optimal engineered strain combining high target product expression efficiency with high biosafety,,greatly accelerating the R&D process.
4.3 Manufacturing Process Quality Control (QC)
As a critical step in production strain documentation and characterization of the Master Cell Bank (MCB) and Working Cell Bank (WCB). Through periodic whole genome sequencing of the production strain,,genetic stability during passaging and fermentation is monitored,,ensuring that no unexpected gene mutations, sequence loss, or contamination has occurred,and guaranteeing consistent and controllable quality and safety for every production batch.
4.4 Basic Scientific Research
In-depth investigation of the specific effects of genetic modification on microbial physiology and metabolism, environmental adaptability, and host interactions. For example,,precisely characterizing exogenous gene integration sites and their effects on neighboring gene expression,,or discovering unexpected genome-wide”"off-target"”effects of genetic engineering,to provide valuable data for deepening understanding of gene function and optimizing gene editing tools.
4.5 Intellectual Property Protection
A complete and accurate whole genome sequence of the strain constitutes the most compelling evidence of its uniqueness and novelty,,serving as critical legal evidence for applying for strain-related patents and protecting core intellectual property.
5. Sample Report
To visually demonstrate the analytical depth and report quality,,the following presents simulated core results from a "Whole Genome Sequencing and Safety Analysis Report" for both a bacterium and a fungus. All data are simulated,and provided solely to illustrate the analytical content and figure style.
5.1 Bacterial Whole Genome Sequencing and Safety Analysis Example
5.1.1 Data Quality Control and Genome Assembly
Sample Name | Sequencing Platform | Raw Data(Gb) | Clean Data (Gb) | Genome Size (Mb) | Contig N50 (Kb) | GC Content (%) | Completeness (BUSCO) |
E.coli-modified | Illumina + PacBio | 1.5 + 2.0 | 1.45 + 1.98 | 5.12 | 4,800 | 50.8 | 99.5% |

Figure 5. Circos Plot of the Modified E. coli Genome
5.1.2 Safety Screening Results
Following rigorous alignment against VFDB,, CARD,CARD, and PATRIC databases,,several genes associated with virulence or antimicrobial resistance were identified in the modified strain.,The detailed list is provided below. These genes are native to the host strain,and were not exogenously introduced;,their risk should be evaluated in conjunction with the specific intended use of the strain and applicable exemption lists.
Gene Name | Functional Description | Database | Identity (%) | Coverage (%) | Risk Assessment |
ompA | Outer membrane protein A,,involved in adhesion | VFDB | 99.5 | 100 | Low risk |
acrB | Multidrug efflux pump | CARD | 100 | 100 | Attention required |
gad | Glutamate decarboxylase,,acid resistance | VFDB | 100 | 100 | Low risk |

Figure 6. Classification Statistics of Bacterial Safety Risk Genes
5.1.3 Genetic Stability Assessment
To evaluate the genetic stability of the genetically modified strain during serial passaging,,we performed whole genome comparative analysis of the original strain (Generation 1,,G1) and the strain after multiple passages (Generation 5,,G5). Results showed that,after 5 generations of serial passaging,,genome sequence identity reached 99.9998%,,with only 2 single-nucleotide polymorphisms (SNPs) and 1 small insertion/deletion (InDel) detected,,and no structural variants or exogenous gene loss observed. This demonstrates that the genetically modified strain possesses high genetic stability,and is suitable for industrial-scale production.

Figure 7. Bacterial Genetic Stability Assessment - Whole Genome Comparison of Generation 1 vs. Generation 5
5.2 Fungal Whole Genome Sequencing and Safety Analysis Example
5.2.1 Data Quality Control and Genome Assembly
Sample Name | Sequencing Platform | Raw Data (Gb) | Clean Data (Gb) | Genome Size (Mb) | Contig N50 (Mb) | GC Content (%) | Completeness (BUSCO) |
A.niger-modified | Illumina + PacBio | 3.5 + 4.0 | 3.41 + 3.95 | 33.9 | 3.8 | 48.5 | 98.9% |

Figure 8. Circos Plot of the Modified A. niger Genome, Showing Gene Distribution and GC Content Across 10 Chromosomes
5.2.2 Safety Screening Results
Following comprehensive safety screening of the modified A. niger strain genome,,we focused on pathogenicity-related genes and the biosynthetic potential for secondary metabolites.
Gene Name | Functional Description | Database | Identity (%) | Coverage (%) | Risk Assessment |
AN01G01234 | PKS-NRPS hybrid gene cluster | antiSMASH | 85 | 70 | Attention required (potential unknown secondary metabolites) |
AN02G05678 | ABC transporter | CARD | 92 | 88 | Attention required (possibly associated with drug resistance) |
pacC | pH signaling pathway transcription factor | PHI-base | 99 | 100 | Low risk |

Figure 9. Classification Statistics of Fungal Safety Risk Genes
5.2.3 Genetic Stability Assessment
For the genetically modified fungal strain,,we similarly performed whole genome comparative analysis of Generation 1 (G1) vs. Generation 5 (G5). Given the larger and more structurally complex fungal genome,,we focused particularly on the integrity of all 10 chromosomes and the stability of exogenous gene integration sites. Results showed that,after 5 generations of serial passaging,,genome sequence identity was 99.99%,,with 3 SNPs, 2 InDels, and 1 small structural variant (~200 bp segment inversion) detected. All variants were located in non-coding regions or low-expression gene regions,,and the exogenous genes and their integration sites remained completely stable. The overall assessment concludes that this strain possesses satisfactory genetic stability.

Figure 10. Fungal Genetic Stability Assessment - Comparative Analysis of Generation 1 vs. Generation 5 Genomes Across 10 Chromosomes
6. Service Contents and Sample Requirements
We provide a one-stop service from experimental design consultation to final report delivery,,ensuring smooth project execution,with differentiated sample requirements tailored to the characteristics of bacteria and fungi.
6.1 Service Contents
Service Item | Service Content |
Project Consultation | Senior technical experts assist in designing a rigorous experimental plan,,defining sample and information requirements,,and customizing analytical content according to your specific needs. |
Sample Testing | Standardized sample receipt and quality inspection process,,using optimized DNA extraction protocols,to ensure high-quality genomic DNA for downstream sequencing. |
Sequencing and Assembly | Execution of the“NGS+TGS”2.1 "NGS + TGS" Hybrid Sequencing Strategy,followed by professional hybrid assembly,to deliver chromosome-level genome sequences. |
Data Analysis | Execution of the differentiated advanced bioinformatics analysis pipeline described above, tailored for bacteria or fungi,,including genome annotation, exogenous gene analysis, genetic stability assessment, and comprehensive safety screening. |
Report Delivery | Delivery of a comprehensive PDF report and complete analysis result files within the committed turnaround time (typically 30-40 business days),,with report content and format compliant with regulatory submission requirements. |
After-sales Support | Professional report interpretation, data query support, and ongoing technical consultation services,to help clients better understand and apply the analysis results. |
6.2 Sample and Information Requirements
Accurate analysis depends on high-quality samples and complete information. Please prepare strictly according to the following requirements:
Requirement Category | Item | Bacteria | Fungi |
Genomic DNA (gDNA) | Total amount | >= 2 ug (Qubit quantification) | >= 5 ug (Qubit quantification) |
Concentration | ≥ 50 ng/µL | ≥ 100 ng/µL | |
Purity | OD260/280 = 1.8-2.0,No RNA contamination | OD260/280 = 1.8-2.0,No RNA contamination | |
Cell/Biomass Sample | Wet weight | ≥ 200 mg | ≥ 500 mg |
Condition | Freshly cultured cell pellet | Freshly cultured mycelium or spores | |
Storage | -80 degrees C or liquid nitrogen | -80 degrees C or liquid nitrogen | |
Required Information | Strain Information | Detailed strain name, origin, culture conditions, and Gram staining result (for bacteria), etc. | |
Genetic Modification Information | Detailed parental strain information, vector map, sequences of all exogenously introduced genes, and functional descriptions. | ||
Sample Information | Clear sample identifiers,and explicit pairing information between genetically modified groups and control groups (parental strains). | ||
Turnaround Time | 30-40 business days | ||
7. References
[1] Standing Committee of the National People's Congress of the People's Republic of China. (2021). Food Safety Law of the People's Republic of China (2021 Amendment). Beijing: Law Press China.
[2] China National Center for Food Safety Risk Assessment. (2024). Requirements for Submission Materials for the Safety Assessment of Genetically Modified Microorganisms Used in Food Processing (Trial). Retrieved from https://www.cfsa.net.cn/
[5] Codex Alimentarius Commission. (2003). *Guideline for the conduct of food safety assessment of foods derived from recombinant-DNA microorganisms* (CAC/GL 46-2003). Food and Agriculture Organization of the United Nations/World Health Organization.
[6] Wick, R. R., Judd, L. M., Gorrie, C. L., & Holt, K. E. (2017). Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads. *PLoS Computational Biology*, 13(6), e1005595. https://doi.org/10.1371/journal.pcbi.1005595
[7] Antipov, D., Korobeynikov, A., McLean, J. S., & Pevzner, P. A. (2016). hybridSPAdes: an algorithm for hybrid assembly of short and long reads. *Bioinformatics*, 32(7), 1009-1015. https://doi.org/10.1093/bioinformatics/btv688