Analysis of metagenomic data from geoduck (Lutraria phylippinarum) infected with proboscis disease in Cat Ba calves rearing area, Vietnam

Geoduck (Lutraria phylippinarum) is a cultured bivalve species with high nutritional and economic value. Since 2010, geoduck farming in Vietnam has developed strongly, especially in the provinces of Quang Ninh, Hai Phong and Khanh Hoa. However, since the end of 2011, the phenomenon of dead geoduck has occurred on a large scale with a mortality rate of up to 100% in most of the growing areas such as Cat Ba, Van Don and Cam Ranh, causing great damage. The Ministry of Agriculture and Rural Development has directed the Aquaculture Research Institute 1 to carry out projects to research epidemics and identify pathogens. The results of histopathology and electron microscopy have shown changes in the macroscopic and microscopic structures of some tissues and the presence of pathogens as follows: 1) The gill tissue, mantle tissue, liver tissue, and tubal muscle tissue of the cecum died with swelling of the tubule with necrosis; 2) Infected gill and tubal muscle tissue; 3) The gill tissue appears as virus-like inclusions; 4) Proboscis muscle tissue appears a virus-like microbial structure in the cytoplasm and cell wall of the proboscis with the size of about 70-110 nm x 600-1000 nm, referred to as VLPs (Phan Thi Van and cs., 2013). However, the structure of VLP cannot be isolated and cultured separately, so it is difficult to study by conventional methods. Therefore, there is no specific evidence to confirm the causative agent of proboscis swelling on geoduck.

From the above practical issue, the Genomic Research Institute has cooperated with Aquaculture Research Institute I and Japanese experts in the field of Bioinformatics from Ochanomizu University, Tohoku University, Institute of Technology Research Nagahama Biosciences and Technology proposes JSPS-VAST international cooperation project: “Metagenomic analysis to identify the virus that causes swelling of the proboscis disease on geoduck (Lutraria phylippinarum) cultured in Vietnam”. The project was approved by the Vietnam Academy of Science and Technology, code VAST.HTQT.NHATBAN.02/17-19, led by Genomic Research Institute, implemented from 2017 to 2020, chaired by Dr. Kim Thi Phuong Oanh. On 26/01/2021, the Acceptance Council for International Cooperation Tasks (International Cooperation) level of VAST has accepted and evaluated the project as Excellent.

Metagenomics is a tool that has been used effectively to study pathogenic microorganisms and microbial diversity. There are many new viruses that cause disease in marine organisms discovered by this method. In parallel with the Next Generation Sequencing (NGS), many analytical methods and bioinformatics software have been developed to analyze the metagenome and the genome of viruses, making the study more effective. The objective of the international cooperation project is to study suitable bioinformatics methods to analyze metagenomic data from infected geoduck, thereby identifying the causative agent of prostatitis in cultured geoduck in Vietnam. In addition, cooperation with foreign partners helps improve the qualifications in the field of bioinformatics of the officials participating in the project.

To achieve this goal, the study has carried out three main contents including: 1) Preparation of a database of metagenomes of geoduck infected with proboscis disease. Sequence data from Illumina's system will be collected from a collaborative study between the Genome Research Institute and the Aquaculture Research Institute I. In addition, in collaboration with the Japanese partner to decode the above sequence. MinION machine (Oxford Nanopore) for additional analytical data; 2) Research to identify suitable bioinformatics methods to analyze metagenomic data; 3) Analysis of metagenomic data from geoduck infected with proboscis disease to determine the microbial composition in the specimen and towards the identification of the causative agent.

Decoding and analyzing metagenome to identify pathogens is one of the new, modern research methods and is suitable with research trends in the world. This disease causing mass mortality has just appeared and been described in Vietnam, most likely the causative agent is a new species. Therefore, the identification and classification of species is very important.

Carrying out the three research contents mentioned above, the study has obtained the following main results:

  • Metagenomes of geoduck infected with proboscis disease were decoded using the Illumina NGS system and the Oxford Nanopore MinION. The decoding results showed that the DNA library prepared by viral enrichment methods eliminated a large proportion of host DNA compared with conventional methods. Metagenome decoding data has been registered on the international gene bank DDBJ Sequence Read Archive with code DRA008913 and NCBI Sequence Read Archive with code SRR10717928.
  • The genome of the geoduck host (Lutraria phylippinarum) was decoded using the NGS Illumina and MinIOn systems, and then the geoduck genome was assembled and annotated by bioinformatics methods. The genome size is estimated at 542 Mbp, assembled into 2,405 scaffolds with an N50 value of 0.9 Mbp. The predicted geoduck genome consisted of 30,533 genes and was annotated based on transcriptome data. The quality of the genome is equivalent to that of the published genome at the same time as the project is completed.
  • Metagenome data of geoduck infected with proboscis disease peduncles were analyzed using suitable bioinformatics methods. The microbial composition of the infected tubal filtrate was determined in Table 1.

Table 1. Organism species composition in metagenome data

  • Discovery of a novel viral genome from metagenomics data of geoduck infected with proboscis disease. It is a single-stranded DNA (ssDNA), circular virus, belonging to the Cressdnaviricota phylum (CRESS-DNA virus). This is the first new virus discovered in geoduck, phylogenetic analysis shows that it is far from the CRESS-DNA viruses found in other animals. The genome structure of the new CRESS-DNA virus is shown in Figure 1.

Figure 1: Genome structure of the newly discovered virus CRESS-DNA in geoduck
A. Structure and size of Single-stranded DNA circular genome;
B. Stem loop located between the 5' ends of two ORFs, contains a conservative 9bp-motif (CAGTATTAC) structure and is structurally specific for Cressdnaviricota viruses.

The topic has selected suitable methods to analyze metagenome data from geoduck infected with proboscis disease. Although the causative agent has not been identified, these are the molecular biology data underpinning further research to determine the cause of tubal swelling in the geoduck.

In addition, the project successfully trained 01 PhD student in the field of bioinformatics for the Genome Research Institute; organize a training course to introduce and practice bioinformatics tools/methods to analyze genome, transcriptome, metagenome (Figure 2); successfully organized international seminars in both Vietnam and Japan on the science of “Omics” (Figures 3, 4); the project has 01 articles published in journals in the list of SCI-E, 01 articles published in journals in the list of ESCI, 01 report at the National Conference.

