Project's information

Project's title Approximation algorithms in two bioinformatics problems: DNA motif finding and ARG inference
Project’s code ĐLTE00.01/19-20
Research hosting institution Institute of Information Technology
Project leader’s name Nguyen Thi Phuong Thao
Project duration 01/01/2019 - 31/12/2020
Project’s budget 500 million VND
Classify Fair
Goal and objectives of the project

Researching and developing approximation algorithms to solve two bioinformatics problems: (1) DNA motif finding, and (2) ARG inference.

Main results

-    We proposed the GAMARG algorithm that combined the four-gamete test constraint with the longest shared ends strategy in ARG4WG to optimize the number of recombination events in ARG building process. Experiment with different datasets showed that GAMARG algorithm outperforms other heuristic algorithms in building ARGs for large datasets. It also is much better than other heuristic algorithms and comparable to exhaustive search methods for small datasets.
-     We proposed a new ant colony optimization algorithm, called ACOSite, to find the location of motif. ACOSite has effectively maximized the IC function, giving a promising F1 score when compared with the state-of-the-art algorithms.
-   Besides, we performed statistical analysis methods on 6 CYP genes of Kinh Vietnamese people and observed diverging trends in the genetic variations of CYP2B6, CYP2D6, and CYP3A5 compared with six other populations retrieved from the 1000 Genomes Project. In terms of the phenotypic drug responses in KHV, CYP2C19 exhibited all of the metabolic phenotypes at a non-trivial frequency. CYP3A5 metabolized drugs at a lower rate than the other five CYPs.

Novelty and actuality and scientific meaningfulness of the results

We proposed an ARG inference algorithm that was able to handle thousands sequences with tens of thousands of markers, and also could reach the minimum recombination ARGs. Besides, this is the first large-scale study to investigate multiple CYP genes in the KHV for precision medicine from a public health perspective. Differences found in the distributions of metabolizers for the KHV suggest careful prescriptions for CYP2C19 and CYP3A5-metabolized drugs.

Products of the project

-    Scientific papers in referred journals (list):
1.    Nguyen Thi Phuong Thao, Le Sy Vinh, “A Hybrid Approach to Optimize the Number of Recombination in Ancestral Recombination Graphs”, In Proceedings of the 2019 9th International Conference on Bioscience, Biochemistry and Bioinformatics (pp. 36-42). ACM, 2019
2.    Diep TH, Hiep TV, Thao TPN, Nhung HTM, Kien TT, Vinh LS (2022), "Exploring the Kinh Vietnamese genomic database for the polymorphisms of the P450 genes toward precision public health", Annals of Human Biology  (
-    Technological products (describe in details: technical characteristics, place):
Source code of GAMARG is available for download from:

Images of project