Nguyen is a second-year Combined Master’s & Ph.D. student in Integrative Biotechnology at Sungkyunkwan University, with a strong academic foundation as the Valedictorian in Artificial Intelligence from FPT University, Vietnam. His interests center on deep learning approaches for biological data. He excels in research design, coding, and proposing novel ideas, with a primary focus on computational methods. His long-term goal is to develop innovative deep learning frameworks that contribute meaningfully to computational biology.
Research interest: Computational biology, Bioinformatics, Deep Learning.
“It’s fine to celebrate success, but it is more important to heed the lessons of failure.” — Bill Gates
“Don’t compare yourself with anyone in this world. If you do so, you are insulting yourself.” — Bill Gates
Drinking water is an essential resource for human health and well-being, still it faces increasing threats from contamination by chemical pollutants. Among the contaminants, persistent, mobile, and toxic (PMT) substances, along with very persistent and very mobile (vPvM) substances, have emerged as chemicals of significant concern due to their harmful effects on human health. Regulatory bodies have recognized them as emerging contaminants requiring stricter monitoring and management practices. Traditional experimental methods for detecting and characterizing these substances are often slow and resource-intensive. Therefore, there is a pressing need to develop efficient computational approaches to detect persistent, mobile, and toxic, or very persistent and very mobile (PMT/vPvM) substances rapidly and economically. Addressing this gap, we proposed Mulaqua, the first deep learning (DL) approach specifically designed for identifying PMT/vPvM substances. Mulaqua utilizes a novel multimodal approach combining molecular string representation with molecular image for the final prediction. To address the data imbalance issue in the training dataset, we employ a data augmentation strategy based on Simplified Molecular Input Line Entry System (SMILES) enumeration, which helped to achieve a balanced performance with the training accuracy (ACC), F1-score (F1), and Matthews correlation coefficient (MCC) score of 0.920, 0.590, and 0.548, respectively. Our study also includes interpretability analyses to elucidate how specific molecular architectures influence PMT/vPvM substances characterization, thereby providing meaningful insights. Mulaqua demonstrates excellent transferability, validated through rigorous evaluation of external datasets, which significantly improves performance compared to the baseline. Unlike previous methods, Mulaqua is now publicly available at https://github.com/cbbl-skku-org/Mulaqua/, holds significant potential as a proactive tool for early hazard identification and regulatory prioritization of PMT/vPvM substances in environmental risk management.
@article{NGUYEN2025140573,bibtex_show=true,dimensions={true},title={Mulaqua: An interpretable multimodal deep learning framework for identifying PMT/vPvM substances in drinking water},journal={Journal of Hazardous Materials},volume={500},pages={140573},year={2025},issn={0304-3894},doi={https://doi.org/10.1016/j.jhazmat.2025.140573},url={https://www.sciencedirect.com/science/article/pii/S0304389425034934},author={Nguyen, Nguyen Doan Hieu and Pham, Nhat Truong and Seo, Hojin and Wei, Leyi and Manavalan, Balachandran},keywords={PMT/vPvM substances, Water quality, Data augmentation, Multimodal deep learning, Model interpretation}}
xBitterT5
xBitterT5: an explainable transformer-based framework with multimodal inputs for identifying bitter-taste peptides
Nguyen Doan Hieu Nguyen, Nhat Truong Pham, Duong Thanh Tran, and 3 more authors
Bitter peptides (BPs), derived from the hydrolysis of proteins in food, play a crucial role in both food science and biomedicine by influencing taste perception and participating in various physiological processes. Accurate identification of BPs is essential for understanding food quality and potential health impacts. Traditional machine learning approaches for BP identification have relied on conventional feature descriptors, achieving moderate success but struggling with the complexities of biological sequence data. Recent advances utilizing protein language model embedding and meta-learning approaches have improved the accuracy, but frequently neglect the molecular representations of peptides and lack interpretability. In this study, we propose xBitterT5, a novel multimodal and interpretable framework for BP identification that integrates pretrained transformer-based embeddings from BioT5+\thinspacewith the combination of peptide sequence and its SELFIES molecular representation. Specifically, incorporating both peptide sequences and their molecular strings, xBitterT5 demonstrates superior performance compared to previous methods on the same benchmark datasets. Importantly, the model provides residue-level interpretability, highlighting chemically meaningful substructures that significantly contribute to its bitterness, thus offering mechanistic insights beyond black-box predictions. A user-friendly web server (https://balalab-skku.org/xBitterT5/) and a standalone version (https://github.com/cbbl-skku-org/xBitterT5/) are freely available to support both computational biologists and experimental researchers in peptide-based food and biomedicine.
@article{Nguyen2025,bibtex_show=true,dimensions={true},author={Nguyen, Nguyen Doan Hieu and Pham, Nhat Truong and Tran, Duong Thanh and Wei, Leyi and Malik, Adeel and Manavalan, Balachandran},title={xBitterT5: an explainable transformer-based framework with multimodal inputs for identifying bitter-taste peptides},journal={Journal of Cheminformatics},year={2025},month=aug,day={20},volume={17},number={1},pages={127},issn={1758-2946},doi={10.1186/s13321-025-01078-1},url={https://doi.org/10.1186/s13321-025-01078-1}}
Feel free to contact me by any means of communication application above