Publications | Nguyen Doan Hieu Nguyen

2024

ESWA
SwinTExCo: Exemplar-based video colorization using Swin Transformer

Duong Thanh Tran, Nguyen Doan Hieu Nguyen, Trung Thanh Pham, and 5 more authors

Expert Systems with Applications, 2024

Abs Bib HTML Code

Video colorization represents a compelling domain within the field of Computer Vision. The traditional approach in this field relies on Convolutional Neural Networks (CNNs) to extract features from each video frame and employs a recurrent network to learn information between video frames. While demonstrating considerable success in colorization, most traditional CNNs suffer from a limited receptive field size, capturing local information within a fixed-sized window. Consequently, they struggle to directly grasp long-range dependencies or pixel relationships that span large image or video frame areas. To address this limitation, recent advancements in the field have leveraged Vision Transformer (ViT) and their variants to enhance performance. This article introduces Swin Transformer Exemplar-based Video Colorization (SwinTExCo), an end-to-end model for the video colorization process that incorporates the Swin Transformer architecture as the backbone. The experimental results demonstrate that our proposed method outperforms many other state-of-the-art methods in both quantitative and qualitative metrics. The achievements of this research have significant implications for the domain of documentary and history video restoration, contributing to the broader goal of preserving cultural heritage and facilitating a deeper understanding of historical events through enhanced audiovisual materials.
@article{TRAN2024125437, bibtex_show = true, title = {SwinTExCo: Exemplar-based video colorization using Swin Transformer}, journal = {Expert Systems with Applications}, pages = {125437}, year = {2024}, publisher = {Elsevier}, issn = {0957-4174}, dimensions = {true}, doi = {https://doi.org/10.1016/j.eswa.2024.125437}, author = {Tran, Duong Thanh and Nguyen, Nguyen Doan Hieu and Pham, Trung Thanh and Tran, Phuong-Nam and Vu, Thuy-Duong Thi and Nguyen, Cuong Tuan and Dang-Ngoc, Hanh and Dang, Duc Ngoc Minh}, keywords = {Computer vision, Image colorization, Video colorization, Exemplar-based, Vision transformer, Swin transformer}, }
Mol2Lang-VLM
Mol2Lang-VLM: Vision- and Text-Guided Generative Pre-trained Language Models for Advancing Molecule Captioning through Multimodal Fusion

Duong Thanh Tran^(†), Nhat Truong Pham^(†), Nguyen Doan Hieu Nguyen, and 1 more author

In Proceedings of the 1st Workshop on Language + Molecules (L+M 2024) , Aug 2024

Abs Bib HTML Code

This paper introduces Mol2Lang-VLM, an enhanced method for refining generative pre-trained language models for molecule captioning using multimodal features to achieve more accurate caption generation. Our approach leverages the encoder and decoder blocks of the Transformer-based architecture by introducing third sub-layers into both. Specifically, we insert sub-layers in the encoder to fuse features from SELFIES strings and molecular images, while the decoder fuses features from SMILES strings and their corresponding descriptions. Moreover, cross multi-head attention is employed instead of common multi-head attention to enable the decoder to attend to the encoder’s output, thereby integrating the encoded contextual information for better and more accurate caption generation. Performance evaluation on the CheBI-20 and L+M-24 benchmark datasets demonstrates Mol2Lang-VLM’s superiority, achieving higher accuracy and quality in caption generation compared to existing methods. Our code and pre-processed data are available at https://github.com/nhattruongpham/mol-lang-bridge/tree/mol2lang/.
@inproceedings{tran-etal-2024-mol2lang, bibtex_show = true, title = {{M}ol2{L}ang-{VLM}: Vision- and Text-Guided Generative Pre-trained Language Models for Advancing Molecule Captioning through Multimodal Fusion}, author = {Tran(†), Duong Thanh and Pham(†), Nhat Truong and Nguyen, Nguyen Doan Hieu and Manavalan, Balachandran}, editor = {Edwards, Carl and Wang, Qingyun and Li, Manling and Zhao, Lawrence and Hope, Tom and Ji, Heng}, booktitle = {Proceedings of the 1st Workshop on Language + Molecules (L+M 2024)}, month = aug, year = {2024}, address = {Bangkok, Thailand}, publisher = {Association for Computational Linguistics}, pages = {97--102}, }
Lang2Mol-Diff
Lang2Mol-Diff: A Diffusion-Based Generative Model for Language-to-Molecule Translation Leveraging SELFIES Representation

Nguyen Doan Hieu Nguyen^(†), Nhat Truong Pham^(†), Duong Thanh Tran, and 1 more author

In Proceedings of the 1st Workshop on Language + Molecules (L+M 2024) , Aug 2024

Abs Bib HTML Code

Generating de novo molecules from textual descriptions is challenging due to potential issues with molecule validity in SMILES representation and limitations of autoregressive models. This work introduces Lang2Mol-Diff, a diffusion-based language-to-molecule generative model using the SELFIES representation. Specifically, Lang2Mol-Diff leverages the strengths of two state-of-the-art molecular generative models: BioT5 and TGM-DLM. By employing BioT5 to tokenize the SELFIES representation, Lang2Mol-Diff addresses the validity issues associated with SMILES strings. Additionally, it incorporates a text diffusion mechanism from TGM-DLM to overcome the limitations of autoregressive models in this domain. To the best of our knowledge, this is the first study to leverage the diffusion mechanism for text-based de novo molecule generation using the SELFIES molecular string representation. Performance evaluation on the L+M-24 benchmark dataset shows that Lang2Mol-Diff outperforms all existing methods for molecule generation in terms of validity. Our code and pre-processed data are available at https://github.com/nhattruongpham/mol-lang-bridge/tree/lang2mol/.
@inproceedings{nguyen-etal-2024-lang2mol, bibtex_show = true, title = {{L}ang2{M}ol-Diff: A Diffusion-Based Generative Model for Language-to-Molecule Translation Leveraging {SELFIES} Representation}, author = {Nguyen(†), Nguyen Doan Hieu and Pham(†), Nhat Truong and Tran, Duong Thanh and Manavalan, Balachandran}, editor = {Edwards, Carl and Wang, Qingyun and Li, Manling and Zhao, Lawrence and Hope, Tom and Ji, Heng}, booktitle = {Proceedings of the 1st Workshop on Language + Molecules (L+M 2024)}, month = aug, year = {2024}, address = {Bangkok, Thailand}, publisher = {Association for Computational Linguistics}, pages = {128--134}, }

2023

ICTC
Vitexco: Exemplar-based Video Colorization using Vision Transformer

Duong Thanh Tran, Nguyen Doan Hieu Nguyen, Trung Thanh Pham, and 3 more authors

In 2023 14th International Conference on Information and Communication Technology Convergence (ICTC) , Aug 2023

Abs Bib HTML

In the field of image and video colorization, the existing research employs a CNN to extract information from each video frame. However, due to the local nature of a kernel, it is challenging for CNN to capture the relationships between each pixel and others in an image, leading to inaccurate colorization. To solve this issue, we introduce an end-to-end network called Vitexco for colorizing videos. Vitexco utilizes the power of the Vision Transformer (ViT) to capture the relationships among all pixels in a frame with each other, providing a more effective method for colorizing video frames. We evaluate our approach on DAVIS datasets and demonstrate that it outperforms the state-of-the-art methods regarding color accuracy and visual quality. Our findings suggest that using a ViT can significantly enhance the performance of video colorization.
@inproceedings{10393505, bibtex_show = true, author = {Tran, Duong Thanh and Nguyen, Nguyen Doan Hieu and Pham, Trung Thanh and Tran, Phuong-Nam and Vu, Thuy-Duong Thi and Dang, Duc Ngoc Minh}, title = {Vitexco: Exemplar-based Video Colorization using Vision Transformer}, booktitle = {2023 14th International Conference on Information and Communication Technology Convergence (ICTC)}, publisher = {{IEEE}}, year = {2023}, pages = {59-64}, keywords = {Measurement;Visualization;Image color analysis;Transformers;Information and communication technology;Data mining;Kernel;image colorization;video colorization;exemplar-based;vision transformer}, doi = {10.1109/ICTC58733.2023.10393505}, dimensions = {true}, }