Deep Representation Learning Enabling Cross-Modality Person Re-identification: Explorations and Perspectives

Authors

Li Fan School of Artificial Intelligence, Neijiang Normal University, Neijiang 641100, Sichuan, China

DOI:

https://doi.org/10.53469/wjimt.2025.08(04).22

Keywords:

Deep Representation Learning, Cross-Modality Person Re-identification, Challenge

Abstract

This paper focuses on the technology of cross-modality person re-identification empowered by deep representation learning. Deep representation learning can automatically extract high-level features, while cross-modal person re-identification is committed to solving the problem of matching pedestrian features among different modal data. The integration of these two is of great significance. This paper expounds on the foundation of deep representation learning, the task process of cross-modal person re-identification, the challenges it faces, and its application fields. It also introduces the application of deep representation learning in this context and analyzes existing problems, such as high model complexity and weak generalization ability. At the same time, it looks ahead to the future development trends, including technologies such as data augmentation using Generative Adversarial Networks and domain adaptation through transfer learning. These are expected to promote the industrial implementation of this technology and the construction of its ecosystem.

References

Bengio, Yoshua, Aaron Courville, and Pascal Vincent. "Representation learning: A review and new perspectives." IEEE transactions on pattern analysis and machine intelligence 35.8 (2013): 1798-1828.

Zhong, Guoqiang, et al. "An overview on data representation learning: From traditional feature learning to recent deep learning." The Journal of Finance and Data Science 2.4 (2016): 265-278.

Ju, Wei, et al. "A comprehensive survey on deep graph representation learning." Neural Networks (2024): 106207.

Chen, Fenxiao, et al. "Graph representation learning: a survey." APSIPA Transactions on Signal and Information Processing 9 (2020): e15.

Prajwal, Thode Sai, and Ilavarasi AK. "A comparative study Of RESNET-pretrained models for computer vision." Proceedings of the 2023 Fifteenth International Conference on Contemporary Computing. 2023.

Jiang, Jianguo, et al. "A cross-modal multi-granularity attention network for RGB-IR person re-identification." Neurocomputing 406 (2020): 59-67.

Hafner, Frank M., et al. "RGB-depth cross-modal person re-identification." 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, 2019.

Zeng, Xuanli, et al. "Random area pixel variation and random area transform for visible-infrared cross-modal pedestrian re-identification." Expert Systems with Applications 215 (2023): 119307.

Wu, Ancong, et al. "RGB-infrared cross-modality person re-identification." Proceedings of the IEEE international conference on computer vision. 2017.

Wang, Yue. "Cross-Modality Person Re-Identification: An Attention-Enhanced Framework for Deep Fusion of Visible and Infrared Features." 2024 IEEE 16th International Conference on Computational Intelligence and Communication Networks (CICN). IEEE, 2024.

Wang, Weidong, et al. "Feature decoupling and interaction network for defending against adversarial examples." Image and Vision Computing 144 (2024): 104931.

Chan, Sixian, et al. "Parameter sharing and multi-granularity feature learning for cross-modality person re-identification." Complex & Intelligent Systems 10.1 (2024): 949-962.

Zhang, Jiawei, et al. "Amc-net: An effective network for automatic modulation classification." ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023.

Guo, Wenzhong, Jianwen Wang, and Shi** Wang. "Deep multimodal representation learning: A survey." Ieee Access 7 (2019): 63373-63394.

Kazemi, Seyed Mehran, et al. "Representation learning for dynamic graphs: A survey." Journal of Machine Learning Research 21.70 (2020): 1-73.

An infrared and visible image fusion algorithm based on ResNet-152

Wang, Limin, et al. "Places205-vggnet models for scene recognition." arxiv preprint arxiv:1508.01667 (2015).

Yang, Zihan. "Classification of picture art style based on VGGNET." Journal of Physics: Conference Series. Vol. 1774. No. 1. IOP Publishing, 2021.

Zhou, Kaiyang, et al. "Omni-scale feature learning for person re-identification." Proceedings of the IEEE/CVF international conference on computer vision. 2019.

Fukushima K. Neocognitron: A Self-Organizing Neural Network Model for A Mechanism of Pattern Recognition Unaffected by Shift in Position[J]. Biological cybernetics, 1980, 36(4): 193-202.

Siami-Namini, Sima, Neda Tavakoli, and Akbar Siami Namin. "The performance of LSTM and BiLSTM in forecasting time series." 2019 IEEE International conference on big data (Big Data). IEEE, 2019.

Zhang, Yaquan, et al. "Memory-gated recurrent networks." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 35. No. 12. 2021.

Downloads

Published

2025-04-28

Issue

Vol. 8 No. 4 (2025)

Section

Articles