Research on the Application of Spark in Medical Big Data Analysis under the Background of Blockchain
DOI:
https://doi.org/10.53469/wjimt.2025.08(03).09Keywords:
Blockchain, Spark, Medical big data, Privacy protection, Parallel computingAbstract
Medical big data holds significant value in promoting precision medicine, disease prediction, and public health management. However, issues such as sensitivity, decentralization, and privacy security limit its in-depth application. This study proposes a collaborative computing framework based on blockchain and Apache Spark, aiming to address the challenges of privacy protection, cross-institutional sharing, and efficient analysis of medical data. By designing an access control mechanism based on smart contracts and an anonymization scheme utilizing zero-knowledge proofs, combined with Spark's distributed memory computing advantages, a secure and trustworthy platform for medical data analysis is constructed. Experiments demonstrate that this framework improves data processing efficiency by 3.5 times compared to the traditional Hadoop architecture on the MIMIC-III dataset, while also meeting HIPAA privacy standards. This study provides theoretical support and practical pathways for the application of "blockchain + big data" technology in the medical field.
References
Cai, J. H. (2013). Technical challenges and countermeasures for the integration of medical big data. China Journal of Health Informatics Management, 10(2), 45-50.
COVID-19 Data Consortium. (2020). Title of the report. The Lancet, 395(10242)
Esposito, C., De Santis, A., Tortora, G., Chang, H., & Choo, K.-K. R. (2018). blockchain-based healthcare workflows in federated clouds. IEEE Internet of Things Journal, 5(6), 4910-4922.
White, T. (2012). Hadoop: The definitive guide (4th ed.). O’Reilly Media.
Esposito, C., De Santis, A., Tortora, G., Chang, H., & Choo, K.-K. R. (2018). blockchain-based secure storage and access control for healthcare data in federated clouds. IEEE Transactions on Cloud Computing, 10(3), 258-272.
Xue, T. F., Zhang, L., & Wang, H. (2020). A consortium blockchain-based medical data sharing framework with attribute-based encryption. Future Generation Computer Systems, 111, 639-650.
Song, K., Li, J., & Zhang, W. (2019). A hybrid signature scheme for patient privacy protection in blockchain-based healthcare systems. IEEE Access, 7, 126335-126346.
Zaharia, M., Xin, R. S., Wendell, P., Das, T., Armbrust, M., Dave, A., ... & Stoica, I. (2016). Apache Spark: A unified engine for big data processing. Communications of the ACM, 59(11), 56-65.
Munich, M., Azaria, A., & Halamka, J. (2019). Real-time clinical decision support using Apache Spark streaming. Journal of Biomedical Informatics, 95, 103218.
Chen, Y., Li, X., & Wang, Q. (2022). Distributed deep learning for medical image analysis using Apache Spark. Medical Image Analysis, 78, 102389.
Gilbert, M., Rambhatla, S., & Zhao, Y. (2021). Decentralized contact tracing using blockchain and stream processing. Nature Computational Science, 1(9), 589-597.
Liu, Y., Li, B., & Ravi, S. (2023). Federated learning for drug safety prediction with blockchain-based incentive mechanisms. IEEE Journal of Biomedical and Health Informatics, 27(4), 1892-1902.
Wang, Q., Nguyen, T., & Stoica, I. (2023). Co-designing blockchain and Spark for healthcare data analytics. Proceedings of the ACM SIGMOD International Conference on Management of Data, 1457-1470.
Chen, Z., Almeida, J., & Rosu, G. (2024). Post-quantum cryptography in distributed data processing systems. IEEE Transactions on Dependable and Secure Computing, 21(1), 456-469.
Zhang, L., Zhou, M., & Wu, D. (2023). Lightweight blockchain for IoMT-enabled real-time health monitoring. ACM Transactions on Internet of Things, 4(3), 1-24.