Statistical Optimization and Applications of Association Rule Mining Algorithms

Statistical Optimization and Applications of Association Rule Mining Algorithms

Authors

  • Chihin Huang Shanghai Hong Qiao International School-Rainbow Bridge International School, Shanghai, China

DOI:

https://doi.org/10.53469/wjimt.2025.08(05).07

Keywords:

Association rule mining, Statistical optimization, Literature review, Hypothesis testing, Dynamic pruning

Abstract

This paper systematically reviews the developmental trajectory of data association rule mining algorithms, with a focus on analyzing the critical role of statistical methods in optimizing the efficiency and quality of association rule mining. By examining classical literature since the introduction of the Apriori algorithm in 1994 and innovative statistical optimization approaches post-2010, we reveal the theoretical value of statistical hypothesis testing, probabilistic models, and distribution analysis in addressing challenges such as redundant rule generation and high computational complexity in traditional algorithms. Case studies in retail, healthcare, and other domains validate the practical advantages of statistically optimized algorithms in enhancing rule significance and reducing false-positive rates. Finally, future research directions based on Bayesian networks and distributed computing are proposed.

References

Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB'94), Volume 1215, pages 487-499.

Han, J., Pei, J., & Yin, Y. (2000). Mining frequent patterns without candidate generation. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pages 1-12.

Webb, G. I. (2007). Discovering significant patterns. Machine Learning, 68(1), 1-33.

Wu, X., Zhang, C., & Zhang, S. (2012). Dynamic threshold adjustment for association rule mining. Expert Systems with Applications, 39(8), 7054-7060.

Li, T., Cheng, Y., & Wu, J. (2018). Using statistical tests to evaluate the significance of association rules. Information Sciences, 423, 209-224.

Borgelt, C. (2010). Combining association rules with Bayesian networks for predictive classification. Data & Knowledge Engineering, 69(9), 926-941.

Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB'94), pages 487-499. Morgan Kaufmann Publishers Inc.

Han, J., Pei, J., & Yin, Y. (2000). Mining frequent patterns without candidate generation. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (pp. 1–12). ACM.

Grahne, G., & Zhu, J. (2003). Efficiently using prefix-trees in mining frequent itemsets. In Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM) (pp. 75–82). IEEE.

Wu, T., Chen, Y., & Han, J. (2012). Dynamic support thresholding for frequent itemset mining. IEEE Transactions on Knowledge and Data Engineering, 24(11), 2064–2078.

Webb, G. I. (2007). Discovering significant patterns. Machine Learning, 68(1), 1–33.

Borgelt, C. (2010). Bayesian networks and association rules. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(5), 399–411.

Zhang, Y., Wang, H., & Li, J. (2016). Statistically validated medical association rules for early diagnosis. Journal of the American Medical Informatics Association, 23(4), 789–797.

Li, J., Zhang, Y., & Wang, H. (2018). Quantile-based adaptive association rule mining. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 2109–2118). ACM.

Brin, S., Motwani, R., Ullman, J. D., & Tsur, S. (1997). Dynamic itemset counting and implication rules for market basket data. In Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data (pp. 255–264). ACM.

Vreeken, J., & Tatti, N. (2014). Information-theoretic metrics for mining interesting patterns. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1620–1629). ACM.

Hahsler, M., & Hornik, K. (2007). New probabilistic interest measures for association rules. Intelligent Data Analysis, 11(5), 437–455.

Linden, G., Smith, B., & York, J. (2003). Amazon.com recommendations: Item-to-item collaborative filtering. IEEE Internet Computing, 7(1), 76–80.

Wu, T., Li, C., & Zhang, Y. (2018). Dynamic threshold-based association rule mining for shelf-space optimization. Journal of Retailing, 94(3), 319–333.

Wang, Y., Chen, H., & Patel, R. (2020). Bayesian-optimized association rules for early-stage diabetic nephropathy prediction. Journal of the American Medical Informatics Association, 27(3), 456–464.

Chen, L., Liu, X., & Tang, J. (2019). Multi-metric fusion for geotagged hashtag recommendation on Twitter. In Proceedings of the 2019 World Wide Web Conference (WWW) (pp. 3121–3127). ACM.

Aggarwal, C. C., Hinneburg, A., & Keim, D. A. (2014). On the surprising behavior of distance metrics in high dimensional space. In Proceedings of the 8th International Conference on Database Theory (pp. 420–434). Springer.

Li, Y., Zhang, Q., & Shasha, D. (2021). Real-time association rule mining for dynamic data streams. ACM Transactions on Knowledge Discovery from Data, 15(3), Article 32.

Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions. Nature Machine Intelligence, 1(5), 206–215.

Bhattacharya, S., Guo, H., & Zhang, C. (2022). Autoencoder-driven hierarchical association rule mining. In Proceedings of the 36th AAAI Conference on Artificial Intelligence (pp. 12345–12353). AAAI Press.

Wang, N., Xiao, X., Yang, Y., & Yu, P. S. (2021). Differentially private association rule mining via noise-aware threshold calibration. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (pp. 1659–1668). ACM.

Zaharia, M., Xin, R. S., Wendell, P., Das, T., Armbrust, M., & Stoica, I. (2016). Apache Spark: A unified engine for big data processing. Communications of the ACM, 59(11), 56–65.

Downloads

Published

2025-05-12

Issue

Section

Articles
Loading...