Sales Prediction Based on Textual Features of Online Product Reviews
DOI:
https://doi.org/10.53469/ijomsr.2025.08(09).07Keywords:
Online product reviews, Sales prediction, Text mining, Latent Dirichlet Allocation (LDA), Ridge regressionAbstract
Accurate sales forecasting is essential for effective enterprise decision-making. Reliable predictions not only optimize inventory management and reduce resource waste but also enhance customer experience. While prior research has primarily focused on improving traditional time series models, relatively limited attention has been given to forecasting sales based on textual features extracted from online product reviews. This study addresses this gap by incorporating consumer-generated reviews into a sales prediction framework. Using Amazon.com as a case study, we analyze the open-source Amazon food review dataset. Through feature engineering, three categories of textual variables are constructed: review topics (extracted via the Gensim library and Latent Dirichlet Allocation, LDA), star ratings, and review usefulness. These features, together with lagged sales, are used as inputs into a ridge regression model. Experimental results show that the proposed model achieves an R² of 0.88 on the training set and 0.78 on the test set, indicating both feasibility and strong predictive accuracy. Compared with traditional time series methods, the review-based approach leverages text mining to capture consumers’ genuine perceptions and market responses. This innovation enhances forecasting accuracy and offers theoretical as well as practical implications for enterprises, including improved sales planning, more adaptive market strategies, and more efficient warehouse and inventory management.
References
Chopra, S., & Meindl, P. Supply Chain Management: Strategy, Planning, and Operation. Pearson, 2016.
Waller, M. A., & Fawcett, S. E. Click Here for a Data Scientist: Big Data, Predictive Analytics, and Theory Development in the Era of a Maker Movement Supply Chain. Journal of Business Logistics, 2013, 34(4): 249-252.
Makridakis, S., & Hibon, M. The M3-Competition: Results, Conclusions, and Implications. International Journal of Forecasting, 2000, 16(4): 451-476.
Makkar, Sandhya, and Sneha Jaiswal. "Predictive analytics on e-commerce annual sales." Proceedings of Data Analytics and Management: ICDAM 2021, Volume 1. Springer Singapore, 2022.
Hyndman, R. J., & Athanasopoulos, G. Forecasting: principles and practice. OTexts, 2018.
Fulcher, B. D., & Jones, N. S. hctsa: A Computational Framework for Automated Time-Series Phenotyping Using Massive Feature Extraction. Cell Systems, 2017, 5(5): 527-531.
Bontempi, G., Taieb, S. B., & Le Borgne, Y. A. Machine Learning Strategies for Time Series Forecasting. European Business Intelligence Summer School, 2012, 62-77.
Blei, D. M., Ng, A. Y., & Jordan, M. I. *Latent Dirichlet Allocation*. Journal of Machine Learning Research, 2003.
Hong, L., & Davison, B. D. *Empirical study of topic modeling in Twitter*. Proceedings of the First Workshop on Social Media Analytics, 2010.
Meeks, E., & Weingart, S. *The Digital Humanities Contribution to Topic Modeling*. Journal of Digital Humanities, 2012.
Blei, D. M., & McAuliffe, J. D. *Supervised topic models*. Advances in Neural Information Processing Systems, 2007.
Blei, D. M., & Lafferty, J. D. *Dynamic topic models*. Proceedings of the 23rd International Conference on Machine Learning, 2006.
Makkar, S., & Jaiswal, S. *Predictive analytics on e-commerce annual sales*. Proceedings of Data Analytics and Management: ICDAM 2021, Volume 1. Springer Singapore, 2022.
Griffiths, T. L., & Steyvers, M. Finding scientific topics. Proceedings of the National Academy of Sciences, 2004.
Teh, Y. W., Jordan, M. I., Beal, M. J., & Blei, D. M. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 2006.
Makkar, Sandhya, and Sneha Jaiswal. "Predictive analytics on e-commerce annual sales." Proceedings of Data Analytics and Management: ICDAM 2021, Volume 1. Springer Singapore, 2022.
Wei X, Croft W B. LDA-based document models for ad-hoc retrieval[C]//Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. 2006: 178-185.
Montgomery, D. C., Peck, E. A., & Vining, G. G. (2012). Introduction to linear regression analysis. Wiley.