Multi-Modal Deep Learning for Purchase Decision Modeling: A Comprehensive Framework for E-Commerce Recommendation
DOI:
https://doi.org/10.15849/ijasca.v18i2.81Keywords:
multi-modal learning, recommender systems, purchase prediction, deep learning, e-commerce, cross-modal attentionAbstract
Understanding customers' purchasing decisions is a fundamental challenge in e-commerce. This work presents a multi-modal deep learning architecture using product image, product description, and user behavior history to predict the probability that a user will purchase a product. We use ResNet-50, BERT, and a bidirectional LSTM to encode features from the three modalities and propose a cross-modal attention mechanism to integrate the features. Our experiments are carried out on the Amazon Electronics dataset. We achieve an ROC-AUC of 0.892, which outperforms the best unimodal model by at least 8\%. Ablation experiments reveal that the different modalities complement one another, with user behavior history being the most important modality.