Multi-Modal Deep Learning for Purchase Decision Modeling: A Comprehensive Framework for E-Commerce Recommendation

Authors

  • Raouya El Youbi National School of Applied Sciences, Sidi Mohamed Ben Abdellah University, Fez, Morocco
  • Fayçal Messaoudi National School of Business and Management Sidi Mohamed Ben Abdellah University, Fez, Morocco
  • Riad Loukili National School of Applied Sciences, Sidi Mohamed Ben Abdellah University, Fez, Morocco
  • Manal Loukili National School of Business and Management Sidi Mohamed Ben Abdellah University, Fez, Morocco
  • Essa Lafi Al Smadi Ajloun National University, Ajloun 26810, Jordan

DOI:

https://doi.org/10.15849/ijasca.v18i2.81

Keywords:

multi-modal learning, recommender systems, purchase prediction, deep learning, e-commerce, cross-modal attention

Abstract

Understanding customers' purchasing decisions is a fundamental challenge in e-commerce. This work presents a multi-modal deep learning architecture using product image, product description, and user behavior history to predict the probability that a user will purchase a product. We use ResNet-50, BERT, and a bidirectional LSTM to encode features from the three modalities and propose a cross-modal attention mechanism to integrate the features. Our experiments are carried out on the Amazon Electronics dataset. We achieve an ROC-AUC of 0.892, which outperforms the best unimodal model by at least 8\%. Ablation experiments reveal that the different modalities complement one another, with user behavior history being the most important modality.

Downloads

Published

2026-06-13