Object detection and multimodal learning for product recommendations

Selwon, Karolina; Wnuk, Paweł

doi:10.34808/1ayy-v106

Artykuł - szczegóły

Tytuł artykułu

Object detection and multimodal learning for product recommendations

Autorzy

Selwon Karolina , Wnuk Paweł

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.34808/1ayy-v106

Warianty tytułu

Języki publikacji

Abstrakty

This study showcases how deep learning can be applied to automated information extraction in fashion data to create a recommendation system. The proposed approach is an algorithm for recommending multiple products based on visual and textual features, ensuring compatibility with query items. The object detection model can detect many products across different garment categories. The study utilized public e-commerce datasets and trained models using deep learning methods. The compatibility model has shown promising results in automating recommendations of compatible products based on user interests. The study experimented with multiple pre-trained feature extraction models and successfully trained the object detection model for fashion article detection and localization tasks. Overall, the goal is to deploy the method to enhance its effectiveness and usefulness in providing a satisfying shopping experience for e-commerce users.

Słowa kluczowe

object detection learning features extraction

wykrywanie obiektów uczenie się cechy ekstrakcji

Wydawca

Politechnika Gdańska

Czasopismo

TASK Quarterly : scientific bulletin of Academic Computer Centre in Gdansk

Rocznik

2023

Tom

Vol. 27, No 2

Strony

1--9

Opis fizyczny

Bibliogr. 18 poz., rys., tab.

Twórcy

autor

Selwon Karolina

karselwo@pg.edu.pl

Faculty of ElectronicsTelecommunications and Informatics, Gdańsk University of Technology, Gdańsk, Poland, ul. Narutowicza 11/12, Gdańsk, Poland

autor

Wnuk Paweł

Shopai sp. z o.o., ul. Roździeńskiego 2A, Piekary Śląskie, Poland

Bibliografia

[1] Y.-H. Chang and Y.-Y. Zhang, “Deep learning for clothing style recognition using yolov5”, Micromachines, vol. 13, no. 10, p. 1678, 2022.
[2] Y. Ge, R. Zhang, X. Wang, X. Tang, and P. Luo, “Deepfashion2: A versatile benchmark for detection, pose estimation, segmentation and reidentification of clothing images”, in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5337–5345, 2019.
[3] “Yolov5.” https://github.com/ultralytics/yolov5. Accessed: 2023-07-01.
[4] M. I. Vasileva, B. A. Plummer, K. Dusad, S. Rajpal, R. Kumar, and D. Forsyth, “Learning type-aware embeddings for fashion compatibility”, in Proceedings of the European conference on computer vision (ECCV), pp. 390–405, 2018.
[5] X. Han, Z. Wu, Y.-G. Jiang, and L. S. Davis, “Learning fashion compatibility with bidirectional lstms”, in Proceedings of the 25th ACM international conference on Multimedia, pp. 1078–1086, 2017.
[6] A. Veit, S. Belongie, and T. Karaletsos, “Conditional similarity networks”, in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 830–838, 2017.
[7] Z. Cui, Z. Li, S. Wu, X.-Y. Zhang, and L. Wang, “Dressing as a whole: Outfit compatibility learning based on node-wise graph neural networks”, in The world wide web conference, pp. 307–317, 2019.
[8] Y. Li, L. Cao, J. Zhu, and J. Luo, “Mining fashion outfit composition using an end-to-end deep learning approach on set data”, IEEE Transactions on Multimedia, vol. 19, no. 8, pp. 1946–1955, 2017.
[9] A. Ravi, S. Repakula, U. K. Dutta, and M. Parmar, “Buy me that look: An approach for recommending similar fashion products”, in 2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR), pp. 97–103, IEEE, 2021.
[10] E. Li, E. Kim, A. Zhai, J. Beal, and K. Gu, “Bootstrapping complete the look at pinterest”, in Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 3299–3307, 2020.
[11] S. Zheng, F. Yang, M. H. Kiapour, and R. Piramuthu, “Modanet: A large-scale street fashion dataset with polygon annotations”, in Proceedings of the 26th ACM international conference on Multimedia, pp. 1670–1678, 2018.
[12] X. Wang, B. Wu, and Y. Zhong, “Outfit compatibility prediction and diagnosis with multi-layered comparison network”, in Proceedings of the 27th ACM international conference on multimedia,pp. 329–337, 2019.
[13] “Bert base model”, https://huggingface.co/bert-base-uncased. Accessed: 2023-07-01.
[14] “Clip (contrastive language-imagepre-training)”. https://github.com/openai/CLIP. Accessed: 2023-07-01.
[15] “Resnet-50”, https://huggingface.co/microsoft/resnet-50. Accessed: 2023-07-01.
[16] “Vision transformer”, https://huggingface.co/google/vit-base-patch32-224-in21k. Accessed: 2023-07-01.
[17] M. Tan and Q. Le, “Efficientnetv2: Smaller models and faster training”, in International conference on machine learning, pp. 10096–10106, PMLR, 2021.
[18] I. O. Tolstikhin, N. Houlsby, A. Kolesnikov, L. Beyer, X. Zhai, T. Unterthiner, J. Yung, A. Steiner, D. Keysers, J. Uszkoreit, et al.,“Mlp-mixer: An all-mlp architecture for vision”, Advances in neural information processing systems, vol. 34, pp. 24261–24272, 2021.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-2bbb7a6e-4f66-4744-bec0-7a7ba0935d38