Towards a Deep Multimodal Similarity Learning for Text and Image Embeddings Fusion

Turquoise Technology Solutions Inc. (“Turquoise”) is a Google Partner firm based in Montreal. They have been working on recommender systems for books and recipes. They are currently interested in improving the accuracy of their recommender systems. In order to improve such a system, the goal is to extract the model of customers’ preferences using both visual and textual interactions via processing the images of a product and texts left as comments respectively. This proposal concentrates on developing an image feature extraction using a deep convolutional autoencoder architecture with an intention to interface with a textual data. Using the image embedding vectors outputted by the network, a search method based on the K-nearest neighbor algorithm is proposed to calculate the similarity between an input image. Upon the performance of the model in measuring the image similarity, the search method can find the similarity through a fusion space obtained from the concatenated textual and image embeddings.

Faculty Supervisor:

Javad Dargahi

Student:

Partner:

Turquoise Technology Solutions Inc

Discipline:

Engineering

Sector:

Artificial Intelligence; Technology; Commercial Services

University:

Concordia University

Program: