You can access the demo via this Link

By leveraging recent advances in natural language processing (NLP) and information retrieval techniques, we propose a new way to enhance movie recommendation systems in this project. This project endeavors to harness cuttingedge methodologies in NLP and information retrieval to provide users with highly tailored movie suggestions based on Persian queries, thus enhancing their overall movie-watching experience.
We collected movie data from four prominent Persian websites: (DigiMovie, n.d.), (FilmKio, n.d.), (TinyMovie, n.d.), and (Uptv, n.d.). These websites offer a vast repository of movies with persian-language details, encompassing various genres, release years, IMDB ratings and more. The data extraction process involved scraping information such as movie titles, descriptions, genres, release years, ratings, actors' names and other relevant details.
As part of the proposed system, information retrieval and text generation are combined through the Retrieval-Augmented Generation (RAG) framework. To capture semantic similarities between user queries and movie datasets, the system embeds them in language models. Using the embedded representations, movies that are most relevant to the user query are retrieved from the dataset. to enhance recommendation accuracy, we compare embeddings generated by FastText, ParsBERT, GPT, and Cohere multi lingual model.
To evaluate the performance of the proposed system, we benchmarked the models on the following metrics:
The Intersection over Union (IoU) metric measures the overlap between the predicted and ground truth movie recommendations. To make the ground truth recommendations, we used the "similar movies" section on the IMDB website.
We randomly selected 100 movies from the dataset as the imdb evaluation set`.
The Overlap metric measures the number of overlapping movies between the predicted and ground truth recommendations.
The Accuracy by Genre metric evaluates the performance of the models in recommending movies from different genres. We randomly selected 10 genres from the dataset and then calculated the accuracy of the models in recommending movies from these genres. Meaning that we calculated the percentage of movies from the selected genre that were recommended by the models.