Abstract
Email is an important means of communication and is largely used in corporations and businesses due to its efficiency, low cost and its practical asynchrony. In order to automate email-managing tasks for an efficient and easy utilization, many intelligent techniques were proposed and applied by researchers in the field of machine learning and data mining. In this research, a survey on the different techniques used for recommendation system and information retrieval in emails is presented. A case study of text classification using email content was elaborated using pre-trained language models (BERT, Elmo). A three baseline models were used to evaluate the performance: Random Forest, support vector machine and Naive Bayes. The classic classification metrics (Precision, Recall and F score) were used to assess the performance of the models. The results of experiments show that Elmo performed poorly in the binary classification with an accuracy of 38%, whereas the lake of sufficient Resources (GPU) and expensive computation of language model Bert presented a limitation for extracting its accuracy. The baseline models achieved good accuracy with random forest having highest value of 81%. This work does not imply an innovation in the state of the art, only the application of methods learned in the Master of Data Science to a problem of interest.