The ProfessionAI company wants to create a library capable of analyzing received emails. Specifically, the CEO requests to identify the SPAM emails on which to carry out content analysis. The CTO specifically provides you with a dataset and asks you to:
- Train a classifier to identify SPAM;
- Identify the main topics among the SPAM emails present in the dataset;
- Calculate the semantic distance between the topics obtained, to deduce their heterogeneity;
- Extract the Organizations quoted the NON-SPAM emails.