Predicting the results of the Oscar 2019 with artificial intelligence!

Another year goes by and after it comes the long-awaited Academy Awards, The Oscars, continuing the project of last year kicking the results of the Oscar 2018 with artificial intelligence! I’m here another year with the same mission and ask:

Given the history of the Oscar categories and their winners, is it possible to predict the next winners? Is there any pattern in the winners? Would an artificial intelligence (AI) be able to discover future winners based on certain relevant characteristics?

After last year some tasks remained for the challenge of this new year:

  • Code Refactoring
  • Sentiment analysis by Twitter
  • Inclusion of new Oscar categories to be predicted: Photography, Animation and Best Music
  • Inclusion of new critical attributes in datasets

In regards to these tasks I am disappointed with me, concluding only a small refactoring to ensure that the code is executable and performs better. However this year I decided to apply some important techniques in order to improve the accuracy of the models and the final result, among which I can highlight the use of:

  • Standardization of data
  • Data Scaling (Scaling)
  • Principal Component Analysis (PCA)
  • Selection of the best features/attributes

From the strategies discussed above, after several empirical tests, applying the techniques individually and together only the use of data scaling with the RobustScaler algorithm of scikit-learn has brought good results, feel free to play with these techniques in the algorithm, they are commented and ready to be used. Another technique to be tested in the future is the selection of training and test data with k-Fold, something I did not prioritize for this year due to the good distribution of data in our dataset.

The algorithm with the refactorings can be found here:

This year we will continue to try to predict the following categories:

  • Supporting actress
  • Supporting actor
  • Original Screenplay
  • Adapted Screenplay
  • Actress
  • Actor
  • Movie
  • Director

For this we will have a dataset of nominees and winners from 1980 to 2018! And we will have the following attributes to help us with this task:

  • Guild Winners
  • Golden Globe Winners
  • Movie time in minutes
  • Box office in dollars
  • IMDB Score
  • Users score (Metacritic)
  • Critical score (Metacritic)
  • If the film was produced in the United States
  • Rating (Metacritic)
  • Movie Release Quarter

Before we go to what interests I would like to highlight a new difficulty found this year, which was in relation to the box office of films produced by Netflix, such as Rome and The Ballad of Buster Scruggs, since these figures are nonexistent since there is no box office for the platform and in addition Netflix does not make available to the public the return that their originals guarantee. In these cases the approach chosen was to assign the box office average of the films in the category in question to the Netflix films.

And here we go!

Best Supporting Actor: Mahershala Ali in Green Book

Best Supporting Actress: Rachel Weisz in The Favorite

Best Adapted Screenplay: A Star Is Born

In this category I faced a problem of tie between possible winners, to get around the problem I decided to do some tests applying different techniques and algorithms until arriving at a result without tie. For this I removed the application of the algorithm RobustScaler and applied PCA with the number of components equal to 8.

Best Original Screenplay: The Favorite

Best Director: Alfonso Cuarón in Rome

Best Actor: Rami Malek in Bohemian Rhapsody

Best Actress: Glenn Close in The Wife

Best Picture: Bohemian Rhapsody

Here I had the same problem reported in the Best Adapted Screenplay category and I do the the same strategy, removing RobustScaler and applying PCA with the number of components equal to 8.

Well done!

I confess that I really enjoyed the final results including I will really bet on them hahaha. Now enough we look forward to the real results on February 24 and hope not to lose money on these bets rsrs!

A passionate developer, working as QA Engineer at QuintoAndar, also an App Developer on weekends and a Data Scientist on free time. Founder App Teste Eneagrama.