Headline Analysis & Prediction Tool

Newspaper Headline Analysis & Prediction Tool

Headlines attract attention and they are a deciding factor whether a reader will continue reading an article or not. We have used the power of AI to predict good headlines.

“Молодий Буковинець” (translation: Young Bukovynian) newsletter is a dominant outlet among the regional editions. The news site covers local and national news, with an emphasis on the local news and events relevant to the citizens of the region. The readership is about 180K citizens of Bukovyna.

The Project Steps

Analysis of the Theory

Our first step was to analyze the headlines of the news portal and identify trends and factors that contribute to the effectiveness of the headline.

Considering the amount of information available for analysis, we have implemented an AI/ML feature to continuously process the incoming article analytics.

We also used the data about the target audience to narrow the results.

Creating Headline Predicting Software

Once we had our initial data on how to increase reach with the help of headlines, we started developing a software solution that would assist the authors to create more effective headlines and predict their success.

Since the project uses the Ukrainian language, this added another layer of specificity to the project since there is a lot of data available for the English language, but none for Ukrainian.

Technologies we used

GCP
docker
python
flask
TensorFlow
keras
jupyter
React

Types of Analysis Implemented

We have analyzed the dataset with the publications for the following criteria:

View analysis correlation on two criteria: 2 hours after posting the piece and 24 hours after posting the piece

TF-IDF analysis to determine the importance of the specific words to a document in a corpus

LDA topic modeling to round up the relevant tag keywords.

The analysis was performed on a Python-based natural language pipeline Polyglot.

Results

As a result, we have constructed a correlation matrix that illustrates the dependence of these factors on the success rate of the headline. The graph includes the following correlations:

  • Publication time (2 and 24 hours after posting)
  • Entity count (including person, organization, and location count)
  • Title length, polarity, digits, and sentiment score

The solution we got in the result became a foundation for the prototype of Headline Success Prediction Interface: the prototype works on a predictive model based on the AutoML predictions model and custom ML model. Our client was happy with the way the product turned out and for us, it was a great opportunity to show our experience in AI/ML technology.


Would you like to empower your business with Artificial Intelligence / Machine Learning?

Leave your details below and let’s get down to business.