📰 Real-Time News Keyword Trend Analyzer

Tracking trending keywords in news articles using real-time data
pipelines

“See the world’s breaking news trends — live.”

News Keyword Trend Analyzer


📌 One-Line Summary

A real-time news keyword trend analyzer that collects live news
headlines, processes them to extract popular keywords, and visualizes
their trends in real time using Kafka, Apache Flink, Elasticsearch,
and Kibana
.


1️⃣ How It Works

1. Data Collection — Kafka Producer (news_producer.py)

  • Fetches top news headlines from NewsAPI.org
  • Sends news titles to the Kafka topic news every 30 seconds
  • Uses environment variables for API key and EC2 host configuration

  • Consumes news data from Kafka
  • Cleans titles (lowercasing, removing special characters)
  • Extracts keywords (words with ≥4 letters)
  • Counts keyword frequencies in real-time
  • Sends processed results to Elasticsearch

3. Storage & Visualization — Elasticsearch & Kibana

  • Stores keyword counts in Elasticsearch index news_keywords
  • Visualizes trends in Kibana dashboards
    • Keyword frequency charts
    • Trend over time graphs

2️⃣ System Architecture

[NewsAPI] → [Kafka Producer] → [Kafka Topic: news] → [Flink Consumer] → [Elasticsearch] → [Kibana Dashboard]

News Keyword Trend Analyzer


🚀 Quick Start

  1. Clone & Install — Download the repository and install
    dependencies\
  2. Start Services — Launch Kafka, Flink, Elasticsearch, and
    Kibana\
  3. Run Scripts — Start the Kafka producer and Flink consumer\
  4. Visualize — Open Kibana to see real-time keyword trends

📎 Full setup guide: View on
GitHub


4️⃣ Usage

  • The Kafka Producer streams news data to Flink.
  • Flink processes titles → extracts keywords → counts occurrences.
  • Elasticsearch indexes keyword trends.
  • Kibana displays live keyword frequency and trend graphs.

🛠 Technologies Used

Step Technology


Data Source NewsAPI.org
Streaming Apache Kafka
Processing Apache Flink
Storage Elasticsearch
Visualization Kibana
Language Python


💡 Key Learnings

  • Apache Flink’s stream processing is powerful for real-time
    analytics
    .
  • Kafka ensures scalable and fault-tolerant data streaming.
  • Elasticsearch + Kibana make it easy to explore and visualize trends
    instantly.

🔗 GitHub Repository

📂 View Project on
GitHub