News Keyword Trend Analyzer
📰 Real-Time News Keyword Trend Analyzer
Tracking trending keywords in news articles using real-time data
pipelines
“See the world’s breaking news trends — live.”
📌 One-Line Summary
A real-time news keyword trend analyzer that collects live news
headlines, processes them to extract popular keywords, and visualizes
their trends in real time using Kafka, Apache Flink, Elasticsearch,
and Kibana.
1️⃣ How It Works
1. Data Collection — Kafka Producer (news_producer.py
)
- Fetches top news headlines from NewsAPI.org
- Sends news titles to the Kafka topic
news
every 30 seconds - Uses environment variables for API key and EC2 host configuration
2. Real-Time Processing — Flink Consumer (keyword_trend_analyzer.py
)
- Consumes news data from Kafka
- Cleans titles (lowercasing, removing special characters)
- Extracts keywords (words with ≥4 letters)
- Counts keyword frequencies in real-time
- Sends processed results to Elasticsearch
3. Storage & Visualization — Elasticsearch & Kibana
- Stores keyword counts in Elasticsearch index
news_keywords
- Visualizes trends in Kibana dashboards
- Keyword frequency charts
- Trend over time graphs
2️⃣ System Architecture
[NewsAPI] → [Kafka Producer] → [Kafka Topic: news] → [Flink Consumer] → [Elasticsearch] → [Kibana Dashboard]
🚀 Quick Start
- Clone & Install — Download the repository and install
dependencies\ - Start Services — Launch Kafka, Flink, Elasticsearch, and
Kibana\ - Run Scripts — Start the Kafka producer and Flink consumer\
- Visualize — Open Kibana to see real-time keyword trends
📎 Full setup guide: View on
GitHub
4️⃣ Usage
- The Kafka Producer streams news data to Flink.
- Flink processes titles → extracts keywords → counts occurrences.
- Elasticsearch indexes keyword trends.
- Kibana displays live keyword frequency and trend graphs.
🛠 Technologies Used
Step Technology
Data Source NewsAPI.org
Streaming Apache Kafka
Processing Apache Flink
Storage Elasticsearch
Visualization Kibana
Language Python
💡 Key Learnings
- Apache Flink’s stream processing is powerful for real-time
analytics. - Kafka ensures scalable and fault-tolerant data streaming.
- Elasticsearch + Kibana make it easy to explore and visualize trends
instantly.
🔗 GitHub Repository
All articles on this blog are licensed under CC BY-NC-SA 4.0 unless otherwise stated.