AI-driven PR analytics monitoring 50,000+ news sources daily
A global PR agency couldn't manually monitor 50,000+ news sources or hit a sub-hour crisis-response window. We built a Python + transformer pipeline that crawls, classifies, and ranks news in real time — 94% classification accuracy with sub-60-second alerts.
The challenge
A global PR agency monitored brand sentiment across 50,000+ news sources daily for enterprise clients. The intake was a small army of analysts skim-reading feeds, tagging items, and writing executive briefings by hand. Average crisis response time was 4+ hours — long past the window where a brand could meaningfully shape the story.
Worse: the agency had no way to benchmark client coverage against competitors, and the executive briefings written each Friday were already a week stale by the time they hit a CEO’s inbox.
What we built
A real-time intelligence pipeline that ingests, classifies, and ranks news at scale:
- Multi-source crawlers for 50,000+ web, RSS, and licensed-feed sources, deduplicated and language-detected on ingest
- Custom fine-tuned transformer for sentiment, topic, and entity classification — trained on the agency’s own historical PR corpus
- Crisis alerting that fires within 60 seconds of a high-impact mention, ranked by reach × sentiment × velocity
- Competitive benchmarking across client and competitor brands on the same metrics
- Automated executive briefings generated nightly per client account
The classifier reaches 94% accuracy on the agency’s own labelled test set — and gracefully degrades when faced with novel domains.
Architecture
A Python-first pipeline on AWS designed for both throughput and inference latency:
- Ingest: FastAPI + async crawlers behind AWS Lambda, with SQS for backpressure
- Storage: Elasticsearch for full-text + facet search, S3 for raw archive, PostgreSQL for client config
- ML: HuggingFace transformers fine-tuned on the agency’s corpus, served via a managed endpoint with autoscaling
- Alerting: Real-time stream processor watching ranked outputs, dispatching to Slack, email, and SMS
- Briefings: Nightly batch job that summarises the day’s signal into client-tailored Markdown / PDF
Outcomes
- 94% classification accuracy on the in-house test set
- 70% reduction in manual review effort across the analyst team
- Crisis response window cut from 4+ hours to under 60 seconds
- 50k+ news sources monitored continuously, in 12 languages
- 8-second average end-to-end latency from publish to ranked feed
- Automated weekly executive briefings replacing the manual Friday process
Why it worked
Three calls shaped the outcome:
- Fine-tune over prompt. Off-the-shelf LLM classification wasn’t accurate enough on PR-specific framing. Fine-tuning on the agency’s labelled corpus closed the gap and brought inference cost down by an order of magnitude.
- Rank, don’t just classify. The hard problem isn’t “is this negative?” — it’s “is this the negative one we should care about right now?” Reach × sentiment × velocity surfaces the right item in seconds.
- Briefings are a product, not a script. The executive briefings are versioned, A/B tested for retention, and regenerated on demand. They became a stand-alone client-facing surface.
The pipeline is now central to the agency’s enterprise tier and is the basis for its competitive intelligence subscription product.