Research Agent: Case Study

We built our own market intelligence agentbefore we sold it to anyone else.

6 RSS feeds. 15 subreddits. Scheduled discovery runs all day. Structured synthesis runs deliver only what is worth action. Here is the architecture and current cadence.

Note: The sources and tools shown below are what we use for our own intelligence system. Your build monitors whatever matters to your business: competitors, regulators, industry publications, customer channels, using the feeds and APIs that fit your stack.

The Architecture

Two-stage pipeline. Discovery then synthesis.

Discovery jobs collect and stage raw signal. Synthesis jobs score, summarize, and deliver only high-value output.

Stage 1
Discovery
RSS + Reddit + Web + GitHub, staged as raw data
Stage 2
Synthesis
Scoring + Summaries + HIGH-only delivery

Discovery output is treated as untrusted input. Synthesis validates, scores, and only delivers actionable items.

The Numbers

What it monitors and how often.

6
RSS feeds monitored daily: TechCrunch AI, The Verge AI, MIT Tech Review, Wired AI, LangChain Blog, Ars Technica
15
Subreddits scanned 3x per day for buying signals, competitor mentions, and emerging pain points
7:50
Morning AI briefing schedule, Mon to Thu, Sat, and Sun
48h
Freshness cutoff, anything older than 48 hours gets filtered out before synthesis
How It Works

Current production schedule.

Daily operations

Two-stage execution model

Discovery stage Synthesis stage Deduplication memory

Every run follows the same contract. Stage 1 discovery gathers source material and writes staged records. Stage 2 synthesis reads staged records as untrusted input, scores relevance, and creates operator-ready output.

This structure keeps collection and interpretation separate. It also makes retries and audits straightforward because staged artifacts are persisted before synthesis runs.

Reddit cadence, every day

Reddit discovery and synthesis

Discovery 7:30 AM + 3:30 PM Synthesis 8:00 AM + 4:00 PM HIGH-only delivery

Reddit Discovery runs at 7:30 AM and 3:30 PM. It stages candidate posts and updates an alerts log silently. No direct user delivery happens in this step.

Reddit Synthesis runs at 8:00 AM and 4:00 PM. It processes staged candidates, applies the scoring model, and delivers only HIGH-priority results.

Briefings and weekly intelligence jobs

Scheduled outputs

Morning AI Briefing Partner and competitor jobs Friday deep dive

Morning AI Briefing runs at 7:50 AM on Monday through Thursday, plus Saturday and Sunday.

Monday Partner Discovery runs at 8:00 AM. Monday Competitor Intel runs at 10:00 AM. Friday Deep Dive runs at 10:00 AM.

The daily briefing source set still includes 6 RSS feeds. These are combined with web and repository signal before synthesis scoring.

Injection defense and data hygiene

Untrusted input handling

UNTRUSTED EXTERNAL DATA headers Stage boundary controls Skip-and-continue policy

Discovery records include explicit UNTRUSTED EXTERNAL DATA headers. This marks source text as third-party content and blocks instruction carryover.

During synthesis, staged discovery content is always treated as untrusted. If a record appears suspicious, malformed, or prompt-injection-like, the run skips that record and continues without blocking the rest of the batch.

What It Delivers

A briefing, not a firehose.

The goal is not to surface everything. It is to surface the five things that actually matter today, with a clear note on whether each one requires action now, watching, or filing away.

We read this every morning. It takes about two minutes. When something is red-flagged, we act on it the same day. When something is green, we file it and move on. There is no inbox to manage, no RSS reader to open, no dashboard to check.

The same design principle applies regardless of the domain: competitor monitoring, regulatory tracking, talent market intelligence. The output should be actionable in under five minutes.

Morning Briefing: 7:52AM today
Anthropic releases Claude 4 with extended context

Direct upgrade path for our agent stack. Review model pricing before Monday client calls.

LangChain adds native HubSpot integration

Worth watching. May simplify our CRM layer for future builds. Not urgent.

GitHub trending: browser-use (14k stars this week)

Agent browser automation. Useful for form-filling workflows. File for later.

McKinsey: 67% of enterprises plan AI ops spend in 2026

Useful reference for outreach. Add to pitch context.

Four items. Two minutes. No inbox required.

The Point

The sources change. The architecture stays the same.

We monitor the AI landscape because that is the domain that matters to us. The same pipeline, ingest, filter, synthesize, deliver, deduplicate, works for any intelligence problem.

Competitor Monitoring

Track pricing page changes, job postings, press releases, and review site feedback for specific competitors. Deliver a weekly brief on what changed.

Regulatory Tracking

Monitor agency publications, Federal Register filings, and trade association updates for rule changes relevant to your industry. Flag anything that requires a response.

Talent Market Intelligence

Track hiring trends in your space: what roles competitors are adding, what skills they are looking for, where talent is moving. Useful for workforce planning and comp benchmarking.

Talk to us about your workflow

Fixed price. Two to four weeks. You own the code.