I want you to help me build a personal daily morning briefing system that runs autonomously on a schedule and produces a single HTML page I can read on my phone over coffee.

I'm a physician (or clinician-researcher). The flagship section will almost certainly be a **daily PubMed scan in my specialty** — filtered through a quality rubric, deduped against papers I've already seen, with the wheat separated from the chaff before it touches my eyes. Other sections (institutional news, guideline tracker, deliverables, market data, etc.) plug in around it.

A working reference system has these properties:
- Runs at a fixed time each morning (e.g., 5 AM weekdays)
- Outputs ONE HTML file at a fixed path, overwritten daily
- Has 3–6 sections, each tailored to my work
- Each section pulls from a different source (PubMed, news, society RSS, a task tracker, etc.)
- Keeps costs low by doing **retrieval in plain Python scripts** and only spending LLM tokens on triage, summarization, and rendering
- Is **idempotent** — running it twice in one day doesn't damage outputs
- Persists state between runs (dedups against a "seen" list of PMIDs and DOIs I've already looked at)

## Your job has five phases. Do them in order. Don't skip ahead.

### Phase 1 — Interview me

Ask focused questions until you understand the answers below. Don't ask all at once — start broad, adapt to my answers. If I'm vague, propose 2–3 concrete options. Cap at ~6 turns.

**About my clinical/research domain:**
1. What's my specialty (or sub-specialty / research focus)? What PubMed queries should run daily? (Aim for 6–12 queries, each narrow enough to be useful — e.g., "carotid stenosis," "limb preservation / CLTI / diabetic foot," "LLMs in healthcare." Not "vascular surgery" alone — too noisy.)
2. Which journals do I consider Tier 1 / must-not-miss? (NEJM, Lancet, JAMA, JAMA Surgery, Annals of Surgery, my specialty's flagship — fill in.)
3. What study designs do I weight up? (RCTs, large registries, validated prediction models, etc.)
4. What's the "would discuss this at conference / would change my practice" bar for me? Examples help.
5. Should LLM/genAI-in-medicine papers be in scope? If yes, what's the quality bar? (Most are noise — exclude "we asked ChatGPT" papers unless in a top journal.)

**About the other sections:**
6. What else do I check every morning? Candidates for clinicians: institutional news (my health system, my department), society announcements (SVS, ACS, my board), conference deadlines (abstract submissions, registration), my own project tracker, FDA approvals / drug shortages, financial data, weather.
7. For each section, what's the data source? (API, website with no API, RSS, MCP, my own database.)
8. What's the *judgment* you should make for each? E.g., "filter 100 papers to 5" vs. "summarize the top 3 headlines" vs. "just show a number."
9. Any thresholds or alerts? E.g., "alert if mortgage drops below 6%," "highlight deliverables due today in red."

**About the system:**
10. What time should it run, and on what days?
11. Where should the HTML live? (Default: a fixed path on disk, opened in browser or on phone.)
12. Do I already have tools, scripts, databases, or MCP servers it should read from? (No need to seed the dedup list from anything — it fills in automatically each day as the briefing runs.)

### Phase 2 — Confirm a plan

Before writing any code, summarize back to me:
- The sections you'll build, in display order (PubMed lit review usually anchors the bottom or top — your call based on my reading habit)
- For the PubMed section: the query list, the dedup source, the rubric outline
- The data source and judgment task for each other section
- What runs in a Python script vs. what costs LLM tokens
- Where files will live (script paths, state file paths, HTML output path)
- The scheduled-task entry point and trigger time
- A rough cost estimate per run (a working one is ~$0.05–0.15/day)

Wait for my "go" before continuing.

### Phase 3 — Build it

Write:
- A skill/task definition file (e.g., `~/.claude/scheduled-tasks/<name>/SKILL.md`) that orchestrates the run
- Python retrieval scripts in a dedicated `scripts/` directory
- Reference files (rubric, query list, configs) in `references/`
- Initial state files (empty `seen.json`, etc.) so the first run doesn't crash on missing input

#### The PubMed pattern (mandatory for any clinical lit-scan section)

This is the highest-leverage piece. Get it right and the rest is easy.

1. **A Python retrieval script (`scripts/pubmed_search.py`)** that uses NCBI E-utilities to:
   - Read a `queries.json` file listing my topic queries with names
   - For each query, fetch the last N days of PubMed hits (default N=1, but N=3 on Monday to cover the weekend)
   - Pull `pmid`, `doi`, `title`, `authors`, `journal`, `pubdate`, and `abstract` for each
   - **Dedup** against a persistent "seen" list of PMIDs and DOIs — papers I've already looked at and don't want to see again. This is just a flat state file, not a wiki. It is the single biggest signal-to-noise win.
   - Write a small JSON to `/tmp/pubmed_YYYY-MM-DD.json` for the LLM to read
   - Use my email in the User-Agent (NCBI requires this for high-volume queries)
   - Sleep ~350 ms between calls to respect NCBI rate limits

2. **A rubric file (`references/rubric.md`)** — this is the filter the LLM applies after the script returns candidates. The rubric is opinionated and personal. Draft a starting version from my interview answers, then let me edit it. A working rubric has roughly this shape:

   ```
   You are filtering N raw PubMed candidates down to a 5–15 paper digest.
   Signal over volume — 3 excellent papers beats 15 mediocre. When in doubt, leave it out.

   INCLUDE if ANY:
   - Tier 1 journal: <list my journals>
   - Methodologic strength: RCT, large prospective cohort (n>500),
     well-designed meta-analysis (PRISMA), validated prediction model
     with external validation
   - Practice-changing potential in my specialty
   - Direct relevance to my active research projects (list them)
   - High-quality LLM/genAI paper (rigorous eval, real clinical task)

   EXCLUDE:
   - Case reports (n<20) unless in a top-5 journal
   - Narrative reviews adding nothing new
   - Letters/editorials except in NEJM/Lancet/JAMA
   - Single-center retrospective with no novel method
   - "We asked ChatGPT and compared to textbook" papers
   - Predatory journals

   OUTPUT per paper: citation, why-it-matters (one line),
   2–3 sentence summary, suggested tier.

   For excluded papers, produce a "Quick Hits" appendix with citation only —
   no summary — so I can skim what was filtered without paying summary tokens.

   If 0 papers pass, say "no new papers today." Do not pad.
   If >20 pass, tighten further.
   ```

3. **The orchestration (in the SKILL.md)**: run the script, then read the rubric, then read the candidates JSON, then write a markdown digest *and* a section in the morning HTML. Save the digest at a dated path like `outputs/YYYY-MM-DD_lit-digest.md` — the idempotency guard in the principles list relies on detecting today's existing digest before re-running. **At the very end of a successful run, append every PMID and DOI from today's candidate set to the seen list.** This is what makes the briefing self-deduplicating: anything I have been shown today will not reappear tomorrow. Do this as the *last* step, so a mid-run failure never leaves orphan "seen" entries. The script does retrieval; the LLM does only the rubric-based triage and the summaries.


#### Architectural principles — apply to every section

These come from real failures in a working system:

- **Retrieval in scripts, judgment in prompts.** Don't waste LLM tokens fetching JSON from APIs — that's what Python is for. The LLM should only see the filtered, structured candidate set, then do triage, summarization, and HTML rendering.
- **Idempotent — the single most important rule.** Before writing today's output, check whether a non-empty version already exists. If it does, *preserve it*. Don't clobber a rich earlier-run digest with an empty "nothing new today" version when a re-fire finds zero new candidates because the earlier run already indexed them. This bit a real system on day one.
- **Dedup is the signal-to-noise win.** For PubMed: dedup against the user's "seen" list (PMID/DOI set). For news: maintain a `seen.json` capped at ~200 URLs, normalize URLs before comparing (strip trailing slash, query strings, `utm_*` params), and also skip items whose headline is substantively the same story.
- **The rubric is editable, not hard-coded.** Put it in `references/rubric.md` so I can tweak the thresholds without touching the orchestration. Re-read it every run.
- **Narrow data queries.** If pulling from a database or large JSON blob, write a query that returns just today's relevant slice. Don't dump 20 KB into the context to use 400 bytes of it. For nested JSON, use a recursive CTE.
- **One HTML artifact, fixed path, overwritten daily.** Design for phone reading: max-width ~720 px, semantic colors for urgency (red = due today, yellow = soon, green = later), tier badges for papers (Tier 1 / Tier 2 / Tier 3), clickable PMID links to `pubmed.ncbi.nlm.nih.gov/{pmid}/`, target="_blank".
- **Fail loudly on missing dependencies.** If a critical path doesn't exist (the project directory, the API key, the database), exit early with a clear message. Don't silently write to a fallback location.
- **Compose, don't bundle.** Each section should be independently runnable so I can debug or extend one without touching the others.
- **Quality control on the rubric itself.** If the rubric leaves the user with 0 papers, say so explicitly — don't pad with marginal items. If it leaves >20, tighten further. The digest is for a 5-minute coffee review, not a Tuesday afternoon.

### Phase 4 — Test it

Run the task end-to-end once with today's real data. Show me the HTML output. Walk me through the lit review section paper-by-paper so I can pressure-test the rubric — if a paper survived that shouldn't have, the rubric needs sharpening. Then confirm the auto-append worked: show me the seen list before and after the run, so I can see today's PMIDs are now in it. Iterate until the digest matches my taste. Don't claim done until I've seen real output, approved the rubric calibration, and verified the dedup list updated correctly.

### Phase 5 — Schedule it

Set up the scheduled trigger at the time I picked. First check whether the `schedule` skill is available on this machine; if it is, use it — that's the simplest path. If it's not available, set up an OS-level scheduled task (cron on Mac/Linux, Task Scheduler on Windows) and walk me through verifying it in plain English. Don't assume I know what crontab is. Show me the entry you wrote and read it back to me in normal sentences ("this will run at 5 AM every weekday, calling X") so I can confirm it's right.

---

## Things to NOT do

- Don't start with code. Interview first.
- Don't ask me to fill out a giant form. Talk to me.
- Don't over-engineer v1. Three solid sections (PubMed + one news + one personal) beat six half-broken ones.
- Don't update the dedup list before the digest is finalized. The append step is the *last* thing the run does, so a failed or partial run never leaves orphan "seen" entries.
- Don't invent PMIDs, DOIs, or abstract text. If the script didn't return it, you don't have it.
- Don't silently swallow errors. Surface them.
- Don't write to `~` or `/tmp` for state files — use a project-scoped directory I control. (Per-run intermediate JSON in `/tmp` is fine.)

Start with phase 1, question 1.