feat: Calendar Agent + IoT Agent design + DB migration

- Deployed Calendar Agent (n8n ID: 4ZIEGck9n4l5qaDt) - 12 Google Calendars via HA proxy, cron 06:30 - GPT-4.1 batch classification -> memory_facts - Telegram daily briefing - DB: added source_ref column + dedup index on memory_facts - DB: created ha_sensor_config table (IoT Agent sensor allowlist) - 9 seed entries (Pixel 10, Pixel Watch, EY HP, Spotify, GPS) - README: full IoT Agent design documentation - Sensor allowlist (regex), LLM-based activity inference - Three-layer data flow, confidence-gated clarification - README: Calendar Agent design + workflow diagram - README: updated infra table, ADR broker, credentials - CHANGELOG: Calendar Agent milestone Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-21 13:45:12 +00:00
parent 841d3a93f6
commit 90d9faacca
3 changed files with 258 additions and 34 deletions
--- a/README.md
+++ b/README.md
@@ -41,11 +41,9 @@ Production-grade self-hosted stack. Key components relevant to ALPHA_PROJECT:
 |---|---|
 | **n8n** | Primary orchestrator and workflow engine for all agents |
 | **Node-RED** | Event-driven automation, Home Assistant bridge |
-| **Patroni / PostgreSQL** | Persistent structured memory store |
+| **Patroni / PostgreSQL** | Persistent structured memory store — `postgres.persistence.svc.cluster.local:5432/pompeo` |
 | **Qdrant** | Vector store for semantic/episodic memory — `qdrant.persistence.svc.cluster.local:6333` |
-| **NATS / Redis Streams** | Message broker between agents *(to be chosen and deployed)* |
-| **Authentik** | SSO / IAM (OIDC) |
-| **Home Assistant** | IoT hub — device tracking, automations, sensors |
+| **Home Assistant** | IoT hub — device tracking, automations, sensors, Google Calendar proxy |
 | **MikroTik** | Network — VLANs, firewall rules, device presence detection |
 | **Paperless-ngx** | Document archive (`docs.mt-home.uk`) |
 | **Actual Budget** | Personal finance |
@@ -95,22 +93,9 @@ ALPHA_PROJECT uses specialized agents, each responsible for a specific data doma

 ### Message Broker (Blackboard Pattern)

-Agents do not call each other directly. They publish observations to a **central message queue** (NATS JetStream or Redis Streams — TBD). The **Proactive Arbiter** consumes the queue, batches low-priority messages, and immediately processes high-priority ones.
+Agents do not call each other directly. They write observations to the **`agent_messages` table** in PostgreSQL (blackboard pattern). The **Proactive Arbiter** polls this table, batches low-priority messages, and immediately processes high-priority ones. High-urgency events trigger a direct n8n webhook call bypassing the queue.

-Message schema (all agents must conform):
-
-```json
-{
-  "agent": "mail",
-  "priority": "low|high",
-  "event_type": "new_fact|reminder|alert|behavioral_observation",
-  "subject": "brief description",
-  "detail": {},
-  "source_ref": "optional reference to postgres record or external ID",
-  "timestamp": "ISO8601",
-  "expires_at": "ISO8601 or null"
-}
-```
+**ADR: No dedicated message broker** — Postgres is sufficient for the expected message volume and avoids operational overhead. Revisit if throughput exceeds 1k messages/day.

 ### Memory Architecture

@@ -176,6 +161,148 @@ Each Qdrant point includes a metadata payload for pre-filtering (`user_id`, `sou

 User preferences, fixed facts, communication style. Updated manually or via explicit agent action.

+---
+
+## IoT Agent — Design Notes
+
+### Data Source: Home Assistant
+
+Home Assistant (`http://10.30.20.100:8123`, HA OS 2026.3.2, Alzano Lombardo BG) is the primary hub for physical-world context. It aggregates Google Pixel 10, Pixel Watch 4, smart home devices, and 25 Google Calendars.
+
+**Person allowlist** (permanent by design — `person.ajada_tahiraj` is explicitly excluded):
+
+| Person | Entity | Notes |
+|---|---|---|
+| Martin Tahiraj | `person.martin_tahiraj` | ✅ Tracked |
+| Ajada Tahiraj | `person.ajada_tahiraj` | ❌ Excluded (sister — privacy) |
+
+**Key sensors for Martin:**
+
+| Sensor | Entity ID | Signal |
+|---|---|---|
+| Activity (Google) | `sensor.pixel_10_detected_activity` | still / walking / running / in_vehicle |
+| Geocoded location | `sensor.pixel_10_geocoded_location` | Human-readable street address |
+| EY laptop | `device_tracker.ey_hp` | Router tracker — online = laptop on home WiFi |
+| Spotify | `media_player.spotify_martin` | Current track, playing/paused |
+| Sleep duration | `sensor.pixel_10_sleep_duration` | Pixel Watch 4 |
+| Next alarm | `sensor.pixel_10_next_alarm` | Scheduled wake-up |
+| Work Profile | `binary_sensor.pixel_10_work_profile` | Android Work Profile active |
+| Screen on | `binary_sensor.pixel_10_interactive` | Phone screen on/off |
+| Do Not Disturb | `binary_sensor.pixel_10_do_not_disturb` | DND mode |
+| Daily steps | `sensor.pixel_10_daily_steps` | Pixel Watch 4 |
+| Heart rate | `sensor.pixel_10_heart_rate` | Pixel Watch 4 |
+| GPS Zone | `person.martin_tahiraj` | home / not_home / zone name |
+
+Room presence sensors (PIR-based) are considered **unreliable** — excluded for now.
+
+### Sensor Allowlist — `ha_sensor_config`
+
+Instead of hardcoded rules, the IoT Agent uses a dynamic allowlist stored in Postgres. Sensors are matched by **regex pattern**, allowing glob-style additions:
+
+```sql
+CREATE TABLE ha_sensor_config (
+  id          SERIAL PRIMARY KEY,
+  pattern     TEXT NOT NULL,          -- regex pattern, e.g. 'sensor\.pixel_10_.*'
+  user_id     TEXT NOT NULL,
+  group_name  TEXT NOT NULL,          -- 'mobile_device' | 'work_presence' | 'entertainment' | ...
+  description TEXT,
+  active      BOOLEAN NOT NULL DEFAULT true
+);
+
+-- Seed entries
+INSERT INTO ha_sensor_config (pattern, user_id, group_name, description) VALUES
+  ('sensor\.pixel_10_.*',           'martin', 'mobile_device', 'All Pixel 10 sensors'),
+  ('device_tracker\.ey_hp',         'martin', 'work_presence', 'EY Laptop router tracker'),
+  ('media_player\.spotify_martin',  'martin', 'entertainment', 'Spotify'),
+  ('binary_sensor\.pixel_10_.*',    'martin', 'mobile_device', 'Pixel 10 binary sensors'),
+  ('person\.martin_tahiraj',        'martin', 'presence',      'Martin GPS zone state');
+```
+
+This allows adding new sensors (e.g. `sensor.pixel_watch_.*`) without workflow changes.
+
+### Activity State Machine (LLM-based — no fixed rules)
+
+The IoT Agent sends a snapshot of all allowlisted sensor values to GPT-4.1 and asks it to infer the current activity label and confidence. **No if/else rules are coded** — the LLM performs inference.
+
+Example LLM output:
+```json
+{
+  "activity": "home_working",
+  "confidence": 0.92,
+  "do_not_disturb": true,
+  "location": "home",
+  "notes": "Laptop EY online, work profile attivo, orario lavorativo 09-18"
+}
+```
+
+Activity labels: `sleeping`, `home_relaxing`, `home_working`, `commuting`, `at_office`, `out_errands`, `out_with_dog`, `exercising`, `traveling`, `unknown`.
+
+### Three-Layer Data Flow
+
+| Layer | Trigger | Frequency | Output |
+|---|---|---|---|
+| Webhook | HA automation (zone change, motion) | Event-driven | Immediate `agent_messages` entry |
+| Polling | n8n cron | Every 20 min | Sensor snapshot → LLM → `behavioral_context` |
+| Daily cron | n8n cron midnight | Once/day | Day summary → Qdrant `episodes` embedding |
+
+### Historical Bootstrap
+
+One-time job: last 12 months of HA sensor history → daily LLM summaries → Qdrant `episodes`.
+- Source: HA History API (`/api/history/period/{start}?filter_entity_id=...`)
+- Output: one Qdrant point per day per user, with full behavioral context
+
+### Confidence-Gated Clarification
+
+When activity inference confidence < 0.6, or when Pompeo detects a potential life change (new employer from emails, travel pattern, etc.), it asks Martin directly via Telegram:
+
+> "Ciao Martin, sto notando email di Avanade — lavori ancora per EY o sei passato lì? 🤔"
+
+Pompeo updates `user_profile` or `memory_facts` with the confirmed fact and adjusts its confidence threshold.
+
+---
+
+## Calendar Agent — Design Notes
+
+### Design Decisions
+
+- **Data source**: Google Calendar events fetched via **Home Assistant REST API** (`/api/calendars/{entity_id}?start=&end=`) — HA proxies all 25 calendars and removes the need for a direct Google OAuth credential in n8n.
+- **Dedup**: `memory_facts.source_ref` stores the HA event UID; `ON CONFLICT (user_id, source, source_ref) WHERE source_ref IS NOT NULL DO NOTHING` prevents duplicates.
+- **LLM enrichment**: GPT-4.1 classifies each event in batch (category, action_required, do_not_disturb, priority, behavioral_context, pompeo_note).
+- **No Qdrant embedding yet** (Phase 2): individual events go to Postgres only; a weekly aggregated embedding will be added later.
+
+### Calendars Tracked
+
+| Calendar | Entity ID | Category | User |
+|---|---|---|---|
+| Lavoro | `calendar.calendar` | work | martin |
+| Famiglia | `calendar.famiglia` | personal | martin |
+| Spazzatura | `calendar.spazzatura` | chores | martin |
+| Pulizie | `calendar.pulizie` | chores | martin |
+| Formula 1 | `calendar.formula_1` | leisure | martin |
+| WEC | `calendar.lm_wec_fia_world_endurance_championship` | leisure | martin |
+| Inter | `calendar.inter_calendar` | leisure | martin |
+| Compleanni | `calendar.birthdays` | social | martin |
+| Varie | `calendar.varie` | misc | martin |
+| Festività Italia | `calendar.festivita_in_italia` | holiday | shared |
+| Films (Radarr) | `calendar.films` | leisure | martin |
+| Serie TV (Sonarr) | `calendar.serie_tv` | leisure | martin |
+
+### n8n Workflow
+
+**`📅 Pompeo — Calendar Agent [Schedule]`** — ID `4ZIEGck9n4l5qaDt`
+
+```
+⏰ Schedule (06:30) → 📅 Imposta Range → 🔑 Token Copilot
+  → 📋 Prepara Calendari (12 items)
+  → 📡 HA Fetch (×12, one per calendar)
+  → 🏷️ Estrai ed Etichetta (tagged events, flat)
+  → 📝 Prepara Prompt (dedup + LLM prompt)
+  → 🤖 GPT-4.1 (batch classify all events)
+  → 📋 Parse Risposta
+  → 💾 Postgres Upsert (memory_facts, per event, ON CONFLICT DO NOTHING)
+  → 📦 Aggrega → ✍️ Prepara Messaggio → 📱 Telegram Briefing
+```
+
 ### Embedding Strategy

 - Embeddings are generated via Ollama (`nomic-embed-text` or equivalent) once the LLM server is online
@@ -257,6 +384,18 @@ Imports bank CSV statements (Banca Sella format) into Actual Budget via Telegram

 **Common pattern across Paperless + Actual workflows**: GitHub Copilot token is obtained fresh at each run (`GET https://api.github.com/copilot_internal/v2/token`), then used for `POST https://api.githubcopilot.com/chat/completions` with model `gpt-4.1`.

+### 📅 Pompeo — Calendar Agent [Schedule] (`4ZIEGck9n4l5qaDt`) ✅ Active
+
+Runs every morning at 06:30 (and on-demand via manual trigger).
+
+- Fetches events for the next 7 days from 12 Google Calendars via **Home Assistant REST API** (calendar proxy — no Google OAuth needed in n8n)
+- Tags each event with calendar name, category, user_id
+- **GPT-4.1 batch classification**: category, action_required, do_not_disturb, priority, behavioral_context, pompeo_note
+- **Postgres upsert** → `memory_facts` (source=calendar, source_ref=HA event UID, dedup ON CONFLICT DO NOTHING)
+- **Telegram briefing**: daily grouped summary sent to the notification channel
+
+Calendars: Lavoro, Famiglia, Spazzatura, Pulizie, Formula 1, WEC, Inter, Compleanni, Varie, Festività Italia, Films (Radarr), Serie TV (Sonarr).
+
 ### n8n Credentials (IDs)

 | ID | Name | Type |
@@ -266,6 +405,8 @@ Imports bank CSV statements (Banca Sella format) into Actual Budget via Telegram
 | `vBwUxlzKrX3oDHyN` | GitHub Copilot OAuth Token | HTTP Header Auth |
 | `uvGjLbrN5yQTQIzv` | Paperless-NGX API | HTTP Header Auth |
 | `ZIVFNgI3esCKuYXc` | Google Calendar account | Google Calendar OAuth2 (also used for Tasks API) |
+| `u0JCseXGnDG5hS9F` | Home Assistant API | HTTP Header Auth (long-lived HA token) |
+| `mRqzxhSboGscolqI` | Pompeo — PostgreSQL | Postgres (database: `pompeo`, user: `martin`) |

 ---

@@ -293,11 +434,12 @@ Imports bank CSV statements (Banca Sella format) into Actual Budget via Telegram
  - Endpoint: `qdrant.persistence.svc.cluster.local:6333`
 - [x] ~~Run **PostgreSQL migrations** on Patroni~~ ✅ 2026-03-21
  - Database `pompeo` creato (Zalando Operator)
-  - Tabelle: `user_profile`, `memory_facts`, `finance_documents`, `behavioral_context`, `agent_messages`
+  - Tabelle: `user_profile`, `memory_facts` (+ `source_ref` + dedup index), `finance_documents`, `behavioral_context`, `agent_messages`
  - Multi-tenancy: campo `user_id` su tutte le tabelle, seed `martin` + `shared`
  - Script DDL: `alpha/db/postgres.sql`
 - [ ] Verify embedding endpoint via Copilot (`text-embedding-3-small`) as bootstrap fallback
 - [ ] Plan migration to local Ollama embedding model once LLM server is online
+- [ ] Create `ha_sensor_config` table in Postgres and seed initial sensor patterns

 ---

@@ -317,11 +459,11 @@ Imports bank CSV statements (Banca Sella format) into Actual Budget via Telegram

 ### Phase 2 — New Agents

- [ ] **Calendar Agent**
-  - Poll Google Calendar (all relevant calendars)
-  - Persist upcoming events to Postgres (`memory_facts` + `behavioral_context` for leisure events)
-  - Weekly cluster embedding (chunk per week, not per event)
-  - Dedup recurring events: embed only first occurrence, store rest in Postgres only
+- [x] ~~**Calendar Agent**~~ ✅ 2026-03-20 — `4ZIEGck9n4l5qaDt`
+  - 12 calendari Google via HA proxy, fetch next 7 days
+  - GPT-4.1 batch classification → `memory_facts` (dedup by HA event UID)
+  - Telegram daily briefing at 06:30
+  - **Phase 2**: add weekly Qdrant embedding for semantic retrieval

 - [ ] **Finance Agent** (extend beyond Paperless)
  - Read Actual Budget export or API
@@ -333,10 +475,12 @@ Imports bank CSV statements (Banca Sella format) into Actual Budget via Telegram
  - Cron-based cluster health check (disk, pod status, backup freshness)
  - Publishes to message broker with `priority: high` for critical alerts

- [ ] **IoT Agent**
-  - Home Assistant webhook → Node-RED → n8n
-  - Device presence tracking → `behavioral_context`
-  - Pattern recognition via Qdrant similarity on historical episodes (e.g. "Tuesday evening, outside, laptop on")
+- [ ] **IoT Agent** — *design complete, implementation pending*
+  - Sensor allowlist via `ha_sensor_config` Postgres table (regex-based)
+  - No fixed rules: GPT-4.1 infers activity label + confidence from sensor snapshot
+  - Three layers: webhook (events) + polling 20min (behavioral_context) + daily cron (Qdrant episodes)
+  - Historical bootstrap: 12 months HA history → daily LLM summaries → Qdrant `episodes`
+  - Confidence-gated clarification: ask Martin via Telegram if confidence < 0.6

 - [ ] **Newsletter Agent**
  - Separate Gmail label for newsletters (excluded from Daily Digest main flow)
@@ -346,12 +490,10 @@ Imports bank CSV statements (Banca Sella format) into Actual Budget via Telegram

 ### Phase 3 — Message Broker + Proactive Arbiter

- [ ] Choose and deploy broker: **NATS JetStream** (preferred — lightweight, native Kubernetes) or Redis Streams
- [ ] Define final message schema (draft above, to be validated)
 - [ ] Implement **Proactive Arbiter** n8n workflow:
  - Adaptive schedule (morning briefing, midday, evening recap)
-  - Consume queue batch → LLM correlation prompt → structured `notify/defer/discard` output
-  - High-priority bypass path
+  - Consume `agent_messages` batch → LLM correlation prompt → structured `notify/defer/discard` output
+  - High-priority bypass path (direct webhook)
  - All decisions logged to Telegram audit channel
 - [ ] Implement **correlation logic**: detect when 2+ agents report related events (e.g. IoT presence + calendar event + open reminder)