diff --git a/CHANGELOG.md b/CHANGELOG.md index cf9aeea..2b698bc 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,6 +4,44 @@ Tutte le modifiche significative al progetto ALPHA_PROJECT sono documentate qui. --- +## [2026-03-20] Calendar Agent β€” primo workflow Pompeo in produzione + +### Cosa Γ¨ stato fatto + +Primo agente Pompeo deployato e attivo su n8n: `πŸ“… Pompeo β€” Calendar Agent [Schedule]` (ID `4ZIEGck9n4l5qaDt`). + +### Design + +- **Sorgente dati**: Home Assistant REST API usata come proxy Google Calendar β€” evita OAuth Google diretto in n8n e funziona per tutti i 25 calendari registrati in HA. +- **Calendari tracciati** (12): Lavoro, Famiglia, Spazzatura, Pulizie, Formula 1, WEC, Inter, Compleanni, Varie, FestivitΓ  Italia, Films (Radarr), Serie TV (Sonarr). +- **LLM enrichment**: GPT-4.1 (via Copilot) classifica ogni evento: category, action_required, do_not_disturb, priority, behavioral_context, pompeo_note. +- **Dedup**: `memory_facts.source_ref` = HA event UID; `ON CONFLICT DO NOTHING` su indice unico parziale. +- **Telegram briefing**: ogni mattina alle 06:30, riepilogo eventi prossimi 7 giorni raggruppati per calendario. + +### Migrazioni DB applicate + +- `ALTER TABLE memory_facts ADD COLUMN source_ref TEXT` β€” colonna per ID esterno di dedup +- `CREATE UNIQUE INDEX memory_facts_dedup_idx ON memory_facts (user_id, source, source_ref) WHERE source_ref IS NOT NULL` +- `CREATE INDEX idx_memory_facts_source_ref ON memory_facts (source_ref) WHERE source_ref IS NOT NULL` + +### Credential n8n create + +| ID | Nome | Tipo | +|---|---|---| +| `u0JCseXGnDG5hS9F` | Home Assistant API | HTTP Header Auth | +| `mRqzxhSboGscolqI` | Pompeo β€” PostgreSQL | Postgres (pompeo/martin) | + +### Flusso workflow + +``` +⏰ Schedule (06:30) β†’ πŸ“… Range β†’ πŸ”‘ Token Copilot + β†’ πŸ“‹ Calendari (12 items) β†’ πŸ“‘ HA Fetch (Γ—12) β†’ 🏷️ Estrai + Tag + β†’ πŸ“ Prompt (dedup) β†’ πŸ€– GPT-4.1 β†’ πŸ“‹ Parse + β†’ πŸ’Ύ Postgres Upsert (memory_facts) β†’ πŸ“¦ Aggrega β†’ πŸ“± Telegram +``` + +--- + ## [2026-03-21] ADR β€” Message Broker: nessun broker dedicato ### Decisione diff --git a/README.md b/README.md index ecfc2ed..d9c9146 100644 --- a/README.md +++ b/README.md @@ -41,11 +41,9 @@ Production-grade self-hosted stack. Key components relevant to ALPHA_PROJECT: |---|---| | **n8n** | Primary orchestrator and workflow engine for all agents | | **Node-RED** | Event-driven automation, Home Assistant bridge | -| **Patroni / PostgreSQL** | Persistent structured memory store | +| **Patroni / PostgreSQL** | Persistent structured memory store β€” `postgres.persistence.svc.cluster.local:5432/pompeo` | | **Qdrant** | Vector store for semantic/episodic memory β€” `qdrant.persistence.svc.cluster.local:6333` | -| **NATS / Redis Streams** | Message broker between agents *(to be chosen and deployed)* | -| **Authentik** | SSO / IAM (OIDC) | -| **Home Assistant** | IoT hub β€” device tracking, automations, sensors | +| **Home Assistant** | IoT hub β€” device tracking, automations, sensors, Google Calendar proxy | | **MikroTik** | Network β€” VLANs, firewall rules, device presence detection | | **Paperless-ngx** | Document archive (`docs.mt-home.uk`) | | **Actual Budget** | Personal finance | @@ -95,22 +93,9 @@ ALPHA_PROJECT uses specialized agents, each responsible for a specific data doma ### Message Broker (Blackboard Pattern) -Agents do not call each other directly. They publish observations to a **central message queue** (NATS JetStream or Redis Streams β€” TBD). The **Proactive Arbiter** consumes the queue, batches low-priority messages, and immediately processes high-priority ones. +Agents do not call each other directly. They write observations to the **`agent_messages` table** in PostgreSQL (blackboard pattern). The **Proactive Arbiter** polls this table, batches low-priority messages, and immediately processes high-priority ones. High-urgency events trigger a direct n8n webhook call bypassing the queue. -Message schema (all agents must conform): - -```json -{ - "agent": "mail", - "priority": "low|high", - "event_type": "new_fact|reminder|alert|behavioral_observation", - "subject": "brief description", - "detail": {}, - "source_ref": "optional reference to postgres record or external ID", - "timestamp": "ISO8601", - "expires_at": "ISO8601 or null" -} -``` +**ADR: No dedicated message broker** β€” Postgres is sufficient for the expected message volume and avoids operational overhead. Revisit if throughput exceeds 1k messages/day. ### Memory Architecture @@ -176,6 +161,148 @@ Each Qdrant point includes a metadata payload for pre-filtering (`user_id`, `sou User preferences, fixed facts, communication style. Updated manually or via explicit agent action. +--- + +## IoT Agent β€” Design Notes + +### Data Source: Home Assistant + +Home Assistant (`http://10.30.20.100:8123`, HA OS 2026.3.2, Alzano Lombardo BG) is the primary hub for physical-world context. It aggregates Google Pixel 10, Pixel Watch 4, smart home devices, and 25 Google Calendars. + +**Person allowlist** (permanent by design β€” `person.ajada_tahiraj` is explicitly excluded): + +| Person | Entity | Notes | +|---|---|---| +| Martin Tahiraj | `person.martin_tahiraj` | βœ… Tracked | +| Ajada Tahiraj | `person.ajada_tahiraj` | ❌ Excluded (sister β€” privacy) | + +**Key sensors for Martin:** + +| Sensor | Entity ID | Signal | +|---|---|---| +| Activity (Google) | `sensor.pixel_10_detected_activity` | still / walking / running / in_vehicle | +| Geocoded location | `sensor.pixel_10_geocoded_location` | Human-readable street address | +| EY laptop | `device_tracker.ey_hp` | Router tracker β€” online = laptop on home WiFi | +| Spotify | `media_player.spotify_martin` | Current track, playing/paused | +| Sleep duration | `sensor.pixel_10_sleep_duration` | Pixel Watch 4 | +| Next alarm | `sensor.pixel_10_next_alarm` | Scheduled wake-up | +| Work Profile | `binary_sensor.pixel_10_work_profile` | Android Work Profile active | +| Screen on | `binary_sensor.pixel_10_interactive` | Phone screen on/off | +| Do Not Disturb | `binary_sensor.pixel_10_do_not_disturb` | DND mode | +| Daily steps | `sensor.pixel_10_daily_steps` | Pixel Watch 4 | +| Heart rate | `sensor.pixel_10_heart_rate` | Pixel Watch 4 | +| GPS Zone | `person.martin_tahiraj` | home / not_home / zone name | + +Room presence sensors (PIR-based) are considered **unreliable** β€” excluded for now. + +### Sensor Allowlist β€” `ha_sensor_config` + +Instead of hardcoded rules, the IoT Agent uses a dynamic allowlist stored in Postgres. Sensors are matched by **regex pattern**, allowing glob-style additions: + +```sql +CREATE TABLE ha_sensor_config ( + id SERIAL PRIMARY KEY, + pattern TEXT NOT NULL, -- regex pattern, e.g. 'sensor\.pixel_10_.*' + user_id TEXT NOT NULL, + group_name TEXT NOT NULL, -- 'mobile_device' | 'work_presence' | 'entertainment' | ... + description TEXT, + active BOOLEAN NOT NULL DEFAULT true +); + +-- Seed entries +INSERT INTO ha_sensor_config (pattern, user_id, group_name, description) VALUES + ('sensor\.pixel_10_.*', 'martin', 'mobile_device', 'All Pixel 10 sensors'), + ('device_tracker\.ey_hp', 'martin', 'work_presence', 'EY Laptop router tracker'), + ('media_player\.spotify_martin', 'martin', 'entertainment', 'Spotify'), + ('binary_sensor\.pixel_10_.*', 'martin', 'mobile_device', 'Pixel 10 binary sensors'), + ('person\.martin_tahiraj', 'martin', 'presence', 'Martin GPS zone state'); +``` + +This allows adding new sensors (e.g. `sensor.pixel_watch_.*`) without workflow changes. + +### Activity State Machine (LLM-based β€” no fixed rules) + +The IoT Agent sends a snapshot of all allowlisted sensor values to GPT-4.1 and asks it to infer the current activity label and confidence. **No if/else rules are coded** β€” the LLM performs inference. + +Example LLM output: +```json +{ + "activity": "home_working", + "confidence": 0.92, + "do_not_disturb": true, + "location": "home", + "notes": "Laptop EY online, work profile attivo, orario lavorativo 09-18" +} +``` + +Activity labels: `sleeping`, `home_relaxing`, `home_working`, `commuting`, `at_office`, `out_errands`, `out_with_dog`, `exercising`, `traveling`, `unknown`. + +### Three-Layer Data Flow + +| Layer | Trigger | Frequency | Output | +|---|---|---|---| +| Webhook | HA automation (zone change, motion) | Event-driven | Immediate `agent_messages` entry | +| Polling | n8n cron | Every 20 min | Sensor snapshot β†’ LLM β†’ `behavioral_context` | +| Daily cron | n8n cron midnight | Once/day | Day summary β†’ Qdrant `episodes` embedding | + +### Historical Bootstrap + +One-time job: last 12 months of HA sensor history β†’ daily LLM summaries β†’ Qdrant `episodes`. +- Source: HA History API (`/api/history/period/{start}?filter_entity_id=...`) +- Output: one Qdrant point per day per user, with full behavioral context + +### Confidence-Gated Clarification + +When activity inference confidence < 0.6, or when Pompeo detects a potential life change (new employer from emails, travel pattern, etc.), it asks Martin directly via Telegram: + +> "Ciao Martin, sto notando email di Avanade β€” lavori ancora per EY o sei passato lΓ¬? πŸ€”" + +Pompeo updates `user_profile` or `memory_facts` with the confirmed fact and adjusts its confidence threshold. + +--- + +## Calendar Agent β€” Design Notes + +### Design Decisions + +- **Data source**: Google Calendar events fetched via **Home Assistant REST API** (`/api/calendars/{entity_id}?start=&end=`) β€” HA proxies all 25 calendars and removes the need for a direct Google OAuth credential in n8n. +- **Dedup**: `memory_facts.source_ref` stores the HA event UID; `ON CONFLICT (user_id, source, source_ref) WHERE source_ref IS NOT NULL DO NOTHING` prevents duplicates. +- **LLM enrichment**: GPT-4.1 classifies each event in batch (category, action_required, do_not_disturb, priority, behavioral_context, pompeo_note). +- **No Qdrant embedding yet** (Phase 2): individual events go to Postgres only; a weekly aggregated embedding will be added later. + +### Calendars Tracked + +| Calendar | Entity ID | Category | User | +|---|---|---|---| +| Lavoro | `calendar.calendar` | work | martin | +| Famiglia | `calendar.famiglia` | personal | martin | +| Spazzatura | `calendar.spazzatura` | chores | martin | +| Pulizie | `calendar.pulizie` | chores | martin | +| Formula 1 | `calendar.formula_1` | leisure | martin | +| WEC | `calendar.lm_wec_fia_world_endurance_championship` | leisure | martin | +| Inter | `calendar.inter_calendar` | leisure | martin | +| Compleanni | `calendar.birthdays` | social | martin | +| Varie | `calendar.varie` | misc | martin | +| FestivitΓ  Italia | `calendar.festivita_in_italia` | holiday | shared | +| Films (Radarr) | `calendar.films` | leisure | martin | +| Serie TV (Sonarr) | `calendar.serie_tv` | leisure | martin | + +### n8n Workflow + +**`πŸ“… Pompeo β€” Calendar Agent [Schedule]`** β€” ID `4ZIEGck9n4l5qaDt` + +``` +⏰ Schedule (06:30) β†’ πŸ“… Imposta Range β†’ πŸ”‘ Token Copilot + β†’ πŸ“‹ Prepara Calendari (12 items) + β†’ πŸ“‘ HA Fetch (Γ—12, one per calendar) + β†’ 🏷️ Estrai ed Etichetta (tagged events, flat) + β†’ πŸ“ Prepara Prompt (dedup + LLM prompt) + β†’ πŸ€– GPT-4.1 (batch classify all events) + β†’ πŸ“‹ Parse Risposta + β†’ πŸ’Ύ Postgres Upsert (memory_facts, per event, ON CONFLICT DO NOTHING) + β†’ πŸ“¦ Aggrega β†’ ✍️ Prepara Messaggio β†’ πŸ“± Telegram Briefing +``` + ### Embedding Strategy - Embeddings are generated via Ollama (`nomic-embed-text` or equivalent) once the LLM server is online @@ -257,6 +384,18 @@ Imports bank CSV statements (Banca Sella format) into Actual Budget via Telegram **Common pattern across Paperless + Actual workflows**: GitHub Copilot token is obtained fresh at each run (`GET https://api.github.com/copilot_internal/v2/token`), then used for `POST https://api.githubcopilot.com/chat/completions` with model `gpt-4.1`. +### πŸ“… Pompeo β€” Calendar Agent [Schedule] (`4ZIEGck9n4l5qaDt`) βœ… Active + +Runs every morning at 06:30 (and on-demand via manual trigger). + +- Fetches events for the next 7 days from 12 Google Calendars via **Home Assistant REST API** (calendar proxy β€” no Google OAuth needed in n8n) +- Tags each event with calendar name, category, user_id +- **GPT-4.1 batch classification**: category, action_required, do_not_disturb, priority, behavioral_context, pompeo_note +- **Postgres upsert** β†’ `memory_facts` (source=calendar, source_ref=HA event UID, dedup ON CONFLICT DO NOTHING) +- **Telegram briefing**: daily grouped summary sent to the notification channel + +Calendars: Lavoro, Famiglia, Spazzatura, Pulizie, Formula 1, WEC, Inter, Compleanni, Varie, FestivitΓ  Italia, Films (Radarr), Serie TV (Sonarr). + ### n8n Credentials (IDs) | ID | Name | Type | @@ -266,6 +405,8 @@ Imports bank CSV statements (Banca Sella format) into Actual Budget via Telegram | `vBwUxlzKrX3oDHyN` | GitHub Copilot OAuth Token | HTTP Header Auth | | `uvGjLbrN5yQTQIzv` | Paperless-NGX API | HTTP Header Auth | | `ZIVFNgI3esCKuYXc` | Google Calendar account | Google Calendar OAuth2 (also used for Tasks API) | +| `u0JCseXGnDG5hS9F` | Home Assistant API | HTTP Header Auth (long-lived HA token) | +| `mRqzxhSboGscolqI` | Pompeo β€” PostgreSQL | Postgres (database: `pompeo`, user: `martin`) | --- @@ -293,11 +434,12 @@ Imports bank CSV statements (Banca Sella format) into Actual Budget via Telegram - Endpoint: `qdrant.persistence.svc.cluster.local:6333` - [x] ~~Run **PostgreSQL migrations** on Patroni~~ βœ… 2026-03-21 - Database `pompeo` creato (Zalando Operator) - - Tabelle: `user_profile`, `memory_facts`, `finance_documents`, `behavioral_context`, `agent_messages` + - Tabelle: `user_profile`, `memory_facts` (+ `source_ref` + dedup index), `finance_documents`, `behavioral_context`, `agent_messages` - Multi-tenancy: campo `user_id` su tutte le tabelle, seed `martin` + `shared` - Script DDL: `alpha/db/postgres.sql` - [ ] Verify embedding endpoint via Copilot (`text-embedding-3-small`) as bootstrap fallback - [ ] Plan migration to local Ollama embedding model once LLM server is online +- [ ] Create `ha_sensor_config` table in Postgres and seed initial sensor patterns --- @@ -317,11 +459,11 @@ Imports bank CSV statements (Banca Sella format) into Actual Budget via Telegram ### Phase 2 β€” New Agents -- [ ] **Calendar Agent** - - Poll Google Calendar (all relevant calendars) - - Persist upcoming events to Postgres (`memory_facts` + `behavioral_context` for leisure events) - - Weekly cluster embedding (chunk per week, not per event) - - Dedup recurring events: embed only first occurrence, store rest in Postgres only +- [x] ~~**Calendar Agent**~~ βœ… 2026-03-20 β€” `4ZIEGck9n4l5qaDt` + - 12 calendari Google via HA proxy, fetch next 7 days + - GPT-4.1 batch classification β†’ `memory_facts` (dedup by HA event UID) + - Telegram daily briefing at 06:30 + - **Phase 2**: add weekly Qdrant embedding for semantic retrieval - [ ] **Finance Agent** (extend beyond Paperless) - Read Actual Budget export or API @@ -333,10 +475,12 @@ Imports bank CSV statements (Banca Sella format) into Actual Budget via Telegram - Cron-based cluster health check (disk, pod status, backup freshness) - Publishes to message broker with `priority: high` for critical alerts -- [ ] **IoT Agent** - - Home Assistant webhook β†’ Node-RED β†’ n8n - - Device presence tracking β†’ `behavioral_context` - - Pattern recognition via Qdrant similarity on historical episodes (e.g. "Tuesday evening, outside, laptop on") +- [ ] **IoT Agent** β€” *design complete, implementation pending* + - Sensor allowlist via `ha_sensor_config` Postgres table (regex-based) + - No fixed rules: GPT-4.1 infers activity label + confidence from sensor snapshot + - Three layers: webhook (events) + polling 20min (behavioral_context) + daily cron (Qdrant episodes) + - Historical bootstrap: 12 months HA history β†’ daily LLM summaries β†’ Qdrant `episodes` + - Confidence-gated clarification: ask Martin via Telegram if confidence < 0.6 - [ ] **Newsletter Agent** - Separate Gmail label for newsletters (excluded from Daily Digest main flow) @@ -346,12 +490,10 @@ Imports bank CSV statements (Banca Sella format) into Actual Budget via Telegram ### Phase 3 β€” Message Broker + Proactive Arbiter -- [ ] Choose and deploy broker: **NATS JetStream** (preferred β€” lightweight, native Kubernetes) or Redis Streams -- [ ] Define final message schema (draft above, to be validated) - [ ] Implement **Proactive Arbiter** n8n workflow: - Adaptive schedule (morning briefing, midday, evening recap) - - Consume queue batch β†’ LLM correlation prompt β†’ structured `notify/defer/discard` output - - High-priority bypass path + - Consume `agent_messages` batch β†’ LLM correlation prompt β†’ structured `notify/defer/discard` output + - High-priority bypass path (direct webhook) - All decisions logged to Telegram audit channel - [ ] Implement **correlation logic**: detect when 2+ agents report related events (e.g. IoT presence + calendar event + open reminder) diff --git a/db/postgres.sql b/db/postgres.sql index 9cf0140..9b78a8d 100644 --- a/db/postgres.sql +++ b/db/postgres.sql @@ -56,6 +56,7 @@ CREATE TABLE IF NOT EXISTS memory_facts ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), user_id TEXT NOT NULL DEFAULT 'martin', source TEXT NOT NULL, -- 'email' | 'calendar' | 'iot' | 'paperless' | 'n8n' | ... + source_ref TEXT, -- ID esterno per dedup (es. UID evento Google, thread_id email) category TEXT, -- 'finance' | 'personal' | 'work' | 'health' | ... subject TEXT, detail JSONB, -- payload flessibile per-source @@ -77,6 +78,15 @@ CREATE INDEX IF NOT EXISTS idx_memory_facts_action ON memory_facts(user_id, action_required) WHERE action_required = true; +CREATE INDEX IF NOT EXISTS idx_memory_facts_source_ref + ON memory_facts(source_ref) + WHERE source_ref IS NOT NULL; + +-- Dedup: prevents duplicate inserts for same event (used by Calendar Agent and others) +CREATE UNIQUE INDEX IF NOT EXISTS memory_facts_dedup_idx + ON memory_facts(user_id, source, source_ref) + WHERE source_ref IS NOT NULL; + -- ============================================================================= -- 3. FINANCE_DOCUMENTS @@ -163,8 +173,42 @@ CREATE INDEX IF NOT EXISTS idx_agent_msgs_expires WHERE expires_at IS NOT NULL AND arbiter_decision IS NULL; +-- ============================================================================= +-- 6. HA_SENSOR_CONFIG +-- Allowlist dinamica dei sensori Home Assistant monitorati dall'IoT Agent. +-- Pattern = regex, matchato contro gli entity_id di Home Assistant. +-- Evita regole hardcoded nel workflow β€” aggiungere sensori = INSERT. +-- ============================================================================= +CREATE TABLE IF NOT EXISTS ha_sensor_config ( + id SERIAL PRIMARY KEY, + pattern TEXT NOT NULL, -- regex pattern (es. 'sensor\.pixel_10_.*') + user_id TEXT NOT NULL DEFAULT 'martin', + group_name TEXT NOT NULL, -- 'mobile_device' | 'work_presence' | 'entertainment' | ... + description TEXT, + active BOOLEAN NOT NULL DEFAULT true, + created_at TIMESTAMP NOT NULL DEFAULT now() +); + +CREATE INDEX IF NOT EXISTS idx_ha_sensor_config_user + ON ha_sensor_config(user_id, active); + +-- Seed: sensori significativi per Martin +INSERT INTO ha_sensor_config (pattern, user_id, group_name, description) VALUES + ('sensor\.pixel_10_.*', 'martin', 'mobile_device', 'Tutti i sensori Pixel 10'), + ('binary_sensor\.pixel_10_.*', 'martin', 'mobile_device', 'Sensori binari Pixel 10'), + ('device_tracker\.ey_hp', 'martin', 'work_presence', 'Laptop EY (router tracker)'), + ('media_player\.spotify_martin', 'martin', 'entertainment', 'Spotify Martin'), + ('person\.martin_tahiraj', 'martin', 'presence', 'Zona GPS Martin'), + ('sensor\.pixel_watch_.*', 'martin', 'wearable', 'Pixel Watch 4 (futuro)'), + ('sensor\.pixel_10_heart_rate', 'martin', 'health', 'Frequenza cardiaca'), + ('sensor\.pixel_10_daily_steps', 'martin', 'health', 'Passi giornalieri'), + ('sensor\.pixel_10_sleep_duration', 'martin', 'health', 'Durata sonno'), + ('sensor\.pixel_10_next_alarm', 'martin', 'routine', 'Prossima sveglia') +ON CONFLICT DO NOTHING; + + -- ============================================================================= -- Fine script -- ============================================================================= \echo 'βœ… Schema pompeo applicato correttamente.' -\echo ' Tabelle: user_profile, memory_facts, finance_documents, behavioral_context, agent_messages' +\echo ' Tabelle: user_profile, memory_facts, finance_documents, behavioral_context, agent_messages, ha_sensor_config'