feat: Calendar Agent + IoT Agent design + DB migration

- Deployed Calendar Agent (n8n ID: 4ZIEGck9n4l5qaDt)
  - 12 Google Calendars via HA proxy, cron 06:30
  - GPT-4.1 batch classification -> memory_facts
  - Telegram daily briefing
- DB: added source_ref column + dedup index on memory_facts
- DB: created ha_sensor_config table (IoT Agent sensor allowlist)
  - 9 seed entries (Pixel 10, Pixel Watch, EY HP, Spotify, GPS)
- README: full IoT Agent design documentation
  - Sensor allowlist (regex), LLM-based activity inference
  - Three-layer data flow, confidence-gated clarification
- README: Calendar Agent design + workflow diagram
- README: updated infra table, ADR broker, credentials
- CHANGELOG: Calendar Agent milestone

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
2026-03-21 13:45:12 +00:00
parent 841d3a93f6
commit 90d9faacca
3 changed files with 258 additions and 34 deletions

View File

@@ -4,6 +4,44 @@ Tutte le modifiche significative al progetto ALPHA_PROJECT sono documentate qui.
---
## [2026-03-20] Calendar Agent — primo workflow Pompeo in produzione
### Cosa è stato fatto
Primo agente Pompeo deployato e attivo su n8n: `📅 Pompeo — Calendar Agent [Schedule]` (ID `4ZIEGck9n4l5qaDt`).
### Design
- **Sorgente dati**: Home Assistant REST API usata come proxy Google Calendar — evita OAuth Google diretto in n8n e funziona per tutti i 25 calendari registrati in HA.
- **Calendari tracciati** (12): Lavoro, Famiglia, Spazzatura, Pulizie, Formula 1, WEC, Inter, Compleanni, Varie, Festività Italia, Films (Radarr), Serie TV (Sonarr).
- **LLM enrichment**: GPT-4.1 (via Copilot) classifica ogni evento: category, action_required, do_not_disturb, priority, behavioral_context, pompeo_note.
- **Dedup**: `memory_facts.source_ref` = HA event UID; `ON CONFLICT DO NOTHING` su indice unico parziale.
- **Telegram briefing**: ogni mattina alle 06:30, riepilogo eventi prossimi 7 giorni raggruppati per calendario.
### Migrazioni DB applicate
- `ALTER TABLE memory_facts ADD COLUMN source_ref TEXT` — colonna per ID esterno di dedup
- `CREATE UNIQUE INDEX memory_facts_dedup_idx ON memory_facts (user_id, source, source_ref) WHERE source_ref IS NOT NULL`
- `CREATE INDEX idx_memory_facts_source_ref ON memory_facts (source_ref) WHERE source_ref IS NOT NULL`
### Credential n8n create
| ID | Nome | Tipo |
|---|---|---|
| `u0JCseXGnDG5hS9F` | Home Assistant API | HTTP Header Auth |
| `mRqzxhSboGscolqI` | Pompeo — PostgreSQL | Postgres (pompeo/martin) |
### Flusso workflow
```
⏰ Schedule (06:30) → 📅 Range → 🔑 Token Copilot
→ 📋 Calendari (12 items) → 📡 HA Fetch (×12) → 🏷️ Estrai + Tag
→ 📝 Prompt (dedup) → 🤖 GPT-4.1 → 📋 Parse
→ 💾 Postgres Upsert (memory_facts) → 📦 Aggrega → 📱 Telegram
```
---
## [2026-03-21] ADR — Message Broker: nessun broker dedicato
### Decisione

208
README.md
View File

@@ -41,11 +41,9 @@ Production-grade self-hosted stack. Key components relevant to ALPHA_PROJECT:
|---|---|
| **n8n** | Primary orchestrator and workflow engine for all agents |
| **Node-RED** | Event-driven automation, Home Assistant bridge |
| **Patroni / PostgreSQL** | Persistent structured memory store |
| **Patroni / PostgreSQL** | Persistent structured memory store`postgres.persistence.svc.cluster.local:5432/pompeo` |
| **Qdrant** | Vector store for semantic/episodic memory — `qdrant.persistence.svc.cluster.local:6333` |
| **NATS / Redis Streams** | Message broker between agents *(to be chosen and deployed)* |
| **Authentik** | SSO / IAM (OIDC) |
| **Home Assistant** | IoT hub — device tracking, automations, sensors |
| **Home Assistant** | IoT hub — device tracking, automations, sensors, Google Calendar proxy |
| **MikroTik** | Network — VLANs, firewall rules, device presence detection |
| **Paperless-ngx** | Document archive (`docs.mt-home.uk`) |
| **Actual Budget** | Personal finance |
@@ -95,22 +93,9 @@ ALPHA_PROJECT uses specialized agents, each responsible for a specific data doma
### Message Broker (Blackboard Pattern)
Agents do not call each other directly. They publish observations to a **central message queue** (NATS JetStream or Redis Streams — TBD). The **Proactive Arbiter** consumes the queue, batches low-priority messages, and immediately processes high-priority ones.
Agents do not call each other directly. They write observations to the **`agent_messages` table** in PostgreSQL (blackboard pattern). The **Proactive Arbiter** polls this table, batches low-priority messages, and immediately processes high-priority ones. High-urgency events trigger a direct n8n webhook call bypassing the queue.
Message schema (all agents must conform):
```json
{
"agent": "mail",
"priority": "low|high",
"event_type": "new_fact|reminder|alert|behavioral_observation",
"subject": "brief description",
"detail": {},
"source_ref": "optional reference to postgres record or external ID",
"timestamp": "ISO8601",
"expires_at": "ISO8601 or null"
}
```
**ADR: No dedicated message broker** — Postgres is sufficient for the expected message volume and avoids operational overhead. Revisit if throughput exceeds 1k messages/day.
### Memory Architecture
@@ -176,6 +161,148 @@ Each Qdrant point includes a metadata payload for pre-filtering (`user_id`, `sou
User preferences, fixed facts, communication style. Updated manually or via explicit agent action.
---
## IoT Agent — Design Notes
### Data Source: Home Assistant
Home Assistant (`http://10.30.20.100:8123`, HA OS 2026.3.2, Alzano Lombardo BG) is the primary hub for physical-world context. It aggregates Google Pixel 10, Pixel Watch 4, smart home devices, and 25 Google Calendars.
**Person allowlist** (permanent by design — `person.ajada_tahiraj` is explicitly excluded):
| Person | Entity | Notes |
|---|---|---|
| Martin Tahiraj | `person.martin_tahiraj` | ✅ Tracked |
| Ajada Tahiraj | `person.ajada_tahiraj` | ❌ Excluded (sister — privacy) |
**Key sensors for Martin:**
| Sensor | Entity ID | Signal |
|---|---|---|
| Activity (Google) | `sensor.pixel_10_detected_activity` | still / walking / running / in_vehicle |
| Geocoded location | `sensor.pixel_10_geocoded_location` | Human-readable street address |
| EY laptop | `device_tracker.ey_hp` | Router tracker — online = laptop on home WiFi |
| Spotify | `media_player.spotify_martin` | Current track, playing/paused |
| Sleep duration | `sensor.pixel_10_sleep_duration` | Pixel Watch 4 |
| Next alarm | `sensor.pixel_10_next_alarm` | Scheduled wake-up |
| Work Profile | `binary_sensor.pixel_10_work_profile` | Android Work Profile active |
| Screen on | `binary_sensor.pixel_10_interactive` | Phone screen on/off |
| Do Not Disturb | `binary_sensor.pixel_10_do_not_disturb` | DND mode |
| Daily steps | `sensor.pixel_10_daily_steps` | Pixel Watch 4 |
| Heart rate | `sensor.pixel_10_heart_rate` | Pixel Watch 4 |
| GPS Zone | `person.martin_tahiraj` | home / not_home / zone name |
Room presence sensors (PIR-based) are considered **unreliable** — excluded for now.
### Sensor Allowlist — `ha_sensor_config`
Instead of hardcoded rules, the IoT Agent uses a dynamic allowlist stored in Postgres. Sensors are matched by **regex pattern**, allowing glob-style additions:
```sql
CREATE TABLE ha_sensor_config (
id SERIAL PRIMARY KEY,
pattern TEXT NOT NULL, -- regex pattern, e.g. 'sensor\.pixel_10_.*'
user_id TEXT NOT NULL,
group_name TEXT NOT NULL, -- 'mobile_device' | 'work_presence' | 'entertainment' | ...
description TEXT,
active BOOLEAN NOT NULL DEFAULT true
);
-- Seed entries
INSERT INTO ha_sensor_config (pattern, user_id, group_name, description) VALUES
('sensor\.pixel_10_.*', 'martin', 'mobile_device', 'All Pixel 10 sensors'),
('device_tracker\.ey_hp', 'martin', 'work_presence', 'EY Laptop router tracker'),
('media_player\.spotify_martin', 'martin', 'entertainment', 'Spotify'),
('binary_sensor\.pixel_10_.*', 'martin', 'mobile_device', 'Pixel 10 binary sensors'),
('person\.martin_tahiraj', 'martin', 'presence', 'Martin GPS zone state');
```
This allows adding new sensors (e.g. `sensor.pixel_watch_.*`) without workflow changes.
### Activity State Machine (LLM-based — no fixed rules)
The IoT Agent sends a snapshot of all allowlisted sensor values to GPT-4.1 and asks it to infer the current activity label and confidence. **No if/else rules are coded** — the LLM performs inference.
Example LLM output:
```json
{
"activity": "home_working",
"confidence": 0.92,
"do_not_disturb": true,
"location": "home",
"notes": "Laptop EY online, work profile attivo, orario lavorativo 09-18"
}
```
Activity labels: `sleeping`, `home_relaxing`, `home_working`, `commuting`, `at_office`, `out_errands`, `out_with_dog`, `exercising`, `traveling`, `unknown`.
### Three-Layer Data Flow
| Layer | Trigger | Frequency | Output |
|---|---|---|---|
| Webhook | HA automation (zone change, motion) | Event-driven | Immediate `agent_messages` entry |
| Polling | n8n cron | Every 20 min | Sensor snapshot → LLM → `behavioral_context` |
| Daily cron | n8n cron midnight | Once/day | Day summary → Qdrant `episodes` embedding |
### Historical Bootstrap
One-time job: last 12 months of HA sensor history → daily LLM summaries → Qdrant `episodes`.
- Source: HA History API (`/api/history/period/{start}?filter_entity_id=...`)
- Output: one Qdrant point per day per user, with full behavioral context
### Confidence-Gated Clarification
When activity inference confidence < 0.6, or when Pompeo detects a potential life change (new employer from emails, travel pattern, etc.), it asks Martin directly via Telegram:
> "Ciao Martin, sto notando email di Avanade — lavori ancora per EY o sei passato lì? 🤔"
Pompeo updates `user_profile` or `memory_facts` with the confirmed fact and adjusts its confidence threshold.
---
## Calendar Agent — Design Notes
### Design Decisions
- **Data source**: Google Calendar events fetched via **Home Assistant REST API** (`/api/calendars/{entity_id}?start=&end=`) — HA proxies all 25 calendars and removes the need for a direct Google OAuth credential in n8n.
- **Dedup**: `memory_facts.source_ref` stores the HA event UID; `ON CONFLICT (user_id, source, source_ref) WHERE source_ref IS NOT NULL DO NOTHING` prevents duplicates.
- **LLM enrichment**: GPT-4.1 classifies each event in batch (category, action_required, do_not_disturb, priority, behavioral_context, pompeo_note).
- **No Qdrant embedding yet** (Phase 2): individual events go to Postgres only; a weekly aggregated embedding will be added later.
### Calendars Tracked
| Calendar | Entity ID | Category | User |
|---|---|---|---|
| Lavoro | `calendar.calendar` | work | martin |
| Famiglia | `calendar.famiglia` | personal | martin |
| Spazzatura | `calendar.spazzatura` | chores | martin |
| Pulizie | `calendar.pulizie` | chores | martin |
| Formula 1 | `calendar.formula_1` | leisure | martin |
| WEC | `calendar.lm_wec_fia_world_endurance_championship` | leisure | martin |
| Inter | `calendar.inter_calendar` | leisure | martin |
| Compleanni | `calendar.birthdays` | social | martin |
| Varie | `calendar.varie` | misc | martin |
| Festività Italia | `calendar.festivita_in_italia` | holiday | shared |
| Films (Radarr) | `calendar.films` | leisure | martin |
| Serie TV (Sonarr) | `calendar.serie_tv` | leisure | martin |
### n8n Workflow
**`📅 Pompeo — Calendar Agent [Schedule]`** — ID `4ZIEGck9n4l5qaDt`
```
⏰ Schedule (06:30) → 📅 Imposta Range → 🔑 Token Copilot
→ 📋 Prepara Calendari (12 items)
→ 📡 HA Fetch (×12, one per calendar)
→ 🏷️ Estrai ed Etichetta (tagged events, flat)
→ 📝 Prepara Prompt (dedup + LLM prompt)
→ 🤖 GPT-4.1 (batch classify all events)
→ 📋 Parse Risposta
→ 💾 Postgres Upsert (memory_facts, per event, ON CONFLICT DO NOTHING)
→ 📦 Aggrega → ✍️ Prepara Messaggio → 📱 Telegram Briefing
```
### Embedding Strategy
- Embeddings are generated via Ollama (`nomic-embed-text` or equivalent) once the LLM server is online
@@ -257,6 +384,18 @@ Imports bank CSV statements (Banca Sella format) into Actual Budget via Telegram
**Common pattern across Paperless + Actual workflows**: GitHub Copilot token is obtained fresh at each run (`GET https://api.github.com/copilot_internal/v2/token`), then used for `POST https://api.githubcopilot.com/chat/completions` with model `gpt-4.1`.
### 📅 Pompeo — Calendar Agent [Schedule] (`4ZIEGck9n4l5qaDt`) ✅ Active
Runs every morning at 06:30 (and on-demand via manual trigger).
- Fetches events for the next 7 days from 12 Google Calendars via **Home Assistant REST API** (calendar proxy — no Google OAuth needed in n8n)
- Tags each event with calendar name, category, user_id
- **GPT-4.1 batch classification**: category, action_required, do_not_disturb, priority, behavioral_context, pompeo_note
- **Postgres upsert** → `memory_facts` (source=calendar, source_ref=HA event UID, dedup ON CONFLICT DO NOTHING)
- **Telegram briefing**: daily grouped summary sent to the notification channel
Calendars: Lavoro, Famiglia, Spazzatura, Pulizie, Formula 1, WEC, Inter, Compleanni, Varie, Festività Italia, Films (Radarr), Serie TV (Sonarr).
### n8n Credentials (IDs)
| ID | Name | Type |
@@ -266,6 +405,8 @@ Imports bank CSV statements (Banca Sella format) into Actual Budget via Telegram
| `vBwUxlzKrX3oDHyN` | GitHub Copilot OAuth Token | HTTP Header Auth |
| `uvGjLbrN5yQTQIzv` | Paperless-NGX API | HTTP Header Auth |
| `ZIVFNgI3esCKuYXc` | Google Calendar account | Google Calendar OAuth2 (also used for Tasks API) |
| `u0JCseXGnDG5hS9F` | Home Assistant API | HTTP Header Auth (long-lived HA token) |
| `mRqzxhSboGscolqI` | Pompeo — PostgreSQL | Postgres (database: `pompeo`, user: `martin`) |
---
@@ -293,11 +434,12 @@ Imports bank CSV statements (Banca Sella format) into Actual Budget via Telegram
- Endpoint: `qdrant.persistence.svc.cluster.local:6333`
- [x] ~~Run **PostgreSQL migrations** on Patroni~~ ✅ 2026-03-21
- Database `pompeo` creato (Zalando Operator)
- Tabelle: `user_profile`, `memory_facts`, `finance_documents`, `behavioral_context`, `agent_messages`
- Tabelle: `user_profile`, `memory_facts` (+ `source_ref` + dedup index), `finance_documents`, `behavioral_context`, `agent_messages`
- Multi-tenancy: campo `user_id` su tutte le tabelle, seed `martin` + `shared`
- Script DDL: `alpha/db/postgres.sql`
- [ ] Verify embedding endpoint via Copilot (`text-embedding-3-small`) as bootstrap fallback
- [ ] Plan migration to local Ollama embedding model once LLM server is online
- [ ] Create `ha_sensor_config` table in Postgres and seed initial sensor patterns
---
@@ -317,11 +459,11 @@ Imports bank CSV statements (Banca Sella format) into Actual Budget via Telegram
### Phase 2 — New Agents
- [ ] **Calendar Agent**
- Poll Google Calendar (all relevant calendars)
- Persist upcoming events to Postgres (`memory_facts` + `behavioral_context` for leisure events)
- Weekly cluster embedding (chunk per week, not per event)
- Dedup recurring events: embed only first occurrence, store rest in Postgres only
- [x] ~~**Calendar Agent**~~ ✅ 2026-03-20 — `4ZIEGck9n4l5qaDt`
- 12 calendari Google via HA proxy, fetch next 7 days
- GPT-4.1 batch classification → `memory_facts` (dedup by HA event UID)
- Telegram daily briefing at 06:30
- **Phase 2**: add weekly Qdrant embedding for semantic retrieval
- [ ] **Finance Agent** (extend beyond Paperless)
- Read Actual Budget export or API
@@ -333,10 +475,12 @@ Imports bank CSV statements (Banca Sella format) into Actual Budget via Telegram
- Cron-based cluster health check (disk, pod status, backup freshness)
- Publishes to message broker with `priority: high` for critical alerts
- [ ] **IoT Agent**
- Home Assistant webhook → Node-RED → n8n
- Device presence tracking → `behavioral_context`
- Pattern recognition via Qdrant similarity on historical episodes (e.g. "Tuesday evening, outside, laptop on")
- [ ] **IoT Agent***design complete, implementation pending*
- Sensor allowlist via `ha_sensor_config` Postgres table (regex-based)
- No fixed rules: GPT-4.1 infers activity label + confidence from sensor snapshot
- Three layers: webhook (events) + polling 20min (behavioral_context) + daily cron (Qdrant episodes)
- Historical bootstrap: 12 months HA history → daily LLM summaries → Qdrant `episodes`
- Confidence-gated clarification: ask Martin via Telegram if confidence < 0.6
- [ ] **Newsletter Agent**
- Separate Gmail label for newsletters (excluded from Daily Digest main flow)
@@ -346,12 +490,10 @@ Imports bank CSV statements (Banca Sella format) into Actual Budget via Telegram
### Phase 3 — Message Broker + Proactive Arbiter
- [ ] Choose and deploy broker: **NATS JetStream** (preferred — lightweight, native Kubernetes) or Redis Streams
- [ ] Define final message schema (draft above, to be validated)
- [ ] Implement **Proactive Arbiter** n8n workflow:
- Adaptive schedule (morning briefing, midday, evening recap)
- Consume queue batch → LLM correlation prompt → structured `notify/defer/discard` output
- High-priority bypass path
- Consume `agent_messages` batch → LLM correlation prompt → structured `notify/defer/discard` output
- High-priority bypass path (direct webhook)
- All decisions logged to Telegram audit channel
- [ ] Implement **correlation logic**: detect when 2+ agents report related events (e.g. IoT presence + calendar event + open reminder)

View File

@@ -56,6 +56,7 @@ CREATE TABLE IF NOT EXISTS memory_facts (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id TEXT NOT NULL DEFAULT 'martin',
source TEXT NOT NULL, -- 'email' | 'calendar' | 'iot' | 'paperless' | 'n8n' | ...
source_ref TEXT, -- ID esterno per dedup (es. UID evento Google, thread_id email)
category TEXT, -- 'finance' | 'personal' | 'work' | 'health' | ...
subject TEXT,
detail JSONB, -- payload flessibile per-source
@@ -77,6 +78,15 @@ CREATE INDEX IF NOT EXISTS idx_memory_facts_action
ON memory_facts(user_id, action_required)
WHERE action_required = true;
CREATE INDEX IF NOT EXISTS idx_memory_facts_source_ref
ON memory_facts(source_ref)
WHERE source_ref IS NOT NULL;
-- Dedup: prevents duplicate inserts for same event (used by Calendar Agent and others)
CREATE UNIQUE INDEX IF NOT EXISTS memory_facts_dedup_idx
ON memory_facts(user_id, source, source_ref)
WHERE source_ref IS NOT NULL;
-- =============================================================================
-- 3. FINANCE_DOCUMENTS
@@ -163,8 +173,42 @@ CREATE INDEX IF NOT EXISTS idx_agent_msgs_expires
WHERE expires_at IS NOT NULL AND arbiter_decision IS NULL;
-- =============================================================================
-- 6. HA_SENSOR_CONFIG
-- Allowlist dinamica dei sensori Home Assistant monitorati dall'IoT Agent.
-- Pattern = regex, matchato contro gli entity_id di Home Assistant.
-- Evita regole hardcoded nel workflow — aggiungere sensori = INSERT.
-- =============================================================================
CREATE TABLE IF NOT EXISTS ha_sensor_config (
id SERIAL PRIMARY KEY,
pattern TEXT NOT NULL, -- regex pattern (es. 'sensor\.pixel_10_.*')
user_id TEXT NOT NULL DEFAULT 'martin',
group_name TEXT NOT NULL, -- 'mobile_device' | 'work_presence' | 'entertainment' | ...
description TEXT,
active BOOLEAN NOT NULL DEFAULT true,
created_at TIMESTAMP NOT NULL DEFAULT now()
);
CREATE INDEX IF NOT EXISTS idx_ha_sensor_config_user
ON ha_sensor_config(user_id, active);
-- Seed: sensori significativi per Martin
INSERT INTO ha_sensor_config (pattern, user_id, group_name, description) VALUES
('sensor\.pixel_10_.*', 'martin', 'mobile_device', 'Tutti i sensori Pixel 10'),
('binary_sensor\.pixel_10_.*', 'martin', 'mobile_device', 'Sensori binari Pixel 10'),
('device_tracker\.ey_hp', 'martin', 'work_presence', 'Laptop EY (router tracker)'),
('media_player\.spotify_martin', 'martin', 'entertainment', 'Spotify Martin'),
('person\.martin_tahiraj', 'martin', 'presence', 'Zona GPS Martin'),
('sensor\.pixel_watch_.*', 'martin', 'wearable', 'Pixel Watch 4 (futuro)'),
('sensor\.pixel_10_heart_rate', 'martin', 'health', 'Frequenza cardiaca'),
('sensor\.pixel_10_daily_steps', 'martin', 'health', 'Passi giornalieri'),
('sensor\.pixel_10_sleep_duration', 'martin', 'health', 'Durata sonno'),
('sensor\.pixel_10_next_alarm', 'martin', 'routine', 'Prossima sveglia')
ON CONFLICT DO NOTHING;
-- =============================================================================
-- Fine script
-- =============================================================================
\echo '✅ Schema pompeo applicato correttamente.'
\echo ' Tabelle: user_profile, memory_facts, finance_documents, behavioral_context, agent_messages'
\echo ' Tabelle: user_profile, memory_facts, finance_documents, behavioral_context, agent_messages, ha_sensor_config'