feat: Calendar Agent + IoT Agent design + DB migration
- Deployed Calendar Agent (n8n ID: 4ZIEGck9n4l5qaDt) - 12 Google Calendars via HA proxy, cron 06:30 - GPT-4.1 batch classification -> memory_facts - Telegram daily briefing - DB: added source_ref column + dedup index on memory_facts - DB: created ha_sensor_config table (IoT Agent sensor allowlist) - 9 seed entries (Pixel 10, Pixel Watch, EY HP, Spotify, GPS) - README: full IoT Agent design documentation - Sensor allowlist (regex), LLM-based activity inference - Three-layer data flow, confidence-gated clarification - README: Calendar Agent design + workflow diagram - README: updated infra table, ADR broker, credentials - CHANGELOG: Calendar Agent milestone Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
208
README.md
208
README.md
@@ -41,11 +41,9 @@ Production-grade self-hosted stack. Key components relevant to ALPHA_PROJECT:
|
||||
|---|---|
|
||||
| **n8n** | Primary orchestrator and workflow engine for all agents |
|
||||
| **Node-RED** | Event-driven automation, Home Assistant bridge |
|
||||
| **Patroni / PostgreSQL** | Persistent structured memory store |
|
||||
| **Patroni / PostgreSQL** | Persistent structured memory store — `postgres.persistence.svc.cluster.local:5432/pompeo` |
|
||||
| **Qdrant** | Vector store for semantic/episodic memory — `qdrant.persistence.svc.cluster.local:6333` |
|
||||
| **NATS / Redis Streams** | Message broker between agents *(to be chosen and deployed)* |
|
||||
| **Authentik** | SSO / IAM (OIDC) |
|
||||
| **Home Assistant** | IoT hub — device tracking, automations, sensors |
|
||||
| **Home Assistant** | IoT hub — device tracking, automations, sensors, Google Calendar proxy |
|
||||
| **MikroTik** | Network — VLANs, firewall rules, device presence detection |
|
||||
| **Paperless-ngx** | Document archive (`docs.mt-home.uk`) |
|
||||
| **Actual Budget** | Personal finance |
|
||||
@@ -95,22 +93,9 @@ ALPHA_PROJECT uses specialized agents, each responsible for a specific data doma
|
||||
|
||||
### Message Broker (Blackboard Pattern)
|
||||
|
||||
Agents do not call each other directly. They publish observations to a **central message queue** (NATS JetStream or Redis Streams — TBD). The **Proactive Arbiter** consumes the queue, batches low-priority messages, and immediately processes high-priority ones.
|
||||
Agents do not call each other directly. They write observations to the **`agent_messages` table** in PostgreSQL (blackboard pattern). The **Proactive Arbiter** polls this table, batches low-priority messages, and immediately processes high-priority ones. High-urgency events trigger a direct n8n webhook call bypassing the queue.
|
||||
|
||||
Message schema (all agents must conform):
|
||||
|
||||
```json
|
||||
{
|
||||
"agent": "mail",
|
||||
"priority": "low|high",
|
||||
"event_type": "new_fact|reminder|alert|behavioral_observation",
|
||||
"subject": "brief description",
|
||||
"detail": {},
|
||||
"source_ref": "optional reference to postgres record or external ID",
|
||||
"timestamp": "ISO8601",
|
||||
"expires_at": "ISO8601 or null"
|
||||
}
|
||||
```
|
||||
**ADR: No dedicated message broker** — Postgres is sufficient for the expected message volume and avoids operational overhead. Revisit if throughput exceeds 1k messages/day.
|
||||
|
||||
### Memory Architecture
|
||||
|
||||
@@ -176,6 +161,148 @@ Each Qdrant point includes a metadata payload for pre-filtering (`user_id`, `sou
|
||||
|
||||
User preferences, fixed facts, communication style. Updated manually or via explicit agent action.
|
||||
|
||||
---
|
||||
|
||||
## IoT Agent — Design Notes
|
||||
|
||||
### Data Source: Home Assistant
|
||||
|
||||
Home Assistant (`http://10.30.20.100:8123`, HA OS 2026.3.2, Alzano Lombardo BG) is the primary hub for physical-world context. It aggregates Google Pixel 10, Pixel Watch 4, smart home devices, and 25 Google Calendars.
|
||||
|
||||
**Person allowlist** (permanent by design — `person.ajada_tahiraj` is explicitly excluded):
|
||||
|
||||
| Person | Entity | Notes |
|
||||
|---|---|---|
|
||||
| Martin Tahiraj | `person.martin_tahiraj` | ✅ Tracked |
|
||||
| Ajada Tahiraj | `person.ajada_tahiraj` | ❌ Excluded (sister — privacy) |
|
||||
|
||||
**Key sensors for Martin:**
|
||||
|
||||
| Sensor | Entity ID | Signal |
|
||||
|---|---|---|
|
||||
| Activity (Google) | `sensor.pixel_10_detected_activity` | still / walking / running / in_vehicle |
|
||||
| Geocoded location | `sensor.pixel_10_geocoded_location` | Human-readable street address |
|
||||
| EY laptop | `device_tracker.ey_hp` | Router tracker — online = laptop on home WiFi |
|
||||
| Spotify | `media_player.spotify_martin` | Current track, playing/paused |
|
||||
| Sleep duration | `sensor.pixel_10_sleep_duration` | Pixel Watch 4 |
|
||||
| Next alarm | `sensor.pixel_10_next_alarm` | Scheduled wake-up |
|
||||
| Work Profile | `binary_sensor.pixel_10_work_profile` | Android Work Profile active |
|
||||
| Screen on | `binary_sensor.pixel_10_interactive` | Phone screen on/off |
|
||||
| Do Not Disturb | `binary_sensor.pixel_10_do_not_disturb` | DND mode |
|
||||
| Daily steps | `sensor.pixel_10_daily_steps` | Pixel Watch 4 |
|
||||
| Heart rate | `sensor.pixel_10_heart_rate` | Pixel Watch 4 |
|
||||
| GPS Zone | `person.martin_tahiraj` | home / not_home / zone name |
|
||||
|
||||
Room presence sensors (PIR-based) are considered **unreliable** — excluded for now.
|
||||
|
||||
### Sensor Allowlist — `ha_sensor_config`
|
||||
|
||||
Instead of hardcoded rules, the IoT Agent uses a dynamic allowlist stored in Postgres. Sensors are matched by **regex pattern**, allowing glob-style additions:
|
||||
|
||||
```sql
|
||||
CREATE TABLE ha_sensor_config (
|
||||
id SERIAL PRIMARY KEY,
|
||||
pattern TEXT NOT NULL, -- regex pattern, e.g. 'sensor\.pixel_10_.*'
|
||||
user_id TEXT NOT NULL,
|
||||
group_name TEXT NOT NULL, -- 'mobile_device' | 'work_presence' | 'entertainment' | ...
|
||||
description TEXT,
|
||||
active BOOLEAN NOT NULL DEFAULT true
|
||||
);
|
||||
|
||||
-- Seed entries
|
||||
INSERT INTO ha_sensor_config (pattern, user_id, group_name, description) VALUES
|
||||
('sensor\.pixel_10_.*', 'martin', 'mobile_device', 'All Pixel 10 sensors'),
|
||||
('device_tracker\.ey_hp', 'martin', 'work_presence', 'EY Laptop router tracker'),
|
||||
('media_player\.spotify_martin', 'martin', 'entertainment', 'Spotify'),
|
||||
('binary_sensor\.pixel_10_.*', 'martin', 'mobile_device', 'Pixel 10 binary sensors'),
|
||||
('person\.martin_tahiraj', 'martin', 'presence', 'Martin GPS zone state');
|
||||
```
|
||||
|
||||
This allows adding new sensors (e.g. `sensor.pixel_watch_.*`) without workflow changes.
|
||||
|
||||
### Activity State Machine (LLM-based — no fixed rules)
|
||||
|
||||
The IoT Agent sends a snapshot of all allowlisted sensor values to GPT-4.1 and asks it to infer the current activity label and confidence. **No if/else rules are coded** — the LLM performs inference.
|
||||
|
||||
Example LLM output:
|
||||
```json
|
||||
{
|
||||
"activity": "home_working",
|
||||
"confidence": 0.92,
|
||||
"do_not_disturb": true,
|
||||
"location": "home",
|
||||
"notes": "Laptop EY online, work profile attivo, orario lavorativo 09-18"
|
||||
}
|
||||
```
|
||||
|
||||
Activity labels: `sleeping`, `home_relaxing`, `home_working`, `commuting`, `at_office`, `out_errands`, `out_with_dog`, `exercising`, `traveling`, `unknown`.
|
||||
|
||||
### Three-Layer Data Flow
|
||||
|
||||
| Layer | Trigger | Frequency | Output |
|
||||
|---|---|---|---|
|
||||
| Webhook | HA automation (zone change, motion) | Event-driven | Immediate `agent_messages` entry |
|
||||
| Polling | n8n cron | Every 20 min | Sensor snapshot → LLM → `behavioral_context` |
|
||||
| Daily cron | n8n cron midnight | Once/day | Day summary → Qdrant `episodes` embedding |
|
||||
|
||||
### Historical Bootstrap
|
||||
|
||||
One-time job: last 12 months of HA sensor history → daily LLM summaries → Qdrant `episodes`.
|
||||
- Source: HA History API (`/api/history/period/{start}?filter_entity_id=...`)
|
||||
- Output: one Qdrant point per day per user, with full behavioral context
|
||||
|
||||
### Confidence-Gated Clarification
|
||||
|
||||
When activity inference confidence < 0.6, or when Pompeo detects a potential life change (new employer from emails, travel pattern, etc.), it asks Martin directly via Telegram:
|
||||
|
||||
> "Ciao Martin, sto notando email di Avanade — lavori ancora per EY o sei passato lì? 🤔"
|
||||
|
||||
Pompeo updates `user_profile` or `memory_facts` with the confirmed fact and adjusts its confidence threshold.
|
||||
|
||||
---
|
||||
|
||||
## Calendar Agent — Design Notes
|
||||
|
||||
### Design Decisions
|
||||
|
||||
- **Data source**: Google Calendar events fetched via **Home Assistant REST API** (`/api/calendars/{entity_id}?start=&end=`) — HA proxies all 25 calendars and removes the need for a direct Google OAuth credential in n8n.
|
||||
- **Dedup**: `memory_facts.source_ref` stores the HA event UID; `ON CONFLICT (user_id, source, source_ref) WHERE source_ref IS NOT NULL DO NOTHING` prevents duplicates.
|
||||
- **LLM enrichment**: GPT-4.1 classifies each event in batch (category, action_required, do_not_disturb, priority, behavioral_context, pompeo_note).
|
||||
- **No Qdrant embedding yet** (Phase 2): individual events go to Postgres only; a weekly aggregated embedding will be added later.
|
||||
|
||||
### Calendars Tracked
|
||||
|
||||
| Calendar | Entity ID | Category | User |
|
||||
|---|---|---|---|
|
||||
| Lavoro | `calendar.calendar` | work | martin |
|
||||
| Famiglia | `calendar.famiglia` | personal | martin |
|
||||
| Spazzatura | `calendar.spazzatura` | chores | martin |
|
||||
| Pulizie | `calendar.pulizie` | chores | martin |
|
||||
| Formula 1 | `calendar.formula_1` | leisure | martin |
|
||||
| WEC | `calendar.lm_wec_fia_world_endurance_championship` | leisure | martin |
|
||||
| Inter | `calendar.inter_calendar` | leisure | martin |
|
||||
| Compleanni | `calendar.birthdays` | social | martin |
|
||||
| Varie | `calendar.varie` | misc | martin |
|
||||
| Festività Italia | `calendar.festivita_in_italia` | holiday | shared |
|
||||
| Films (Radarr) | `calendar.films` | leisure | martin |
|
||||
| Serie TV (Sonarr) | `calendar.serie_tv` | leisure | martin |
|
||||
|
||||
### n8n Workflow
|
||||
|
||||
**`📅 Pompeo — Calendar Agent [Schedule]`** — ID `4ZIEGck9n4l5qaDt`
|
||||
|
||||
```
|
||||
⏰ Schedule (06:30) → 📅 Imposta Range → 🔑 Token Copilot
|
||||
→ 📋 Prepara Calendari (12 items)
|
||||
→ 📡 HA Fetch (×12, one per calendar)
|
||||
→ 🏷️ Estrai ed Etichetta (tagged events, flat)
|
||||
→ 📝 Prepara Prompt (dedup + LLM prompt)
|
||||
→ 🤖 GPT-4.1 (batch classify all events)
|
||||
→ 📋 Parse Risposta
|
||||
→ 💾 Postgres Upsert (memory_facts, per event, ON CONFLICT DO NOTHING)
|
||||
→ 📦 Aggrega → ✍️ Prepara Messaggio → 📱 Telegram Briefing
|
||||
```
|
||||
|
||||
### Embedding Strategy
|
||||
|
||||
- Embeddings are generated via Ollama (`nomic-embed-text` or equivalent) once the LLM server is online
|
||||
@@ -257,6 +384,18 @@ Imports bank CSV statements (Banca Sella format) into Actual Budget via Telegram
|
||||
|
||||
**Common pattern across Paperless + Actual workflows**: GitHub Copilot token is obtained fresh at each run (`GET https://api.github.com/copilot_internal/v2/token`), then used for `POST https://api.githubcopilot.com/chat/completions` with model `gpt-4.1`.
|
||||
|
||||
### 📅 Pompeo — Calendar Agent [Schedule] (`4ZIEGck9n4l5qaDt`) ✅ Active
|
||||
|
||||
Runs every morning at 06:30 (and on-demand via manual trigger).
|
||||
|
||||
- Fetches events for the next 7 days from 12 Google Calendars via **Home Assistant REST API** (calendar proxy — no Google OAuth needed in n8n)
|
||||
- Tags each event with calendar name, category, user_id
|
||||
- **GPT-4.1 batch classification**: category, action_required, do_not_disturb, priority, behavioral_context, pompeo_note
|
||||
- **Postgres upsert** → `memory_facts` (source=calendar, source_ref=HA event UID, dedup ON CONFLICT DO NOTHING)
|
||||
- **Telegram briefing**: daily grouped summary sent to the notification channel
|
||||
|
||||
Calendars: Lavoro, Famiglia, Spazzatura, Pulizie, Formula 1, WEC, Inter, Compleanni, Varie, Festività Italia, Films (Radarr), Serie TV (Sonarr).
|
||||
|
||||
### n8n Credentials (IDs)
|
||||
|
||||
| ID | Name | Type |
|
||||
@@ -266,6 +405,8 @@ Imports bank CSV statements (Banca Sella format) into Actual Budget via Telegram
|
||||
| `vBwUxlzKrX3oDHyN` | GitHub Copilot OAuth Token | HTTP Header Auth |
|
||||
| `uvGjLbrN5yQTQIzv` | Paperless-NGX API | HTTP Header Auth |
|
||||
| `ZIVFNgI3esCKuYXc` | Google Calendar account | Google Calendar OAuth2 (also used for Tasks API) |
|
||||
| `u0JCseXGnDG5hS9F` | Home Assistant API | HTTP Header Auth (long-lived HA token) |
|
||||
| `mRqzxhSboGscolqI` | Pompeo — PostgreSQL | Postgres (database: `pompeo`, user: `martin`) |
|
||||
|
||||
---
|
||||
|
||||
@@ -293,11 +434,12 @@ Imports bank CSV statements (Banca Sella format) into Actual Budget via Telegram
|
||||
- Endpoint: `qdrant.persistence.svc.cluster.local:6333`
|
||||
- [x] ~~Run **PostgreSQL migrations** on Patroni~~ ✅ 2026-03-21
|
||||
- Database `pompeo` creato (Zalando Operator)
|
||||
- Tabelle: `user_profile`, `memory_facts`, `finance_documents`, `behavioral_context`, `agent_messages`
|
||||
- Tabelle: `user_profile`, `memory_facts` (+ `source_ref` + dedup index), `finance_documents`, `behavioral_context`, `agent_messages`
|
||||
- Multi-tenancy: campo `user_id` su tutte le tabelle, seed `martin` + `shared`
|
||||
- Script DDL: `alpha/db/postgres.sql`
|
||||
- [ ] Verify embedding endpoint via Copilot (`text-embedding-3-small`) as bootstrap fallback
|
||||
- [ ] Plan migration to local Ollama embedding model once LLM server is online
|
||||
- [ ] Create `ha_sensor_config` table in Postgres and seed initial sensor patterns
|
||||
|
||||
---
|
||||
|
||||
@@ -317,11 +459,11 @@ Imports bank CSV statements (Banca Sella format) into Actual Budget via Telegram
|
||||
|
||||
### Phase 2 — New Agents
|
||||
|
||||
- [ ] **Calendar Agent**
|
||||
- Poll Google Calendar (all relevant calendars)
|
||||
- Persist upcoming events to Postgres (`memory_facts` + `behavioral_context` for leisure events)
|
||||
- Weekly cluster embedding (chunk per week, not per event)
|
||||
- Dedup recurring events: embed only first occurrence, store rest in Postgres only
|
||||
- [x] ~~**Calendar Agent**~~ ✅ 2026-03-20 — `4ZIEGck9n4l5qaDt`
|
||||
- 12 calendari Google via HA proxy, fetch next 7 days
|
||||
- GPT-4.1 batch classification → `memory_facts` (dedup by HA event UID)
|
||||
- Telegram daily briefing at 06:30
|
||||
- **Phase 2**: add weekly Qdrant embedding for semantic retrieval
|
||||
|
||||
- [ ] **Finance Agent** (extend beyond Paperless)
|
||||
- Read Actual Budget export or API
|
||||
@@ -333,10 +475,12 @@ Imports bank CSV statements (Banca Sella format) into Actual Budget via Telegram
|
||||
- Cron-based cluster health check (disk, pod status, backup freshness)
|
||||
- Publishes to message broker with `priority: high` for critical alerts
|
||||
|
||||
- [ ] **IoT Agent**
|
||||
- Home Assistant webhook → Node-RED → n8n
|
||||
- Device presence tracking → `behavioral_context`
|
||||
- Pattern recognition via Qdrant similarity on historical episodes (e.g. "Tuesday evening, outside, laptop on")
|
||||
- [ ] **IoT Agent** — *design complete, implementation pending*
|
||||
- Sensor allowlist via `ha_sensor_config` Postgres table (regex-based)
|
||||
- No fixed rules: GPT-4.1 infers activity label + confidence from sensor snapshot
|
||||
- Three layers: webhook (events) + polling 20min (behavioral_context) + daily cron (Qdrant episodes)
|
||||
- Historical bootstrap: 12 months HA history → daily LLM summaries → Qdrant `episodes`
|
||||
- Confidence-gated clarification: ask Martin via Telegram if confidence < 0.6
|
||||
|
||||
- [ ] **Newsletter Agent**
|
||||
- Separate Gmail label for newsletters (excluded from Daily Digest main flow)
|
||||
@@ -346,12 +490,10 @@ Imports bank CSV statements (Banca Sella format) into Actual Budget via Telegram
|
||||
|
||||
### Phase 3 — Message Broker + Proactive Arbiter
|
||||
|
||||
- [ ] Choose and deploy broker: **NATS JetStream** (preferred — lightweight, native Kubernetes) or Redis Streams
|
||||
- [ ] Define final message schema (draft above, to be validated)
|
||||
- [ ] Implement **Proactive Arbiter** n8n workflow:
|
||||
- Adaptive schedule (morning briefing, midday, evening recap)
|
||||
- Consume queue batch → LLM correlation prompt → structured `notify/defer/discard` output
|
||||
- High-priority bypass path
|
||||
- Consume `agent_messages` batch → LLM correlation prompt → structured `notify/defer/discard` output
|
||||
- High-priority bypass path (direct webhook)
|
||||
- All decisions logged to Telegram audit channel
|
||||
- [ ] Implement **correlation logic**: detect when 2+ agents report related events (e.g. IoT presence + calendar event + open reminder)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user