Files
Alpha/README.md

360 lines
17 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ALPHA_PROJECT — System Context for GitHub Copilot
## Who I am
I am a Cloud Architect with 6 years of Big4 consulting experience (currently at manager level at EY), working daily on Azure, Dynamics 365 / Dataverse, .NET, and enterprise integration patterns. I run a production-grade homelab on Kubernetes at home, and I am building ALPHA_PROJECT as a personal initiative in my spare time (evenings and weekends).
---
## What ALPHA_PROJECT is
ALPHA_PROJECT is a **proactive, multi-agent personal AI assistant** built entirely on self-hosted infrastructure. It is not a chatbot. It is an autonomous system that:
- Monitors my digital life (email, calendar, home automation, finances, infrastructure)
- Maintains a persistent, structured memory of facts, habits, and preferences
- Takes initiative to notify me of relevant events, correlations, and pending actions
- Interacts with me via voice (Amazon Echo / Alexa custom skill named **"Pompeo"**) and Telegram
- Runs local LLMs on dedicated hardware — no cloud AI inference (except GitHub Copilot completions, available via EY license at zero cost)
The assistant is named **Pompeo** (the Alexa skill wake word).
---
## Infrastructure
### LLM Server (new, dedicated node — outside the Kubernetes cluster)
- **CPU**: AMD Ryzen 5 4500
- **RAM**: 16 GB DDR4
- **GPU**: NVIDIA GeForce RTX 3060 (16 GB VRAM)
- **Runtime**: Ollama (API-compatible with OpenAI)
- **Primary model**: Qwen2.5-14B-Instruct Q4_K_M (fits entirely in VRAM, no offload)
- **Secondary model**: Qwen2.5-Coder-14B-Instruct Q4_K_M (for code-related tasks)
- **Embedding model**: TBD — to be served via Ollama (e.g. `nomic-embed-text`)
- **Constraint**: zero RAM offload — all models must fit entirely in 16 GB VRAM
### Kubernetes Homelab Cluster
Production-grade self-hosted stack. Key components relevant to ALPHA_PROJECT:
| Component | Role |
|---|---|
| **n8n** | Primary orchestrator and workflow engine for all agents |
| **Node-RED** | Event-driven automation, Home Assistant bridge |
| **Patroni / PostgreSQL** | Persistent structured memory store |
| **Qdrant** | Vector store for semantic/episodic memory *(to be deployed)* |
| **NATS / Redis Streams** | Message broker between agents *(to be chosen and deployed)* |
| **Authentik** | SSO / IAM (OIDC) |
| **Home Assistant** | IoT hub — device tracking, automations, sensors |
| **MikroTik** | Network — VLANs, firewall rules, device presence detection |
| **Paperless-ngx** | Document archive (`docs.mt-home.uk`) |
| **Actual Budget** | Personal finance |
| **Mealie** | Meal planning / recipes |
| **Immich** | Photo library |
| **Outline** | Internal wiki / knowledge base |
| **Radarr / Sonarr** | Media management |
| **Jenkins** | CI/CD |
| **AdGuard** | DNS filtering |
| **WireGuard** | VPN |
| **Minio** | S3-compatible object storage |
| **Longhorn** | Distributed block storage |
| **Velero** | Disaster recovery / backup |
### External Services (in use)
- **Gmail** — primary email
- **Google Calendar** — calendar (multiple calendars: Work, Family, Formula 1, WEC, Inter, Birthdays, Tasks, Pulizie, Spazzatura, Festività Italia, Varie)
- **Amazon Echo** — voice interface for Pompeo
- **AWS Lambda** — bridge between Alexa skill and n8n webhook
- **Telegram** — notifications, logging, manual document upload
- **GitHub Copilot** (GPT-4.1 via `api.githubcopilot.com`) — LLM completions at zero cost (EY license)
### Internal Services / Custom
- `orchestrator.mt-home.uk` — n8n instance
- `docs.mt-home.uk` — Paperless-ngx
- `filewizard.home.svc.cluster.local:8000` — custom OCR microservice (async, job-based API)
---
## Architecture Overview
### Multi-Agent Design
ALPHA_PROJECT uses specialized agents, each responsible for a specific data domain. All agents are implemented as **n8n workflows**.
| Agent | Trigger | Responsibility |
|---|---|---|
| **Mail Agent** | Cron every 15-30 min | Read Gmail, classify emails, extract facts, detect invoices/bills |
| **Finance Agent** | Triggered by Mail Agent or Telegram | Process PDF invoices/bills, archive to Paperless, persist to memory |
| **Calendar Agent** | Cron + on-demand | Read Google Calendar, detect upcoming events, cross-reference with other agents |
| **Infrastructure Agent** | Cron + alert webhooks | Monitor Kubernetes cluster health, disk usage, failed jobs |
| **IoT Agent** | Event-driven (Home Assistant webhooks) | Monitor device presence, home state, learn behavioral patterns |
| **Newsletter Agent** | Cron morning | Digest newsletters, extract relevant articles |
| **Proactive Arbiter** | Cron (adaptive frequency) + high-priority queue messages | Consume agent outputs, correlate, decide what to notify |
### Message Broker (Blackboard Pattern)
Agents do not call each other directly. They publish observations to a **central message queue** (NATS JetStream or Redis Streams — TBD). The **Proactive Arbiter** consumes the queue, batches low-priority messages, and immediately processes high-priority ones.
Message schema (all agents must conform):
```json
{
"agent": "mail",
"priority": "low|high",
"event_type": "new_fact|reminder|alert|behavioral_observation",
"subject": "brief description",
"detail": {},
"source_ref": "optional reference to postgres record or external ID",
"timestamp": "ISO8601",
"expires_at": "ISO8601 or null"
}
```
### Memory Architecture
Three layers of persistence:
**1. Structured memory — PostgreSQL (Patroni)**
Episodic facts, finance records, reminders, behavioral observations. Fast, queryable, expirable.
```sql
-- Generic episodic facts
CREATE TABLE memory_facts (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
source TEXT NOT NULL, -- 'email', 'calendar', 'iot', 'paperless', ...
category TEXT, -- 'finance', 'personal', 'work', 'health', ...
subject TEXT,
detail JSONB, -- flexible per-source payload
action_required BOOLEAN DEFAULT false,
action_text TEXT,
created_at TIMESTAMP DEFAULT now(),
expires_at TIMESTAMP, -- facts have a TTL
qdrant_id UUID -- FK to vector store
);
-- Finance documents (frequent structured queries)
CREATE TABLE finance_documents (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
paperless_doc_id INT,
correspondent TEXT,
amount NUMERIC(10,2),
currency TEXT DEFAULT 'EUR',
doc_date DATE,
doc_type TEXT,
tags TEXT[],
created_at TIMESTAMP DEFAULT now()
);
-- Behavioral context (used by IoT agent and Arbiter)
CREATE TABLE behavioral_context (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
event_type TEXT, -- 'sport_event', 'dog_walk', 'work_session', ...
start_at TIMESTAMP,
end_at TIMESTAMP,
do_not_disturb BOOLEAN DEFAULT false,
home_presence_expected BOOLEAN,
notes TEXT
);
```
**2. Semantic memory — Qdrant**
Vector embeddings for similarity search. Three collections:
| Collection | Content |
|---|---|
| `martin_episodes` | Conversations, episodic facts with timestamp |
| `martin_knowledge` | Documents, Outline notes, newsletters, knowledge base |
| `martin_preferences` | Preferences, habits, behavioral patterns |
Each Qdrant point includes a metadata payload for pre-filtering (source, date, category, action_required) to avoid full-scan similarity searches.
**3. Profile memory — PostgreSQL (static table)**
User preferences, fixed facts, communication style. Updated manually or via explicit agent action.
### Embedding Strategy
- Embeddings are generated via Ollama (`nomic-embed-text` or equivalent) once the LLM server is online
- During bootstrap phase: embeddings generated via GitHub Copilot (`text-embedding-3-small` at `api.githubcopilot.com/embeddings`) — same token acquisition pattern already in use
- Never embed raw content — always embed **LLM-generated summaries + extracted entities**
### Proactive Notification Logic
The Arbiter runs on an **adaptive schedule**:
| Time slot | Frequency | Behavior |
|---|---|---|
| 23:0007:00 | Never | Silence |
| 07:0009:00 | Once | Morning briefing (calendar, reminders, pending actions) |
| 09:0019:00 | Every 2-3h | Only high-priority or correlated events |
| 19:0022:00 | Once | Evening recap + next day preview |
High-priority queue messages bypass the schedule and trigger immediate notification.
Notification is sent via **Amazon Echo / Pompeo** (TTS) for voice, and **Telegram** for logging. Every Arbiter decision (notify / discard / defer) is logged to a dedicated Telegram audit channel.
### Voice Interface (Pompeo)
- Amazon Echo → **Alexa Custom Skill****AWS Lambda** (bridge) → **n8n webhook** → Ollama (Qwen2.5-14B) → TTS response back to Echo
- Wake phrase: "Pompeo"
- Lambda is intentionally thin — it only translates the Alexa request format to the n8n webhook payload and returns the TTS response
---
## Existing n8n Workflows (already in production)
### 📬 Gmail — Daily Digest [Schedule] (`1lIKvVJQIcva30YM`)
- Runs every 3 hours (+ test webhook)
- Fetches unread emails from the last 3 hours
- Calls GPT-4.1 (via Copilot) to classify each email: category, sentiment, labels, action_required, whether it has a Paperless-relevant PDF attachment
- Applies Gmail labels, marks as read, trashes spam
- If a bill/invoice PDF is detected → triggers the **Upload Bolletta** webhook
- Sends a digest report to Telegram
### 📄 Paperless — Upload Bolletta [Email] (`vbzQ3fgUalOPdcOq`)
- Triggered by webhook from Daily Digest (payload includes `email_id`)
- Downloads the PDF attachment from Gmail API
- Fetches Paperless metadata (correspondents, document types, tags, storage paths, similar existing documents)
- Calls GPT-4.1 to infer Paperless metadata (correspondent, doc type, tags, storage path, filename, date)
- Uploads PDF to Paperless, polls task status, patches metadata on the created document
- Sends Telegram confirmation
### 📄 Paperless — Upload Documento [Telegram] (`ZX5rLSETg6Xcymps`)
- Triggered by Telegram bot (user sends a PDF with caption starting with "Documento")
- Downloads file from Telegram
- Sends to FileWizard OCR microservice (async job), polls for result
- Same GPT-4.1 metadata inference pipeline as above
- Uploads to Paperless (filename = original filename without extension), patches metadata
- Sends Telegram confirmation with link to document
- Cleans up FileWizard: deletes processed files, then clears job history
**Common pattern across all three**: GitHub Copilot token is obtained fresh at each run (`GET https://api.github.com/copilot_internal/v2/token`), then used for `POST https://api.githubcopilot.com/chat/completions` with model `gpt-4.1`.
### n8n Credentials (IDs)
| ID | Name | Type |
|---|---|---|
| `qvOikS6IF0H5khr8` | Gmail OAuth2 | OAuth2 |
| `uTXHLqcCJxbOvqN3` | Telegram account | Telegram API |
| `vBwUxlzKrX3oDHyN` | GitHub Copilot OAuth Token | HTTP Header Auth |
| `uvGjLbrN5yQTQIzv` | Paperless-NGX API | HTTP Header Auth |
---
## Coding Conventions
- **n8n workflows**: nodes named in Italian, descriptive emoji prefixes on trigger nodes
- **Workflow naming**: `{icon} {App} — {Azione} {Tipo} [{Sorgente}]` (e.g. `📄 Paperless — Upload Documento [Telegram]`)
- **HTTP nodes**: always use `predefinedCredentialType` for authenticated services already configured in n8n credentials
- **GPT body**: use `contentType: "raw"` + `rawContentType: "application/json"` + `JSON.stringify({...})` inline expression — never `specifyBody: string`
- **LLM output parsing**: always defensive — handle missing `choices`, malformed JSON, empty responses gracefully
- **Copilot token**: always fetched fresh per workflow run, never cached across executions
- **Binary fields**: Telegram node `file.get` with `download: true` stores binary in field named `data` (not `attachment`)
- **Postgres**: use UUID primary keys with `gen_random_uuid()`, JSONB for flexible payloads, always include `created_at`
- **Qdrant upsert**: always include full metadata payload for filtering; use `message_id` / `thread_id` / `doc_id` as logical dedup keys
---
## TO-DO
### Phase 0 — Infrastructure Bootstrap *(prerequisite for everything)*
- [ ] Deploy **Qdrant** on the Kubernetes cluster
- Create collections: `martin_episodes`, `martin_knowledge`, `martin_preferences`
- Configure payload indexes on: `source`, `category`, `date`, `action_required`
- [ ] Run **PostgreSQL migrations** on Patroni
- Create tables: `memory_facts`, `finance_documents`, `behavioral_context`
- Add index on `memory_facts(source, category, expires_at)`
- [ ] Verify embedding endpoint via Copilot (`text-embedding-3-small`) as bootstrap fallback
- [ ] Plan migration to local Ollama embedding model once LLM server is online
---
### Phase 1 — Memory Integration into Existing Workflows
- [ ] **Daily Digest**: after `Parse risposta GPT-4.1`, add:
- Postgres INSERT into `memory_facts` (source=email, category, subject, detail JSONB, action_required, expires_at)
- Embedding generation (Copilot endpoint) → Qdrant upsert into `martin_episodes`
- Thread dedup: use `thread_id` as logical key, update existing Qdrant point if thread already exists
- [ ] **Upload Bolletta** + **Upload Documento (Telegram)**: after `Paperless - Patch Metadati`, add:
- Postgres INSERT into `finance_documents` (correspondent, amount, doc_date, doc_type, tags, paperless_doc_id)
- Postgres INSERT into `memory_facts` (source=paperless, category=finance, cross-reference)
- Embedding of OCR text chunks → Qdrant upsert into `martin_knowledge`
---
### Phase 2 — New Agents
- [ ] **Calendar Agent**
- Poll Google Calendar (all relevant calendars)
- Persist upcoming events to Postgres (`memory_facts` + `behavioral_context` for leisure events)
- Weekly cluster embedding (chunk per week, not per event)
- Dedup recurring events: embed only first occurrence, store rest in Postgres only
- [ ] **Finance Agent** (extend beyond Paperless)
- Read Actual Budget export or API
- Persist transactions, monthly summaries to `finance_documents`
- Trend analysis prompt for periodic financial summary
- [ ] **Infrastructure Agent**
- Webhook receiver for Kubernetes/Longhorn/Minio alerts
- Cron-based cluster health check (disk, pod status, backup freshness)
- Publishes to message broker with `priority: high` for critical alerts
- [ ] **IoT Agent**
- Home Assistant webhook → Node-RED → n8n
- Device presence tracking → `behavioral_context`
- Pattern recognition via Qdrant similarity on historical episodes (e.g. "Tuesday evening, outside, laptop on")
- [ ] **Newsletter Agent**
- Separate Gmail label for newsletters (excluded from Daily Digest main flow)
- Morning cron: summarize + extract relevant articles → `martin_knowledge`
---
### Phase 3 — Message Broker + Proactive Arbiter
- [ ] Choose and deploy broker: **NATS JetStream** (preferred — lightweight, native Kubernetes) or Redis Streams
- [ ] Define final message schema (draft above, to be validated)
- [ ] Implement **Proactive Arbiter** n8n workflow:
- Adaptive schedule (morning briefing, midday, evening recap)
- Consume queue batch → LLM correlation prompt → structured `notify/defer/discard` output
- High-priority bypass path
- All decisions logged to Telegram audit channel
- [ ] Implement **correlation logic**: detect when 2+ agents report related events (e.g. IoT presence + calendar event + open reminder)
---
### Phase 4 — Voice Interface (Pompeo)
- [ ] Create Alexa Custom Skill ("Pompeo")
- [ ] AWS Lambda bridge (thin translator: Alexa request → n8n webhook → TTS response)
- [ ] n8n webhook handler: receive transcribed text → prepend memory context → Ollama inference → return TTS string
- [ ] TTS response pipeline back to Echo
- [ ] Proactive push: Arbiter → Lambda → Echo notification (Alexa proactive events API)
---
### Phase 5 — Generalization and Backlog
- [ ] **OCR on email attachments in Daily Digest**: generalize the ingest pipeline to extract text from any PDF attachment (not just bills), using FileWizard OCR — produce richer embeddings and enable full-text retrieval on any emailed document
- [ ] **Flusso Cedolino** (payslip pipeline):
- Trigger: Gmail label `Lavoro/Cedolino` or Telegram upload
- PDF → FileWizard OCR → GPT-4.1 metadata extraction (month, gross, net, deductions)
- Paperless upload with tag `Cedolino`
- Persist structured data to `finance_documents` (custom fields for payslip)
- Trend embedding in `martin_knowledge` for finance agent queries
- [ ] Behavioral habit modeling: aggregate `behavioral_context` records over time, generate periodic "habit summary" embeddings in `martin_preferences`
- [ ] Outline → Qdrant pipeline: sync selected Outline documents into `martin_knowledge` on edit/publish event
- [ ] Chrome browsing history ingestion (privacy-filtered): evaluate browser extension or local export → embedding pipeline for interest/preference modeling
- [ ] "Posti e persone" graph: structured contact/location model in Postgres, populated from email senders, calendar attendees, Home Assistant presence data
- [ ] Local embedding model: migrate from Copilot `text-embedding-3-small` to Ollama-served model (e.g. `nomic-embed-text`) once LLM server is stable