feat: Phase 0 bootstrap — Qdrant deploy e schema PostgreSQL
- README.md: contesto ALPHA_PROJECT, architettura multi-agent, stack infrastrutturale - CHANGELOG.md: documenta deploy Qdrant v1.17.0 e creazione database pompeo - db/postgres.sql: schema DDL database pompeo (user_profile, memory_facts, finance_documents, behavioral_context, agent_messages) con multi-tenancy user_id - db/qdrant.sh: script per creazione/ripristino collections Qdrant (episodes, knowledge, preferences) con payload indexes Design decisions: - Multi-tenancy via user_id su Qdrant e PostgreSQL (estendibile a nuovi utenti senza modifiche infrastrutturali) - agent_messages come blackboard persistente per il Proactive Arbiter Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
154
CHANGELOG.md
Normal file
154
CHANGELOG.md
Normal file
@@ -0,0 +1,154 @@
|
|||||||
|
# ALPHA_PROJECT — Changelog
|
||||||
|
|
||||||
|
Tutte le modifiche significative al progetto ALPHA_PROJECT sono documentate qui.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## [2026-03-21] PostgreSQL — Database "pompeo" e schema ALPHA_PROJECT
|
||||||
|
|
||||||
|
### Overview
|
||||||
|
|
||||||
|
Creato il database `pompeo` sul cluster Patroni (namespace `persistence`) e applicato lo schema iniziale per la memoria strutturata di Pompeo. Seconda milestone della Phase 0 — Infrastructure Bootstrap.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Modifica manifest Patroni
|
||||||
|
|
||||||
|
Aggiunto `pompeo: martin` nella sezione `databases` di `infra/cluster/persistence/patroni/postgres.yaml`. Il database è stato creato automaticamente dallo Zalando Operator senza downtime sugli altri database.
|
||||||
|
|
||||||
|
Script DDL idempotente disponibile in: `alpha/db/postgres.sql`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Design decision — Multi-tenancy anche in PostgreSQL
|
||||||
|
|
||||||
|
Coerentemente con la scelta adottata per Qdrant, tutte le tabelle includono il campo `user_id TEXT NOT NULL DEFAULT 'martin'`. I valori `'martin'` e `'shared'` sono seedati in `user_profile` come utenti iniziali del sistema.
|
||||||
|
|
||||||
|
Aggiungere un nuovo utente in futuro non richiede modifiche allo schema — è sufficiente inserire una riga in `user_profile` e usare il nuovo `user_id` negli INSERT.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Design decision — agent_messages come blackboard persistente
|
||||||
|
|
||||||
|
La tabella `agent_messages` implementa il **blackboard pattern** del message broker: ogni agente n8n inserisce le proprie osservazioni con `arbiter_decision = NULL` (pending). Il Proactive Arbiter legge i messaggi in coda, decide (`notify` / `defer` / `discard`) e aggiorna `arbiter_decision`, `arbiter_reason` e `processed_at`.
|
||||||
|
|
||||||
|
Rispetto a usare solo NATS/Redis come broker, questo approccio garantisce un **audit log permanente** di tutte le osservazioni e decisioni, interrogabile via SQL per debug, tuning e analisi storiche.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Schema creato
|
||||||
|
|
||||||
|
**5 tabelle** nel database `pompeo`:
|
||||||
|
|
||||||
|
| Tabella | Ruolo |
|
||||||
|
|---|---|
|
||||||
|
| `user_profile` | Preferenze statiche per utente (lingua, timezone, stile notifiche, quiet hours). Seed: `martin`, `shared` |
|
||||||
|
| `memory_facts` | Fatti episodici prodotti da tutti gli agenti, con TTL (`expires_at`) e riferimento al punto Qdrant (`qdrant_id`) |
|
||||||
|
| `finance_documents` | Documenti finanziari strutturati: bollette, fatture, cedolini. Include `raw_text` per embedding |
|
||||||
|
| `behavioral_context` | Contesto IoT/comportamentale per l'Arbiter: DND, home presence, tipo evento |
|
||||||
|
| `agent_messages` | Blackboard del message broker — osservazioni agenti + decisioni Arbiter |
|
||||||
|
|
||||||
|
**15 index** totali:
|
||||||
|
|
||||||
|
| Index | Tabella | Tipo |
|
||||||
|
|---|---|---|
|
||||||
|
| `idx_memory_facts_user_source_cat` | `memory_facts` | `(user_id, source, category)` |
|
||||||
|
| `idx_memory_facts_expires` | `memory_facts` | `(expires_at)` WHERE NOT NULL |
|
||||||
|
| `idx_memory_facts_action` | `memory_facts` | `(user_id, action_required)` WHERE true |
|
||||||
|
| `idx_finance_docs_user_date` | `finance_documents` | `(user_id, doc_date DESC)` |
|
||||||
|
| `idx_finance_docs_correspondent` | `finance_documents` | `(user_id, correspondent)` |
|
||||||
|
| `idx_behavioral_ctx_user_time` | `behavioral_context` | `(user_id, start_at, end_at)` |
|
||||||
|
| `idx_behavioral_ctx_dnd` | `behavioral_context` | `(user_id, do_not_disturb)` WHERE true |
|
||||||
|
| `idx_agent_msgs_pending` | `agent_messages` | `(user_id, priority, created_at)` WHERE pending |
|
||||||
|
| `idx_agent_msgs_agent_type` | `agent_messages` | `(agent, event_type, created_at)` |
|
||||||
|
| `idx_agent_msgs_expires` | `agent_messages` | `(expires_at)` WHERE pending AND NOT NULL |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Phase 0 — Stato aggiornato
|
||||||
|
|
||||||
|
- [x] ~~Deploy **Qdrant** sul cluster~~ ✅ 2026-03-21
|
||||||
|
- [x] ~~Collections Qdrant con multi-tenancy `user_id`~~ ✅ 2026-03-21
|
||||||
|
- [x] ~~Payload indexes Qdrant~~ ✅ 2026-03-21
|
||||||
|
- [x] ~~Database `pompeo` + schema PostgreSQL~~ ✅ 2026-03-21
|
||||||
|
- [ ] Verify embedding endpoint via Copilot (`text-embedding-3-small`)
|
||||||
|
- [ ] Migrazione a Ollama `nomic-embed-text` (quando LLM server è online)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## [2026-03-21] Qdrant — Deploy e setup collections (Phase 0)
|
||||||
|
|
||||||
|
### Overview
|
||||||
|
|
||||||
|
Completato il deploy di **Qdrant v1.17.0** sul cluster Kubernetes (namespace `persistence`) e la creazione delle collections per la memoria semantica di Pompeo. Questa è la prima milestone della Phase 0 — Infrastructure Bootstrap.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Deploy infrastruttura
|
||||||
|
|
||||||
|
Qdrant deployato via Helm chart ufficiale (`qdrant/qdrant`) nel namespace `persistence`, coerente con il pattern infrastrutturale esistente (Longhorn storage, Sealed Secrets, ServiceMonitor Prometheus).
|
||||||
|
|
||||||
|
**Risorse create:**
|
||||||
|
|
||||||
|
| Risorsa | Dettaglio |
|
||||||
|
|---|---|
|
||||||
|
| StatefulSet `qdrant` | 1/1 pod Running, image `qdrant/qdrant:v1.17.0` |
|
||||||
|
| PVC `qdrant-storage-qdrant-0` | 20Gi Longhorn RWO |
|
||||||
|
| Service `qdrant` | ClusterIP — porte 6333 (REST), 6334 (gRPC), 6335 (p2p) |
|
||||||
|
| SealedSecret `qdrant-api-secret` | API key cifrata, namespace `persistence` |
|
||||||
|
| ServiceMonitor `qdrant` | Prometheus scraping su `:6333/metrics`, label `release: monitoring` |
|
||||||
|
|
||||||
|
**Endpoint interno:** `qdrant.persistence.svc.cluster.local:6333`
|
||||||
|
|
||||||
|
Manifest in: `infra/cluster/persistence/qdrant/`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Design decision — Multi-tenancy collections (Opzione B)
|
||||||
|
|
||||||
|
**Problema affrontato**: nominare le collections `martin_episodes`, `martin_knowledge`, `martin_preferences` avrebbe vincolato Pompeo ad essere esclusivamente un assistente personale singolo, rendendo impossibile — senza migration — estendere il sistema ad altri membri della famiglia in futuro.
|
||||||
|
|
||||||
|
**Scelta adottata**: architettura multi-tenant con 3 collection condivise e isolamento via campo `user_id` nel payload di ogni punto vettoriale.
|
||||||
|
|
||||||
|
```
|
||||||
|
episodes ← user_id: "martin" | "shared" | <futuri utenti>
|
||||||
|
knowledge ← user_id: "martin" | "shared" | <futuri utenti>
|
||||||
|
preferences ← user_id: "martin" | "shared" | <futuri utenti>
|
||||||
|
```
|
||||||
|
|
||||||
|
Il valore `"shared"` è riservato a dati della casa/famiglia visibili a tutti gli utenti (es. calendario condiviso, documenti di casa, finanze comuni). Le query n8n usano un filtro `should: [user_id=martin, user_id=shared]` per recuperare sia il contesto personale che quello condiviso.
|
||||||
|
|
||||||
|
**Vantaggi**: aggiungere un nuovo utente domani non richiede alcuna modifica infrastrutturale — solo includere il nuovo `user_id` negli upsert e nelle query.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Collections create
|
||||||
|
|
||||||
|
Tutte e 3 le collections sono operative (status `green`):
|
||||||
|
|
||||||
|
| Collection | Contenuto |
|
||||||
|
|---|---|
|
||||||
|
| `episodes` | Fatti episodici con timestamp (email, IoT, calendario, conversazioni) |
|
||||||
|
| `knowledge` | Documenti, note Outline, newsletter, knowledge base |
|
||||||
|
| `preferences` | Preferenze, abitudini e pattern comportamentali per utente |
|
||||||
|
|
||||||
|
**Payload schema comune** (5 index su ogni collection):
|
||||||
|
|
||||||
|
| Campo | Tipo | Scopo |
|
||||||
|
|---|---|---|
|
||||||
|
| `user_id` | keyword | Filtro multi-tenant (`"martin"`, `"shared"`) |
|
||||||
|
| `source` | keyword | Origine del dato (`"email"`, `"calendar"`, `"iot"`, `"paperless"`, …) |
|
||||||
|
| `category` | keyword | Dominio semantico (`"finance"`, `"work"`, `"personal"`, …) |
|
||||||
|
| `date` | datetime | Timestamp del fatto — filtrabile per range |
|
||||||
|
| `action_required` | bool | Flag per il Proactive Arbiter |
|
||||||
|
|
||||||
|
**Dimensione vettori**: 1536 (compatibile con `text-embedding-3-small` via GitHub Copilot — bootstrap phase). Da rivedere alla migrazione verso `nomic-embed-text` su Ollama.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Phase 0 — Stato al momento del deploy Qdrant
|
||||||
|
|
||||||
|
- [x] ~~Deploy **Qdrant** sul cluster~~
|
||||||
|
- [x] ~~Creazione collections con multi-tenancy `user_id`~~
|
||||||
|
- [x] ~~Payload indexes: `user_id`, `source`, `category`, `date`, `action_required`~~
|
||||||
|
- [x] ~~Run **PostgreSQL migrations** su Patroni~~ ✅ completato nella sessione stessa
|
||||||
41
README.md
41
README.md
@@ -42,7 +42,7 @@ Production-grade self-hosted stack. Key components relevant to ALPHA_PROJECT:
|
|||||||
| **n8n** | Primary orchestrator and workflow engine for all agents |
|
| **n8n** | Primary orchestrator and workflow engine for all agents |
|
||||||
| **Node-RED** | Event-driven automation, Home Assistant bridge |
|
| **Node-RED** | Event-driven automation, Home Assistant bridge |
|
||||||
| **Patroni / PostgreSQL** | Persistent structured memory store |
|
| **Patroni / PostgreSQL** | Persistent structured memory store |
|
||||||
| **Qdrant** | Vector store for semantic/episodic memory *(to be deployed)* |
|
| **Qdrant** | Vector store for semantic/episodic memory — `qdrant.persistence.svc.cluster.local:6333` |
|
||||||
| **NATS / Redis Streams** | Message broker between agents *(to be chosen and deployed)* |
|
| **NATS / Redis Streams** | Message broker between agents *(to be chosen and deployed)* |
|
||||||
| **Authentik** | SSO / IAM (OIDC) |
|
| **Authentik** | SSO / IAM (OIDC) |
|
||||||
| **Home Assistant** | IoT hub — device tracking, automations, sensors |
|
| **Home Assistant** | IoT hub — device tracking, automations, sensors |
|
||||||
@@ -160,17 +160,17 @@ CREATE TABLE behavioral_context (
|
|||||||
);
|
);
|
||||||
```
|
```
|
||||||
|
|
||||||
**2. Semantic memory — Qdrant**
|
**2. Semantic memory — Qdrant** — `qdrant.persistence.svc.cluster.local:6333`
|
||||||
|
|
||||||
Vector embeddings for similarity search. Three collections:
|
Vector embeddings for similarity search. Three collections with **multi-tenant design**: isolation via `user_id` payload field (`"martin"`, `"shared"`, future users).
|
||||||
|
|
||||||
| Collection | Content |
|
| Collection | Content |
|
||||||
|---|---|
|
|---|---|
|
||||||
| `martin_episodes` | Conversations, episodic facts with timestamp |
|
| `episodes` | Conversations, episodic facts with timestamp |
|
||||||
| `martin_knowledge` | Documents, Outline notes, newsletters, knowledge base |
|
| `knowledge` | Documents, Outline notes, newsletters, knowledge base |
|
||||||
| `martin_preferences` | Preferences, habits, behavioral patterns |
|
| `preferences` | Preferences, habits, behavioral patterns |
|
||||||
|
|
||||||
Each Qdrant point includes a metadata payload for pre-filtering (source, date, category, action_required) to avoid full-scan similarity searches.
|
Each Qdrant point includes a metadata payload for pre-filtering (`user_id`, `source`, `date`, `category`, `action_required`) to avoid full-scan similarity searches.
|
||||||
|
|
||||||
**3. Profile memory — PostgreSQL (static table)**
|
**3. Profile memory — PostgreSQL (static table)**
|
||||||
|
|
||||||
@@ -266,12 +266,15 @@ Notification is sent via **Amazon Echo / Pompeo** (TTS) for voice, and **Telegra
|
|||||||
|
|
||||||
### Phase 0 — Infrastructure Bootstrap *(prerequisite for everything)*
|
### Phase 0 — Infrastructure Bootstrap *(prerequisite for everything)*
|
||||||
|
|
||||||
- [ ] Deploy **Qdrant** on the Kubernetes cluster
|
- [x] ~~Deploy **Qdrant** on the Kubernetes cluster~~ ✅ 2026-03-21
|
||||||
- Create collections: `martin_episodes`, `martin_knowledge`, `martin_preferences`
|
- Collections: `episodes`, `knowledge`, `preferences` (multi-tenant via `user_id` payload field)
|
||||||
- Configure payload indexes on: `source`, `category`, `date`, `action_required`
|
- Payload indexes: `user_id`, `source`, `category`, `date`, `action_required`
|
||||||
- [ ] Run **PostgreSQL migrations** on Patroni
|
- Endpoint: `qdrant.persistence.svc.cluster.local:6333`
|
||||||
- Create tables: `memory_facts`, `finance_documents`, `behavioral_context`
|
- [x] ~~Run **PostgreSQL migrations** on Patroni~~ ✅ 2026-03-21
|
||||||
- Add index on `memory_facts(source, category, expires_at)`
|
- Database `pompeo` creato (Zalando Operator)
|
||||||
|
- Tabelle: `user_profile`, `memory_facts`, `finance_documents`, `behavioral_context`, `agent_messages`
|
||||||
|
- Multi-tenancy: campo `user_id` su tutte le tabelle, seed `martin` + `shared`
|
||||||
|
- Script DDL: `alpha/db/postgres.sql`
|
||||||
- [ ] Verify embedding endpoint via Copilot (`text-embedding-3-small`) as bootstrap fallback
|
- [ ] Verify embedding endpoint via Copilot (`text-embedding-3-small`) as bootstrap fallback
|
||||||
- [ ] Plan migration to local Ollama embedding model once LLM server is online
|
- [ ] Plan migration to local Ollama embedding model once LLM server is online
|
||||||
|
|
||||||
@@ -281,13 +284,13 @@ Notification is sent via **Amazon Echo / Pompeo** (TTS) for voice, and **Telegra
|
|||||||
|
|
||||||
- [ ] **Daily Digest**: after `Parse risposta GPT-4.1`, add:
|
- [ ] **Daily Digest**: after `Parse risposta GPT-4.1`, add:
|
||||||
- Postgres INSERT into `memory_facts` (source=email, category, subject, detail JSONB, action_required, expires_at)
|
- Postgres INSERT into `memory_facts` (source=email, category, subject, detail JSONB, action_required, expires_at)
|
||||||
- Embedding generation (Copilot endpoint) → Qdrant upsert into `martin_episodes`
|
- Embedding generation (Copilot endpoint) → Qdrant upsert into `episodes` (user_id=martin)
|
||||||
- Thread dedup: use `thread_id` as logical key, update existing Qdrant point if thread already exists
|
- Thread dedup: use `thread_id` as logical key, update existing Qdrant point if thread already exists
|
||||||
|
|
||||||
- [ ] **Upload Bolletta** + **Upload Documento (Telegram)**: after `Paperless - Patch Metadati`, add:
|
- [ ] **Upload Bolletta** + **Upload Documento (Telegram)**: after `Paperless - Patch Metadati`, add:
|
||||||
- Postgres INSERT into `finance_documents` (correspondent, amount, doc_date, doc_type, tags, paperless_doc_id)
|
- Postgres INSERT into `finance_documents` (correspondent, amount, doc_date, doc_type, tags, paperless_doc_id)
|
||||||
- Postgres INSERT into `memory_facts` (source=paperless, category=finance, cross-reference)
|
- Postgres INSERT into `memory_facts` (source=paperless, category=finance, cross-reference)
|
||||||
- Embedding of OCR text chunks → Qdrant upsert into `martin_knowledge`
|
- Embedding of OCR text chunks → Qdrant upsert into `knowledge` (user_id=martin)
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -316,7 +319,7 @@ Notification is sent via **Amazon Echo / Pompeo** (TTS) for voice, and **Telegra
|
|||||||
|
|
||||||
- [ ] **Newsletter Agent**
|
- [ ] **Newsletter Agent**
|
||||||
- Separate Gmail label for newsletters (excluded from Daily Digest main flow)
|
- Separate Gmail label for newsletters (excluded from Daily Digest main flow)
|
||||||
- Morning cron: summarize + extract relevant articles → `martin_knowledge`
|
- Morning cron: summarize + extract relevant articles → `knowledge`
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -351,9 +354,9 @@ Notification is sent via **Amazon Echo / Pompeo** (TTS) for voice, and **Telegra
|
|||||||
- PDF → FileWizard OCR → GPT-4.1 metadata extraction (month, gross, net, deductions)
|
- PDF → FileWizard OCR → GPT-4.1 metadata extraction (month, gross, net, deductions)
|
||||||
- Paperless upload with tag `Cedolino`
|
- Paperless upload with tag `Cedolino`
|
||||||
- Persist structured data to `finance_documents` (custom fields for payslip)
|
- Persist structured data to `finance_documents` (custom fields for payslip)
|
||||||
- Trend embedding in `martin_knowledge` for finance agent queries
|
- Trend embedding in `knowledge` for finance agent queries
|
||||||
- [ ] Behavioral habit modeling: aggregate `behavioral_context` records over time, generate periodic "habit summary" embeddings in `martin_preferences`
|
- [ ] Behavioral habit modeling: aggregate `behavioral_context` records over time, generate periodic "habit summary" embeddings in `preferences`
|
||||||
- [ ] Outline → Qdrant pipeline: sync selected Outline documents into `martin_knowledge` on edit/publish event
|
- [ ] Outline → Qdrant pipeline: sync selected Outline documents into `knowledge` on edit/publish event
|
||||||
- [ ] Chrome browsing history ingestion (privacy-filtered): evaluate browser extension or local export → embedding pipeline for interest/preference modeling
|
- [ ] Chrome browsing history ingestion (privacy-filtered): evaluate browser extension or local export → embedding pipeline for interest/preference modeling
|
||||||
- [ ] "Posti e persone" graph: structured contact/location model in Postgres, populated from email senders, calendar attendees, Home Assistant presence data
|
- [ ] "Posti e persone" graph: structured contact/location model in Postgres, populated from email senders, calendar attendees, Home Assistant presence data
|
||||||
- [ ] Local embedding model: migrate from Copilot `text-embedding-3-small` to Ollama-served model (e.g. `nomic-embed-text`) once LLM server is stable
|
- [ ] Local embedding model: migrate from Copilot `text-embedding-3-small` to Ollama-served model (e.g. `nomic-embed-text`) once LLM server is stable
|
||||||
|
|||||||
170
db/postgres.sql
Normal file
170
db/postgres.sql
Normal file
@@ -0,0 +1,170 @@
|
|||||||
|
-- =============================================================================
|
||||||
|
-- ALPHA_PROJECT — Database "pompeo" — Schema iniziale
|
||||||
|
-- =============================================================================
|
||||||
|
-- Applicare su: postgresql://martin@postgres.persistence.svc.cluster.local:5432/pompeo
|
||||||
|
--
|
||||||
|
-- Esecuzione dal cluster:
|
||||||
|
-- sudo microk8s kubectl run psql-pompeo --rm -it \
|
||||||
|
-- --image=postgres:17-alpine --namespace=persistence \
|
||||||
|
-- --env="PGPASSWORD=<password>" --restart=Never \
|
||||||
|
-- -- psql "postgresql://martin@postgres:5432/pompeo" -f /dev/stdin < postgres.sql
|
||||||
|
--
|
||||||
|
-- Esecuzione via port-forward:
|
||||||
|
-- sudo microk8s kubectl port-forward svc/postgres -n persistence 5432:5432
|
||||||
|
-- psql "postgresql://martin@localhost:5432/pompeo" -f postgres.sql
|
||||||
|
-- =============================================================================
|
||||||
|
|
||||||
|
\c pompeo
|
||||||
|
|
||||||
|
-- ---------------------------------------------------------------------------
|
||||||
|
-- Estensioni
|
||||||
|
-- ---------------------------------------------------------------------------
|
||||||
|
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
|
||||||
|
CREATE EXTENSION IF NOT EXISTS "pg_trgm"; -- full-text similarity search su subject/detail
|
||||||
|
|
||||||
|
|
||||||
|
-- =============================================================================
|
||||||
|
-- 1. USER_PROFILE
|
||||||
|
-- Preferenze statiche per utente. Aggiornata manualmente o via agent action.
|
||||||
|
-- user_id 'shared' = preferenze della casa (visibili a tutti).
|
||||||
|
-- =============================================================================
|
||||||
|
CREATE TABLE IF NOT EXISTS user_profile (
|
||||||
|
user_id TEXT PRIMARY KEY,
|
||||||
|
display_name TEXT,
|
||||||
|
language TEXT NOT NULL DEFAULT 'it',
|
||||||
|
timezone TEXT NOT NULL DEFAULT 'Europe/Rome',
|
||||||
|
notification_style TEXT NOT NULL DEFAULT 'concise', -- 'concise' | 'verbose'
|
||||||
|
quiet_start TIME NOT NULL DEFAULT '23:00',
|
||||||
|
quiet_end TIME NOT NULL DEFAULT '07:00',
|
||||||
|
preferences JSONB, -- freeform: soglie, preferenze extra per agente
|
||||||
|
updated_at TIMESTAMP NOT NULL DEFAULT now()
|
||||||
|
);
|
||||||
|
|
||||||
|
-- Utenti iniziali
|
||||||
|
INSERT INTO user_profile (user_id, display_name) VALUES
|
||||||
|
('martin', 'Martin'),
|
||||||
|
('shared', 'Shared')
|
||||||
|
ON CONFLICT (user_id) DO NOTHING;
|
||||||
|
|
||||||
|
|
||||||
|
-- =============================================================================
|
||||||
|
-- 2. MEMORY_FACTS
|
||||||
|
-- Fatti episodici prodotti da tutti gli agenti. TTL tramite expires_at.
|
||||||
|
-- qdrant_id: riferimento al punto vettoriale corrispondente nella collection "episodes".
|
||||||
|
-- =============================================================================
|
||||||
|
CREATE TABLE IF NOT EXISTS memory_facts (
|
||||||
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||||
|
user_id TEXT NOT NULL DEFAULT 'martin',
|
||||||
|
source TEXT NOT NULL, -- 'email' | 'calendar' | 'iot' | 'paperless' | 'n8n' | ...
|
||||||
|
category TEXT, -- 'finance' | 'personal' | 'work' | 'health' | ...
|
||||||
|
subject TEXT,
|
||||||
|
detail JSONB, -- payload flessibile per-source
|
||||||
|
action_required BOOLEAN NOT NULL DEFAULT false,
|
||||||
|
action_text TEXT,
|
||||||
|
created_at TIMESTAMP NOT NULL DEFAULT now(),
|
||||||
|
expires_at TIMESTAMP, -- NULL = permanente
|
||||||
|
qdrant_id UUID -- FK logico → collection "episodes"
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_memory_facts_user_source_cat
|
||||||
|
ON memory_facts(user_id, source, category);
|
||||||
|
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_memory_facts_expires
|
||||||
|
ON memory_facts(expires_at)
|
||||||
|
WHERE expires_at IS NOT NULL;
|
||||||
|
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_memory_facts_action
|
||||||
|
ON memory_facts(user_id, action_required)
|
||||||
|
WHERE action_required = true;
|
||||||
|
|
||||||
|
|
||||||
|
-- =============================================================================
|
||||||
|
-- 3. FINANCE_DOCUMENTS
|
||||||
|
-- Documenti finanziari strutturati (bollette, fatture, cedolini).
|
||||||
|
-- paperless_doc_id: riferimento al documento in Paperless-ngx.
|
||||||
|
-- =============================================================================
|
||||||
|
CREATE TABLE IF NOT EXISTS finance_documents (
|
||||||
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||||
|
user_id TEXT NOT NULL DEFAULT 'martin',
|
||||||
|
paperless_doc_id INT, -- ID documento in Paperless-ngx
|
||||||
|
correspondent TEXT,
|
||||||
|
amount NUMERIC(10,2),
|
||||||
|
currency TEXT NOT NULL DEFAULT 'EUR',
|
||||||
|
doc_date DATE,
|
||||||
|
doc_type TEXT, -- 'bolletta' | 'fattura' | 'cedolino' | ...
|
||||||
|
tags TEXT[],
|
||||||
|
raw_text TEXT, -- testo OCR grezzo (per embedding)
|
||||||
|
created_at TIMESTAMP NOT NULL DEFAULT now()
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_finance_docs_user_date
|
||||||
|
ON finance_documents(user_id, doc_date DESC);
|
||||||
|
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_finance_docs_correspondent
|
||||||
|
ON finance_documents(user_id, correspondent);
|
||||||
|
|
||||||
|
|
||||||
|
-- =============================================================================
|
||||||
|
-- 4. BEHAVIORAL_CONTEXT
|
||||||
|
-- Contesto comportamentale prodotto dall'IoT Agent e dal Calendar Agent.
|
||||||
|
-- Usato dal Proactive Arbiter per rispettare DND e stimare presence.
|
||||||
|
-- =============================================================================
|
||||||
|
CREATE TABLE IF NOT EXISTS behavioral_context (
|
||||||
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||||
|
user_id TEXT NOT NULL DEFAULT 'martin',
|
||||||
|
event_type TEXT, -- 'sport_event' | 'dog_walk' | 'work_session' | 'commute' | ...
|
||||||
|
start_at TIMESTAMP,
|
||||||
|
end_at TIMESTAMP,
|
||||||
|
do_not_disturb BOOLEAN NOT NULL DEFAULT false,
|
||||||
|
home_presence_expected BOOLEAN,
|
||||||
|
notes TEXT,
|
||||||
|
created_at TIMESTAMP NOT NULL DEFAULT now()
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_behavioral_ctx_user_time
|
||||||
|
ON behavioral_context(user_id, start_at, end_at);
|
||||||
|
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_behavioral_ctx_dnd
|
||||||
|
ON behavioral_context(user_id, do_not_disturb)
|
||||||
|
WHERE do_not_disturb = true;
|
||||||
|
|
||||||
|
|
||||||
|
-- =============================================================================
|
||||||
|
-- 5. AGENT_MESSAGES
|
||||||
|
-- Blackboard: ogni agente pubblica qui le proprie osservazioni.
|
||||||
|
-- Il Proactive Arbiter legge, decide (notify/defer/discard) e aggiorna.
|
||||||
|
-- Corrisponde al message schema definito in alpha/README.md.
|
||||||
|
-- =============================================================================
|
||||||
|
CREATE TABLE IF NOT EXISTS agent_messages (
|
||||||
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||||
|
agent TEXT NOT NULL, -- 'mail' | 'calendar' | 'iot' | 'finance' | 'infra' | ...
|
||||||
|
priority TEXT NOT NULL, -- 'low' | 'high'
|
||||||
|
event_type TEXT NOT NULL, -- 'new_fact' | 'reminder' | 'alert' | 'behavioral_observation'
|
||||||
|
user_id TEXT NOT NULL DEFAULT 'martin',
|
||||||
|
subject TEXT,
|
||||||
|
detail JSONB,
|
||||||
|
source_ref TEXT, -- ID record Postgres o ref esterna
|
||||||
|
expires_at TIMESTAMP,
|
||||||
|
arbiter_decision TEXT, -- NULL (pending) | 'notify' | 'defer' | 'discard'
|
||||||
|
arbiter_reason TEXT,
|
||||||
|
created_at TIMESTAMP NOT NULL DEFAULT now(),
|
||||||
|
processed_at TIMESTAMP
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_agent_msgs_pending
|
||||||
|
ON agent_messages(user_id, priority, created_at)
|
||||||
|
WHERE arbiter_decision IS NULL;
|
||||||
|
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_agent_msgs_agent_type
|
||||||
|
ON agent_messages(agent, event_type, created_at);
|
||||||
|
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_agent_msgs_expires
|
||||||
|
ON agent_messages(expires_at)
|
||||||
|
WHERE expires_at IS NOT NULL AND arbiter_decision IS NULL;
|
||||||
|
|
||||||
|
|
||||||
|
-- =============================================================================
|
||||||
|
-- Fine script
|
||||||
|
-- =============================================================================
|
||||||
|
\echo '✅ Schema pompeo applicato correttamente.'
|
||||||
|
\echo ' Tabelle: user_profile, memory_facts, finance_documents, behavioral_context, agent_messages'
|
||||||
91
db/qdrant.sh
Normal file
91
db/qdrant.sh
Normal file
@@ -0,0 +1,91 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
# =============================================================================
|
||||||
|
# ALPHA_PROJECT — Qdrant — Setup collections e payload indexes
|
||||||
|
# =============================================================================
|
||||||
|
# Collections già create il 2026-03-21. Script conservato per tracciabilità
|
||||||
|
# e disaster recovery (da eseguire su un'istanza Qdrant vuota).
|
||||||
|
#
|
||||||
|
# Prerequisiti:
|
||||||
|
# sudo microk8s kubectl port-forward svc/qdrant -n persistence 6333:6333
|
||||||
|
#
|
||||||
|
# Esecuzione:
|
||||||
|
# bash alpha/db/qdrant.sh
|
||||||
|
# =============================================================================
|
||||||
|
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
QDRANT_URL="${QDRANT_URL:-http://localhost:6333}"
|
||||||
|
QDRANT_API_KEY="${QDRANT_API_KEY:-__Montecarlo00!}"
|
||||||
|
|
||||||
|
# Dimensione vettori: 1536 = text-embedding-3-small (Copilot, bootstrap phase)
|
||||||
|
# Da aggiornare a 768 alla migrazione verso nomic-embed-text su Ollama
|
||||||
|
VECTOR_SIZE=1536
|
||||||
|
|
||||||
|
header_key="api-key: ${QDRANT_API_KEY}"
|
||||||
|
|
||||||
|
echo "==> Connessione a ${QDRANT_URL}"
|
||||||
|
curl -sf "${QDRANT_URL}/" -H "${header_key}" | grep -o '"version":"[^"]*"'
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# -----------------------------------------------------------------------------
|
||||||
|
# Collections
|
||||||
|
# Architettura multi-tenant: isolamento via campo user_id nel payload.
|
||||||
|
# Valori user_id: "martin" | "shared" | <futuri utenti>
|
||||||
|
# -----------------------------------------------------------------------------
|
||||||
|
for COL in episodes knowledge preferences; do
|
||||||
|
echo "==> Creazione collection: ${COL}"
|
||||||
|
curl -sf -X PUT "${QDRANT_URL}/collections/${COL}" \
|
||||||
|
-H "${header_key}" \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d "{
|
||||||
|
\"vectors\": { \"size\": ${VECTOR_SIZE}, \"distance\": \"Cosine\" },
|
||||||
|
\"optimizers_config\": { \"default_segment_number\": 2 },
|
||||||
|
\"replication_factor\": 1
|
||||||
|
}" | grep -o '"status":"[^"]*"'
|
||||||
|
done
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# -----------------------------------------------------------------------------
|
||||||
|
# Payload indexes (per pre-filtering efficiente prima della ricerca vettoriale)
|
||||||
|
# -----------------------------------------------------------------------------
|
||||||
|
for COL in episodes knowledge preferences; do
|
||||||
|
echo "==> Indexes per collection: ${COL}"
|
||||||
|
|
||||||
|
for FIELD in user_id source category; do
|
||||||
|
printf " %-20s (keyword) → " "${FIELD}"
|
||||||
|
curl -sf -X PUT "${QDRANT_URL}/collections/${COL}/index" \
|
||||||
|
-H "${header_key}" \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d "{\"field_name\": \"${FIELD}\", \"field_schema\": \"keyword\"}" \
|
||||||
|
| grep -o '"status":"[^"]*"'
|
||||||
|
done
|
||||||
|
|
||||||
|
printf " %-20s (datetime) → " "date"
|
||||||
|
curl -sf -X PUT "${QDRANT_URL}/collections/${COL}/index" \
|
||||||
|
-H "${header_key}" \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"field_name": "date", "field_schema": "datetime"}' \
|
||||||
|
| grep -o '"status":"[^"]*"'
|
||||||
|
|
||||||
|
printf " %-20s (bool) → " "action_required"
|
||||||
|
curl -sf -X PUT "${QDRANT_URL}/collections/${COL}/index" \
|
||||||
|
-H "${header_key}" \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"field_name": "action_required", "field_schema": "bool"}' \
|
||||||
|
| grep -o '"status":"[^"]*"'
|
||||||
|
done
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# -----------------------------------------------------------------------------
|
||||||
|
# Verifica finale
|
||||||
|
# -----------------------------------------------------------------------------
|
||||||
|
echo "==> Collections attive:"
|
||||||
|
curl -sf "${QDRANT_URL}/collections" -H "${header_key}" \
|
||||||
|
| python3 -c "import sys,json; [print(' -', c['name']) for c in json.load(sys.stdin)['result']['collections']]"
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
echo "✅ Setup Qdrant completato."
|
||||||
|
echo " Collections: episodes, knowledge, preferences"
|
||||||
|
echo " Payload indexes: user_id, source, category, date, action_required"
|
||||||
Reference in New Issue
Block a user