feat: Phase 0 bootstrap — Qdrant deploy e schema PostgreSQL
- README.md: contesto ALPHA_PROJECT, architettura multi-agent, stack infrastrutturale - CHANGELOG.md: documenta deploy Qdrant v1.17.0 e creazione database pompeo - db/postgres.sql: schema DDL database pompeo (user_profile, memory_facts, finance_documents, behavioral_context, agent_messages) con multi-tenancy user_id - db/qdrant.sh: script per creazione/ripristino collections Qdrant (episodes, knowledge, preferences) con payload indexes Design decisions: - Multi-tenancy via user_id su Qdrant e PostgreSQL (estendibile a nuovi utenti senza modifiche infrastrutturali) - agent_messages come blackboard persistente per il Proactive Arbiter Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
41
README.md
41
README.md
@@ -42,7 +42,7 @@ Production-grade self-hosted stack. Key components relevant to ALPHA_PROJECT:
|
||||
| **n8n** | Primary orchestrator and workflow engine for all agents |
|
||||
| **Node-RED** | Event-driven automation, Home Assistant bridge |
|
||||
| **Patroni / PostgreSQL** | Persistent structured memory store |
|
||||
| **Qdrant** | Vector store for semantic/episodic memory *(to be deployed)* |
|
||||
| **Qdrant** | Vector store for semantic/episodic memory — `qdrant.persistence.svc.cluster.local:6333` |
|
||||
| **NATS / Redis Streams** | Message broker between agents *(to be chosen and deployed)* |
|
||||
| **Authentik** | SSO / IAM (OIDC) |
|
||||
| **Home Assistant** | IoT hub — device tracking, automations, sensors |
|
||||
@@ -160,17 +160,17 @@ CREATE TABLE behavioral_context (
|
||||
);
|
||||
```
|
||||
|
||||
**2. Semantic memory — Qdrant**
|
||||
**2. Semantic memory — Qdrant** — `qdrant.persistence.svc.cluster.local:6333`
|
||||
|
||||
Vector embeddings for similarity search. Three collections:
|
||||
Vector embeddings for similarity search. Three collections with **multi-tenant design**: isolation via `user_id` payload field (`"martin"`, `"shared"`, future users).
|
||||
|
||||
| Collection | Content |
|
||||
|---|---|
|
||||
| `martin_episodes` | Conversations, episodic facts with timestamp |
|
||||
| `martin_knowledge` | Documents, Outline notes, newsletters, knowledge base |
|
||||
| `martin_preferences` | Preferences, habits, behavioral patterns |
|
||||
| `episodes` | Conversations, episodic facts with timestamp |
|
||||
| `knowledge` | Documents, Outline notes, newsletters, knowledge base |
|
||||
| `preferences` | Preferences, habits, behavioral patterns |
|
||||
|
||||
Each Qdrant point includes a metadata payload for pre-filtering (source, date, category, action_required) to avoid full-scan similarity searches.
|
||||
Each Qdrant point includes a metadata payload for pre-filtering (`user_id`, `source`, `date`, `category`, `action_required`) to avoid full-scan similarity searches.
|
||||
|
||||
**3. Profile memory — PostgreSQL (static table)**
|
||||
|
||||
@@ -266,12 +266,15 @@ Notification is sent via **Amazon Echo / Pompeo** (TTS) for voice, and **Telegra
|
||||
|
||||
### Phase 0 — Infrastructure Bootstrap *(prerequisite for everything)*
|
||||
|
||||
- [ ] Deploy **Qdrant** on the Kubernetes cluster
|
||||
- Create collections: `martin_episodes`, `martin_knowledge`, `martin_preferences`
|
||||
- Configure payload indexes on: `source`, `category`, `date`, `action_required`
|
||||
- [ ] Run **PostgreSQL migrations** on Patroni
|
||||
- Create tables: `memory_facts`, `finance_documents`, `behavioral_context`
|
||||
- Add index on `memory_facts(source, category, expires_at)`
|
||||
- [x] ~~Deploy **Qdrant** on the Kubernetes cluster~~ ✅ 2026-03-21
|
||||
- Collections: `episodes`, `knowledge`, `preferences` (multi-tenant via `user_id` payload field)
|
||||
- Payload indexes: `user_id`, `source`, `category`, `date`, `action_required`
|
||||
- Endpoint: `qdrant.persistence.svc.cluster.local:6333`
|
||||
- [x] ~~Run **PostgreSQL migrations** on Patroni~~ ✅ 2026-03-21
|
||||
- Database `pompeo` creato (Zalando Operator)
|
||||
- Tabelle: `user_profile`, `memory_facts`, `finance_documents`, `behavioral_context`, `agent_messages`
|
||||
- Multi-tenancy: campo `user_id` su tutte le tabelle, seed `martin` + `shared`
|
||||
- Script DDL: `alpha/db/postgres.sql`
|
||||
- [ ] Verify embedding endpoint via Copilot (`text-embedding-3-small`) as bootstrap fallback
|
||||
- [ ] Plan migration to local Ollama embedding model once LLM server is online
|
||||
|
||||
@@ -281,13 +284,13 @@ Notification is sent via **Amazon Echo / Pompeo** (TTS) for voice, and **Telegra
|
||||
|
||||
- [ ] **Daily Digest**: after `Parse risposta GPT-4.1`, add:
|
||||
- Postgres INSERT into `memory_facts` (source=email, category, subject, detail JSONB, action_required, expires_at)
|
||||
- Embedding generation (Copilot endpoint) → Qdrant upsert into `martin_episodes`
|
||||
- Embedding generation (Copilot endpoint) → Qdrant upsert into `episodes` (user_id=martin)
|
||||
- Thread dedup: use `thread_id` as logical key, update existing Qdrant point if thread already exists
|
||||
|
||||
- [ ] **Upload Bolletta** + **Upload Documento (Telegram)**: after `Paperless - Patch Metadati`, add:
|
||||
- Postgres INSERT into `finance_documents` (correspondent, amount, doc_date, doc_type, tags, paperless_doc_id)
|
||||
- Postgres INSERT into `memory_facts` (source=paperless, category=finance, cross-reference)
|
||||
- Embedding of OCR text chunks → Qdrant upsert into `martin_knowledge`
|
||||
- Embedding of OCR text chunks → Qdrant upsert into `knowledge` (user_id=martin)
|
||||
|
||||
---
|
||||
|
||||
@@ -316,7 +319,7 @@ Notification is sent via **Amazon Echo / Pompeo** (TTS) for voice, and **Telegra
|
||||
|
||||
- [ ] **Newsletter Agent**
|
||||
- Separate Gmail label for newsletters (excluded from Daily Digest main flow)
|
||||
- Morning cron: summarize + extract relevant articles → `martin_knowledge`
|
||||
- Morning cron: summarize + extract relevant articles → `knowledge`
|
||||
|
||||
---
|
||||
|
||||
@@ -351,9 +354,9 @@ Notification is sent via **Amazon Echo / Pompeo** (TTS) for voice, and **Telegra
|
||||
- PDF → FileWizard OCR → GPT-4.1 metadata extraction (month, gross, net, deductions)
|
||||
- Paperless upload with tag `Cedolino`
|
||||
- Persist structured data to `finance_documents` (custom fields for payslip)
|
||||
- Trend embedding in `martin_knowledge` for finance agent queries
|
||||
- [ ] Behavioral habit modeling: aggregate `behavioral_context` records over time, generate periodic "habit summary" embeddings in `martin_preferences`
|
||||
- [ ] Outline → Qdrant pipeline: sync selected Outline documents into `martin_knowledge` on edit/publish event
|
||||
- Trend embedding in `knowledge` for finance agent queries
|
||||
- [ ] Behavioral habit modeling: aggregate `behavioral_context` records over time, generate periodic "habit summary" embeddings in `preferences`
|
||||
- [ ] Outline → Qdrant pipeline: sync selected Outline documents into `knowledge` on edit/publish event
|
||||
- [ ] Chrome browsing history ingestion (privacy-filtered): evaluate browser extension or local export → embedding pipeline for interest/preference modeling
|
||||
- [ ] "Posti e persone" graph: structured contact/location model in Postgres, populated from email senders, calendar attendees, Home Assistant presence data
|
||||
- [ ] Local embedding model: migrate from Copilot `text-embedding-3-small` to Ollama-served model (e.g. `nomic-embed-text`) once LLM server is stable
|
||||
|
||||
Reference in New Issue
Block a user