Skip to content

The "Drop & Forget" Passive Ingestion Workflow

This document outlines the architecture and workflow for passively ingesting source material into the pi-vault-mind and Obsidian ecosystem.

Core Concept

The user should never have to manually trigger an agent to read a new source document. By saving a file to a designated "Inbox" or "Resources" directory, the system automatically detects, normalizes, analyzes, and integrates the knowledge into both the internal semantic database and the human-readable vault.

Workflow Steps

1. The Trigger (Human Action)

The user drops a document (PDF, DOCX, HTML, or plain text) into a designated watched folder. * Example path: Vault/30 - Resources/Inbox/ or Vault/Agent/Inbox/ * Action: Zero active prompting required.

2. The Watcher (System Action)

A background file-watcher process (the Dispatcher) detects the newly created file. * Action: It recognizes the file is in an ingestion folder. * Dispatch: It queues an @agent:ingest task, launching The Miner subagent in an isolated background fork to prevent blocking or polluting the user's active session.

3. Format Normalization

The Miner subagent uses the wiki_ingest tool for automatic document conversion:

  • URLs: Calls wiki_ingest(source="https://...") which invokes npx any2md internally to fetch and convert to clean markdown.
  • Local files: For PDF/DOCX, the Miner uses bash with npx any2md <file>. For plain markdown, it reads directly.
  • Result: Clean, LLM-optimized Markdown ready for entity extraction.

4. Extraction & Chunking (The Miner)

The Miner reads the normalized markdown and performs semantic analysis: * Summarization: Writes a high-level executive summary of the document. * Entity Extraction: Identifies key entities, concepts, technical claims, and relationships. * Deduplication Check: Calls wiki_search behind the scenes to verify if these concepts already exist in the database, avoiding redundant entity creation.

5. Writing to the Brain (Internal Storage)

For every extracted claim and entity, the Miner executes the append_wiki tool: * Storage: Saves the raw structured facts into the underlying JSONL collection. * Vectorization: Automatically generates embedding vectors (via Transformers.js or Ollama) and inserts them into LanceDB. * Graph Edges: Creates relational edges linking new concepts to existing ones in the graph tables.

6. Syncing to the Vault (Human Workspace)

Finally, the Miner surfaces the synthesized knowledge back into the Obsidian workspace via wiki_sync. * Markdown Pages: Creates or updates structured notes in Vault/Agent/Wiki/Concepts/ and Vault/Agent/Wiki/Sources/. * Graph Visualizations: Syncs graph edges into .canvas files for visual exploration. * Audit Trail: Appends a log entry to Vault/Agent/Journal.md detailing the actions taken (e.g., "Processed quantum-paper.pdf. Extracted 14 claims. Created 2 new concept pages.").

Required Updates & Implementation Plan

To fully realize this workflow, the following updates are planned: 1. Watcher Integration: Finalize the file-watcher script that debounces events and triggers the subagent pipeline specifically for the Inbox/ path. 2. any2md Pipeline: Integrate the any2md CLI wrapper as a standard tool available to the Miner subagent. 3. Journal Logging: Ensure the VaultWriter class can append to a chronological Journal.md in addition to creating distinct concept notes.