Lessons From Building Two AI-Powered Local-First Tools

Published: Jun 03, 2026

Local Data, Cloud Brains
Choose the Right Retrieval
AI Needs Guardrails
MCP Is the Glue
Ship Skills With Your Project
I Need a Bigger Computer
Test Coverage Enables AI Iteration
The Pattern

Over the past two weeks I built two AI-augmented applications back to back. Both are local-first tools that lean on cloud LLMs for intelligence. Both ship as Python packages with SQLite storage, MCP servers for agent integration, and companion web UIs. Both were developed with heavy AI assistance — spec-first, AI-implemented, test-verified.

The projects:

HSA Receipt Ledger — A privacy-first tool for managing Health Savings Account receipts. It exposes MCP tools for an LLM to scan inbox files, extract text via OCR, present structured fields for user confirmation, and insert records into a local SQLite ledger. Comes with a Streamlit dashboard for visual browsing.
Owner’s Manual Assistant — A tool that indexes PDF owner’s manuals and answers natural language questions about them. Uses SQLite FTS5 for keyword retrieval and passes retrieved chunks to a cloud LLM for answer synthesis. No vector embeddings, no local models, no complexity.

Local Data, Cloud Brains

I started both projects with a strong bias toward local-first architecture. Your HSA receipts should never touch a cloud server. Your car manual should be queryable offline. But I also wanted the intelligence of modern LLMs — the ability to extract “Quest Diagnostics, $187.50, lab work” from a blurry phone photo, or to synthesize “check the oil level using the dipstick near the front of the engine” from scattered manual sections.

The solution was a clean separation: local data, cloud brains.

HSA Receipt Ledger stores everything on disk — the SQLite ledger, the receipt images, the OCR text. The cloud LLM never sees a file path or a database connection. It calls MCP tools over stdio, receives structured data, and returns a decision. The LLM gets only what it needs to do its job: extracted text that it analyzes, or search results that it summarizes. The raw receipts never leave the machine.

Owner’s Manual Assistant works the same way. Documents are chunked and indexed locally via FTS5. When you ask a question, the MCP server runs a keyword search and returns the top matching chunks. Those chunks — just text — are passed to the cloud LLM for answer synthesis. Your PDFs never leave your laptop.

This is the pragmatic middle ground between full-cloud and full-local. Your data stays under your control. You decide exactly what context the LLM sees. And you don’t need to run a model on your laptop.

Choose the Right Retrieval

Both projects needed to find records or documents. Neither uses vector embeddings.

HSA Receipt Ledger uses SHA-256 content hashing for deduplication. Every receipt file is hashed by its contents, not its filename. If you accidentally drag the same receipt photo into the inbox twice, the MCP server spots the collision and refuses to insert without a force=True flag. This is simpler and more reliable than training a similarity model — a hash either matches or it does not.

Owner’s Manual Assistant uses SQLite FTS5 for full-text search. This was a deliberate choice against vector embeddings. Owner’s manuals have precise, domain-specific terminology: “5W-30”, “EGR valve”, “torque spec”. Keyword search with BM25-style ranking finds these terms exactly. Semantic search might miss “EGR valve” because it embeds it near “exhaust gas recirculation” — which is the same thing, but the user probably typed “EGR”.

FTS5 gives me zero-config full-text search with stemming, prefix queries, and ranking — all built into SQLite, no external service, no vector DB, no embedding provider. For structured documents with specific terminology, it is often the right tool.

The lesson: don’t default to the fanciest retrieval technique. Think about how users actually query your data. Hashes for exact matching. Keyword search for terminology-heavy documents. Embeddings have their place, but they are not the default answer.

AI Needs Guardrails

LLMs are confident and wrong, and they will happily do destructive things if you let them. Every tool I built has deliberate guardrails.

HSA Receipt Ledger has a conversational review loop baked into its core workflow. When the LLM scans a new receipt and extracts structured fields, it does not auto-insert. It presents a preview:

Found receipt: receipt_001.jpg
Provider:      Quest Diagnostics
Date:          2024-03-15
Total:         $187.50
Category:      Lab Work

This file's SHA-256 hash does not match any existing record.
Insert this transaction?

The user says yes or no. The LLM cannot bypass this step. Duplicate detection adds another layer: if the hash matches an existing record, the LLM surfaces the conflict and the user decides whether to re-insert. A force=True parameter exists for deliberate re-entries (same receipt, different reimbursement event), but it must be explicitly requested.

Owner’s Manual Assistant sanitizes every query before hitting the database. FTS5 has special characters (" * + ( ) ^ ~ :) that trigger syntax errors or unbounded prefix searches. The sanitizer strips these while preserving hyphens for terms like “5W-30”. Every query is wrapped in a try/except so a malformed search returns empty results instead of a crash.

Both projects validate file paths against path traversal attacks. User-supplied filenames are checked for /, \, and leading dots. MCP tools reject invalid names with clear error messages rather than silently operating on unexpected files.

The pattern: validate inputs, require confirmation for writes, handle errors explicitly. An LLM is a capable assistant that needs firm boundaries. Clear guardrails make it more reliable, not less capable.

MCP Is the Glue

Both projects expose MCP (Model Context Protocol) servers over stdio. This is the integration pattern that ties the LLM to local tools.

HSA Receipt Ledger exposes five MCP tools:

list_untracked_files — scan inbox for new receipts
extract_file_text — OCR a receipt and return raw text
insert_hsa_transaction — insert a confirmed record with SHA-256 dedup
search_ledger — multi-field search across the ledger
update_reimbursement_status — mark a receipt as reimbursed

Owner’s Manual Assistant exposes one: search_manuals, which takes a query and optional category filter, runs FTS5 search, and returns structured context chunks.

Both servers register in an opencode.json file so OpenCode discovers them automatically:

{
  "mcpServers": {
    "hsa-receipt-ledger": {
      "command": "hsa-ledger",
      "args": ["mcp"]
    }
  }
}

The beauty of this approach: no API keys, no network ports, no authentication middleware. The server runs as a subprocess of the LLM client, communicating over stdio with JSON-RPC. Claude or OpenCode becomes a natural-language interface to your local tools. “Process my new medical receipts” becomes a real command that scans files, runs OCR, presents fields, and waits for confirmation.

The same server also powers a companion UI. HSA Receipt Ledger has a Streamlit dashboard for browsing. Owner’s Manual has a FastAPI web UI. The MCP interface and the web interface query the same database — just different access patterns for different contexts.

Ship Skills With Your Project

Both projects ship OpenCode skills — packaged instructions that tell an AI agent exactly how to use the tool.

HSA Receipt Ledger has three:

receipt-intake — the full workflow for processing inbox files
ledger-search-and-audit — searching and auditing the ledger
export-and-tax-prep — CSV exports and tax preparation summaries

Owner’s Manual Assistant has two:

add-manual — ingest a single PDF manual
bulk-ingest — batch process multiple PDFs

Each skill is a SKILL.md file with step-by-step workflow instructions plus a companion bash script. When I (or another AI) start a new session with one of these projects, the skill tells the agent exactly what to do. No re-explaining the workflow. No guessing the tool signatures.

This was one of the more interesting discoveries of building these tools. A traditional README tells a human how to use your project. Skills tell an AI. They encode not just the API surface but the workflow — the order of operations, the confirmation steps, the error recovery. They make the tool self-documenting for future agent sessions.

I Need a Bigger Computer

Here is the aha moment that reshaped both projects.

Owner’s Manual Assistant uses Marker for PDF text extraction. Marker is a Python library that uses ML models (torch + detectron2) for layout analysis — it understands multi-column layouts, recognizes headings versus body text, and outputs clean Markdown. It is the state of the art for PDF extraction.

It is also slow on CPU. A 50-page manual takes several minutes to process. The first time I ran it, I watched the fan on my laptop spin up and the progress bar crawl. Marker itself is borderline usable on my current hardware. Adding a local LLM on top would be impractical.

This was the moment the hardware ceiling became concrete. Running a 7B-parameter model locally requires significant GPU memory and compute. My laptop is not equipped for it. I could buy a machine with a dedicated GPU, or route generation through a cloud API.

I chose cloud. The architecture that emerged — local data, cloud LLM — was shaped as much by hardware reality as by design philosophy. Marker runs on CPU (slow but acceptable). The LLM runs on someone else’s GPU. Everything else stays local.

If I had better hardware, I would experiment with local models for cases where I want full air-gapped operation. But for now, the separation works. The data I care about never leaves my machine. Only the context I explicitly pass to the LLM goes to the cloud, and I control exactly what that is.

Test Coverage Enables AI Iteration

Both projects were developed with AI assistance. Both needed the AI to refactor, restructure, and move code around. Neither would have worked without thorough tests.

HSA Receipt Ledger has seven test files covering database CRUD, vault operations, text extraction, MCP server logic, CLI commands, and UI helpers. All external dependencies (pypdf, pytesseract, pyheif) are mocked so tests are fast and deterministic.

Owner’s Manual Assistant has 56 tests covering FTS5 error handling, query sanitization edge cases, heading chunking, code-block awareness, image stripping, MCP server lifecycle, and web UI validation.

The tests serve two purposes. First, they verify that the code works — standard test value. Second, they give me confidence to hand the codebase back to the AI for changes. When the AI refactors a module, I can run the full suite and know within seconds whether anything broke.

This is the compounding effect I wrote about in my last post. A tested codebase is one you can trust an AI to modify. Untested code becomes a liability when the AI starts moving things around.

The Pattern

Across both projects, the same workflow emerged:

Spec first — A detailed specification document (both projects have one in docs/)
Plan — Break the spec into ordered tasks
Implement — AI builds the code incrementally
Test — Cover every error path and edge case
Skill-ify — Package the workflows as agent skills

Each project became more than a standalone tool. It became a system designed for both human and AI collaboration — with a database on disk, an MCP server for agent access, a web UI for human access, and skills that document the workflows for future sessions.

The principles that emerged:

Local-first doesn’t mean local-only. Your data stays, the LLM calls out.
Choose retrieval by use case, not hype. Hashes, FTS5, or vectors — pick the right one.
Guardrails make AI tools more capable, not less.
MCP is the glue that turns LLMs into local tool interfaces.
Ship skills alongside READMEs. Document for both humans and agents.
Know your hardware ceiling. Design around it.
Test everything. It is what makes the rest possible.