Designing a 29-Table Database Schema With an AI Agent (And Why I Had to Keep Correcting It)

Where We Left Off

The previous post covered the overnight phone session failure, the testability refactoring pattern, and the storage architecture question: should RepCheck use Firestore, AlloyDB, or Cloud SQL for the scoring path?

This session started by answering that question. Then it answered it again differently when the bill arrived.

This post covers three things: the database engine decision (and reversal), the schema design process, and what the corrections I made along the way reveal about working effectively with AI agents.

The AlloyDB Migration and Reversal

The previous post ended with a hypothesis: AlloyDB might be the right unified database for RepCheck. This session began by committing to that hypothesis fully.

The first task was a major architecture change: replace both Cloud SQL PostgreSQL and Firestore with AlloyDB as our single database. This was not a tentative experiment. It was a full migration across three repositories.

In the main repo (votr): every documentation file referencing Firestore or Cloud SQL was updated. The system design, behavioral specs, code patterns, skeleton templates, annotated guides. All rewritten to reference AlloyDB and Doobie as the unified persistence layer.

In the template repo (repcheck-g8): the Giter8 template that generates new RepCheck repositories was overhauled. The has_firestore and has_cloudsql parameters were replaced with a single has_alloydb flag. Firebase Admin SDK dependencies were removed. Doobie dependencies were added. The alloydb-repository.scala skeleton replaced firestore-repository.scala, complete with pgvector embedding queries and cosine similarity search. The local dev Docker Compose was switched from plain PostgreSQL to AlloyDB Omni, Google’s official local AlloyDB image with pgvector built in.

In the infrastructure repo (tf-repcheck-infra): Terraform modules were created to provision an AlloyDB cluster with a primary instance, private IP via VPC Private Service Access peering, and the appropriate IAM roles.

Then I asked the agent to run terraform plan and explain what it would create. 126 resources. I wanted specifics:

“Yeah I don’t want you to guess. I would rather you run the terraform plan command yourself, and compare that to the repo and check for correctness. Then explain to me item by item what you are creating.”

The agent walked through each resource. Everything checked out architecturally. Then I asked the question that mattered:

“How much would all this cost me to have around for 1 month?”

The answer: approximately $390/month for the dev environment. AlloyDB has a minimum cluster cost that does not scale to zero. You are paying for the cluster whether you are using it or not.

For a side project in active development with no production traffic, that is a hard number to absorb. Especially when I then asked:

“Cloud SQL PostgreSQL supports storing vector data and searching it?”

It does. Cloud SQL PostgreSQL supports the pgvector extension natively. Same vector storage, same cosine similarity queries, same HNSW indexes. At the db-f1-micro tier: ~$10/month.

“Let’s replace Cloud SQL in the dev environment.”

The Terraform infrastructure repo got its third database migration in a single session. But this time, the change was smarter. Instead of a hard switch, the agent introduced a db_engine variable that toggles between "alloydb" and "cloudsql" per environment. Dev uses Cloud SQL. Staging and production will use AlloyDB when the time comes. The same Terraform modules, the same VPC peering, the same private IP networking. Just a different engine underneath.

The application code does not care. Doobie talks to PostgreSQL. Both AlloyDB and Cloud SQL are PostgreSQL wire-compatible. The repository layer, the migration runner, the connection string. All identical. The only difference is cost and scale.

What this taught me: Always ask “how much?” before committing to infrastructure. The agent will optimize for architectural elegance. You need to optimize for the reality of your budget. A $390/month dev database for a project that might not have users for months is not elegant. It is waste. The agent did not flag this. I had to ask.

The Rule That Changed Everything

Early in the session, the agent started making schema decisions without consulting me. Column types, table structures, whether to use JSONB. It was moving fast and making choices that looked reasonable but were not necessarily mine.

I stopped it:

“If you find yourself making assumptions or direct choices about the schema that are not in the documentation, ask me what to do rather than choosing on your own.”

One sentence. It changed the entire dynamic of the session.

From that point forward, the agent presented options instead of making choices. Every ambiguity became a question. Every design fork became a conversation. The schema that emerged was genuinely collaborative. Not the agent’s best guess with my rubber stamp, but a series of deliberate decisions I made after being educated on the trade-offs.

This is the single most important instruction you can give an AI coding agent. Not “be careful” or “follow best practices.” Just: ask me before you decide.

Grounding the Schema in Reality

The first draft of the schema was inferred from our documentation. System design, behavioral specs, code patterns. It was structurally plausible but disconnected from the actual data source.

My correction:

“We should look at the .gov API schemas and base our tables on those, as most of our tables are direct storage of those.”

This shifted the approach. Instead of designing tables from prose descriptions, we opened the Congress.gov OpenAPI spec and derived table columns directly from the API response schemas. The members table got columns from the Member and Members schemas. The bills table got columns from Bill and BillDetail. Vote positions came from houseVoteResults.

The result was richer than what the agent would have produced from the docs alone. When the agent asked whether we wanted a minimal set of columns or the full API representation, I chose full:

“I want the richer version. I want you to update to include all of these. In the future, we may get more creative with our analyses and inferences and this information could be useful.”

This is a product decision masquerading as a schema decision. The agent cannot know whether you will need constitutional_authority_text or image_attribution six months from now. You can.

The JSONB Problem

The agent’s first instinct for anything complex was JSONB. Vote history positions? JSONB. Score breakdowns by topic? JSONB. Bill analysis pass outputs? JSONB.

Each time, I pushed back.

“The scores tables have JSONB for score. Is it possible to denormalize them?”

“Vote history as a structured table is better.”

“I prefer the more normalized approach.”

JSONB is flexible. It is easy. The agent defaults to it because it avoids the work of designing proper table structures. But JSONB in a structured schema is almost always a sign that you have not finished the design work. You are deferring decisions into the data layer, where they become invisible. No foreign keys. No type checking. No ability to query individual fields efficiently.

The rule we codified: always denormalize. Avoid JSONB in structured tables. The only acceptable use of JSONB is when you genuinely cannot predict the object shape and expect a wide variety. In our schema, exactly one column uses JSONB: pipeline_runs.error_summary, where the error structure varies unpredictably across different failure modes.

The score tables alone went from 2 tables with JSONB to 7 fully normalized tables: scores, score_topics, score_congress, score_congress_topics, score_history, score_history_congress, score_history_congress_topics, plus score_history_highlights. More tables, more work. But every column is queryable, every relationship has a foreign key, and the schema documents itself.

Educate Me, Then I’ll Decide

Several times during the session, the agent presented a decision I was not equipped to make without more context. Instead of guessing, I asked it to teach me.

HNSW vs IVFFlat

We needed vector indexes for semantic search. User preferences matched against bill embeddings, subject similarity search, finding-level similarity queries. The agent asked whether we wanted HNSW or IVFFlat indexes.

I did not know the difference:

“Educate me on HNSW vs IVFFlat so that I can make a decision.”

The agent laid out a comparison: HNSW works on empty tables (IVFFlat needs training data), HNSW handles incremental inserts better, HNSW has better recall at the cost of more memory. IVFFlat is faster to build on large datasets but needs periodic retraining.

Our tables start empty and grow incrementally. The choice was obvious once I understood the trade-off:

“I agree on HNSW.”

Flyway vs Liquibase

The agent set up Flyway for migrations without asking. I caught it:

“Why are we using Flyway?”

After a brief comparison:

“I prefer Liquibase.”

Small correction, significant impact. The migration tooling choice affects every future schema change. The agent’s default was not wrong. It just was not my preference.

The Decisions Only I Could Make

Some schema decisions required product thinking that no amount of technical context could provide.

Bill text storage and member stances

The agent had not planned for storing bill text locally or tracking how members voted on specific bills. I laid out the product requirement:

“1. We should store bills in the DB. 2. We should search for all the bills each member voted on, and we should form a stance position from them based on their votes. Storing it in the DB prevents us from having to refetch bills per member. 3. We should store the member and bill relationship. 4. We should be able to use the users’ preferences and answers to search against bills and discover bills that align with their political beliefs. 5. We can then report which bills that align with their beliefs their representative supported and which ones they didn’t.”

Five requirements in one message. This created three new structures: text_content and text_embedding columns on bills, the member_bill_stances denormalized table, and per-topic embeddings on user_preferences.

Per-topic preference embeddings

The agent proposed a single combined embedding per user. I wanted granularity:

“I prefer separate vectors. Allowing me to tell users their preferences per topic.”

This means a user’s healthcare stance has its own embedding, separate from their defense or environment stance. It enables per-topic bill discovery: “find bills matching this user’s healthcare position” rather than a blurred aggregate.

The finding types system

When the agent proposed storing LLM analysis outputs, I shaped the architecture:

“I prefer option B but we can make finding_type an enumeration that is stored in a lookup table that can be expanded as we go forward.”

This created finding_types, an extensible lookup table seeded with 8 categories (4 analysis pass types and 4 discovery categories like pork barrel spending, policy riders, lobbying influence, constitutional concerns). New finding types can be added with an INSERT, no schema migration needed.

Cost-optimizing the multi-pass analysis

The agent had designed bill_analyses with JSONB columns for pass1_output, pass2_output, and pass3_output. I asked a question that restructured the entire table:

“Should we be trying to get the results that go into pass1 into the findings table?”

Then the critical insight:

“Yes, but we still want to store shortened summaries. Ideally, we would be able to do pass 2 and pass 3 on the shortened summaries reducing their cost.”

This changed the architecture fundamentally. Pass 1 (Haiku, cheap) processes the full bill text and produces a shortened summary. That summary, not the original text, becomes the input for Pass 2 (Sonnet) and Pass 3 (Opus). The cost savings compound: expensive models process a fraction of the tokens. And the pass outputs go into bill_findings rows linked by analysis_id, not JSONB blobs on the analysis record.

The agent could not have proposed this. It did not know our cost model. It did not know that Pass 1 summaries needed to survive as intermediate artifacts. This was product-level thinking applied to schema design.

The State Code Problem

A subtle data modeling issue: Congress.gov returns full state names (“Vermont”), but users enter two-letter codes (“VT”). You need to join them.

My solution:

“Let’s make a table with mappings inside that can be joined.”

This created us_states. 56 rows (50 states + DC + 5 territories) with state_code and state_name columns. The migration seeds it as part of the initial schema. Simple, but it solves a real data reconciliation problem that the agent had not flagged.

The @nowarn Confrontation

When the agent hit Liquibase deprecation warnings during compilation, it added @nowarn("msg=deprecated") to suppress them. I caught it immediately:

“Why did you add no-warn?”

The agent explained that Liquibase 4.25.1 marks several methods as deprecated and -Xfatal-warnings promotes those to compile errors.

“Use the new API instead of nowarn.”

This sent the agent down a research path. Finding the non-deprecated CommandScope API, the Scope.getCurrentScope().getSingleton() pattern for factory access, and eventually a direct SQL approach for counting pending changesets that sidesteps the deprecated history service entirely.

The rule we codified: you are not allowed to fix a compile issue with @nowarn. You must apply the real fix. @nowarn is duct tape. It hides problems instead of solving them. An agent will reach for it because it is the fastest path to green. Your job is to refuse.

Validating Without Docker

After the schema was built, I asked a question the agent should have anticipated:

“How did you validate that the migrations are correct, if we do not have Docker?”

The honest answer: we had not. The Testcontainers tests need Docker to spin up a real PostgreSQL instance. Without Docker, the SQL was unverified. Table creation order, foreign key references, pgvector extension, seed data, all untested.

We added H2 as a test dependency and created a ChangelogValidationSpec that validates the Liquibase changelog structure against an in-memory H2 database. It does not execute the PostgreSQL-specific SQL (pgvector, TEXT arrays), but it verifies that the YAML is parseable, all referenced SQL files exist, and changeset IDs are valid. Two tests, no Docker required.

The full validation still needs Testcontainers. But the dry-run check catches the most common mistakes (missing files, duplicate IDs, malformed changelog) during regular development.

Running CI Before Pushing

The agent had been pushing code that failed scalafix checks in CI. Import ordering violations, mainly. Java imports before Scala imports, classes not sorted alphabetically.

The rule: always run all checks that CI will perform locally before pushing. sbt compile, sbt test, sbt scalafmtCheckAll, sbt scalafixAll --check. Every time.

This is one of those rules that seems obvious in retrospect. The agent knows about CI. It has access to sbt. It just was not running the checks before pushing because nobody told it to. Now it is in CLAUDE.md as a universal rule, enforced in every session.

Three Repos, One Schema

The schema did not just land in one place. The changes rippled across three repositories:

votr (main repo): The db-migrations SBT project contains the Liquibase changelog, the MigrationRunner (usable both standalone and from other projects’ tests via Testcontainers), and the H2 validation test. All documentation was updated to reference the unified PostgreSQL architecture.

repcheck-g8 (template repo): The Giter8 template now generates repositories pre-configured for Doobie + pgvector. The alloydb-repository.scala skeleton includes embedding insert/query patterns, a Tables object with all table name constants, and a generic upsert helper. New repos get the right persistence patterns out of the box.

tf-repcheck-infra (infrastructure repo): The db_engine toggle means the schema runs against Cloud SQL in dev and AlloyDB in staging/prod. Same migrations, same Doobie code, different engine. A 97% cost reduction in dev.

The Final Schema

29 tables. Zero JSONB in structured data. HNSW vector indexes on 6 embedding columns. Fully normalized score hierarchy. Extensible finding types. Per-topic user embeddings. A reference table for state code reconciliation.

The tables:

Congress.gov entities: us_states, members, member_terms, member_party_history, bills, bill_cosponsors, bill_subjects, votes, vote_positions, vote_history, vote_history_positions, amendments

LLM analysis: finding_types, amendment_findings, bill_analyses, bill_findings

User data: users, user_preferences, member_bill_stances

Scoring: scores, score_topics, score_congress, score_congress_topics, score_history, score_history_congress, score_history_congress_topics, score_history_highlights

Pipeline tracking: pipeline_runs, processing_results

All managed by Liquibase migrations, with a Mermaid ER diagram that renders in IntelliJ via the Nereid plugin.

What This Session Taught Me About Working With AI

The agent produced the first draft of this schema in minutes. Getting it right took hours. The difference was entirely in the corrections.

Every significant design decision came from a question I asked or a correction I made:

“Base the tables on the .gov API schemas” (grounding in reality)
“I prefer the more normalized approach” (no JSONB)
“Educate me on HNSW vs IVFFlat” (learning before deciding)
“I prefer separate vectors” (product-level granularity)
“Use the new API instead of nowarn” (no shortcuts)
“How did you validate this?” (accountability)

The agent is fast. The agent is thorough. The agent reaches for defaults that are technically defensible but not necessarily right for your system. Your job is to know the difference. And to say so.

The instruction I gave at the start of this session, “ask me what to do rather than choosing on your own”, is the most valuable thing I have written in a CLAUDE.md file. It transforms the agent from an autonomous code generator into a collaborative design partner. The output is better. The decisions are yours. And the schema reflects your product, not the agent’s best guess.

Project Repositories

All code for RepCheck is on GitHub:

votr: Main monorepo. Pipelines, migrations, infrastructure code, acceptance criteria, and this blog
repcheck-shared-models: Shared models library. DTOs, domain objects, Circe codecs, Doobie codecs
repcheck-pipeline-models: Pipeline models library. Events, workflow schemas, error handling, configuration
repcheck-ingestion-common: Ingestion common library. API client, XML parsing, change detection, event publishing, repository base, placeholders, execution helpers, structured logging
repcheck-g8: Giter8 template for scaffolding new RepCheck Scala repositories
tf-repcheck-infra: Terraform infrastructure-as-code for GCP (dev/staging/prod)