Tech Blog - Rohlik Group

At Rohlik, our data warehouse in Snowflake spans dozens of teams, thousands of tables, and tens of thousands of columns. Analysts spend a significant chunk of their day answering recurring questions — margins by category, regional comparisons, KPI pulls for weekly reviews. These questions look simple from the outside, but answering them correctly requires deep institutional knowledge: which schema holds the right table, which metric definition finance actually signed off on, and which dashboard is the canonical source.

This post tells the story of how we tackled this problem in two layers: first, by building a semantic layer in git that documents and codifies our business logic as version-controlled SQL; and second, by building reporting-mcp, an AI-facing gateway that makes that knowledge — and the entire data stack — accessible to AI assistants at runtime.

Part 1: The Problem

Undocumented Warehouses

A data warehouse without a semantic layer is like a codebase without types — everything technically works, but you spend more time reading source code than shipping features. Analysts re-derive the same calculations. Junior engineers learn what sub_order = 0 means by making the wrong query first. AI assistants hallucinate column names.

AI Writing SQL Is Not Enough

Large language models are surprisingly good at writing SQL. Give one a table schema and a question, and it will usually produce something syntactically correct. The problem is correctness at the business logic level.

Consider gross_margin. Every company defines it slightly differently. Ours has an approved formula that accounts for specific cost buckets, currency adjustments, and handling fees. An LLM writing SQL from scratch might get close — but "close" in a finance report is wrong.

Even if the model writes perfect SQL, it still needs to know where to look. With thousands of tables across dozens of schemas, without a way to navigate that structure, the model is guessing.

We needed to solve this at the foundation: document the warehouse, codify the business logic, and then give AI agents structured access to all of it.

Part 2: The Semantic Layer — Foundation in Git

We wanted three things:

Table and column descriptions — what every object in the warehouse actually means
Verified queries — canonical, tested answers to common business questions
Automation — all of it deployed and tracked with zero manual upkeep

All stored in git, version-controlled, peer-reviewed, and deployed by Airflow.

Table and Column Descriptions

The simplest form of semantic metadata is a comment. Snowflake lets you attach a COMMENT to any table or column, surfaced natively in every SQL client and queryable via INFORMATION_SCHEMA.

We track completeness through a dedicated table: DWH_PROD.SEMANTIC_LAYER.DATA_QUALITY_LABELS. Every table and column in production gets a row, and a labels array records what's missing — for example, ['missing_description'] if the comment hasn't been filled in yet.

A weekly Airflow job (semantic_layer_weekly) aggregates this into two SCD2 snapshot tables:

-- STATS_COLUMNS_SNAPSHOT tracks, per table, over time:
--   columns_count
--   columns_without_metadata        — how many columns still lack a description
--   columns_without_metadata_percentage
--   datasensai_access_count         — columns that ARE described (can be used by AI tools)
--   description_filled              — whether the parent table itself has a description

The SCD2 pattern is intentional — it lets us plot documentation coverage over time, not just see today's state. A merge checks whether anything changed; if it did, the old row gets a VALID_TO timestamp and a new one is inserted. The engineering metrics dashboard shows the trend as a KPI.

The rules enforced in our codebase conventions are clear: all production tables must include structured table-level comments and column-level comments. This is non-negotiable. Temporary tables (tmp_*) and work schemas (WORK_*) are exempt. Everything else is not.

Verified Queries

Column descriptions tell you what a column is. Verified queries tell you how to use the data correctly.

A verified query is a Snowflake view in DWH_PROD.SEMANTIC_LAYER, prefixed vq_, that encodes a tested, production-ready answer to a business question. The naming convention makes intent explicit:

vq_{domain}_{metric}_by_{dimensions}

Domain prefixes: com (Commercial), fin (Finance), sup (Supply Chain), cus (Customer), wh (Warehouse), log (Logistics), gen (General), hr (HR).

Here's a real example — Average Order Value by country:

CREATE OR REPLACE VIEW "vq_com_average_order_value_by_country"
(
    "deliver_date"        COMMENT 'Delivery date',
    "country"             COMMENT 'Country code (cz, de, at, hu, ro)',
    "total_gr"            COMMENT 'Total gross revenue in local currency',
    "order_count"         COMMENT 'Count of distinct master orders (including parent and sub-orders)',
    "average_order_value" COMMENT 'Average order value (total GR / order count) in local currency'
)
COMMENT = '{
  "domains": ["Commercial"],
  "description": "Daily Average Order Value (AOV) by country...",
  "question": "What was the Average Order Value for all orders delivered in a given country and period?",
  "load_type": "VIEW"
}'
AS
SELECT
    TO_DATE("deliver_at") AS "deliver_date",
    "country",
    SUM("gr") AS "total_gr",
    COUNT(DISTINCT "master_order_unique_id") AS "order_count",
    DIV0("total_gr", "order_count") AS "average_order_value"
FROM DWH_PROD_TRANSFORM.TR_LEVEL_4."aggregated_order"
WHERE "finished_order" = 1
  AND "deliver_at" >= DATEADD('month', -13, CURRENT_DATE)
GROUP BY ALL;

Notice what's packed in here:

Column-level COMMENT on every output column — no guessing what total_gr means
Table-level COMMENT as JSON — machine-readable metadata: business domain, natural language question, plain English description
The correct business logic — finished_order = 1, master_order_unique_id (not unique_id, that would include suborders and make the result wrong), 13-month rolling window
Double-quoted identifiers throughout — consistent with SQL standards

At the time of writing there are tens of views covering Commercial, Finance, Supply Chain, Customer, and Warehousing metrics.

Deployment and Storage

Adding a new verified query is a three-step process: write it, validate it against production, commit.

Verified queries are available as views, but we also maintain a flat DWH_PROD.SEMANTIC_LAYER.VERIFIED_QUERIES table, rebuilt daily by a stored procedure. It iterates over every vq_* view, extracts the SELECT body and column comments, and writes structured rows:

ID                  | UUID
QUESTION            | "What was the AOV...?"
DESCRIPTION         | "Daily AOV by country..."
SQL_QUERY           | "SELECT TRY_TO_DATE(...)..."
COLUMN_DESCRIPTIONS | {"deliver_date": "Delivery date", ...}
LABELS              | ["Commercial"]
QUESTION_VECTOR     | NULL  ← reserved for future embeddings

Everything is version-controlled and peer-reviewed. Nothing requires a separate metadata platform. Deployment is automated.

Part 3: reporting-mcp — Making the Semantic Layer AI-Accessible

With a documented warehouse and a library of verified queries in place, the next step was making all of this accessible to AI assistants at runtime. We built reporting-mcp, a Model Context Protocol server — an open standard for giving AI agents structured access to external systems.

It exposes ten tools that Claude (or any MCP-compatible agent) can call.

Data Warehouse Access

Tool	What it does
`search_database_metadata`	Navigate Snowflake hierarchically: databases → schemas → tables → columns, or search by keyword
`read_query`	Execute a validated read-only `SELECT` against Snowflake

The metadata tool lets the agent explore in real time: "What schemas are in DWH_PROD? What tables are in TR_LEVEL_4? What columns does orders_daily have?" — all against the actual current schema.

Semantic Layer Integration — the Key Piece

Tool	What it does
`search_verified_queries`	Find pre-approved SQL queries matching a keyword
`search_tableau_metrics`	Find approved metric definitions with standardized formulas

This is the part that closes the loop. Before writing any SQL, the agent checks if a verified query already exists — drawing from the same vq_* views and VERIFIED_QUERIES table described above. If a match exists, the agent uses it verbatim. No improvisation.

Metric definitions work the same way. search_tableau_metrics(search_term="margin") returns the exact formula the finance team approved — including edge cases, exclusions, and currency handling. However our current goal is to move all metrics from Tableau into GIT and Snowflake, which will further enhance our semantic layer and will empower AI agents.

BI Tool Integration

Tool	What it does
`query_tableau_workbooks`	Search Tableau workbooks by keyword
`query_tableau_views`	Search individual Tableau views/sheets
`get_tableau_workbook_details`	List all views in a workbook
`get_tableau_view_data`	Export view data as CSV with optional filters
`query_metabase_reports`	Search Metabase reports by keyword
`get_metabase_report_data`	Retrieve Metabase report results

An agent can find and pull the data behind an existing dashboard without a human having to locate the right workbook ID. "Find the KPI workbook and get me the weekly revenue view filtered to CZ" is a complete, executable request.

Part 4: Architecture and Design Decisions

How reporting-mcp Is Built

Key design choices:

Read-only by default. Write operations exist in the codebase but are disabled unless ALLOW_WRITE=true is explicitly set. The default posture is safe for an agent that might make mistakes.
Query validation before execution. Every query passed to read_query is validated: it must start with SELECT, SHOW, DESCRIBE, or similar read-only keywords. Anything that looks like a mutation is rejected.
Single metadata tool, not many. One search_database_metadata tool routes based on parameters provided. Fewer tools means less overhead for the model deciding what to call.
Semantic layer first. Tool docstrings explicitly tell the agent to check verified queries and metric definitions before writing custom SQL. This is prompt-level guidance baked into the tool contract.

Semantic Layer Design Principles

Start with conventions, not tooling. The most valuable investment was writing down how to describe a table — the JSON format, which fields are required, what a good description looks like. The tooling followed naturally once everyone agreed on the contract.
Make the right thing automatic. Automated jobs deploy verified queries and report coverage. No one has to remember.
SQL views are documentation. A CREATE OR REPLACE VIEW with column comments and a JSON metadata block is simultaneously executable code, documentation, and a machine-readable API.
Reserve space for what's coming. The QUESTION_VECTOR column costs nothing today and makes the semantic search upgrade a column fill rather than a schema change.

Part 5: How It Works in Practice

Here's a simplified trace of what happens when someone asks: "What was our order fulfillment rate in Germany last week?"

1. search_verified_queries(search_term="fulfillment rate")
   → Found: "Weekly fulfillment rate by country" (verified 2025-11-03)
   → Query: SELECT country, date_trunc('week', order_date) as week, ...

2. read_query(query="SELECT ... WHERE country = 'DE' AND week = '2026-04-13'")
   → Returns: [{country: DE, week: 2026-04-13, fulfillment_rate: 0.971}]

3. Response: "Germany's fulfillment rate last week was 97.1%."

No schema exploration needed because a verified query existed. The whole round trip takes a few seconds. Without this system, it would have required a human to locate the right query, adjust the filters, run it, and paste the result. If human needs further drilling, adding dimensions or reshaping the result - AI agent simply adjusts current verified query. No need to build it from scratch. This makes the result more precise.

When no verified query exists, the flow is longer — the agent explores schemas, drafts SQL, and notes that the result should be cross-checked — but it's still largely automated. And as more verified queries are added, the agent's coverage improves — not because anything was retrained, but because there are more correct answers it can retrieve and reuse.

Part 6: How People Actually Use It — Devin as the Data Interface

The technical stack described above would be useful even if only data engineers interacted with it. But the real payoff comes from who's using it day to day: product managers, category managers, finance controllers, operations teams — people who need answers from the DWH but have no reason to write SQL or navigate Snowflake.

At Rohlik, these users interact with Devin, an AI assistant that has reporting-mcp connected as one of its tools. There are two main entry points:

Direct in the Devin App

Anyone in the company can open the Devin app and ask a data question in natural language: "What was our AOV in Hungary last week?", "How does CZ margin compare month over month?", "Show me the top 10 underperforming categories by fulfillment rate." Devin handles the rest — searching for verified queries, pulling the right data, and returning a clear answer. No Snowflake login, no SQL, no waiting for an analyst to get to it.

The #ask-data Slack Channel

We used to have multliple slack channels dedicated for data related questions. Now we have everything unified under one channel #ask-data. Users post their question, Devin picks it up, and the conversation happens right in the thread. Follow-up questions, filters, breakdowns — it's all interactive. A product manager can ask for a number, get it, then immediately say "now break that down by warehouse" without context-switching to another tool. It allows more people to collaborate in one slack thread, or even pointing out wrong answers so that data team can easily see whole conversation and fix the issue (like adding new verified query)

Why This Matters

The numbers back this up. In the last month alone, there were 1,500 Devin sessions touching data and analytics topics, from 371 unique users across the company. That's not a handful of power users — it's a broad cross-section of the business getting answers from the warehouse without ever opening Snowflake.

This changes the operating model for data teams. Instead of fielding a queue of ad-hoc requests — each requiring an analyst to understand the question, find the right table, write the query, and format the result — the semantic layer and Devin handle the routine questions automatically. Analysts are freed to focus on deeper analysis, modeling, and building new verified queries that expand what Devin can answer next time.

The trust model is critical here. Because Devin draws from verified queries and approved metric definitions, the answers it gives are the same numbers that appear in the official dashboards. Users aren't getting AI-improvised calculations — they're getting the finance-approved formula, executed on live data, delivered in seconds.

Part 7: What's Next

A few areas we'd approach differently or plan to improve:

Structured output from read_query. Currently the tool returns formatted text. Returning structured JSON or a typed schema would make it easier for downstream tools to process results programmatically.
Query cost estimation before execution. Large Snowflake queries can be expensive. Adding a dry-run or cost-estimation step would prevent accidental full-table scans.
Smarter search for verified queries. The current search is keyword ILIKE against a text column. Vector similarity search (using the reserved QUESTION_VECTOR column) against query descriptions would surface relevant queries more reliably when the exact term doesn't match.
Expanding documentation coverage. Coverage gaps still exist, and the embedding story is still ahead of us. But the system is in version control, it deploys itself, and it gets a little more complete with every PR.

Takeaway

The semantic layer and reporting-mcp together represent a two-part approach to the same problem: making institutional data knowledge accessible, trustworthy, and machine-readable.

The semantic layer — verified queries, column descriptions, metadata tracking — forms the foundation. It ensures that business logic is codified once, peer-reviewed, and deployed automatically. Its value is immediate for human users and compounds over time as coverage grows.

reporting-mcp is the access layer. It doesn't replace human understanding of the data — it makes that understanding available to AI agents that would otherwise have to guess. The ten tools it exposes are a thin but well-considered interface over a complex data stack. That simplicity is intentional — fewer, better-defined tools are easier for agents to use correctly.

The key insight is that the semantic layer doubles as a prompt-engineering tool. Every verified query added to the library is one more question an AI agent can answer correctly on the first try — without retraining, without hardcoding, and with the same numbers the official dashboards show.

If you're running a modern data stack and want to give AI assistants real, trustworthy access to it, the pattern here is worth a look: document the logic in code, deploy it automatically, and expose it through a structured protocol.