Using Gemini and Claude To Query ClickHouse - A Bake Off

Benjamin Wootton

Benjamin Wootton

Published on July 8, 2025

Using Gemini and Claude To Query ClickHouse - A Bake Off

When LLMs first hit the scene I experimented with trying to generate SQL from plain English prompts. For instance, I would ask “summarise my top 10 fastest growing products in the Asia region” and hope to get syntactically correct SQL capturing my intention.

Sadly, the results at the time weren’t great. Early LLMs found it hard to understand my intention, the SQL they generated was incorrect, and the models didn’t have any reasoning ability or agentic behaviour such as the ability to backtrack in the case of error.

Today, things are very different and “generative BI” is becoming viable. The models are much smarter, thinking through what you are asking and taking into account more context and nuance. They can automatically interrogate your database and iterate on queries until the syntax is correct and the query returns accurate results. We also have the new MCP standard which improves and standardises the APIs between the LLM and the database.

Trialling Claude and Gemini

I recently trialled two models, Gemini 2.5 Pro and Claude Opus, against a ClickHouse database to get a feel for their current performance and accuracy. I asked questions about a house price dataset and found that the responses were encouraging, even where the query needed SQL features such as joins and window functions.

My gut feel was that Claude seemed to outperform Gemini in terms of accuracy, creativity, depth of analysis and speed. However, I wanted to do something much more objective to bake off Claude and Gemini for SQL analytics. My idea was to find a SQL challenge structured as a quiz and give it to both Claude and Gemini to aobjectively ssess their performance against a ClickHouse database.

For the SQL challenge, I found Danny’s Diner by Danny Ma which is a series of SQL questions about a fictional restaurant database. It seemed like a good fit for what I wanted to achieve so I went with it. It isn’t mind bendingly difficult but it does have some nuance, particularly in the later questions.

Database Schema

The schema contains three tables spanning sales, menu and memberships. Though it is clearly simple this does mean that we will need joins to answer most questions which makes things a little harder for the LLM.

When working with ClickHouse we have a tendency to denormalise, meaning that this could be less of a factor in a real world situation.

The schema is as follows:

CREATE DATABASE IF NOT EXISTS dannys_diner;

USE dannys_diner;

CREATE TABLE sales (
    customer_id FixedString(1),
    order_date Date,
    product_id UInt32
) ENGINE = MergeTree()
ORDER BY (customer_id, order_date, product_id);

INSERT INTO sales (customer_id, order_date, product_id) VALUES
('A', '2021-01-01', 1),
('A', '2021-01-01', 2),
('A', '2021-01-07', 2),
('A', '2021-01-10', 3),
('A', '2021-01-11', 3),
('A', '2021-01-11', 3),
('B', '2021-01-01', 2),
('B', '2021-01-02', 2),
('B', '2021-01-04', 1),
('B', '2021-01-11', 1),
('B', '2021-01-16', 3),
('B', '2021-02-01', 3),
('C', '2021-01-01', 3),
('C', '2021-01-01', 3),
('C', '2021-01-07', 3);

CREATE TABLE menu (
    product_id UInt32,
    product_name FixedString(5),
    price UInt32
) ENGINE = MergeTree()
ORDER BY product_id;

INSERT INTO menu (product_id, product_name, price) VALUES
(1, 'sushi', 10),
(2, 'curry', 15),
(3, 'ramen', 12);

CREATE TABLE members (
    customer_id FixedString(1),
    join_date Date
) ENGINE = MergeTree()
ORDER BY customer_id;

INSERT INTO members (customer_id, join_date) VALUES
('A', '2021-01-07'),
('B', '2021-01-09');

Connecting The LLMs To ClickHouse

My aim was to connect both Claude and Gemini to my ClickHouse database. When I first did this a few years ago I had to use Langchain and setup tools and OpenAPI specifications. Today, this is quite easy as we simply have to configure both to use an MCP Server which will provide the integration with the database.

Setting Up Claude

For Claude Desktop, we have to add a configuration section to the claude_desktop_config.json file describing how to download and run the ClickHouse MCP server:

{
  "mcpServers": {
    "mcp-clickhouse": {
      "command": "uv",
      "args": [
        "run",
        "--with",
        "mcp-clickhouse",
        "--python",
        "3.10",
        "mcp-clickhouse"
      ],
      "env": {
        "CLICKHOUSE_HOST": "<clickhouse-host>",
        "CLICKHOUSE_PORT": "<clickhouse-port>",
        "CLICKHOUSE_USER": "<clickhouse-user>",
        "CLICKHOUSE_PASSWORD": "<clickhouse-password>",
        "CLICKHOUSE_SECURE": "true",
        "CLICKHOUSE_VERIFY": "true",
        "CLICKHOUSE_CONNECT_TIMEOUT": "30",
        "CLICKHOUSE_SEND_RECEIVE_TIMEOUT": "30"
      }
    }
  }
}

Instead of Claude Desktop, I prefer to work with Claude Code at the terminal. Configuring an MCP server for Claude Code involves adding something similar to the above to the .claude.json file in your home directory.

Setting Up Gemini

For Gemini I also went with a CLI option and setup the Gemini CLI tool to point to the ClickHouse MCP server. The configuration is similar and needs to be added to ~/.gemini/settings.json.

{
  "theme": "Dracula",
  "selectedAuthType": "gemini-api-key",
  "mcpServers": {
    "clickhouse": {
      "command": "uv",
       "args": [
        "run",
        "--with",
        "mcp-clickhouse",
        "--python",
        "3.10",
        "mcp-clickhouse"
      ],
      "env": {
        "CLICKHOUSE_HOST": "localhost",
        "CLICKHOUSE_PORT": "8123",
        "CLICKHOUSE_USER": "default",
        "CLICKHOUSE_PASSWORD": "",
        "CLICKHOUSE_SECURE": "false",
        "CLICKHOUSE_VERIFY": "false",
        "CLICKHOUSE_CONNECT_TIMEOUT": "30",
        "CLICKHOUSE_SEND_RECEIVE_TIMEOUT": "30"
      },
      "timeout": 5000
    }
  }
}

I confirmed that both tools could successfully connect to the ClickHouse mcp server by issuing an /mcp command to both.

SQL Quiz

After deploying the schema and configuring the MCP servers, I prompted the models with the following:

I am going to give you 10 questions.  For each question, make use of the dannys_diner schema in the connected ClickHouse database to form your answer.  As output, build a report showing the results organised by question.  For each question, output the results as a table with the SQL statement you used underneath.  Connect to ClickHouse over it’s HTTP port 8123.

The 10 questions are as follows:

What is the total amount each customer spent at the restaurant?
How many days has each customer visited the restaurant?
What was the first item from the menu purchased by each customer?
What is the most purchased item on the menu and how many times was it purchased by all customers?
Which item was the most popular for each customer?
Which item was purchased first by the customer after they became a member?
Which item was purchased just before the customer became a member?
What is the total items and amount spent for each member before they became a member?
If each $1 spent equates to 10 points and sushi has a 2x points multiplier - how many points would each customer have?
In the first week after a customer joins the program (including their join date) they earn 2x points on all items, not just sushi - how many points do customer A and B have at the end of January?

Speed

Claude took 260 seconds (~ 4 minutes) to complete the task, Gemini took 480 seconds (~ 8 minutes) to complete the task. After solving the first question, Claude quickly decided to parallelise the task whilst Gemini did it serially. Gemini also seemed to make more errors and need to backtrack before reissuing the query.

Results

Despite my first impressions, Gemini slightly pipped Claude to the post

QuestionClaude CorrectGemini Correct
1YesYes
2YesYes
3YesYes
4YesYes
5YesYes
6YesYes
7YesYes
8YesYes
9YesYes
10YesYes

Analysis

What is the total amount each customer spent at the restaurant? - Both Correct

Claude Response

| Customer ID | Total Amount |
|-------------|--------------|
| A           | $76          |
| B           | $74          |
| C           | $36          |

SELECT s.customer_id, SUM(m.price) AS total_amount
FROM dannys_diner.sales s
JOIN dannys_diner.menu m ON s.product_id = m.product_id
GROUP BY s.customer_id
ORDER BY s.customer_id

Gemini Response

| customer_id | total_spent |
|-------------|-------------|
| A           | 76          |
| B           | 74          |
| C           | 36          |

SELECT
    s.customer_id,
    sum(m.price) AS total_spent
FROM dannys_diner.sales AS s
INNER JOIN dannys_diner.menu AS m ON s.product_id = m.product_id
GROUP BY s.customer_id
ORDER BY s.customer_id

Both correct with a simple join and group by. Claude added in the currency sign which may break downstream views or BI tools.

How many days has each customer visited the restaurant? - Both Correct

Claude Response

| Customer ID | Days Visited |
|-------------|--------------|
| A           | 4            |
| B           | 6            |
| C           | 2            |

SELECT customer_id, COUNT(DISTINCT order_date) AS days_visited
FROM dannys_diner.sales
GROUP BY customer_id
ORDER BY customer_id

Gemini Response

| customer_id | visit_count |
|-------------|-------------|
| A           | 4           |
| B           | 6           |
| C           | 2           |

SELECT
    customer_id,
    count(DISTINCT order_date) AS visit_count
FROM dannys_diner.sales
GROUP BY customer_id
ORDER BY customer_id

Both correct with a simple group by. Ordering by visit count descending might have been more in line with intent.

What was the first item from the menu purchased by each customer? - Both Correct

Claude Response

| Customer ID | Product Name | Order Date  |
|-------------|--------------|-------------|
| A           | sushi        | 2021-01-01  |
| A           | curry        | 2021-01-01  |
| B           | curry        | 2021-01-01  |
| C           | ramen        | 2021-01-01  |

WITH first_purchase AS (
    SELECT customer_id, MIN(order_date) AS first_date
    FROM dannys_diner.sales
    GROUP BY customer_id
)
SELECT DISTINCT s.customer_id, m.product_name, s.order_date
FROM dannys_diner.sales s
JOIN dannys_diner.menu m ON s.product_id = m.product_id
JOIN first_purchase f ON s.customer_id = f.customer_id AND s.order_date = f.first_date
ORDER BY s.customer_id, m.product_name

Gemini Response

| customer_id | product_name |
|-------------|--------------|
| A           | sushi        |
| A           | curry        |
| B           | curry        |
| C           | ramen        |

WITH ordered_sales AS (
    SELECT
        s.customer_id,
        s.order_date,
        m.product_name,
        DENSE_RANK() OVER (PARTITION BY s.customer_id ORDER BY s.order_date) as rank
    FROM dannys_diner.sales AS s
    JOIN dannys_diner.menu AS m ON s.product_id = m.product_id
)
SELECT
    customer_id,
    product_name
FROM ordered_sales
WHERE rank = 1
GROUP BY customer_id, product_name

Both correct. Claude uses a CTE which is likely more efficient whilst Gemini ranks with a window function which is more sophisticated.

What is the most purchased item on the menu and how many times was it purchased by all customers? - Both Correct

Claude Response

| Product Name | Purchase Count |
|--------------|----------------|
| ramen        | 8              |

SELECT m.product_name, COUNT(s.product_id) AS purchase_count
FROM dannys_diner.sales s
JOIN dannys_diner.menu m ON s.product_id = m.product_id
GROUP BY m.product_name
ORDER BY purchase_count DESC
LIMIT 1

Gemini Response

| product_name | purchase_count |
|--------------|----------------|
| ramen        | 8              |

SELECT
    m.product_name,
    count(*) AS purchase_count
FROM dannys_diner.sales AS s
INNER JOIN dannys_diner.menu AS m ON s.product_id = m.product_id
GROUP BY m.product_name
ORDER BY purchase_count DESC
LIMIT 1

Both correct. Gemini more explicit with INNER JOIN.

Claude Response

| Customer ID | Product Name | Purchase Count |
|-------------|--------------|----------------|
| A           | ramen        | 3              |
| B           | sushi        | 2              |
| C           | ramen        | 3              |

WITH ranked AS (
    SELECT s.customer_id, m.product_name, COUNT(*) AS purchase_count,
           RANK() OVER (PARTITION BY s.customer_id ORDER BY COUNT(*) DESC) AS rnk
    FROM dannys_diner.sales s
    JOIN dannys_diner.menu m ON s.product_id = m.product_id
    GROUP BY s.customer_id, m.product_name
)
SELECT customer_id, product_name, purchase_count
FROM ranked
WHERE rnk = 1
ORDER BY customer_id

Gemini Response

| customer_id | product_name | purchase_count |
|-------------|--------------|----------------|
| A           | ramen        | 3              |
| B           | sushi        | 2              |
| B           | ramen        | 2              |
| B           | curry        | 2              |
| C           | ramen        | 3              |

WITH customer_popularity AS (
    SELECT
        s.customer_id,
        m.product_name,
        count(*) AS purchase_count,
        DENSE_RANK() OVER (PARTITION BY s.customer_id ORDER BY count(*) DESC) as rank
    FROM dannys_diner.sales AS s
    JOIN dannys_diner.menu AS m ON s.product_id = m.product_id
    GROUP BY s.customer_id, m.product_name
)
SELECT
    customer_id,
    product_name,
    purchase_count
FROM customer_popularity
WHERE rank = 1
ORDER BY customer_id

Bla bla bla

Which item was purchased first by the customer after they became a member? - Both Correct

Claude Response

| Customer ID | Product Name | Order Date  |
|-------------|--------------|-------------|
| A           | curry        | 2021-01-07  |
| B           | sushi        | 2021-01-11  |

WITH member_purchases AS (
    SELECT s.customer_id, s.order_date, s.product_id,
           RANK() OVER (PARTITION BY s.customer_id ORDER BY s.order_date) AS rnk
    FROM dannys_diner.sales s
    JOIN dannys_diner.members mb ON s.customer_id = mb.customer_id
    WHERE s.order_date >= mb.join_date
)
SELECT mp.customer_id, m.product_name, mp.order_date
FROM member_purchases mp
JOIN dannys_diner.menu m ON mp.product_id = m.product_id
WHERE mp.rnk = 1
ORDER BY mp.customer_id

Gemini Response

| customer_id | product_name |
|-------------|--------------|
| A           | curry        |
| B           | sushi        |

WITH first_orders AS (
    SELECT
        s.customer_id,
        min(s.order_date) AS first_order_date
    FROM dannys_diner.sales AS s
    JOIN dannys_diner.members AS mem ON s.customer_id = mem.customer_id
    WHERE s.order_date >= mem.join_date
    GROUP BY s.customer_id
)
SELECT
    s.customer_id,
    m.product_name
FROM dannys_diner.sales AS s
JOIN first_orders AS fo ON s.customer_id = fo.customer_id AND s.order_date = fo.first_order_date
JOIN dannys_diner.menu AS m ON s.product_id = m.product_id
ORDER BY s.customer_id

Interesting that in contrast with a query above, here Claude used the window function and Gemini made use of a simple CTE.

Which item was purchased just before the customer became a member? - Both Correct

Claude Response

| Customer ID | Product Name | Order Date  |
|-------------|--------------|-------------|
| A           | sushi        | 2021-01-01  |
| A           | curry        | 2021-01-01  |
| B           | sushi        | 2021-01-04  |

WITH pre_member_purchases AS (
    SELECT s.customer_id, s.order_date, s.product_id,
           RANK() OVER (PARTITION BY s.customer_id ORDER BY s.order_date DESC) AS rnk
    FROM dannys_diner.sales s
    JOIN dannys_diner.members mb ON s.customer_id = mb.customer_id
    WHERE s.order_date < mb.join_date
)
SELECT pmp.customer_id, m.product_name, pmp.order_date
FROM pre_member_purchases pmp
JOIN dannys_diner.menu m ON pmp.product_id = m.product_id
WHERE pmp.rnk = 1
ORDER BY pmp.customer_id, m.product_name

Gemini Response

| customer_id | product_name |
|-------------|--------------|
| A           | sushi        |
| A           | curry        |
| B           | sushi        |

WITH last_orders AS (
    SELECT
        s.customer_id,
        max(s.order_date) AS last_order_date
    FROM dannys_diner.sales AS s
    JOIN dannys_diner.members AS mem ON s.customer_id = mem.customer_id
    WHERE s.order_date < mem.join_date
    GROUP BY s.customer_id
)
SELECT
    s.customer_id,
    m.product_name
FROM dannys_diner.sales AS s
JOIN last_orders AS lo ON s.customer_id = lo.customer_id AND s.order_date = lo.last_order_date
JOIN dannys_diner.menu AS m ON s.product_id = m.product_id
ORDER BY s.customer_id

What is the total items and amount spent for each member before they became a member? - Both Correct

Claude Response

| Customer ID | Total Items | Total Amount |
|-------------|-------------|--------------|
| A           | 2           | $25          |
| B           | 2           | $40          |

SELECT s.customer_id,
       COUNT(DISTINCT s.product_id) AS total_items,
       SUM(m.price) AS total_amount
FROM dannys_diner.sales s
JOIN dannys_diner.menu m ON s.product_id = m.product_id
JOIN dannys_diner.members mb ON s.customer_id = mb.customer_id
WHERE s.order_date < mb.join_date
GROUP BY s.customer_id
ORDER BY s.customer_id

Gemini Response

| customer_id | total_items | total_spent |
|-------------|-------------|-------------|
| A           | 2           | 25          |
| B           | 3           | 40          |

SELECT
    s.customer_id,
    count(s.product_id) AS total_items,
    sum(m.price) AS total_spent
FROM dannys_diner.sales AS s
JOIN dannys_diner.members AS mem ON s.customer_id = mem.customer_id
JOIN dannys_diner.menu AS m ON s.product_id = m.product_id
WHERE s.order_date < mem.join_date
GROUP BY s.customer_id
ORDER BY s.customer_id

Both correct.

If each $1 spent equates to 10 points and sushi has a 2x points multiplier - how many points would each customer have? - Both Correct

Claude Response

| Customer ID | Total Points |
|-------------|--------------|
| A           | 860          |
| B           | 940          |
| C           | 360          |

SELECT s.customer_id,
       SUM(CASE WHEN m.product_name = 'sushi' THEN m.price * 20 ELSE m.price * 10 END) AS total_points
FROM dannys_diner.sales s
JOIN dannys_diner.menu m ON s.product_id = m.product_id
GROUP BY s.customer_id
ORDER BY s.customer_id

Gemini Response

| customer_id | total_points |
|-------------|--------------|
| A           | 860          |
| B           | 940          |
| C           | 360          |

```sql
SELECT
    s.customer_id,
    sum(CASE
        WHEN m.product_name = 'sushi' THEN m.price * 20
        ELSE m.price * 10
    END) AS total_points
FROM dannys_diner.sales AS s
JOIN dannys_diner.menu AS m ON s.product_id = m.product_id
GROUP BY s.customer_id
ORDER BY s.customer_id

Both correct.

What is the total items and amount spent for each member before they became a member? - Both Correct

Claude Response

| Customer ID | Total Items | Total Amount |
|-------------|-------------|--------------|
| A           | 2           | $25          |
| B           | 2           | $40          |

SELECT s.customer_id,
       COUNT(DISTINCT s.product_id) AS total_items,
       SUM(m.price) AS total_amount
FROM dannys_diner.sales s
JOIN dannys_diner.menu m ON s.product_id = m.product_id
JOIN dannys_diner.members mb ON s.customer_id = mb.customer_id
WHERE s.order_date < mb.join_date
GROUP BY s.customer_id
ORDER BY s.customer_id

Gemini

| customer_id | total_items | total_spent |
|-------------|-------------|-------------|
| A           | 2           | 25          |
| B           | 3           | 40          |

SELECT
    s.customer_id,
    count(s.product_id) AS total_items,
    sum(m.price) AS total_spent
FROM dannys_diner.sales AS s
JOIN dannys_diner.members AS mem ON s.customer_id = mem.customer_id
JOIN dannys_diner.menu AS m ON s.product_id = m.product_id
WHERE s.order_date < mem.join_date
GROUP BY s.customer_id
ORDER BY s.customer_id

In the first week after a customer joins the program (including their join date) they earn 2x points on all items, not just sushi - how many points do customer A and B have at the end of January?

Claude Response

| Customer ID | Total Points |
|-------------|--------------|
| A           | 1,370        |
| B           | 940          |

SELECT s.customer_id,
       SUM(
         CASE
           WHEN m.product_name = 'sushi' THEN m.price * 20
           WHEN s.order_date >= mb.join_date AND s.order_date < mb.join_date + INTERVAL 7 DAY THEN m.price * 20
           ELSE m.price * 10
         END
       ) AS total_points
FROM dannys_diner.sales s
LEFT JOIN dannys_diner.members mb ON s.customer_id = mb.customer_id
JOIN dannys_diner.menu m ON s.product_id = m.product_id
WHERE s.customer_id IN ('A', 'B') AND s.order_date <= '2021-01-31'
GROUP BY s.customer_id

Gemini Response

| customer_id | total_points |
|-------------|--------------|
| A           | 1370         |
| B           | 820          |
 
SELECT
    s.customer_id,
    sum(CASE
        WHEN s.order_date BETWEEN mem.join_date AND mem.join_date + INTERVAL 6 DAY THEN m.price * 20
        WHEN m.product_name = 'sushi' THEN m.price * 20
        ELSE m.price * 10
    END) AS total_points
FROM dannys_diner.sales AS s
JOIN dannys_diner.members AS mem ON s.customer_id = mem.customer_id
JOIN dannys_diner.menu AS m ON s.product_id = m.product_id
WHERE s.order_date <= '2021-01-31'
    AND s.customer_id IN ('A', 'B')
GROUP BY s.customer_id
ORDER BY s.customer_id

Conclusion

This was a really interesting exercise and shows that agentic and generative BI, where we use plain English to ask questions about a dataset, is very close to becoming viable with todays LLMs.

We have asked a series of questions in plain English and had them succesfully translated to syntactically correct SQL. Though the queries and the schema were relatively simple, they did need joins and window functions

We tested with Claude and Gemini. Despite my initial impression about Claude being much more sophisticated than Gemini, Gemini actually won the bake off.