Analysing FIX Data With ClickHouse

Benjamin Wootton

Benjamin Wootton

Published on June 22, 2025

Analysing FIX Data With ClickHouse

The Financial Information eXchange (FIX) Protocol is a standard that is used extensively within the capital markets industry to exchange information and requests between trading counterparties.

The protocol is based around FIX messages which can be used for purposes such as placing new trades, amending existing trades, exchanging market data, and for post trade processing.

If a company wishes to electronically place a trade for a stock, their systems could send a FIX message to their broker which includes details such as the symbol of the stock they wish to trade, the quantity, the price, and the type of order. The broker would then execute the trade, perhaps involving another FIX message to a stock exchange, before sending a confirmation message back to the originator to confirm the order and the price it was filled at.

A FIX message to place a new 100 lot order for Google shares looks might look like this:

8=FIX.4.2|9=178|35=D|49=CLIENT1|56=BROKER1|34=1|52=20250526-15:30:00.000|11=ORD123456|21=1|55=GOOGL|54=1|38=100|40=2|44=2795.00|59=0|10=235|

Once the order is executed, the broker could respond with an execution report showing that the full 100 lot order has been filled at an average price of $2795.00.

8=FIX.4.2|9=196|35=8|49=BROKER1|56=CLIENT1|34=2|52=20250526-15:30:01.000|37=EXEC7890|11=ORD123456|17=EX12345|150=0|39=2|55=GOOGL|54=1|38=100|14=100|6=2795.00|31=2795.00|32=100|151=0|10=072|

As you can see, each FIX message is made up of a series of numeric tags, each of which has a specific meaning. For instance, the tag 8 is the protocol version, 9 is the message length, 35 is the message type, 49 is the sender, 56 is the receiver, 34 is the sequence number, and 52 is the timestamp. By agreeing on the meaning of these tags, the FIX protocol allows for the exchange of information between different counterparties in a standardised way.

How FIX Data Is Used

At a minimum, businesses have to safely retain and archive their FIX messages for audit, regulatory and compliance purposes. If there is an issue with a trade, it is important to be able to see the exact FIX messages that were sent and received in order to understand what happened.

Beyond this though, the data can also be used for valuable analytics purposes. For instance, it may be useful to understand the volume of trades, amends and cancels. There may also be compliance use cases such as idenitfying market abuse or wash trades which may be apparent in the FIX logs. Finally, the data may be useful for risk management or to identify trading opportunities in some circumstances.

For both of these, it is important that data is easy to query, search, analyse and join. If FIX data is simply written to a text file or into an inappropriate database then it is very hard to extract value from and will lead to a lot of time for support agents manually searching through FIX logs.

Where ClickHouse Fits In

As ClickHouse is growing in promenance within capital markets, businesses such as banks, hedge funds, exchanges, brokers and trading technology vendors are increasingly interested in using it to store and analyse their FIX related data. Fortunately, ClickHouse is a natural fit for this use case.

FIX data inherently comes in high volumes. A single bank could be exchanging millions of messages per day as they process huge numbers of orders with their various counterparties. Even vanilla trading use cases could generate significant data volumes, whilst in the realm of algorithmic trading and high frequency trading, the number of messages could be in the billions per day. Over the year, we could be in petabyte scale based on FIX messages alone.

FIX data also tends to be time series and event based, meaning it can be appended to a single table without the need for significant denormalisation. This is the type of data which ClickHouse has really been optimised for as it can be stored in one big partitioned table and queried by time.

Electronic trading is also an inherently real time use case. Users such as traders or risk managers and external systems such as post grade processing or compliance systems might want to see FIX logs in near real time. ClickHouse ability to rapidly ingest and query fresh data makes it an excellent fit for this.

About This Example

We wanted to show a simple example of how to store and analyse FIX data using ClickHouse to demonstrate some techniques for working with it efficiently.

In this example, we have a trading technology vendor who are capturing trade data between various counterparty banks. These banks are trading a mixture of 10 different stocks. Each FIX message contains a sender, a receiver and a ticker signifying the counterparties and the stocks they are trading.

In some instances, a counterparty can reject a trade. This could end up costing the requester money if the market price changes. Our task in this example is to understand the financial impact of these cancellations by cross refencing against market data also stored in ClickHouse to understand the financial loss associated with a rejection.

Dataset

We used the SimpleFix Python library for generating a fictitious set of FIX logs representing buy/sell transactions between 10 banks, which we saved in a ClickHouse table called fix.log.

The transactions refer to 10 different stocks. We generated the stock prices from a random walk and saved them in a ClickHouse table called fix.prices. For simplicity, we considered only two types of messages:

  • New Orders - Single (MsgType = ‘D’): A trade request is sent from one bank to another to buy or sell a stock at market price;

  • Confirmation (MsgType = ‘AK’): The bank that received the request either confirms or rejects the trade.

We generated both the stock prices and the FIX messages at secondly frequency. We assumed that the difference between the time when the trade request is sent, the time when the trade is executed (if confirmed), and the time when the trade confirmation is received is only a few milliseconds, and that the time lag between the trade request and the trade confirmation has no impact on the market price at which the trade is executed. A preview of the data is shown below:

SELECT
    UID,
    Timestamp,
    LEFT(Message, 50)
FROM fix.log
LIMIT 10

Query id: e13cec5b-1483-47c0-a3ba-af7063e1ad69

    ┌─UID─┬───────────Timestamp─┬─left(Message, 50)──────────────────────────────────┐
 1. │   1 │ 2024-03-08 09:38:45 │ 8=FIX.4.4|35=D|49=BNP Paribas|56=Goldman Sachs|34= │
 2. │   2 │ 2024-03-08 09:38:45 │ 8=FIX.4.4|35=AK|49=Goldman Sachs|56=BNP Paribas|34 │
 3. │   3 │ 2024-03-08 09:38:46 │ 8=FIX.4.4|35=D|49=BNP Paribas|56=Bank of America|3 │
 4. │   4 │ 2024-03-08 09:38:46 │ 8=FIX.4.4|35=AK|49=Bank of America|56=BNP Paribas| │
 5. │   5 │ 2024-03-08 09:38:47 │ 8=FIX.4.4|35=D|49=Morgan Stanley|56=Bank of Americ │
 6. │   6 │ 2024-03-08 09:38:47 │ 8=FIX.4.4|35=AK|49=Bank of America|56=Morgan Stanl │
 7. │   7 │ 2024-03-08 09:38:48 │ 8=FIX.4.4|35=D|49=Morgan Stanley|56=Nomura|34=2|52 │
 8. │   8 │ 2024-03-08 09:38:48 │ 8=FIX.4.4|35=AK|49=Nomura|56=Morgan Stanley|34=1|5 │
 9. │   9 │ 2024-03-08 09:38:49 │ 8=FIX.4.4|35=D|49=J.P. Morgan|56=BNP Paribas|34=1| │
10. │  10 │ 2024-03-08 09:38:49 │ 8=FIX.4.4|35=AK|49=BNP Paribas|56=J.P. Morgan|34=3 │
    └─────┴─────────────────────┴────────────────────────────────────────────────────┘

10 rows in set. Elapsed: 0.003 sec. Processed 2.12 thousand rows, 484.37 KB (706.32 thousand rows/s., 161.15 MB/s.)
Peak memory usage: 907.05 KiB.

The price data has the following format:

SELECT *
FROM fix.prices
LIMIT 10

Query id: d897451f-a51c-44a5-881d-f785bf66e8b1

    ┌───────────Timestamp─┬─Ticker─┬─Price─┐
 1. │ 2024-03-08 09:38:45 │ GTO    │ 28.89 │
 2. │ 2024-03-08 09:38:45 │ KHU    │  8.82 │
 3. │ 2024-03-08 09:38:45 │ GZS    │ 98.57 │
 4. │ 2024-03-08 09:38:45 │ WKK    │ 77.46 │
 5. │ 2024-03-08 09:38:45 │ XUD    │ 19.92 │
 6. │ 2024-03-08 09:38:45 │ HXC    │ 56.27 │
 7. │ 2024-03-08 09:38:45 │ VUB    │ 38.77 │
 8. │ 2024-03-08 09:38:45 │ XLF    │  36.6 │
 9. │ 2024-03-08 09:38:45 │ BUA    │ 33.55 │
10. │ 2024-03-08 09:38:45 │ LZV    │  9.18 │
    └─────────────────────┴────────┴───────┘

10 rows in set. Elapsed: 0.003 sec. Processed 8.20 thousand rows, 196.80 KB (2.92 million rows/s., 70.01 MB/s.)
Peak memory usage: 453.33 KiB.

Scenario

We have observed that a large fraction of trades were rejected, which will likely have an impact on the banks’ PnL. To quantify the impact of the rejected trades, we calculate two PnLs for each bank:

The expected PnL, which is calculated taking into account all trades;

The realized PnL, which is calculated taking into account only the confirmed trades.

For simplicity, we calculate each PnL using as buy price the price at which the last buy trade was executed before each sell trade.

Data Preparation

We start by creating a ClickHouse view called fix.messages where we extract the main fields from the raw FIX messages in the fix.log table using ClickHouse string functions and powerful array processing functions. We also do simple transformations such as convering numbers to descriptive strings to make downstream processing easier.

create or replace view 
    fix.messages 
as select 
    UID,
    Timestamp,
    Message,
    Protocol,
    MessageCode,
    MessageType, 
    TradeID,
    TradeStatus,
    Sender,
    Receiver,
    Ticker,
    BuySell,
    Price,
    Quantity
from (
    select 
        UID,
        Timestamp,
        Message,
        arrayMap(x -> splitByChar('=', x), splitByChar('|', Message)) as MessageArray,
        arrayFlatten(arrayMap(x -> arraySlice(x, 1, 1), MessageArray)) as Tag,
        arrayFlatten(arrayMap(x -> arraySlice(x, 2, 1), MessageArray)) as Value,
        arraySlice(Value, indexOf(Tag, '35'), 1)[1] as MessageCode,
        if(MessageCode == 'AK', arraySlice(Value, indexOf(Tag, '665'), 1)[1], NULL) as ConfirmStatus,
        arraySlice(Value, indexOf(Tag, '8'), 1)[1] as Protocol,
        if(MessageCode == 'D', 'New Order - Single', if(MessageCode == 'AK', 'Confirmation', 'Other')) as MessageType, 
        arraySlice(Value, indexOf(Tag, '11'), 1)[1] as TradeID,
        if(ConfirmStatus == '4', 'Confirmed', if(ConfirmStatus == '5', 'Rejected', NULL)) as TradeStatus,
        arraySlice(Value, indexOf(Tag, '49'), 1)[1] as Sender,
        arraySlice(Value, indexOf(Tag, '56'), 1)[1] as Receiver,
        arraySlice(Value, indexOf(Tag, '55'), 1)[1] as Ticker,
        if(arraySlice(Value, indexOf(Tag, '54'), 1)[1] == '1', 'Buy', 'Sell') as BuySell,
        toFloat64(if(MessageCode == 'AK', arraySlice(Value, indexOf(Tag, '6'), 1)[1], NULL)) as Price,
        toFloat64(if(MessageCode == 'D', arraySlice(Value, indexOf(Tag, '38'), 1)[1], arraySlice(Value, indexOf(Tag, '80'), 1)[1])) as Quantity
    from 
        fix.log 
)
order by
    UID, Timestamp

Each trade consists of an Order - New message with MessageCode = D and a confirmation or rejection with MessageCode = X. We create a view to join the two trades for convenience.

create or replace view
    fix.report
as select 
    Timestamp,
    TradeID,
    TradeStatus,
    Sender,
    Receiver,
    Ticker,
    BuySell,
    Price,
    RequestedQuantity,
    AllocatedQuantity
from(
    select 
        TradeID,
        Sender,
        Receiver,
        Ticker,
        BuySell,
        Quantity as RequestedQuantity
    from 
        fix.messages 
    where 
        MessageCode == 'D'
) a
inner join (
        select 
        Timestamp,
        TradeID,
        TradeStatus,
        Price,
        Quantity as AllocatedQuantity
    from 
        fix.messages 
    where 
        MessageCode == 'AK'
) b  
on 
    a.TradeID == b.TradeID
SELECT *
FROM fix.report
LIMIT 10

Query id: e954df22-a21b-487a-b598-ae5d2ba41388

    ┌───────────Timestamp─┬─TradeID─┬─TradeStatus─┬─Sender──────────┬─Receiver────────┬─Ticker─┬─BuySell─┬─Price─┬─RequestedQuantity─┬─AllocatedQuantity─┐
 1. │ 2024-03-08 09:38:45 │ 1       │ Rejected    │ BNP Paribas     │ Goldman Sachs   │ VUB    │ Buy     │ 38.77 │               108 │                 0 │
 2. │ 2024-03-08 09:38:46 │ 2       │ Confirmed   │ BNP Paribas     │ Bank of America │ GZS    │ Buy     │ 98.81 │               100 │               100 │
 3. │ 2024-03-08 09:38:47 │ 3       │ Confirmed   │ Morgan Stanley  │ Bank of America │ LZV    │ Buy     │  7.48 │               140 │               140 │
 4. │ 2024-03-08 09:38:48 │ 4       │ Confirmed   │ Morgan Stanley  │ Nomura          │ XLF    │ Buy     │ 35.33 │                88 │                88 │
 5. │ 2024-03-08 09:38:49 │ 5       │ Confirmed   │ J.P. Morgan     │ BNP Paribas     │ WKK    │ Buy     │ 72.76 │               105 │               105 │
 6. │ 2024-03-08 09:38:50 │ 6       │ Confirmed   │ Nomura          │ J.P. Morgan     │ XLF    │ Buy     │ 36.08 │                65 │                65 │
 7. │ 2024-03-08 09:38:51 │ 7       │ Confirmed   │ Bank of America │ Nomura          │ GZS    │ Buy     │ 96.96 │               118 │               118 │
 8. │ 2024-03-08 09:38:52 │ 8       │ Confirmed   │ Citi            │ J.P. Morgan     │ BUA    │ Buy     │ 31.84 │                89 │                89 │
 9. │ 2024-03-08 09:38:53 │ 9       │ Confirmed   │ Nomura          │ BNP Paribas     │ XLF    │ Sell    │ 37.11 │                65 │                65 │
10. │ 2024-03-08 09:38:54 │ 10      │ Confirmed   │ BNP Paribas     │ J.P. Morgan     │ XLF    │ Buy     │ 37.38 │               141 │               141 │
    └─────────────────────┴─────────┴─────────────┴─────────────────┴─────────────────┴────────┴─────────┴───────┴───────────────────┴───────────────────┘

10 rows in set. Elapsed: 0.055 sec. Processed 4.25 thousand rows, 969.35 KB (77.17 thousand rows/s., 17.61 MB/s.)
Peak memory usage: 4.77 MiB.

Data Analysis

We have seen that a large fraction of trades were rejected, which will likely have an impact on the banks’ PnL. To quantify the impact of the rejected trades, we calculate two PnLs for each bank:

the expected PnL, which is calculated taking into account all trades; the realized PnL, which is calculated taking into account only the confirmed trades. For simplicity, we calculate each PnL using as buy price the price at which the last buy trade was executed before each sell trade.

We start by calculating the trade-level PnL, which we save in ClickHouse view called fix.profit_and_loss.

create or replace view 
    fix.profit_and_loss
as select
    Timestamp,
    Sender, 
    Receiver,
    Ticker,
    BuyTradeStatus,
    SellTradeStatus,
    BuyQuantity,
    SellQuantity,
    BuyPrice,
    SellPrice,
    (SellQuantity * SellPrice - BuyQuantity * BuyPrice) as PnL 
from (
    select 
        Timestamp,
        Sender, 
        Receiver,
        Ticker,
        last_value(BuyQuantity) over (partition by Sender, Ticker order by Timestamp rows between unbounded preceding and current row) as BuyQuantity,
        SellQuantity,
        last_value(BuyTradeStatus) over (partition by Sender, Ticker order by Timestamp rows between unbounded preceding and current row) as BuyTradeStatus,
        SellTradeStatus,
        last_value(BuyPrice) over (partition by Sender, Ticker order by Timestamp rows between unbounded preceding and current row) as BuyPrice,
        SellPrice
    from (
        select
            Timestamp,
            Sender,
            Receiver,
            Ticker,
            if(BuySell == 'Buy', TradeStatus, NULL) as BuyTradeStatus,
            if(BuySell == 'Sell', TradeStatus, NULL) as SellTradeStatus,
            if(BuySell == 'Buy', RequestedQuantity, NULL) as BuyQuantity,
            if(BuySell == 'Sell', RequestedQuantity, NULL) as SellQuantity,
            if(BuySell == 'Buy', Price, NULL) as BuyPrice,
            if(BuySell == 'Sell', Price, NULL) as SellPrice
        from 
            fix.report 
        order by
            Sender,
            Ticker,
            Timestamp
    )
)
where isNotNull(SellPrice)

The view has the following format:

SELECT
    Timestamp,
    Sender,
    Receiver,
    Ticker,
    PnL
FROM fix.profit_and_loss
LIMIT 10

Query id: 2e473b5f-7368-4261-8950-42f258426926

    ┌───────────Timestamp─┬─Sender──────┬─Receiver────────┬─Ticker─┬────────────────PnL─┐
 1. │ 2024-03-08 09:39:37 │ BNP Paribas │ Morgan Stanley  │ BUA    │             960.75 │
 2. │ 2024-03-08 09:43:31 │ BNP Paribas │ Bank of America │ BUA    │ 1087.2999999999993 │
 3. │ 2024-03-08 09:47:16 │ BNP Paribas │ Deutsche Bank   │ BUA    │  983.6800000000003 │
 4. │ 2024-03-08 10:15:30 │ BNP Paribas │ Citi            │ BUA    │ 1854.5500000000002 │
 5. │ 2024-03-08 09:42:33 │ BNP Paribas │ Goldman Sachs   │ KHU    │  197.7800000000001 │
 6. │ 2024-03-08 09:54:09 │ BNP Paribas │ Citi            │ KHU    │             978.02 │
 7. │ 2024-03-08 09:56:45 │ BNP Paribas │ Barclays        │ KHU    │  49.92000000000007 │
 8. │ 2024-03-08 09:58:27 │ BNP Paribas │ J.P. Morgan     │ KHU    │             423.51 │
 9. │ 2024-03-08 10:00:33 │ BNP Paribas │ Citi            │ KHU    │ 1004.3999999999992 │
10. │ 2024-03-08 10:06:19 │ BNP Paribas │ Morgan Stanley  │ KHU    │ 1023.4000000000005 │
    └─────────────────────┴─────────────┴─────────────────┴────────┴────────────────────┘

10 rows in set. Elapsed: 0.073 sec. Processed 4.25 thousand rows, 969.35 KB (58.39 thousand rows/s., 13.32 MB/s.)
Peak memory usage: 6.13 MiB.

After that, we calculate the cumulative expected PnL and realized PnL of each bank, which we save in a ClickHouse view called fix.cumulative_profit_and_loss.

create or replace view 
    fix.cumulative_profit_and_loss
as select  
    Timestamp,
    Sender as Bank,
    ExpectedPnL,
    if(isNotNull(RealizedPnL), RealizedPnL, 0) as RealizedPnL
from (
    select
        Timestamp,
        Sender,
        ExpectedPnL,
        if(isNotNull(RealizedPnL), RealizedPnL, last_value(RealizedPnL) over (partition by Sender order by Timestamp rows between unbounded preceding and current row)) as RealizedPnL
    from
    (
        select
            Timestamp,
            Sender,
            sum(PnL) over (partition by Sender order by Timestamp rows between unbounded preceding and current row) as ExpectedPnL
        from
            fix.profit_and_loss
    ) a
    left join (
        select
            Timestamp,
            Sender,
            sum(PnL) over (partition by Sender order by Timestamp rows between unbounded preceding and current row) as RealizedPnL
        from
            fix.profit_and_loss
        where
            BuyTradeStatus == 'Confirmed'
        and
            SellTradeStatus == 'Confirmed'
    ) b  
    on 
        a.Timestamp == b.Timestamp
    and 
        a.Sender == b.Sender
)
order by Sender, Timestamp
SELECT *
FROM fix.cumulative_profit_and_loss
LIMIT 10

Query id: d57444ae-96ce-4675-92d8-cab6145566e5

    ┌───────────Timestamp─┬─Bank────────┬────────ExpectedPnL─┬────────RealizedPnL─┐
 1. │ 2024-03-08 09:39:04 │ BNP Paribas │  98.69999999999982 │                  0 │
 2. │ 2024-03-08 09:39:37 │ BNP Paribas │ 1059.4499999999998 │             960.75 │
 3. │ 2024-03-08 09:40:03 │ BNP Paribas │ 2323.8499999999995 │ 2225.1499999999996 │
 4. │ 2024-03-08 09:40:29 │ BNP Paribas │ 2583.0099999999993 │ 2225.1499999999996 │
 5. │ 2024-03-08 09:40:38 │ BNP Paribas │ 3350.6099999999997 │ 2225.1499999999996 │
 6. │ 2024-03-08 09:40:41 │ BNP Paribas │ 3874.5599999999986 │ 2225.1499999999996 │
 7. │ 2024-03-08 09:41:10 │ BNP Paribas │  5967.059999999999 │ 2225.1499999999996 │
 8. │ 2024-03-08 09:41:33 │ BNP Paribas │  6267.059999999999 │ 2525.1499999999996 │
 9. │ 2024-03-08 09:41:38 │ BNP Paribas │  7656.959999999999 │ 2525.1499999999996 │
10. │ 2024-03-08 09:42:08 │ BNP Paribas │            7862.49 │ 2525.1499999999996 │
    └─────────────────────┴─────────────┴────────────────────┴────────────────────┘

10 rows in set. Elapsed: 0.145 sec. Processed 8.50 thousand rows, 1.94 MB (58.58 thousand rows/s., 13.37 MB/s.)
Peak memory usage: 14.85 MiB.

We can now visualize the expected PnL against the realized PnL over time.