
The Financial Information eXchange (FIX) Protocol is a standard that is used extensively within the capital markets industry to exchange information and requests between trading counterparties.
The protocol is based around FIX messages which can be used for purposes such as placing new trades, amending existing trades, exchanging market data, and for post trade processing.
If a company wishes to electronically place a trade for a stock, their systems could send a FIX message to their broker which includes details such as the symbol of the stock they wish to trade, the quantity, the price, and the type of order. The broker would then execute the trade, perhaps involving another FIX message to a stock exchange, before sending a confirmation message back to the originator to confirm the order and the price it was filled at.
A FIX message to place a new 100 lot order for Google shares looks might look like this:
8=FIX.4.2|9=178|35=D|49=CLIENT1|56=BROKER1|34=1|52=20250526-15:30:00.000|11=ORD123456|21=1|55=GOOGL|54=1|38=100|40=2|44=2795.00|59=0|10=235|
Once the order is executed, the broker could respond with an execution report showing that the full 100 lot order has been filled at an average price of $2795.00.
8=FIX.4.2|9=196|35=8|49=BROKER1|56=CLIENT1|34=2|52=20250526-15:30:01.000|37=EXEC7890|11=ORD123456|17=EX12345|150=0|39=2|55=GOOGL|54=1|38=100|14=100|6=2795.00|31=2795.00|32=100|151=0|10=072|
As you can see, each FIX message is made up of a series of numeric tags, each of which has a specific meaning. For instance, the tag 8 is the protocol version, 9 is the message length, 35 is the message type, 49 is the sender, 56 is the receiver, 34 is the sequence number, and 52 is the timestamp. By agreeing on the meaning of these tags, the FIX protocol allows for the exchange of information between different counterparties in a standardised way.
How FIX Data Is Used
At a minimum, businesses have to safely retain and archive their FIX messages for audit, regulatory and compliance purposes. If there is an issue with a trade, it is important to be able to see the exact FIX messages that were sent and received in order to understand what happened.
Beyond this though, the data can also be used for valuable analytics purposes. For instance, it may be useful to understand the volume of trades, amends and cancels. There may also be compliance use cases such as idenitfying market abuse or wash trades which may be apparent in the FIX logs. Finally, the data may be useful for risk management or to identify trading opportunities in some circumstances.
For both of these, it is important that data is easy to query, search, analyse and join. If FIX data is simply written to a text file or into an inappropriate database then it is very hard to extract value from and will lead to a lot of time for support agents manually searching through FIX logs.
Where ClickHouse Fits In
As ClickHouse is growing in promenance within capital markets, businesses such as banks, hedge funds, exchanges, brokers and trading technology vendors are increasingly interested in using it to store and analyse their FIX related data. Fortunately, ClickHouse is a natural fit for this use case.
FIX data inherently comes in high volumes. A single bank could be exchanging millions of messages per day as they process huge numbers of orders with their various counterparties. Even vanilla trading use cases could generate significant data volumes, whilst in the realm of algorithmic trading and high frequency trading, the number of messages could be in the billions per day. Over the year, we could be in petabyte scale based on FIX messages alone.
FIX data also tends to be time series and event based, meaning it can be appended to a single table without the need for significant denormalisation. This is the type of data which ClickHouse has really been optimised for as it can be stored in one big partitioned table and queried by time.
Electronic trading is also an inherently real time use case. Users such as traders or risk managers and external systems such as post grade processing or compliance systems might want to see FIX logs in near real time. ClickHouse ability to rapidly ingest and query fresh data makes it an excellent fit for this.
About This Example
We wanted to show a simple example of how to store and analyse FIX data using ClickHouse to demonstrate some techniques for working with it efficiently.
In this example, we have a trading technology vendor who are capturing trade data between various counterparty banks. These banks are trading a mixture of 10 different stocks. Each FIX message contains a sender, a receiver and a ticker signifying the counterparties and the stocks they are trading.
In some instances, a counterparty can reject a trade. This could end up costing the requester money if the market price changes. Our task in this example is to understand the financial impact of these cancellations by cross refencing against market data also stored in ClickHouse to understand the financial loss associated with a rejection.
Dataset
We used the SimpleFix Python library for generating a fictitious set of FIX logs representing buy/sell transactions between 10 banks, which we saved in a ClickHouse table called fix.log
.
The transactions refer to 10 different stocks. We generated the stock prices from a random walk and saved them in a ClickHouse table called fix.prices
. For simplicity, we considered only two types of messages:
-
New Orders - Single (MsgType = ‘D’): A trade request is sent from one bank to another to buy or sell a stock at market price;
-
Confirmation (MsgType = ‘AK’): The bank that received the request either confirms or rejects the trade.
We generated both the stock prices and the FIX messages at secondly frequency. We assumed that the difference between the time when the trade request is sent, the time when the trade is executed (if confirmed), and the time when the trade confirmation is received is only a few milliseconds, and that the time lag between the trade request and the trade confirmation has no impact on the market price at which the trade is executed. A preview of the data is shown below:
SELECT
UID,
Timestamp,
LEFT(Message, 50)
FROM fix.log
LIMIT 10
Query id: e13cec5b-1483-47c0-a3ba-af7063e1ad69
┌─UID─┬───────────Timestamp─┬─left(Message, 50)──────────────────────────────────┐
1. │ 1 │ 2024-03-08 09:38:45 │ 8=FIX.4.4|35=D|49=BNP Paribas|56=Goldman Sachs|34= │
2. │ 2 │ 2024-03-08 09:38:45 │ 8=FIX.4.4|35=AK|49=Goldman Sachs|56=BNP Paribas|34 │
3. │ 3 │ 2024-03-08 09:38:46 │ 8=FIX.4.4|35=D|49=BNP Paribas|56=Bank of America|3 │
4. │ 4 │ 2024-03-08 09:38:46 │ 8=FIX.4.4|35=AK|49=Bank of America|56=BNP Paribas| │
5. │ 5 │ 2024-03-08 09:38:47 │ 8=FIX.4.4|35=D|49=Morgan Stanley|56=Bank of Americ │
6. │ 6 │ 2024-03-08 09:38:47 │ 8=FIX.4.4|35=AK|49=Bank of America|56=Morgan Stanl │
7. │ 7 │ 2024-03-08 09:38:48 │ 8=FIX.4.4|35=D|49=Morgan Stanley|56=Nomura|34=2|52 │
8. │ 8 │ 2024-03-08 09:38:48 │ 8=FIX.4.4|35=AK|49=Nomura|56=Morgan Stanley|34=1|5 │
9. │ 9 │ 2024-03-08 09:38:49 │ 8=FIX.4.4|35=D|49=J.P. Morgan|56=BNP Paribas|34=1| │
10. │ 10 │ 2024-03-08 09:38:49 │ 8=FIX.4.4|35=AK|49=BNP Paribas|56=J.P. Morgan|34=3 │
└─────┴─────────────────────┴────────────────────────────────────────────────────┘
10 rows in set. Elapsed: 0.003 sec. Processed 2.12 thousand rows, 484.37 KB (706.32 thousand rows/s., 161.15 MB/s.)
Peak memory usage: 907.05 KiB.
The price data has the following format:
SELECT *
FROM fix.prices
LIMIT 10
Query id: d897451f-a51c-44a5-881d-f785bf66e8b1
┌───────────Timestamp─┬─Ticker─┬─Price─┐
1. │ 2024-03-08 09:38:45 │ GTO │ 28.89 │
2. │ 2024-03-08 09:38:45 │ KHU │ 8.82 │
3. │ 2024-03-08 09:38:45 │ GZS │ 98.57 │
4. │ 2024-03-08 09:38:45 │ WKK │ 77.46 │
5. │ 2024-03-08 09:38:45 │ XUD │ 19.92 │
6. │ 2024-03-08 09:38:45 │ HXC │ 56.27 │
7. │ 2024-03-08 09:38:45 │ VUB │ 38.77 │
8. │ 2024-03-08 09:38:45 │ XLF │ 36.6 │
9. │ 2024-03-08 09:38:45 │ BUA │ 33.55 │
10. │ 2024-03-08 09:38:45 │ LZV │ 9.18 │
└─────────────────────┴────────┴───────┘
10 rows in set. Elapsed: 0.003 sec. Processed 8.20 thousand rows, 196.80 KB (2.92 million rows/s., 70.01 MB/s.)
Peak memory usage: 453.33 KiB.
Scenario
We have observed that a large fraction of trades were rejected, which will likely have an impact on the banks’ PnL. To quantify the impact of the rejected trades, we calculate two PnLs for each bank:
The expected PnL, which is calculated taking into account all trades;
The realized PnL, which is calculated taking into account only the confirmed trades.
For simplicity, we calculate each PnL using as buy price the price at which the last buy trade was executed before each sell trade.
Data Preparation
We start by creating a ClickHouse view called fix.messages
where we extract the main fields from the raw FIX messages in the fix.log
table using ClickHouse string functions and powerful array processing functions. We also do simple transformations such as convering numbers to descriptive strings to make downstream processing easier.
create or replace view
fix.messages
as select
UID,
Timestamp,
Message,
Protocol,
MessageCode,
MessageType,
TradeID,
TradeStatus,
Sender,
Receiver,
Ticker,
BuySell,
Price,
Quantity
from (
select
UID,
Timestamp,
Message,
arrayMap(x -> splitByChar('=', x), splitByChar('|', Message)) as MessageArray,
arrayFlatten(arrayMap(x -> arraySlice(x, 1, 1), MessageArray)) as Tag,
arrayFlatten(arrayMap(x -> arraySlice(x, 2, 1), MessageArray)) as Value,
arraySlice(Value, indexOf(Tag, '35'), 1)[1] as MessageCode,
if(MessageCode == 'AK', arraySlice(Value, indexOf(Tag, '665'), 1)[1], NULL) as ConfirmStatus,
arraySlice(Value, indexOf(Tag, '8'), 1)[1] as Protocol,
if(MessageCode == 'D', 'New Order - Single', if(MessageCode == 'AK', 'Confirmation', 'Other')) as MessageType,
arraySlice(Value, indexOf(Tag, '11'), 1)[1] as TradeID,
if(ConfirmStatus == '4', 'Confirmed', if(ConfirmStatus == '5', 'Rejected', NULL)) as TradeStatus,
arraySlice(Value, indexOf(Tag, '49'), 1)[1] as Sender,
arraySlice(Value, indexOf(Tag, '56'), 1)[1] as Receiver,
arraySlice(Value, indexOf(Tag, '55'), 1)[1] as Ticker,
if(arraySlice(Value, indexOf(Tag, '54'), 1)[1] == '1', 'Buy', 'Sell') as BuySell,
toFloat64(if(MessageCode == 'AK', arraySlice(Value, indexOf(Tag, '6'), 1)[1], NULL)) as Price,
toFloat64(if(MessageCode == 'D', arraySlice(Value, indexOf(Tag, '38'), 1)[1], arraySlice(Value, indexOf(Tag, '80'), 1)[1])) as Quantity
from
fix.log
)
order by
UID, Timestamp
Each trade consists of an Order - New message with MessageCode = D and a confirmation or rejection with MessageCode = X. We create a view to join the two trades for convenience.
create or replace view
fix.report
as select
Timestamp,
TradeID,
TradeStatus,
Sender,
Receiver,
Ticker,
BuySell,
Price,
RequestedQuantity,
AllocatedQuantity
from(
select
TradeID,
Sender,
Receiver,
Ticker,
BuySell,
Quantity as RequestedQuantity
from
fix.messages
where
MessageCode == 'D'
) a
inner join (
select
Timestamp,
TradeID,
TradeStatus,
Price,
Quantity as AllocatedQuantity
from
fix.messages
where
MessageCode == 'AK'
) b
on
a.TradeID == b.TradeID
SELECT *
FROM fix.report
LIMIT 10
Query id: e954df22-a21b-487a-b598-ae5d2ba41388
┌───────────Timestamp─┬─TradeID─┬─TradeStatus─┬─Sender──────────┬─Receiver────────┬─Ticker─┬─BuySell─┬─Price─┬─RequestedQuantity─┬─AllocatedQuantity─┐
1. │ 2024-03-08 09:38:45 │ 1 │ Rejected │ BNP Paribas │ Goldman Sachs │ VUB │ Buy │ 38.77 │ 108 │ 0 │
2. │ 2024-03-08 09:38:46 │ 2 │ Confirmed │ BNP Paribas │ Bank of America │ GZS │ Buy │ 98.81 │ 100 │ 100 │
3. │ 2024-03-08 09:38:47 │ 3 │ Confirmed │ Morgan Stanley │ Bank of America │ LZV │ Buy │ 7.48 │ 140 │ 140 │
4. │ 2024-03-08 09:38:48 │ 4 │ Confirmed │ Morgan Stanley │ Nomura │ XLF │ Buy │ 35.33 │ 88 │ 88 │
5. │ 2024-03-08 09:38:49 │ 5 │ Confirmed │ J.P. Morgan │ BNP Paribas │ WKK │ Buy │ 72.76 │ 105 │ 105 │
6. │ 2024-03-08 09:38:50 │ 6 │ Confirmed │ Nomura │ J.P. Morgan │ XLF │ Buy │ 36.08 │ 65 │ 65 │
7. │ 2024-03-08 09:38:51 │ 7 │ Confirmed │ Bank of America │ Nomura │ GZS │ Buy │ 96.96 │ 118 │ 118 │
8. │ 2024-03-08 09:38:52 │ 8 │ Confirmed │ Citi │ J.P. Morgan │ BUA │ Buy │ 31.84 │ 89 │ 89 │
9. │ 2024-03-08 09:38:53 │ 9 │ Confirmed │ Nomura │ BNP Paribas │ XLF │ Sell │ 37.11 │ 65 │ 65 │
10. │ 2024-03-08 09:38:54 │ 10 │ Confirmed │ BNP Paribas │ J.P. Morgan │ XLF │ Buy │ 37.38 │ 141 │ 141 │
└─────────────────────┴─────────┴─────────────┴─────────────────┴─────────────────┴────────┴─────────┴───────┴───────────────────┴───────────────────┘
10 rows in set. Elapsed: 0.055 sec. Processed 4.25 thousand rows, 969.35 KB (77.17 thousand rows/s., 17.61 MB/s.)
Peak memory usage: 4.77 MiB.
Data Analysis
We have seen that a large fraction of trades were rejected, which will likely have an impact on the banks’ PnL. To quantify the impact of the rejected trades, we calculate two PnLs for each bank:
the expected PnL, which is calculated taking into account all trades; the realized PnL, which is calculated taking into account only the confirmed trades. For simplicity, we calculate each PnL using as buy price the price at which the last buy trade was executed before each sell trade.
We start by calculating the trade-level PnL, which we save in ClickHouse view called fix.profit_and_loss.
create or replace view
fix.profit_and_loss
as select
Timestamp,
Sender,
Receiver,
Ticker,
BuyTradeStatus,
SellTradeStatus,
BuyQuantity,
SellQuantity,
BuyPrice,
SellPrice,
(SellQuantity * SellPrice - BuyQuantity * BuyPrice) as PnL
from (
select
Timestamp,
Sender,
Receiver,
Ticker,
last_value(BuyQuantity) over (partition by Sender, Ticker order by Timestamp rows between unbounded preceding and current row) as BuyQuantity,
SellQuantity,
last_value(BuyTradeStatus) over (partition by Sender, Ticker order by Timestamp rows between unbounded preceding and current row) as BuyTradeStatus,
SellTradeStatus,
last_value(BuyPrice) over (partition by Sender, Ticker order by Timestamp rows between unbounded preceding and current row) as BuyPrice,
SellPrice
from (
select
Timestamp,
Sender,
Receiver,
Ticker,
if(BuySell == 'Buy', TradeStatus, NULL) as BuyTradeStatus,
if(BuySell == 'Sell', TradeStatus, NULL) as SellTradeStatus,
if(BuySell == 'Buy', RequestedQuantity, NULL) as BuyQuantity,
if(BuySell == 'Sell', RequestedQuantity, NULL) as SellQuantity,
if(BuySell == 'Buy', Price, NULL) as BuyPrice,
if(BuySell == 'Sell', Price, NULL) as SellPrice
from
fix.report
order by
Sender,
Ticker,
Timestamp
)
)
where isNotNull(SellPrice)
The view has the following format:
SELECT
Timestamp,
Sender,
Receiver,
Ticker,
PnL
FROM fix.profit_and_loss
LIMIT 10
Query id: 2e473b5f-7368-4261-8950-42f258426926
┌───────────Timestamp─┬─Sender──────┬─Receiver────────┬─Ticker─┬────────────────PnL─┐
1. │ 2024-03-08 09:39:37 │ BNP Paribas │ Morgan Stanley │ BUA │ 960.75 │
2. │ 2024-03-08 09:43:31 │ BNP Paribas │ Bank of America │ BUA │ 1087.2999999999993 │
3. │ 2024-03-08 09:47:16 │ BNP Paribas │ Deutsche Bank │ BUA │ 983.6800000000003 │
4. │ 2024-03-08 10:15:30 │ BNP Paribas │ Citi │ BUA │ 1854.5500000000002 │
5. │ 2024-03-08 09:42:33 │ BNP Paribas │ Goldman Sachs │ KHU │ 197.7800000000001 │
6. │ 2024-03-08 09:54:09 │ BNP Paribas │ Citi │ KHU │ 978.02 │
7. │ 2024-03-08 09:56:45 │ BNP Paribas │ Barclays │ KHU │ 49.92000000000007 │
8. │ 2024-03-08 09:58:27 │ BNP Paribas │ J.P. Morgan │ KHU │ 423.51 │
9. │ 2024-03-08 10:00:33 │ BNP Paribas │ Citi │ KHU │ 1004.3999999999992 │
10. │ 2024-03-08 10:06:19 │ BNP Paribas │ Morgan Stanley │ KHU │ 1023.4000000000005 │
└─────────────────────┴─────────────┴─────────────────┴────────┴────────────────────┘
10 rows in set. Elapsed: 0.073 sec. Processed 4.25 thousand rows, 969.35 KB (58.39 thousand rows/s., 13.32 MB/s.)
Peak memory usage: 6.13 MiB.
After that, we calculate the cumulative expected PnL and realized PnL of each bank, which we save in a ClickHouse view called fix.cumulative_profit_and_loss.
create or replace view
fix.cumulative_profit_and_loss
as select
Timestamp,
Sender as Bank,
ExpectedPnL,
if(isNotNull(RealizedPnL), RealizedPnL, 0) as RealizedPnL
from (
select
Timestamp,
Sender,
ExpectedPnL,
if(isNotNull(RealizedPnL), RealizedPnL, last_value(RealizedPnL) over (partition by Sender order by Timestamp rows between unbounded preceding and current row)) as RealizedPnL
from
(
select
Timestamp,
Sender,
sum(PnL) over (partition by Sender order by Timestamp rows between unbounded preceding and current row) as ExpectedPnL
from
fix.profit_and_loss
) a
left join (
select
Timestamp,
Sender,
sum(PnL) over (partition by Sender order by Timestamp rows between unbounded preceding and current row) as RealizedPnL
from
fix.profit_and_loss
where
BuyTradeStatus == 'Confirmed'
and
SellTradeStatus == 'Confirmed'
) b
on
a.Timestamp == b.Timestamp
and
a.Sender == b.Sender
)
order by Sender, Timestamp
SELECT *
FROM fix.cumulative_profit_and_loss
LIMIT 10
Query id: d57444ae-96ce-4675-92d8-cab6145566e5
┌───────────Timestamp─┬─Bank────────┬────────ExpectedPnL─┬────────RealizedPnL─┐
1. │ 2024-03-08 09:39:04 │ BNP Paribas │ 98.69999999999982 │ 0 │
2. │ 2024-03-08 09:39:37 │ BNP Paribas │ 1059.4499999999998 │ 960.75 │
3. │ 2024-03-08 09:40:03 │ BNP Paribas │ 2323.8499999999995 │ 2225.1499999999996 │
4. │ 2024-03-08 09:40:29 │ BNP Paribas │ 2583.0099999999993 │ 2225.1499999999996 │
5. │ 2024-03-08 09:40:38 │ BNP Paribas │ 3350.6099999999997 │ 2225.1499999999996 │
6. │ 2024-03-08 09:40:41 │ BNP Paribas │ 3874.5599999999986 │ 2225.1499999999996 │
7. │ 2024-03-08 09:41:10 │ BNP Paribas │ 5967.059999999999 │ 2225.1499999999996 │
8. │ 2024-03-08 09:41:33 │ BNP Paribas │ 6267.059999999999 │ 2525.1499999999996 │
9. │ 2024-03-08 09:41:38 │ BNP Paribas │ 7656.959999999999 │ 2525.1499999999996 │
10. │ 2024-03-08 09:42:08 │ BNP Paribas │ 7862.49 │ 2525.1499999999996 │
└─────────────────────┴─────────────┴────────────────────┴────────────────────┘
10 rows in set. Elapsed: 0.145 sec. Processed 8.50 thousand rows, 1.94 MB (58.58 thousand rows/s., 13.37 MB/s.)
Peak memory usage: 14.85 MiB.
We can now visualize the expected PnL against the realized PnL over time.