Performance Measurement: Returns
Part II: Measuring Returns
This blog is about how to process investment performance using SQL and a database. In the first part, we set up a test database with seven tables, and populated the tables with some test data. In this second part, I will build some views in the database to calculate portfolio returns with daily time series of market values and cash flows. These returns will be presented in both absolute (currency) and relative (percentage) terms, and on all portfolio structure levels for all structures defined. But first, I'll set up an indexed view on portfolio structures.
All the code for the three views can be found in my GitHub repository. As the SQL statements are at times quite long, they are not necessarily copied here in full, so I recommend GitHub for those interested.
Indexed View Sometimes Helps
The way I have set up the Structure table is the following:
select * from PerfTest.Structure
The table is simple; each structure has a ShortName, a LongName and a LevelCnt (a computed column, if you recall from Part I), besides the actual four Levels. While it is intuitive that this is how the structure should work - structure 1, for example, describes an investment process where the decision making tree starts from AssetClasses and ends with Instruments - it is not natural for an SQL database to connect the text in a column (e.g. "Sector") to a column Sector in table Instrument. Therefore, picking the right property of an instrument to divide the portfolio holdings according to Level1 must be done using a case expression.
create view PerfTest.vStructureInstrument with schemabinding as
select s.StructureIK, i.InstrumentIK, case isnull(s.Level1, '#xyz#') when '#xyz#' then null when 'AssetClass' then i.AssetClass when 'Region' then i.Region when 'Country' then i.Country when 'Currency' then i.Currency when 'Sector' then i.Sector when 'Instrument' then i.ShortName else '[data missing]' end as Level1 -- Level2, Level3, and Level4 in a similar way from PerfTest.Instrument i cross join PerfTest.Structure s;
Note how I am picking the correct Instrument table column value depending on the text in Structure table (levels 2 to 4 are included in GitHub code only, within PerfTest tables and data.sql).
The select statement above simply combines all instruments with all structures, and picks the property defined in the structure for each level and instrument. The first line creates a view, which is basically a stored select statement. Note that the view is created with schemabinding. What this means is that the view is bound to the schema of the underlying objects (here, the two tables used) and, hence, those objects cannot be modified in any way that might affect this view (their structure cannot be changed - but their data still can). If you wanted to change the definition of those two tables, you would probably first have to drop this view, then perform the changes, and then re-create the view.
Views do not need to be created with schemabinding, but that enables us to create an index on the view. Views without an index are treated like queries and their results are not stored anywhere. For example, querying this view (with no index) would always access the underlying tables. In addition, due to the rather complex case statement, looking for any specific item in any of the levels would, in practice, build the view first by scanning all instruments and structures, and only then picking the correct ones.
In contrast, if we add a unique clustered index on the view, it becomes materialized - i.e. it becomes like a table, with data stored on disk. The difference to a table is that the data in an indexed view are automatically updated, based on the view's definition, when data in the underlying tables are updated. Let's define a unique clustered index and, for even quicker access, indexes on the InstrumentIK and Level columns:
create unique clustered index vStructureInstrument_UQC on PerfTest.vStructureInstrument (StructureIK asc, InstrumentIK asc); go create index vStructureInstrument_IX#InstrumentIK on PerfTest.vStructureInstrument (InstrumentIK asc); create index vStructureInstrument_IX#Level1 on PerfTest.vStructureInstrument (Level1 asc); create index vStructureInstrument_IX#Level2 on PerfTest.vStructureInstrument (Level2 asc); create index vStructureInstrument_IX#Level3 on PerfTest.vStructureInstrument (Level3 asc); create index vStructureInstrument_IX#Level4 on PerfTest.vStructureInstrument (Level4 asc);
Now, running the following two queries
select StructureIK, InstrumentIK from PerfTest.vStructureInstrument where Level3 = 'Consumer'; select StructureIK, InstrumentIK from PerfTest.vStructureInstrument with (noexpand) where Level3 = 'Consumer';
results in two identical data sets with 42 rows each, giving you structures 1 and 2 (where Level3 = "Sector") and the 21 instruments where Sector = "Consumer". The only difference is the with (noexpand) hint after the view name in the second query. This hint forces SQL Server to use the indexed view instead of the tables (the hint should not be necessary if running SQL Server Enterprise Edition - the big and expensive one - but, apparently, still may have some benefits).
The real difference then is how the data are fetched. If I look at how SQL Server Express Edition gets the 42 rows required, in the first query it scans 171 rows (3 structures x 57 instruments - i.e. all of them). In the second, it can perform an index seek on the index vStructureInstrument_IX#Level3 we created, and get the 42 rows directly from there. Much faster. And much, much faster when the number of structures and instruments grows!
Always a Downside
Indexed views are not free, though. As the data becomes table-like, it will consume disk space just like a table. Furthermore, all data changes to the underlying tables must be processed to this indexed view as well. For this reason, indexed views are not recommended if the data in the underlying tables changes often or a lot. Fortunately, both of our underlying tables are rather stable. We'll test changing the underlying data a little later.
Sometimes, it is better to keep tighter control on updating data that might be stored as an indexed view. One good alternative is to use an ordinary table for the data, and a stored procedure to keep it updated. This gives more control on how and when data are updated, but of course the downside is that those updates need to be carefully built and scheduled.
Holdings to Start With
Of course, for performance calculations, we need to know what we have in the portfolio and what it is worth. In addition, we need to know the cash flows i.e. the payments associated with those holdings, be they related to trading, dividends, exchanging currency etc.
For that purpose, I have built the view vMarketValueAndCashFlow. This view is built on three nested select statements, and always calculates the holdings by adding up all transactions until the date of the holding. Hence, it is not really suitable for large amounts of data, in which case these data should be stored in a table and a stored procedure used to update only the data affected by changes in underlying transactions, prices, or fx rates. However, with our very limited testing data set this is still fast enough.
The first select statement simply combines the instrument transactions and their effect on cash instruments. For example, if we look at one transaction in the Transact table - say TransactIK = 94, its Comment tells us what it is: Buy 500 Beider @ 154.00 EUR. The CashInstrumentIK = 51, which is instrument "EUR Cash", and the cost of the buy is CurrencyValue = 77,000, Currency = EUR. The first part of the select picks what happens to Beider stock, and the second part how that affects our cash account (the two result sets are combined using a union all):
with trans as ( -- portfolio instrument transactions select PortfolioIK, InstrumentIK, TradeDate, Currency, Units, CurrencyValue from PerfTest.Transact -- add transaction based changes on cash accounts. Use the negative CurrencyValue as change in units & value (note: assumes transaction currency = cash instrument currency). union all select PortfolioIK, CashInstrumentIK as InstrumentIK, TradeDate, Currency, -CurrencyValue as Units, -CurrencyValue as CurrencyValue from PerfTest.Transact where CashInstrumentIK is not null ),
The Beider buy transaction would result in two trans rows depicting the changes in
- Beider stock: quantity: +500 units, value 77,000 EUR
- EUR Cash account: quantity: -77,000 units, value -77,000 EUR
The next select builds on these data by using the Calendar table to get all dates and summing up the trans (name of the query above) up to each date to get that date's holding and cash flow. Note that the trade Currency is part of the group by statement - if you had another Beider trade settled in USD, your holdings would have two separate Beiders. It is basically assumed here that transactions usually use the currency of the instrument - if not, it should not cause any errors, but analysis of the results will not be as easy.
units as ( -- sum up all transactions above until each ToDate select t.PortfolioIK, t.InstrumentIK, t.Currency, c.ToDate, c.PrevToDate, sum(t.Units) as Units, -- summing up transaction where TradeDate > calendar PrevToDate (but <= calendar ToDate, for each calendar ToDate) -- will also include transactions where TradeDate is not a calendar date sum(case when t.TradeDate > c.PrevToDate or c.PrevToDate is null then t.CurrencyValue else 0 end) as CashFlowTC from PerfTest.Calendar c inner join trans t on t.TradeDate <= c.TODATE group by t.PortfolioIK, t.InstrumentIK, t.Currency, c.ToDate, c.PrevToDate having -- include data only if there are units or a cashflow sum(t.Units) <> 0 or sum(case when t.TradeDate > c.PrevToDate or c.PrevToDate is null then t.CurrencyValue else 0 end) <> 0 )
See that the inner join is based on t.TradeDate <= c.TODATE, so we only add up the Units from trans up to each calendar date. CashFlowTC (TC for Transaction Currency), in contrast, only sums up the CurrencyValues after the preceding calendar date, so that the cash flow represents only that date. Finally, I filter out data where there are neither units nor cash flow.
Value the Value
Now, having the number of units and cash flow in instrument currency, we need to add valuation i.e. prices and fx rates. Here, I am assuming these can be found for all instruments and all calendar dates; in practice, some method of dealing with data gaps would probably be needed. Also, I am assuming a "base" currency of EUR, always using it as the fx rate base - four (possibly but not necessarily different) fx rates are required here. Adding also data on Portfolio and Instrument to units defined above:
select u.PortfolioIK, por.ShortName as PortfolioShortName, por.Currency as PortfolioCurrency, u.InstrumentIK, i.ShortName, i.LongName, i.AssetClass, i.Region, i.Country, i.Currency, i.Sector, u.ToDate, u.Units, u.Currency as TransCurrency, p.PriceCurrency, p.Price, v.FXRatePC, v.MarketValueIC, v.CashFlowTC, case when lag(u.ToDate, 1) over (partition by u.PortfolioIK, u.InstrumentIK, u.Currency order by u.ToDate) = u.PrevToDate then lag(v.MarketValueIC, 1, 0.00) over (partition by u.PortfolioIK, u.InstrumentIK, u.Currency order by u.ToDate) else 0.00 end as lag_MarketValueIC, v.MarketValuePC, v.CashFlowPC, case when lag(u.ToDate, 1) over (partition by u.PortfolioIK, u.InstrumentIK, u.Currency order by u.ToDate) = u.PrevToDate then lag(v.MarketValuePC, 1, 0.00) over (partition by u.PortfolioIK, u.InstrumentIK, u.Currency order by u.ToDate) else 0.00 end as lag_MarketValuePC from PerfTest.Portfolio por inner join units u on u.PortfolioIK = por.PortfolioIK inner join PerfTest.Instrument i on i.InstrumentIK = u.InstrumentIK left outer join PerfTest.Price p on p.InstrumentIK = u.InstrumentIK and p.PriceDate = u.ToDate -- price FX rate to EUR left outer join PerfTest.FXRate fxp on fxp.Currency = 'EUR' and fxp.PriceCurrency = p.PriceCurrency and fxp.PriceDate = u.ToDate and p.PriceCurrency <> 'EUR' -- transaction (cashflow) FX rate to EUR left outer join PerfTest.FXRate fxt on fxt.Currency = 'EUR' and fxt.PriceCurrency = u.Currency and fxt.PriceDate = u.ToDate and fxt.PriceCurrency <> 'EUR' -- instrument FX rate to EUR left outer join PerfTest.FXRate fxi on fxi.Currency = 'EUR' and fxi.PriceCurrency = i.Currency and fxi.PriceDate = u.ToDate and i.Currency <> 'EUR' -- portfolio FX rate to EUR left outer join PerfTest.FXRate fxpor on fxpor.Currency = 'EUR' and fxpor.PriceCurrency = por.Currency and fxpor.PriceDate = u.ToDate and por.Currency <> 'EUR' cross apply ( -- using cross apply to simplify fx rate handling, as for base currency EUR there are no quotes in data select case when p.PriceCurrency = 'EUR' then 1.00000000 else fxp.FXRate end as FXRatePrice, case when u.Currency = 'EUR' then 1.00000000 else fxt.FXRate end as FXRateTrans, case when i.Currency = 'EUR' then 1.00000000 else fxi.FXRate end as FXRateInstrument, case when por.Currency = 'EUR' then 1.00000000 else fxpor.FXRate end as FXRatePortfolio ) fx cross apply ( -- using another cross apply to calculate market values and cash flows in IC and PC -- converting precision to "currency" i.e. two decimals - a convention select convert(numeric(19,8), fx.FXRatePrice / fx.FXRatePortfolio) as FXRatePC, convert(numeric(19,2), u.Units * p.Price / (fx.FXRatePrice / fx.FXRateInstrument)) as MarketValueIC, convert(numeric(19,2), u.CashFlowTC / (fx.FXRateTrans / fx.FXRateInstrument)) as CashFlowIC, convert(numeric(19,2), u.Units * p.Price / (fx.FXRatePrice / fx.FXRatePortfolio)) as MarketValuePC, convert(numeric(19,2), u.CashFlowTC / (fx.FXRateTrans / fx.FXRatePortfolio)) as CashFlowPC ) v;
I am using cross apply operators as a handy way of calculating something from the data in the joined tables. The first cross apply simply processes the fx rates, in case currency is EUR and no rate is therefore found in the data. For the same reason, the joins to FXRate are left outer joins to allow for data not found there. The second cross apply uses these results when converting the market value (Units x Price) and cash flow from transaction and price currencies to instrument and portfolio currencies. This way, I can enter the formulas for these calculations just once instead of copying them for each use case.
Besides the market value and cash flow, for return calculations, we need the previous day's market value. To get that I have used the window function lag. Window functions "scan" the defined subset of data returned by the query (subset defined by partition by here) and can access the other rows. Lag simply picks one of the preceding rows, the closest one in this case, with order defined by the order by part. To make absolutely sure that I am getting the previous day's market value, and not some distant historical market value in case of a buy when there were no holdings yesterday, I compare the lagged ToDate to Calendar PrevToDate before accepting the lagged value. While this may sound a bit complicated, it is usually more efficient than adding another join to the data to get the previous day values.
The results look like this, when picking just one date and four instruments (click on the image to enlarge it):
select * from PerfTest.vMarketValueAndCashFlow where PortfolioIK = 1 and InstrumentIK in (8,23,34,45) and ToDate = '2023-10-16'
Structural Development
Now, with daily instrument level holdings, market values and cash flows, we need to combine that data with the view vStructureInstrument defined earlier to get total portfolio, sector, etc. data as well. Let's add a third view called vPerformanceTimeSeries. This rather lengthy select statement defines struct based on the two existing views, adding some data from table Structure:
with struct as ( -- calculate sums for each required level of structure select mv.PortfolioIK, max(mv.PortfolioShortName) as PortfolioShortName, -- use of max() here is simply to make grouping by more convenient. Always only one value per PortfolioIK. max(mv.PortfolioCurrency) as PortfolioCurrency, s.StructureIK, s.LevelCnt, case when grouping(si.Level1) = 1 then 0 when grouping(si.Level2) = 1 then 1 when grouping(si.Level3) = 1 then 2 when grouping(si.Level4) = 1 then 3 else 4 end as LevelNr, max(s.Level1) as Level1Name, -- same here, only one value per StructureIK si.Level1, max(s.Level2) as Level2Name, si.Level2, max(s.Level3) as Level3Name, si.Level3, max(s.Level4) as Level4Name, si.Level4, mv.Todate, count(mv.InstrumentIK) as InstrumentCount, -- not used, just for info sum(mv.MarketValuePC) as MarketValuePC, sum(mv.CashFlowPC) as CashFlowPC, sum(mv.lag_MarketValuePC) as lag_MarketValuePC from PerfTest.vMarketValueAndCashFlow mv inner join PerfTest.vStructureInstrument si with (noexpand) on si.InstrumentIK = mv.InstrumentIK inner join PerfTest.Structure s on s.StructureIK = si.StructureIK group by grouping sets ( -- use separate grouping sets for total portfolio and each of the structure levels to get all market values and cash flows summed up accordingly ( mv.PortfolioIK, s.StructureIK, mv.Todate, s.LevelCnt ), ( mv.PortfolioIK, s.StructureIK, mv.Todate, s.LevelCnt, si.Level1 ), ( mv.PortfolioIK, s.StructureIK, mv.Todate, s.LevelCnt, si.Level1, si.Level2 ), ( mv.PortfolioIK, s.StructureIK, mv.Todate, s.LevelCnt, si.Level1, si.Level2, si.Level3 ), ( mv.PortfolioIK, s.StructureIK, mv.Todate, s.LevelCnt, si.Level1, si.Level2, si.Level3, si.Level4 ) ) having -- filter out grouping sets not needed (e.g. where structure stops at level 3, do not include level 4) -- if the level is grouped (i.e. grouping() = 1) or the level is active in structure (LevelCnt is at least this level), then show data -- level 1 is always included (grouping(si.Level2) = 1 or s.LevelCnt >= 2) and (grouping(si.Level3) = 1 or s.LevelCnt >= 3) and (grouping(si.Level4) = 1 or s.LevelCnt >= 4) )
Here, I am summing up the portfolio currency (PC) market value, cash flow, and lagged market value. Summing up instrument currency (IC) values is, obviously, not wise as the sum might include different currencies in it. The group by grouping sets is used to get sums for all structure levels. The first grouping set sums all instruments to portfolio level, and the following four to levels 1, 2, 3, and 4, respectively. The grouping() function can tell us if a particular column was used in grouping data or not - enabling the definition of LevelNr and also filtering out unnecessary data in case the structure has less levels than four (in the having clause). Note the use of with (noexpand) hint when joining to the indexed view.
What this gives us is the following, limiting to one portfolio and structure, and just the top and first level of that structure, and two dates:
select LevelNr, Level1, Todate, InstrumentCount, MarketValuePC, CashFlowPC, lag_MarketValuePC from PerfTest.vPerformanceTimeSeries where PortfolioIK = 1 and StructureIK = 1 and LevelNr <= 1 and ToDate in ('2023-10-16','2023-10-17') order by LevelNr, Level1, Todate
Now, you can see how instrument and and the values are calculated for all structure levels. The level 0 (i.e. total portfolio) has 49 instruments and about €13.6 M of market value on the 16th October. It is then divided to (or, actually, consists of) two (€5.8 M) bond instruments, four (€0.8 M) cash instruments, and 43 (€7.0 M) equity instruments. Transactions on the 17th October add one equity instrument, with a cost of €80,000, which shows in CashFlowPC for both Equity and Cash.
Finally, some returns
Based on the above, I just add some return calculations using struct and a cross apply:
select s.*, a.* from struct s cross apply ( -- calculate returns select -- simple return in currency s.MarketValuePC - s.CashFlowPC - s.lag_MarketValuePC as ReturnPC, -- return percentages - normally assume cashflow happens at end of day, but if cashflow is very large relative to market value --> assume start of day (e.g. opening a position) -- if market value is negative either end of day or start of day, do not try to present a % return (output: null) -- convert to float for maximum accuracy, calculate both a "normal" percentage return and a natural logarithm percentage return case when s.lag_MarketValuePC < 0 or s.MarketValuePC < 0 then cast(null as float) when s.CashFlowPC / nullif(s.MarketValuePC, 0) > 0.8 then convert(float, s.MarketValuePC) / nullif(s.lag_MarketValuePC + s.CashFlowPC, 0) - 1 else convert(float, s.MarketValuePC - s.CashFlowPC) / nullif(s.lag_MarketValuePC, 0) - 1 end as ReturnPCPerc, case when s.lag_MarketValuePC < 0 or s.MarketValuePC < 0 then cast(null as float) when s.CashFlowPC / nullif(s.MarketValuePC, 0) > 0.8 then log(convert(float, s.MarketValuePC) / nullif(s.lag_MarketValuePC + s.CashFlowPC, 0)) else log(convert(float, nullif(s.MarketValuePC - s.CashFlowPC, 0)) / nullif(s.lag_MarketValuePC, 0))
end as ReturnPCLog ) a
The absolute return in portfolio currency, in euros in this case, is simply the market value adjusted for cash flows less previous day's market value. I have calculated two relative (percentage) returns; the "normal" arithmetic one, which is, in most cases, the absolute return relative to previous day's market value minus one, and the continuous / logarithmic one, which is the natural logarithm of the same but without subtracting one. This daily percentage return calculation implicitly assumes the cash flows happen at the end of the day. Of course, this is not usually accurate, cash flows happen during the day. Therefore, the cash flow may differ significantly from the valuation of the asset at the end of the day. Also, when an asset with no current holdings is bought, this formula cannot calculate any percentage return as previous day's market value is zero.
Sometimes you can get very strange results, when cash flow is very large compared to the previous day market value. Consider the following: You own 200 shares of a penny stock priced at €0.05. The position is worth €10.00 and you think it's too cheap and manage to buy 2,000 shares more at €0.05, costing you €100.00. The stock indeed rises to €0.06 by the end of the day, a rise of €0.01 / €0.05 = 20 %. Your return calculation, however, is the following:
2,200 x €0.06 = €132.00 (market value)
2,000 x €0.05 = €100.00 (cash flow)
200 x €0.05 = €10.00 (previous day market value)
€132.00 - €100.00 - €10.00 = €22.00 (return in €)
€22.00 / €10.00 = 220 % (return in percentage)
If you had, however, valued the position at the time of cash flow (which we, in theory, aim for), your return would be 0 % at the time of the trade, and 20 % at the end of the day. This is why I have defined the relative returns so that the cash flow is assumed to happen at the beginning of the day, if it is greater than 80 % of market value (80 % being really quite arbitrarily chosen). Then, instead of subtracting it from today's market value, I am adding it to previous day's market value - here the calculation would be 132 / (10 + 100) - 1 = 20 %. While this does not eliminate all possible strange looking results, it certainly helps.
Also, to avoid totally weird relative returns, I do not calculate them at all if either market value is negative. This is often the case for derivatives, for example. And notice the convert(float, ...) use? That's because, unlike MS Excel or most calculators, SQL Server calculates using the precision of the data types used, and without an explicit conversion to a high-precision data type, rounding errors can be big. Try the difference between select 1 / 2 and select 1.0 / 2.
Pivoting Back to the Indexed View
Finally, if you wanted to see how many items there are for each level in each structure on a given day and for a given portfolio, you could use the following query:
select p.* from ( select s.StructureIK, concat(s.Level1, ' - ', s.Level2, ' - ', s.Level3, ' - ', s.Level4) as Levels, p.LevelNr from PerfTest.vPerformanceTimeSeries p inner join PerfTest.Structure s on s.StructureIK = p.StructureIK where p.PortfolioIK = 1 and p.ToDate = '2023-10-16' ) a pivot ( count(LevelNr) for LevelNr in ([0], [1], [2], [3], [4]) ) p;
Here, in the inner select, I am just picking the LevelNr data for PortfolioIK = 1 on October 16th from the view, added with the level names for each structure from table Structure. Then, I am using the pivot operator to group and count the data for all the levels (from 0 to 4). This gives me this nice summary:
With that in mind, what's going on with structure 2? There are four currencies (level 1) divided to four regions (level 2). There must be only one region per each currency. Actually, structure 2 is not correctly set up. It defines the first decision making level as Currency, the second as Region, and the third as Sector. But, at least for most currencies, the geographic region is not really a sensible decision making level any more - think of CHF, EUR, or GBP - surely the region is usually Europe. For our test data it sure is:
select Currency, ShortName, Region from PerfTest.Instrument
where Currency in ('CHF','EUR','GBP')
Let's see what happens if we change the structure:
update PerfTest.Structure set Level2 = 'AssetClass' where StructureIK = 2;
Now remember the indexed view? It should have automatically updated the data, which should then be reflected in the results we get from the query select p.* ... above. And, indeed, if your update statement was correctly processed, structure 2 now has different results:
Instead of Region, the Level2 indeed is now AssetClass and its count 9, making more sense.
Next Steps
In the next part, I will build functions to combine portfolio performance data with that of a benchmark, and to calculate daily performance contribution and Brinson-Fachler attribution from the vPerformanceTimeSeries view defined in this part.




Comments
Post a Comment