Fillies and mares receive a weight allowance when taking on male horses around the world. This is supposed to equalise opportunity and make the best races open to either sex more competitive. Does it do the supposed job? Should we even consider removing it for the best races, so as to reward the best athlete, rather than merely rewarding the best runner considering its sex? Let's see what the statistics say.
Table 1 contains the record of female horses in Group and Graded races across the world since January 1, 2011, when facing at least one male opponent, including geldings and ridglings:
Table 1: female horses in 5 years of Group/Graded races
Country | W (wins) | R (runs) | SR | IV |
Australia | 284 | 2894 | 10 | 1.13 |
New Zealand | 105 | 1400 | 8 | 0.95 |
Japan | 71 | 1125 | 6 | 0.97 |
France | 82 | 861 | 10 | 0.84 |
Britain | 53 | 554 | 10 | 0.95 |
Ireland | 43 | 336 | 13 | 0.89 |
Argentina | 32 | 270 | 12 | 1.31 |
Germany | 23 | 262 | 9 | 0.76 |
Chile | 12 | 182 | 7 | 0.79 |
Brazil | 12 | 142 | 8 | 0.78 |
Italy | 15 | 126 | 12 | 1.07 |
South Africa | 15 | 124 | 12 | 1.59 |
USA | 17 | 121 | 14 | 1.27 |
Peru | 12 | 105 | 11 | 1.06 |
UAE | 12 | 95 | 13 | 1.51 |
Canada | 5 | 34 | 15 | 1.29 |
Hong Kong | 0 | 32 | 0 | 0 |
Sweden | 0 | 12 | 0 | 0 |
Turkey | 1 | 8 | 12 | 1.01 |
Singapore | 0 | 7 | 0 | 0 |
Norway | 1 | 7 | 14 | 1.47 |
Qatar | 1 | 3 | 33 | 3.67 |
Denmark | 1 | 3 | 33 | 4.00 |
Saudi Arabia | 0 | 2 | 0 | 0 |
Let’s take a walk through the columns. ‘W’ is total winning horses; ‘R’ is total runners; ‘SR’ is Strike Rate (winners per 100 horses), and ‘IV’ is Impact Value (ratio of actual Strike Rate to expected Strike Rate considering size of the field).
Two statistics are worthy of note: R – the number of runners – varies tremendously across the world. Relative to the number of Group and Graded races staged, it is relatively common to see the sexes in competition in Australia and, particularly, in New Zealand. But in the USA, the belief that females are hugely disfavoured on dirt, plus the number of races restricted to females, makes it rare to see the sexes in competition.
IV – Impact Value – is a simple but useful metric in many situations. It measures the rate of winners for a category considering the proportion of runners representing that category. An IV of 1.00 is a win rate no more or less than the average in the size of fields encountered; for instance, a 10 percent strike rate in 10-runner fields. IV less than 1.00 is underperformance and IV greater than 1.00 overperformance; IV 2.00, for instance, represents a strike rate twice as high as random chance.
In this setting, IV greater than 1.00 for females implies an IV less than 1.00 for males, for it should be obvious that sex is a binary value. So, you can quickly see how females have done in each country; it seems as if the general belief that females find it hard to beat males in the US is unjustified from these numbers.
That last statement must be tempered, however, by an important caveat when dealing with statistics that are the result of observation and not experimentation. In the first case, samples of statistics are very likely to be biased. This means that females are not selected at random to take on males, but often because their connections believe they have a good chance of winning, or else why not just stick to their own sex?
Proceeding with caution, then, let us lump all the statistics in Table 1 together, to examine how females fare around the world in the aggregate. Consider Table 2a and 2b.
Table 2a: Category of sex and record, mixed Group/Graded races 2011-
Category | W (wins) | R (runs) | SR | IV |
Colt | 1074 | 9054 | 12 | 1.17 |
Horse | 3447 | 31034 | 11 | 1.11 |
Filly | 165 | 1648 | 10 | 1.08 |
Mare | 632 | 7057 | 9 | 0.99 |
Ridgling | 44 | 412 | 11 | 0.90 |
Gelding | 2135 | 25550 | 8 | 0.85 |
Table 2b: Binary-valued sex and record, mixed Group/Graded races 2011-
Sex | W (wins) | R (runs) | SR | IV |
Male | 6700 | 66050 | 10 | 1.00 |
Female | 797 | 8705 | 9 | 1.00 |
Tables 2a and 2b provide great support for the various schema of sex allowances (3lb to 5lb) around the world. Entire male horses have the highest IV, geldings and ridglings have the lowest IV and female horses are in the middle, with fillies (4yo and younger) doing better than mares (5yo+). All of this is no revelation.
When we combine the different categories of sex which are conditioned for in Table 2a, a nice result is forthcoming, expressed in Table 2b. Remembering we are dealing only with races contested by both male and female horses, the IV of both is 1.00. In other words, sex allowances around the world seem to do a really good job.
But, as thorough data scientists, we should not stop there. Instead, let’s take a step back and think about why female racehorses receive an allowance at all. Using our Group and Graded race data since January 1, 2011, let’s examine the distribution of the two sexes using Racing Post Ratings (RPR), a widely used benchmark of racing merit.
Figure 3: the distribution of merit
Figure 3 above shows the distribution of performances in Group races for male and female racehorses across the world, using the RPR scale. The horizontal axis is the scale of RPR and the vertical axis their relative frequency (expressed using the technical measure ‘probability density’). The green area represents the ‘mass’ of male racing talent and is transparent, allowing for the comparison with the pink area of females. That the green graph is shifted to the right is strong evidence that male horses are better than females, as presupposed. At least, on average.
And it is this last word which turns out to be important. The mean of a population (which is the proper terms for the metric we loosely refer to as ‘the average’) is a measure of its central tendency, and if we say that male horses are better than female horses ‘on average’ then this is most likely to be true when we pick an ‘average’ male and compare him with an ‘average’ female.
In the tails of the distribution – and specifically the right tail where horses with high RPR are to be found – it is much less likely that a given male will be better than a given female, because good horses of both sexes are rare. So, if we made every horse with RPR greater than 115 a ping-pong ball and had separate bags of females and males, there is no guarantee the male would be better, if we drew a ball from each bag at random.
Yet, in a race featuring elite performers of both sexes in competitions, females always get a weight allowance, just as if an average female is being pitted against an average male. This is a dangerous assumption and, as we have established, could easily be violated in real-world encounters.
Guess what? The statistical theory holds. Using the evidence of Table 2a and Table 2b, we previously found that males and females do equally well when matched in Group and Graded races. But a different picture emerges when we condition the results by the classification of the Group race, as Table 4 shows:
Table 4: Category of female horses, mixed Group/Graded races 2011-
Grade | W (wins) | R (runs) | SR | IV |
1 | 263 | 2775 | 9.5 | 1.17 |
2 | 168 | 1864 | 9.0 | 0.99 |
3 | 366 | 4066 | 9.0 | 0.93 |
Here is an important result! As we make races for better horses, and eventually reach the best on the planet, IV shows that female horses have a better record than males. Again, this is undoubtedly a function of ‘selection bias’ - owners and trainers only ask the best females to take on males – but there is clearly an opportunity there to exploit. And the sample-size of 2775 female runners hardly suggests that much selectivity is involved.
And, finally, in case you needed it confirmed, we are talking turf here, not dirt, as Table 5 confirms:
Table 5: how the surfaces compare
Surface | W (wins) | R (runs) | SR | IV |
Turf | 760 | 8281 | 9 | 1.02 |
Dirt | 33 | 382 | 9 | 0.88 |
Tapeta | 4 | 25 | 16 | 2.00 |
Polytrack | 0 | 14 | 0 | 0 |
Leaving the surface aside, however, one wonders at the sagacity of allowing females an allowance. It isn’t difficult to think of dominating victories in the Prix de l’Arc de Triomphe, for instance, as five of the last six winners have been female. This is somewhat misleading, as the brilliant Treve accounts for two of them, but did she need an allowance? Did last year’s winner Found?
As we have established, it isn’t possible to make a definitive case that the female weight allowance should be removed for Group 1 and Grade 1 races, but if you held a prior belief that this should be done for other, objective reasons, the data at least supports your view.