How To Build and Interpret a Nomogram for Setting Better Running Goals
Whether you're starting your fitness journey or planning your next running season, you will need to understand where you are, measure your progress and set running goals. I had to do it myself when I started running this summer, and I was lost.
I turned to my Data Science and Engineering background and built a tool called a Nomogram to assist me.
This guide provides a nonstatistical audience with a methodological approach for building, interpreting, and using nomograms to estimate running fitness and set difficult and specific goals. If you do not know what a Nomogram is, don't worry, I will explain it step by step in the rest of the article.
Brief Review of Goal Setting Theory and Discussion on Performance
Although this article deals with setting better goals, this is not a Goal Setting blog post. Setting goals is part of any self-improvement approach, and fitness or running is no exception.
When setting out to set your own goals, it's easy to get lost in the profusion of acronyms and fields in which goal setting is used, for example: Psychology (e.g. WOOP: Wish, Outcome, Obstacle, Plan), Self-help (e.g. SMART: Specific, Measurable, Achievable, Relevant, and Time-Bound) or Business (e.g. OKRs and KPIs: Objectives, Key Results, Key Performance Indicators).
Sometimes, goals are even set for us, by our employer, our doctor, our coach, our family, our friends, our insurance company. For example, my health insurance gives me a few basic fitness advice:
- "do 60k steps per week, use an app to track them"
- "climb stairs"
- "reduce stress (eustress vs distress)"
- "take breaks at work, move 2h out of 8h of your workday"
Although I'm no stranger to setting goals, I was lost when I started running this summer. Until I re-discovered the Locke theory of goal setting. In 1968, Edwin Locke published a paper called "Toward a Theory of Task Motivation and Incentives" in which he proposed that:
After controlling for ability, goals that are difficult to achieve and specific tend to increase performance far more than easy goals, no goals or telling people to do their best. It therefore follows that the simplest motivational explanation of why some individuals outperform others is that they have different goals.
The first part of this quote is key: "After controlling for ability". The verb control is used in its statistical sense, meaning that the effect of ability is removed from the equation. In other words, we all have different running fitness levels, and we need to control for that when setting goals.
The second part of the quote calls for a disclaimer: by following this approach, we bias ourselves towards performance. There are plenty of other motivations for running, and they are perfectly valid:
But if for the rest of this post we focus on performance, we also need to realise that performance is a result of many factors. For example, the blog post "Why are you so slow?" uses a statistical model to reveal that running speeds for a 200m dash is influenced by 5 factors of which the weakest link is the limiting factor. In other words, if you want to improve your 200m dash time, you need to improve your height, weight, fast- and slow-twitch muscle mass, cardiovascular conditioning, flexibility and elasticity. The research paper Factors associated with high-level endurance performance goes even further and lists 26 factors that influence endurance running performance. I will spare you the detail and leave only a figure from the paper that summarises the factors:
In don't know about you, but this is too many factors for it to be practical. So I started looking for a single numerical estimate of my running fitness that I could use to set goals. It should be easy to measure, easy to understand, and easy to compare to others. Most importantly, it should be tailored to my individual profile.
French Engineering and Nomograms
Nomograms are graphical calculating devices that look like a 2D diagram and that allows approximate computations. Nomograms are in particular used because of their ability to reduce statistical predictive models into a single numerical estimate, perfect for our use case!
The field of nomography was invented in 1884 by the French engineer Philbert Maurice d'Ocagne (1862-1938) and used extensively for many years to provide engineers with fast graphical calculations of complicated formulas to a practical precision.
Historically, they were used and developed in civil engineering. Place yourself if 1843, you are a civil engineer, and you need to calculate the volume of earth to be moved to allow for the construction of a road or a railway. You have a formula, but it's complicated and you don't have a computer to do the calculation for you. At that time, the French administration would have sent you a graphical table to help you with the calculation. Tables turned into nomograms, and just a few years later in 1846, Léon Lalanne, a French engineer from Ecole Polytechnique and Ecole des Ponts, published a nomogram called "Abaque ou compteur universel" in which he explains how to use a nomogram to do all sorts of calculations.
Later, in 1867, Eduard Lill, an Austrian engineer and Captain of Military Engineering,
published a nomogram to solve quadratic equations (
x2 + px + q = 0)
showing nomograms were not just a french affair.
Recently, nomograms have been used beyond civil engineering, especially in the field of electrical engineering (e.g. for resistors or inductance sizing), mechanical engineering (e.g. for gears dimensioning), and chemical engineering (e.g. for phase-transitions of materials). Today, they are mostly used for educational purposes, their practical usage being replaced by computers. Except for a few domains, e.g. cancer prognosis.
Being a french engineer, I have had the pleasure to study "Abaques" (the french word for Nomograms which would translate to Abacuses) in my time. I have in particular been influenced by the nomogram used in optical engineering for the Lensmaker's equation and level sets ("abaques de Pouchet" or "lignes de niveaux" in french)
Searching for a Single Numerical Estimate
At this point of my reasoning, equipped of Lock theory and Nomograms, I could summarise my requirements for the running nomogram as follows:
- professionals and amateurs share an axis: the graph lets you compare yourself to others and in particular, to professionals
- long and short running races share an axis: from sprinting to long endurance, athletes cna use their fitness in a wide range of distances, from 100m dash to 100km ultra trails
- leans on a single numerical estimate to summarise running fitness: a rating perhaps, out of 10 or 100, useful to compare myself to my former self or to others
At that point in my running journey, I had been exposed through my Garmin smart watch to the indicator called VO2max. VO2max is a measure of the maximum volume of oxygen that an athlete can use. It is a good indicator of cardiorespiratory fitness, and it is used by Garmin to estimate your running fitness. There are common protocols to estimate VO2max, such as the Cooper test or the Vameval test (particularly popular in France for football). The idea is to run as fast as you can for a given amount of time (e.g. 6 minutes), and to measure the distance covered. These protocols measure your maximal aerobic speed (MAS) which is related to VO2max. Those protocols have their own practicality and precision issues (e.g. like the fact that you need to know your pace upfront, which is a chicken and egg problem).
For my personal use case, VO2max started losing importance because my typical efforts (e.g. a 60min football game, a 2h bike tour, a 10km running race) are much longer than 6 minutes. I started to realise that other indicators were summarising my running fitness better. For example, I noticed that my average pace on a 60min Z2 jog was improving (cf A guide to heart rate training - Runner's World).
I later learned about Critical Speed (CS). Without going into the details, CS is a measure of the maximum speed that an athlete can sustain for a long period of time. It can replace MAS as a surrogate estimate of fitness. You can use the previous link to calculate it or this link. One of its added benefits is that it is very close to the second ventilatory threshold (SV2) which is otherwise costly and impractical to measure (you need lactate and ventilatory tests and a costly physiological assessment).
In particular, the 2020 paper Calculation of Critical Speed from Raw Training Data in Recreational Marathon Runners shows that CS can be calculated from a few personal time trials (e.g. 400m, 800m, 5km) and that it is a good predictor of marathon performance. Moreover, you can visualise the CS in a 2D space where the x-axis is the duration of effort in seconds and the y-axis is the average speed during the effort in km/h. This 2D space will form the basis for our nomogram in the next sections.
Athletics World Records and the Valencia Marathon
Here is the first version of our nomogram:
This blog post is not a dataviz tutorial, but let me just say that I built this visualisation with the python programming language and the Altair visualisation library. The code is available on Github.
I have added a few World Records for men and women, in a few typical disciplines: 1mile, 5km, 10km, Half Marathon and Marathon. Those dots answer the question: "what would a world record athlete do?". They also form the upper bound of our y-axis: above that line, no human has run faster. Note that this line may move up in time (meaning the World Records will improve) due to training optimisation, new technology (shoes), better doping drugs ...
Beyond World Records, it's interesting to look at major athletics events like the Valencia Marathon. Although marathons welcome amateurs, they need to define a lower limit, for logistical reasons and economical reasons (maintain the brand value of a Valencia Marathon Finisher). This is called the "sweeper car" or "broom wagon" or "voiture balai" in french.
the maximum official time for finishing the race being 5h:30:00, with this time limit not being exceeded under any circumstances.
The next interesting data point from the Valencia marathon are the so-called "starting waves". At the start of a running event, the organiser staggers the athletes in so-called waves of people that hopefully run at a similar pace. The main goal of waves is to limit the meandering needed to overtake a slower athlete, costing energy and time to the faster athlete overtaking. An indirect benefit of waves is that it gives us the organiser's perspective on what they think the distribution of runners will be (if we assume they tried to design waves of comparable amount of athletes).
The Valencia Marathon Trinidad Alfonso is planning to start the race in nine waves in order to improve the comfort and safety of all the runners, based on the order of the accredited times.
Finally, I've looked at the "sub-elite bib status" which is a special status given to athletes that have run a fast enough time in the past 3 years. They gain access to a special starting wave and a few other privileges.
Sub-elite bib status will apply to athletes who apply with times under 30:00 in a 10k race, 1h06:00 in a half marathon, or 2h20:00 in a marathon run in the last three years
Adding a X-axis grid
Here is the second version of our nomogram:
The idea is to divide the x-axis into disciplines that are relevant to running.
On the professional side, World Athletics divides disciplines like this:
- Sprints, Hurdles and Relays: 100m, 110mH, 200m, 300m, 400m, 400mH, 4x100m, 4x200m, 4x400m
- Middle Distances (Courses de demi-fond): 600m, 800m, 1000m, 1500m, Mile, 2000m
- Long Distances and Steeplechase (Courses de fond): 2000m SC, 3000m, 3000m SC, 2 Miles, 5000m, 10000m
- Road Running: 10 km, 15 km, 10 Miles, 20 km, HM, 25 km, 30 km, Marathon, 100 km
On the amateur side, most races organised in my area are 5km, 10km, HM and Marathon. Very little or none for other disciplines. Therefore, I have decided to use the following grid: no sprint, Mile (for Middle distances), 5km (for Long Distance), 10km, HM, Marathon for Road running. If we wanted to include sprinting, we could use a 400m line.
You might note a "distortion" of sorts for longer distances. To counteract this, we could use a logarithmic scale. In particular a Symmetric log scale (symlog), which is particularly useful for plotting data that varies over multiple orders of magnitude but includes zero-valued data, like in this variant:
In the rest of the blog post, I decide to keep the linear scale as it makes the reading of the x-axis easier and puts more emphasis on endurance disciplines as opposed to sprint disciplines.
Adding a Y-axis grid with VDOT
Here is the third version of our nomogram:
The idea is to divide the y-axis into levels that are relevant to running.
"The World's Best Coach" Jack Daniels has proposed a system called VDOT that allows to compare athletes of different levels.
In the 1970s, Daniels and his colleague, Jimmy Gilbert, examined the performances and known VO2max values of elite middle and long distance runners. Although the laboratory determined VO2max values of these runners may have been different, equally performing runners were assigned equal aerobic profiles. Daniels labeled these "pseudoVO2max" or "effective VO2max" values as VDOT values.
With the result of a recent competition, a runner can find his or her VDOT value using a VDOT calculator. This will allow them to determine an "equivalent performance" at a different race distance, as well as recommended training paces.
By looking at the code from the VDOT calculator, I was able to find the equation of curves of constant VDOT values. I then plot the "iso-VDOT" curves on the nomogram, using an interval of 5 points of VDOT. (Side note: VDOT defines levels internally from level 2 = VDOT 40 to level 9 = VDOT 85, equally spaced at 5 points apart which I've simply prolonged).
We can now interprete those curves. For example, Men world record athletes are above VDOT 80 while Women world record athletes have a VDOT of 75. It also seems that the 5km women world record is underperforming other women world records, which calls for a new world record at Paris 2024 maybe. The Valencia Broom Wagon has a VDOT of about 25, while sub-elite athletes seems to have a VDOT a little above 70.
IAAF points as an alternative to VDOT
World Athletics maintains a ranking of athletes based on their performance. They use a points system called IAAF points, similar to ATP points in Tennis. Here is an example of the World Rankings | Women's Marathon:
Do you know if you have scored your first IAAF point yet? Go to this IAAF Scoring Calculator and enter your time for a given distance.
Given that IAAF points are an official ranking, we could have plotted iso-IAAF curves on the monogram. But after trying that, I felt that this was not as clear as the VDOT curves. We can even show that the VDOT curves are a good approximation of the IAAF curves by plotting both on the same graph:
Setting Realistic Goals by looking at French Athletics Open Data
Now that we have a nomogram, we can use it to set a difficult and specific goal. Before that, we need to know what is a realistic performance. Turning to statistics, we can look at the distribution of performances of athletes in a given discipline.
Thankfully, the French Athletics Federation has an Open Data portal. We can crawl the data available at "Les Bilans" for a given discipline, and plot the distribution of performances. Here is the example of Men Half Marathon in 2023:
This looks like a log-normal distribution and we could certainly model this further and look at percentiles etc... However for this blog post, I will simply use a visual interpretation of a realistic performance. "Most people" seem to have a VDOT between 37 and 44. Therefore, aiming for a VDOT of 45 seems like a difficult enough goal as a beginner to get ahead of the masses without setting the bar too high and being unrealistic.
Plotting your past performances and your future goals
Here is the last stop on our nomogram journey:
Over the last few months, I have trained and raced a few times. I have added my past performances to the nomogram in orange. I have also added my future goals in blue. I will run my first Half Marathon in Berlin in April 2023, hoping to finish under 1h40, giving me a VDOT of 45.
When you think about it, finishing a first Half-Marathon in under 1h40 is ambitious, but looking at the data here, I think it's a good goal: difficult and specific. I seem to already have a VDOT above 45 (although on a 1 mile distance - the shortest), and I have a few months to train and improve my fitness further, specifically working on longer distance runs and making sure my VDOT score translates for bigger distances.
This concludes my guide on how to build and interpret a nomogram to set better running goals. No matter your level in running, exercise physiology or statistics, I hope you have found something of value in this article. Feel free to use the nomogram I have built for your own goal setting, and let me know if you have any feedback.
- Goal setting - Wikipedia
- control - Wiktionary, the free dictionary
- Why are you so slow? - Probably Overthinking It
- Factors associated with high-level endurance performance: An expert consensus derived via the Delphi technique
- Nomogram - Wikipedia
- Ecole des Ponts
- Military engineering - Wikipedia
- How To Build and Interpret a Nomogram for Cancer Prognosis | Journal of Clinical Oncology
- Lens - Wikipedia
- Level set - Wikipedia
- A guide to heart rate training - Runner's World
- Critical Speed Calculator for Runners - Upside Strength
- Calculateur de vitesse critique - Remi Rivet
- Calculation of Critical Speed from Raw Training Data in Recreational Marathon Runners - PMC
- Vega-Altair: Declarative Visualization in Python — Vega-Altair 5.2.0 documentation
- louisguitton/critical-speed-calculator: A better VDOT calculator, to estimate a runner's critical speed, get their training zones, and compare them to prototypical runners.
- Regulations 42K 2024 · Valencia Ciudad del Running
- Jack Daniels (coach) - Wikipedia
- V.O2 Running Calculator
- World Rankings | Women's Marathon:
- IAAF Scoring Calculator
- GlaivePro/IaafPoints: PHP library to calculate IAAF scoring points of athletics and IAAF scoring points for combined events.
- statistics - How to calculate IAAF points? - Sports Stack Exchange
- Base de Données - Fédération Française d'Athlétisme