Some Fitness tests [CrossFit Sports Science]

09 May

I recently had the opportunity to review four fitness tests as part of my university studies, and thought you may be interested in some of the assumptions behind theses test.

I’m sorry for the dull writing style.

The lack of  scientific credibility behind certain tests is worrying, but you will be able to see why we don’t get you to sit on a bike and cycle slowly, or prod you with fat calipers, and why we quite like the RAST and MSST tests.

The Multistage Fitness test

The multi stage fitness test, also known as the bleep test and MSST, was designed by Leger and Lambert,( 1982) to assess the VO2 max of participants based on performance in a continuous, pace regulated, shuttle run between two points 20 m apart.  The subjects are health checked and  informed consent given (ACSM, 2005). The participant’s are lined up between two lines 20 m apart and are asked to reach the opposite line before a bleep sounds. The bleeps increase at  preset stages and control the pace of the work (Brewer, 2002). The test is maximal in that it requires the participants to work to the point of exhaustion. Participants are withdrawn from the test when they fail, on two consecutive attempts, to complete the shuttle within the time allowed

The test is generally accepted as valid and reliable to the extent that is being considered by government as a standard test  for schools to assess the fitness of school children (CMO, 2009).

In  pilot  trials 92 participants were  tested  against V02 max assessed via  retro-extrapolation (Leger and Lambert, 1982), although only  25 performed the run twice.  However, Ramsbottom, et al., (1988) re-validated the test by direct  measurement of  the  VO2max of  74 volunteers with a correlation of  0.92. This can be seen as reliable  a study needs more than  50 study participants, although 3 trials  are preferred (Hopkins, 2000) However, Ramsbottom, et al., (1988)  achieved  this  correlation  among a  homogeneous populations. This further validates the test as it is easier to improve correlations by using a heterogeneous population ( Atkinson and Nevill, 2001, Bonen et al., 1979)

Within the fitness and coaching community there are those who distrust VO2 max measurement for sporting use. Noakes, et al., (1990 et al) found VO2 max to be  a poor predictor of race times: others can be  confused by the abstract  nature of the VO2max figure. In the mSST, the levels achieved in the  test can form a fitness currency and for that matter the British  marines and police force set bleep test standards (Brewer, 2002).

As the levels achieved can also be related to a running velocity. the test  is further supported by  McLaughlin, et al., (2010),  who established that the velocity at vo2max  correlated well with  distance running and a “classic” endurance model, taking into account VO2max, %vo2max, lactate threhold and running economy. The MSST   correlates well with  5k race times (Ramsbottom, et al., 1988)  and 10k times  (Paliczka, et al., 1987).

The test is ideal for team training where the sport is running based, as  tests need to be sports specific (Lemmink, 2004). Many people can be tested at the same time with minimal equipment,  and the subjects require motivation  and encouragement which  team members could  supply . The test has been validated  for both sexes as individuals  or in groups.(Leger and Lambert, 1980)

For non athletes or special populations this test does carry very public  connotations of success  and failure. It is a maximal test so unsuitable for many special populations (ACSM, 2005), and can be subject to audibility.

Safety in training is a consistent concern . Gardner (2002) identifies the major cause of exercise related deaths in the US military to be related to atherosclerotic coronary artery disease, and the failure of screening procedures to exclude those suffering from ACAD. The increasing age of the participants was also flagged . Gardner (2002) suggests that vigorous exercise tests need to be conducted where immediate advanced life support measures are available. But Babraj ,et al., (2009) shows high intensity can be used  even with medical populations.

As this is a maximal test, it can have  a developmental training  effect, which is an extra when training time is scarce. The test supplies an easily recordable benchmark.

Apart from direct measurement, the Cooper run test, which is the amount of distance covered in a 12 minute run seems the most viable alternative.However, the  Cooper  test is maximal from the start and has been criticised by Williamson and Hamley (1984)  as it relies on motivation and self pacing skills and the results could be partly attributed to  anaerobic systems. it calls for a bigger running area, which could mean it needs to be staged out doors and could be  subject to the weather. Nevertheless it has an athletic component resulting in a real world effort. The suggestion being that both these tests are more suitable for athletic populations.

Astrand Rhyming Cycle Test

Astrand Rhyming  cycle test  is a  sub-maximal  V02 max test that relates mechanical work to a steady state heart rate,

The test requires a cycle ergometer with the ability to apply  loads of 150,100 and 75W at 50  turns of the flywheel per minute: the aim of the test is to reach a steady state heart rate.

In the  5th and 6th minutes   of the test, heart rates are  taken ( either by palpitation or monitor) and compared. If they do not vary  by more than 5 bpm within the range of  130 and 170 bpm, the test is successfully terminated and the rate recorded. If the heart rate is less than 130 bpm, the load is increased and the test continued until the heart rate reaches this level.. If the heart rates differs by more than 5 bpm, the test is  continued until this criterion was met  (Clink and Thomas, 1981) however the ACSM (2005) suggest averaging the 5/6th minute scores. Numerous other protocols exist. The results are compared against a standard nonogram to ascertain a theoretical VO2 max figure.

The use of this test by the RAF indicates its bias towards testing sedentary populations .The test removes pacing and motivation issues and anaerobic contributions and as it   can give privacy, and removes the connotation of athleticism , it can be seen as a valid health appraisal  that could result in the desire for health improvements from the subject (WILLIAMSON and HAMLEY, 1984.) It is ideal for special, sedentary, and older  populations (Astrand & Rhyming, 1954) The latter limitation is crucial in that older adults demonstrate the highest prevalence of cardiovascular and other chronic diseases. It is not practical or safe to maximally test certain populations (Tanaka, et al., 2001) .

It is worth considering that VO2 max was originally adopted as a reference standard for cardio-respiratory fitness in those struggling to research cardiovascular illnesses (Shepherd, et al., 1968) rather than an athletic marker. However, the original experiment was to aid in the selection of military recruits, and athlete development hence  The test was  originally validated on 18 to 30 years olds (Astrand and Rhyming, 1954)

According to Heywood (2006), sub-maximal exercise protocols assume a steady state heart rate at every exercise intensity, and a linear relationship between heart rate, oxygen use and work load:  This may be so  at lower levels, but the relationship becomes  curvilinear at higher levels of work .This test in particular  assumes equal mechanical cycling efficiency so overestimates  VO2max for highly trained and under estimates the untrained.  It also assumes that maximum heart rates  are  equal when they can easily vary by  11 bpm. Tanaka, et al., (2001) who found that most maximal heart rate assumptions are in correct. as did Clink and Thomas , (1981) whom suggested that validation by maximal methods  were inappropriate as heart rate responses vary in maximal work. Under estimations of VO2 max ( compared to treadmill tested) has ranged from 5 to 25%  and has been  variously attributed to habitual activity, physical conditioning and leg strength (ACSM, 2005)

Whilst there is an overwhelming concession that sub-maximal tests may carry, sometimes substantial, error, the method is seen as  cheap, easy, quick, safe and  ideal for special and sedentary populations (Astrand & Rhyming, 1954) however, once it is established that the test is to be  targeted at sedentary/at risk populations, more variables present themselves to effect heart rate such as  smoking, caffeine, time since last meal heat and hydration effect results.( ACSM, 2005). Diabetes and various  medications can alter heart rate responses.(ACSM, 2005)

A frequently suggested alternative could be the step  test, as the cycle test was originally compared  with such a  test (Astrand & Rhyming, 1954).  Such a substitution would have a mathematical  similarity with the cycle test, in that it relates mechanical work to heart rate.  however, various walk tests exist which can be tailored to the individual at hand. The Rockport walk test is popular, and has a real world application. The test can be stepped down to  a 6 minute walk test that is capable of being used  with patients who have limited short term survival (ACSM, 2005)

RAST test

The running based anaerobic sprint test  (RAST)(Draper and Whyte, 1996) is used as a test of anaerobic running power.  A 35 meter running area is marked off with an adequate over run at each end. The subjects are  health checked and  informed consent given (ACSM, 2005) is briefed on the test then asked to sprint the distance as fast as possible. the time for each sprint is recorded. there is a 10 second turn around time between each sprint. The tester then  applies various simple calculation to the collected information to produce  power in watts  per sprint (Mckenzie, 2005) from here, average power and  a fatigue index can be implied. According to Mckenzie (2005) a low fatigue index indicates the athletes ability to sustain anaerobic performance, but a high decline in sprint times indicates the athlete needs to focus on lactate tolerance training.

As a genuine wattage output the figure is obviously inaccurate as the work is not directly against gravity, but, the  standard calculation produces a recordable figure. For that matter the actual time itself is valid

The test needs to closely resemble the activity that requires the anaerobic output.(Mccardle, et al., 2007) Meckel, et al., (2009) anaerobic testing procedures should mimic the sports specific activity patterns

With a degree of organisation, it is possible to multiple test different candidates one after the other as long as a variety of testers are deployed. There is no computer equipment or specialist kit required. but , light gates can be used.

As a physiological test, purporting to assess anaerobic capacity, the test is poor. According to Aramatzis,et al., (1999),  running may not relate  to metabolic processes but to the efficiency of movement . Measurement error increases with increasing velocity and the power leakage due to  absorption mechanisms within the tendon units cannot be ascertained. Vandewalle, et al.,  (1987) casts doubt on any test purporting to assess anaerobic capacity as maximal performance also depends on glycolytic and aerobic power as well as anaerobic capacity. Fatigue indexes (power decrease) of the all-out tests is not reliable and depends probably on aerobic power as well as the fast-twitch  muscle fibre percentage. According to McArdle, et al., (2007) tests of anaerobic power are  also problematic  due to the influence of age, gender , skill motivation body size: greatly influence the production of norming tables. At this distance the test could be merely testing starting technique ability.

According to Zacharogiannis,  et al., (2004)The test does seem to be a valid and reliable power out put test when validated against the Wingate test. This study confirmed significant correlation between RAST and Wingate in peak and mean power. Zagatto, et al., 2008 concludes peak power/mean power and fatigue index correlate with WANT ( but r’s of 0.46 to 0.63 is hardly conclusive) but good at predicting short distance running scores  50, 100, 200, 400.  correlation between running anaerobic sprint test and anaerobic work capacity in soccer players Loures, et al., (2008)

The test is currently being used with a National League basketball team, footballers, sprinters, judo players and rugby players. Research into the RAST is continuing at the University of Wolverhampton

it is interesting that the rast test has already been used to validate various supplementation (Jourkesh, et al., 2007 )

The Wingate 30 second cycle test presents itself as the obvious alternative test, as good predictor of anaerobic capacity, reproducible, and good performance predictor (Zagatto, et al., 2009)  but has a higher equipment requirement. However, the reservation that  anaerobic protocols resemble the sports being tested seems valid, and cycling is not running.

Skinfold test

Skinfold is an anthropometric method for the estimation of body fat by taking  skin fold measurements.  Skin folds are lifted  from the skeletal/muscular frame  at specified sites using one of many makes of calipers available as a measuring tool. These measurements create  a model  of density. Formulae (Siri or Brozek) , based on various assumptions of the make up of human composition  are applied to this model. partly  based on the observations that  large proportion of total body fat is in the subcutaneous tissue (keys and Brozeck, 1953)

This is in effect a two stage process, with two sets of validation:  the issues are, can a skin fold test create a valid model of density and once confronted with a density figure, is it possible to distinguish what that  percentage of that density is fat.

The test is important  due to the growing obesity crisis. It is useful to have an international reference guide, and according to Durnin and Womersley(1974) fat levels influences death rate, affect drug effectiveness,  and indicate whether the body can with stand cold and starvation.

The current use of skin fold is  based on the Durnin and Wormersley (1974) skinfold method  which estimates  body density by using the above mention method.In the original paper, Durnin and Womersley (1974) list several reservations about the technique including the lack of linear relationship variation in skinfold compressibility .

However,  this was not a random sample, but volunteers were  selected to represent a spread of obesity levels (thereby making corrolation easier) and spread of age. However Mickelsen, (1958) felt that densitometric technique (underwater weighing) fraught with difficulty including  gastrointestinal tract air: fear of underwater weighing . Nevertheless the ACSM guidelines,(2005) suggest skinfold measurements to be  highly correlated with  body composition as determined by hydrodensitometry.  Durnin and Satwanti  (1982) concluded  that variations observed in the estimation of body fat by densitometry are well within the basic errors of the method. Kispert, et al.,(1987) observed that, in a clinical situation, small changes in body shape were difficult to spot if skinfold measurements were taken by different testers. however,  ACSM 2005 suggests training  and multiple practise sessions  can overcome user error.

To this density model a second formula  is applied, either Siri or Brozek, which, based on various assumptions as to  the nature of body composition,  delivers a % body fat figure which can be looked up against norming tables  (ACSM 2005).

In the original paper, Durnin and Womersley (1974) list several reservations about the body composition elements of the technique including density of skeleton and aging changes in body composition with obesity  and the proportion of fat situated subcutaneously however, where body composition was theorised to vary on the basis of   race, gender or age, numerous corrective calculations  now exist.(ACSM, 2005) currently there are two commonly used models about the way in which human density is distributed such as  the two component model  divides the body into the fat mass (FM) and the fat free mass (ffm).(Siri, 1956)  Further research are producing three and four component models

However, Durnin et al 1997 argued that worrying about the possible technical errors  of skin fold  assumed  little importance  against the background about the basic assumptions listing densitometry, total body water and total body K, and others!!!).

it was  only in 1984 that the actual cadaver studies were increased from 9 to 34 (Clarys, et al., 1984) and a range of techniques applied and cross checked including skinfold, underwater weighing (using the  concept of adipose tissue free weight). The composition varied substantially (bones from 16.3%-25.75% and muscle 41.9 to 59.4%) which , it can be argued,  totally, undermines densitometric assumptions about body composition. Clasey, et al., 1999 concluded, reviewing various body composition  formulae that the use of many body composition techniques should be viewed with concern. Davies. et al., (1986), using  A-mode ultrasound, observed that the  proportion of fat situated subcutaneously (PFSS) was found to vary considerably between individuals (range 0·50–0·97 in the women, 0·40–0·97 in the men).  Moreover, there was no relationship between subcutaneous and internal fat masses.

Mickelesen (1958) found  considerable individual variation in the distribution of subcutaneous fat throughout the body.

In practise the technique is intrusive.

An alternative measure depends on for what purpose any obesity measure is to be used. For standard weight loss, a combination of dress size, BMI and self measurement of problem areas are often self applied by most people concerned with weight management. if the target is health risk appraisal, the method  validated by  Björntorp  (1992) Pouliot, et al., (1994) specifically waist circumference seems to be well validated and correlated with health risks of concern to government, easily self applied, and applied in a clinical setting. The technique is recommended by the national forum for obesity.


