Temperature Research Paper

  1. Ziad Obermeyer, assistant professor1 2,
  2. Jasmeet K Samra, research analyst1,
  3. Sendhil Mullainathan, professor3
  1. 1Department of Emergency Medicine, Brigham and Women’s Hospital, Boston, MA, USA
  2. 2Department of Emergency Medicine and Health Care Policy, Harvard Medical School, Boston, MA, USA
  3. 3Department of Economics, Harvard University, Boston, MA, USA
  1. Correspondence to: Z Obermeyer zobermeyer{at}bwh.harvard.edu
  • Accepted 23 November 2017

Abstract

Objective To estimate individual level body temperature and to correlate it with other measures of physiology and health.

Design Observational cohort study.

Setting Outpatient clinics of a large academic hospital, 2009-14.

Participants 35 488 patients who neither received a diagnosis for infections nor were prescribed antibiotics, in whom temperature was expected to be within normal limits.

Main outcome measures Baseline temperatures at individual level, estimated using random effects regression and controlling for ambient conditions at the time of measurement, body site, and time factors. Baseline temperatures were correlated with demographics, medical comorbidities, vital signs, and subsequent one year mortality.

Results In a diverse cohort of 35 488 patients (mean age 52.9 years, 64% women, 41% non-white race) with 243 506 temperature measurements, mean temperature was 36.6°C (95% range 35.7-37.3°C, 99% range 35.3-37.7°C). Several demographic factors were linked to individual level temperature, with older people the coolest (–0.021°C for every decade, P<0.001) and African-American women the hottest (versus white men: 0.052°C, P<0.001). Several comorbidities were linked to lower temperature (eg, hypothyroidism: –0.013°C, P=0.01) or higher temperature (eg, cancer: 0.020, P<0.001), as were physiological measurements (eg, body mass index: 0.002 per m/kg2, P<0.001). Overall, measured factors collectively explained only 8.2% of individual temperature variation. Despite this, unexplained temperature variation was a significant predictor of subsequent mortality: controlling for all measured factors, an increase of 0.149°C (1 SD of individual temperature in the data) was linked to 8.4% higher one year mortality (P=0.014).

Conclusions Individuals’ baseline temperatures showed meaningful variation that was not due solely to measurement error or environmental factors. Baseline temperatures correlated with demographics, comorbid conditions, and physiology, but these factors explained only a small part of individual temperature variation. Unexplained variation in baseline temperature, however, strongly predicted mortality.

Introduction

Have you ever felt cold, or warm, in a room where everyone else felt comfortable? This common experience about room temperature has some interesting lessons for body temperature and how we measure it. To know how warm or cold someone feels, we would not look at room temperature alone. Married couples sitting next to each other in the same room routinely disagree about whether to turn the heat up or down. Individuals have different baseline propensities to feel hot or cold—at any given absolute room temperature.

Yet doctors have forgotten this lesson when measuring core body temperature. We would not use absolute room temperature to infer perceived warmth, but we do use absolute body temperature to infer fever.

In medical school, students are taught that humans have a core body temperature as a species, not as individuals. When clinicians take patients’ temperatures in the clinic or hospital, they compare the measurements with the population average. Deviations from this single number help in the diagnosis of acute pathological states, from infections to thyroid disorders.

Why should someone’s physiological state be compared with an absolute standard temperature? Body temperature deviations, after all, can have their roots in individual physiology, such as age12 and circadian,3 metabolic,4 and ovulatory cycles.5 These factors vary dramatically across individuals, raising the possibility that individuals have baseline temperatures that differ systematically from the population average. The same temperature that is normal for one person might be dangerously high for another.

Historically, the use of a population average was partly a data problem: estimating each individual’s baseline body temperature would have been challenging. Any credible effort would need to tease apart individual baseline temperatures from other sources of variation in measured temperatures: ambient conditions when temperature is taken,6 differences in technique,78 and random error due to intrinsic variability in the measurement process itself.89 Distinguishing signal from noise under these conditions would require large amounts of data: both large numbers of patients and multiple temperature measurements for each patient.

The modern electronic health record stores rich physiological data, including temperature, on large numbers of patients, and many have noted its potential to generate new clinical knowledge.1011 We used these data to deal with random errors in individual temperature measurements, by applying statistical techniques that find signal and minimize noise across multiple temperature measurements, taken in a variety of outpatient settings and for a large number of patients. To deal with the many factors known to affect temperature measurements, we used rich contextual data to control for conditions at the time of measurement: ambient temperature, humidity, time of day, date, and body site of measurement. This allowed us to estimate stable baseline temperatures for every patient in our large and diverse sample, as opposed to population averages. We then explored links between individual temperature and a range of other variables: demographics; physiological measures, including vital signs; and mortality.

Individual differences in body temperature might be meaningful in two ways. Firstly, their very existence could open new insights into human physiology and links between body temperature, metabolism, and longevity.121314 Secondly, individualized “precision temperatures” could allow doctors to tailor testing and treatment decisions to patients’ physiology. At a minimum, they might change one familiar part of the doctor-patient conversation: the eye-rolling that sometimes ensues when patients report that a given temperature, although normal, is “high for me.”

Methods

Until recently, large scale databases of temperature measurements were scarce: with some notable exceptions,1516 most studies of body temperature in humans date from 1950 or before17 and have important limitations. Temperatures were measured at varying times of day and in different seasons, using unspecified instruments.17 Details regarding enrollment procedures were often absent, but—particularly in older studies—there was little apparent concern about achieving a diverse sample of patients for race, sex, and comorbid conditions.17 Lack of longitudinal data meant that temperature measurements could not be correlated to subsequent outcomes.151617 Perhaps most importantly, albeit with some exceptions,1518 sample sizes were small—often in the 10s to 100s of patients.1617

Sample

We used a dataset of electronic health records from a large US based academic hospital. The dataset was assembled in two steps: firstly, we identified patients to be included in the cohort—those with one or more visits to the hospital’s emergency and outpatient departments during 2010-12; and, secondly, we obtained data on each of these patient’s outpatient visits from 2009-14, consisting of visits to clinics during which a temperature was measured (Welch Allyn SureTemp Plus digital thermometers were present in most examination rooms).

Since we were interested in individual estimates of normal body temperature in adults, we focused on routine visits during which temperature was expected to be within normal limits. We did not include emergency department visits, during which acute physiological disturbances may affect measured temperatures. We excluded patients aged less than 18 years and those seen on weekends or outside business hours (7 am-6 pm), to avoid selecting patients seen for non-routine problems; those with implausible recorded temperatures (<32°C and >45°C; 0.04% of the total sample); and those visiting clinics for infections, to remove the effect of infection related disturbances in body temperature. We thus excluded visits with ICD-9 (international classification of diseases, ninth revision) codes for infectious diseases, and visits with antibiotics prescribed in the week after the visit (see supplement eTable 1 for cohort construction). Our final sample accounted for roughly 3-5% of the hospital’s annual load of outpatient visits.

Statistical analysis

Estimating individual baseline temperatures

In the included sample of routine outpatient visits, we used ordinary least squares regression to model measured visit level temperature as a function of external conditions (ie, ambient temperature and dew point, drawn from National Oceanographic and Atmospheric Administration data19), body site (eg, oral, axillary), and time of measurement (hour, day, month, and year).

To estimate baseline temperatures, we modeled individual patients’ deviations from the population mean, controlling for the selected factors. We used fixed effects to ensure that individual temperature effects were approximately normally distributed (see supplement eFigure 1), then we re-estimated the model using random effects. We chose random effects because even in the presence of measurement error they are consistent and efficient estimators of variance.2021 These steps allowed us to estimate individual temperature effects as well as features of the population level distribution, such as the variance of individual effects. Using individual level effects necessarily restricted the sample to patients with at least two measured temperatures over the study period. We omitted time invariant attributes of patients (eg, sex, race) since these cannot be estimated alongside individual level effects. Standard errors were clustered at patient level.

We correlated the resulting individual temperature random effects with other variables of interest in the electronic health record, in three groups: demographics, including age, sex, and race; comorbidities, defined using ICD-9 codes over the year before visits, following usual practice22; and physiological measurements (pulse, systolic and diastolic blood pressure, body mass index), using average values over the period spanning from first to last included visit date for each patient—this mirrored the period over which individual level effects were estimated in our dataset. Since we were exploring a range of correlations, we adjusted P values for multiple hypothesis testing using the Holm sequential procedure.23 Alternative methods of adjustment2425 are provided in supplement eTable 3 (results were substantively unchanged).

Relation between individual temperature effects and mortality

Finally, we explored the relation between individual temperature and mortality, using linkage to state social security data. To accurately calculate one year mortality, we addressed a source of bias in these longitudinal cohort data: patients were sampled and included in the cohort because of encounters over 2010-12, up until which they were necessarily alive. Likewise, patients were followed until their last temperature measurement, as late as 2014, at which time they were also alive by construction. Since our primary interest was the correlation between mortality and routine temperature measurements, we calculated mortality in the year after each patient’s last temperature measurement, excluding those whose 2010-12 sampling event occurred after the last temperature measurement (16% of the sample). We then estimated the relation between one year mortality and individual temperature effects by logistic regression, controlling for demographics, comorbidities, and physiological measures. As the hospital is a referral center serving patients from the local community and further away—who require specialized care for complex, serious illnesses with high mortality—we also controlled for log distance between patients’ home zip code and hospital zip code.

Statistical packages

All analyses were performed in STATA (version 14.0) and R (version 3.2.3; Foundation for Statistical Computing).

Patient involvement

No patients were involved in setting the research question or the outcome measures, nor were they involved in developing plans for design or implementation of the study. No patients were asked to advise on interpretation or writing up of results. There are no plans to disseminate the results of the research to study participants or the relevant patient community.

Results

Of 374 306 patients with temperature measurements at outpatient visits, we excluded 130 800 (largely because of infection diagnoses or antibiotic prescriptions; see supplement eTable 1), leaving 243 506 visits (18% of all outpatient visits meeting the other inclusion criteria). Table 1 shows demographics, physiological measurements, comorbidities, and one year mortality. The mean age at time of the visit was 52.9 years, 64% of patients were women, and 41% were of non-white race (including 16% black or African-American patients, 17% Hispanic patients). The most common primary diagnoses at included outpatient visits were osteoarthritis (5.9%), back pain (4.9%), and routine evaluation and examination (4.5%). The clinics most commonly visited were orthopedics (10%) and internal medicine (hospital based clinic: 7.8%, community clinic: 5.7%). One year mortality in this sample of patients receiving care at a tertiary referral hospital was 6.2%, considerably higher than in the general population (<1% in a similar age range).

The mean measured temperature was 36.6°C (95% confidence interval 36.6°C to 36.6°C). Each patient had a median 5 (interquartile range 3-9) temperature measurements over a median of 2.1 (0.8-3.8) years, and 19% had more than 10 measurements.

Figure 1 shows the correlation of temporal and environmental factors to measured body temperature, estimated by using random effects regression. Temperature measurements were largely oral (88.2% oral, 3.5% temporal, 3.0% tympanic, 0.1% axillary, 5.2% not recorded), and temporal, tympanic, and axillary temperatures were significantly lower than oral temperatures (by –0.03°C, –0.06°C, and –0.27°C, respectively; all P<0.001). We observed diurnal variation in temperature by hour, with a peak at 4 pm (0.03°C v 12 pm, P<0.001). Higher ambient temperature and dew point were both linked to higher body temperature. Month effects worked to offset these effects—that is, at the same ambient temperature and dew point, summer months were linked to lower body temperature and winter months to higher body temperature. On a median temperature day in our dataset (12.2°C), body temperature was on average 0.08°C lower in July than in February, presumably reflecting the effect of compensatory physiological mechanisms (eg, evaporative cooling, vasoconstriction) by season. Supplement eTable 2 presents the full coefficients from the model.

The standard deviation for random effects from this model, denoted as individual baseline temperatures, was 0.15, with 95% range (ie, 2.5–97.5th centile range) 0.60°C (–0.33-0.27°C). In comparison, the standard deviation for raw measured body temperature was 0.42, with 95% range 1.67°C (after subtracting the mean, for comparability: –0.93-0.74°C).

Table 2 shows demographics, vital signs, and comorbidities by fifth of individual baseline temperature, along with regression coefficients of baseline temperatures on the variable. Baseline temperatures declined with age (–0.02°C every decade, P<0.001). African-American women had the highest temperature (0.052°C higher than white men, P<0.001). Baseline temperature also varied significantly as a function of comorbid conditions. Cancer was linked to higher temperature (0.02°C, P<0.001), whereas hypothyroidism was linked to lower temperature (–0.01°C, P=0.01; the relation was linear for mean thyroid stimulating hormone level over the span of the data (see supplement eFigure 2A). The total number of comorbidities was not statistically significantly linked to baseline temperature over and above all individual included comorbidities.

Table 2 shows the relation between individual baseline temperatures and average physiological measurements over the study period. Controlling for demographic factors and comorbidities, higher temperatures were linked to increased body mass index (0.002°C per m/kg2, P<0.001). Higher temperature was also linked to higher pulse (4.0×10−5°C per beats per minute, P=0.17) and increased diastolic blood pressure (1.2×10−4°C per mm Hg, P=0.01). Figure 2 shows mean vital signs over the study period and their relation to baseline temperatures by sex.

Despite the many statistically significant relations identified between individual baseline temperatures and demographics, comorbidities, and physiological measures, these factors collectively accounted for only 8.2% of variation (adjusted R2) in temperature. Residual variation was greater for women than for men (residual sum of squares: 283 v 170), raising the possibility that at least part of the residual difference was driven by hormonal cycles (not measured). However, most variation in baseline temperatures remained unexplained by commonly measured health variables in both sexes.

Table 3 shows the relation between individual baseline temperature and mortality. Controlling for age, sex, race, vital signs, and comorbidities, a 1°C increase in temperature translated into 3.5% higher mortality (P=0.014). For example, a 1 SD increase in temperature (0.15°C) would translate into a 0.52% absolute increase in one year mortality. Compared with a mean mortality of 6.2% in our sample, this represented an 8.4% relative increase in mortality risk.

Table 1

Demographics, physiological measurements, comorbidities, and one year mortality for full sample. Values are numbers (percentages) unless stated otherwise

Summary statistics on temperature

  • Mean 36.6°C (95% range 35.7-37.3°C; 99% range 35.3-37.7°C)

  • Measurements at different sites (versus oral): temporal: –0.03°C; tympanic: –0.06°C; axillary: –0.26°C

Fig 1

Relation of temporal and environmental factors to measured body temperature. Coefficients estimated by random effects regression are shown for ambient temperature, dew point, hour, and month, compared with reference categories: median temperature 10th (12.2°C), median dew point 10th (4.7˚C), 12 pm, and April, respectively

Table 2

Coefficients from regression of individual temperature effect on demographics, comorbidities, and physiological measurements in 20 718 participants

Fig 2

Physiological measurements and relation to temperature effects, by sex. Dots represent centiles of individual temperature effect and vertical bar 95% confidence intervals

Table 3

Regression of one year mortality on individual temperature, controlling for demographics, comorbidities, and physiological measures in 15 821 participants

Discussion

To our knowledge, this is the first attempt to demonstrate meaningful variation in individual body temperature, separately from random measurement error and the influence of external factors, and to correlate them to a range of patient factors and outcomes in a diverse patient population.

These results illustrate a way in which “big data” can serve to generate new medical knowledge. Here we used these data not to answer a causal question (eg, to discover side effects of drugs) or to predict an outcome (eg, to create an early warning system for a given condition), but to discover previously unsuspected and potentially important patterns in human physiology. It is unlikely that any one practitioner could have noticed these patterns, despite their fairly large magnitude connections with mortality. Rather, large datasets and statistical methods are needed to bring out these patterns from the empirical record.

Our most noteworthy result was the connection between temperature and mortality. This fits with a larger body of research showing that reduction in body temperature increases longevity and delays aging in a range of (ectothermic) experimental models, including Drosophila and Caenorhabditis elegans,26 as well as transgenic (homeothermic) mice engineered to have lower temperatures.12 This observation raises a set of questions that may be worth answering through further research. What is the biological basis for an individual’s baseline temperature? And how might these factors teach us more about subtle but important physiological patterns that might lead to good or poor outcomes? Two sets of results from our analysis were suggestive along these lines.

Firstly, individual temperatures were highly correlated with measured patient characteristics, particularly those related to metabolism and obesity. These differences may have their roots in obvious thermodynamic factors: bodies with larger mass dissipate heat less rapidly, leading to higher temperatures. Fat (which is correlated with mass) could also act as an insulator, independent of mass, leading to higher heat retention in people with more fat. Other explanations, however, are also plausible. It is well known that caloric restriction through fasting leads to down-regulation of temperature, presumably to conserve energy27; indeed, reduced temperature is now considered a biomarker for caloric restriction.1314 Although there are few trials of caloric supplementation, the existing literature suggests that individuals vary widely in their ability to dissipate excess energy from overfeeding.28 Given the strong links between resting metabolic rate and body temperature,4 it is possible that higher resting body temperature could be a response serving to dissipate excess energy from caloric intake. We found that a raised temperature correlated with both body mass index and activation of the sympathetic nervous system (ie, increased pulse rate and diastolic blood pressure). Thus temperature could be another variable in the cluster of traits linking obesity and activation of the sympathetic nervous system.293031 Interestingly, while our temperature effects were estimated over longer periods, some small studies have found that hypothermia is linked to increased mortality for acute events (eg, in hip fractures),32 potentially implying a different physiology in short term versus long term temperature regulation.

Secondly, we found a large correlation between individual baseline temperature and mortality that was not explained by measured patient characteristics. What factors might this temperature variation be picking up on that confer an 8.4% mortality disadvantage? It is tempting to speculate. We did identify a correlation between diagnosed cancers and temperature. This has been noted in the literature previously, either because of the direct metabolic demands of the cancer itself,33 or because of the body’s immune response.34 Subclinical infections or rheumatological diseases could exert a similar effect. If higher temperature reflected undiagnosed cancers or other illnesses, this would generate the correlation we observed between the unexplained component of temperature variation and subsequent mortality from these same illnesses. Another potential explanation is that higher temperature reflected a pro-inflammatory milieu; however, we found no clear connection between individual temperature and the inflammatory marker reactive protein (see supplement eFigure 2b). Ultimately, further study is required. We could imagine studies that estimated individual temperature effects using similar methods, then subjected those with higher temperatures to additional diagnostic studies to identify undiagnosed illnesses.

The finding that measured temperature was, other things being equal, lower in hot months and higher in cold months may reflect engagement of well known compensatory adaptations (eg, plasma volume, evaporative cooling, vasoconstriction, shivering) to temperatures experienced over longer periods, as opposed to the short term direct effects of higher or lower temperature.35 Related recent work has shown, for example, that drinking warm beverages on warm days results in heat loss from increased sweat output.36

Finally, our estimate of population mean temperature differed from other studies—for example, it was lower than in a sample of primarily young, healthy participants,16 and higher than in a population of older adults seen in healthcare settings.15 One potential advantage of our approach is that it adjusted for environmental and temporal factors at the time of measurement, which was not possible in other studies. Although this might enhance the generalizability of our estimate, it would be difficult for any one study of core temperature to ensure that estimates are valid for an entire species. However, it may help with clinical decision making for populations seeking healthcare in similar settings.

Limitations of this study

Our study had several limitations. We considered patients at one academic center and measured temperature using similar equipment, which can have correlated errors in measurement. Our methods were designed to estimate robust deviations in individual temperature separately from measurement error but might not generalize to other equipment, although we would guess that stringent Food and Drug Administration standards for clinical electronic thermometers offer some guarantees that the relative magnitude of effects should be similar. Our data came from one climate zone. Although we controlled for the substantial variation in environmental conditions within this zone, temperature and compensatory mechanisms may vary across climactic zones. We excluded patients with infection and those prescribed antibiotics, but it is possible that some patients had infections that were undiagnosed and untreated by their physicians. However, since this would have to be a consistent finding over multiple visits for the same patients, and since most infections are by contrast transient, this is unlikely to have affected our estimated correlation between individual baseline temperature and mortality. We sampled patients based on visits to a hospital emergency department and clinics. This resulted in an ethnically and medically diverse sample over multiple years of data, but it also selected a sicker set of patients, with a higher comorbidity burden and mortality than the general population. An advantage is that our sample was representative of the population of patients using healthcare today.

Conclusions

We found that individuals have body temperature baselines that correlate with a range of demographic factors, comorbid conditions, and physiological measurements. Since the unexplained variation in temperature is large and correlates with mortality, it may be an interesting and important area for further study.

What is already known on this topic

  • A long tradition of research on human core body temperature, starting in the 19th century, has focused on establishing average temperature in a population

  • Temperature is known to be influenced by many factors that differ widely across patients (eg, age and circadian, metabolic, and ovulatory cycles) raising the possibility that individual baseline body temperatures might vary systematically

What this study adds

  • Individual baseline temperatures are correlated with specific demographic factors, with older people the coldest and African-American women the hottest

  • Particular medical conditions were also statistically significantly linked to lower or higher temperature, as were physiological measurements, but these factors explained only 8.2% of variation in individual baseline temperatures

  • The remaining unexplained variation was a large and significant predictor of subsequent mortality, nearly 8.4% higher mortality for a 1 SD increase in temperature

Acknowledgments

We thank Philip Mackowiak for comments on an early draft of this manuscript.

Footnotes

  • Contributors: ZO and SM designed the study and wrote the manuscript. ZO obtained funding. JKS and ZO analyzed the data. All authors had full access to all of the data (including statistical reports and tables) in the study and can take responsibility for the integrity of the data and the accuracy of the data analysis. ZO acts as guarantor.

  • Funding: This study was supported by a grant from the office of the director of the National Institutes of Health (DP5 OD012161) to ZO. This research was independent from funders. The funder had no role in the study design; in the collection, analysis, and interpretation of data; in the writing of the report; or in the decision to submit the article for publication.

  • Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf and declare: no financial relationships with any organizations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.

  • Ethical approval: This study was approved by the institutional review boards of Partners HealthCare, the parent organization of the hospital where the research was conducted.

  • Data sharing: No additional data available.

  • Transparency: The lead author (ZO) affirms that the manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned have been explained.

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

References

Global-mean surface temperature (GMST) is the most important indicator of global climate change, because (i) it is directly related to the planetary energy balance (Fourier 1827) and increases quasi-linearly with cumulative greenhouse gas emissions (IPCC—Intergovernmental Panel on Climate Change 2013), and (ii) GMST is directly related to most climate impacts and risks (Arnell et al2014). Hence there is a large interest in the time evolution of GMST, both in the scientific community and the general public (see e.g. Boykoff 2014, Lewandowsky et al2015, Mooney 2013). Two facets of this high interest are frequent discussions about (i) whether the rise in GMST has accelerated or slowed down, and (ii) how well it agrees with various model projections. These are two separate issues; this paper deals with the former only, i.e. with analysis of possible trend changes in the observational data. Our goal is to provide a current analysis of GMST trends in the light of the recent series of three record-breaking years in a row in most data sets (never seen before in the instrumental record), and to point out two important pitfalls in analysing GMST trends.

While many scientific publications of the past years have discussed an alleged 'hiatus' or 'slowdown' and its possible causes, few have provided any statistical assessment of whether a significant trend change actually occurred. While it is clear and undisputed that the global temperature data show short periods of greater and smaller warming trends or even short periods of cooling, the key question is: is this just due to the ever-present noise, i.e. short-term variability in temperature? Or does it signify a change in behavior, e.g. in the underlying warming trend? In other words, are periods of particularly high or low warming trend significant, in that any of them is unexpected and requires further explanation than just the usual noise in the data? While it is a semantic question what the meaning of a 'hiatus' is, the question of significance is a well-defined scientific question.

Foster and Abraham (2015) applied 'a barrage of statistical tests' to the NASA GISTEMP data for 1970–2013 'to search for evidence of any significant departure from a linear increase at constant rate since 1970.' In every case, the analysis not only failed to establish a trend change with statistical significance, it failed by a wide margin.

Rajaratnam et al (2015) used four GMST data sets up to 2014 to perform statistical tests of four different hypotheses, namely 'whether the recent period has demonstrated (i) a hiatus in the trend in global temperatures, (ii) a temperature trend that is statistically distinct from trends prior to the hiatus period, (iii) a 'stalling' of the global mean temperature, and (iv) a change in the distribution of the year-to-year temperature increases.' They 'conclude that the rate of warming over the past ≈ 15 yr is not appreciably different from the rate of warming prior to the recent period.' They further find 'overwhelming evidence' against a 'pause' in warming (i.e. no trend) over the past ≈ 15 yr.

Cahill et al (2015) likewise analysed four GMST data sets up to 2014 using change point analysis, an established statistical technique to identify significant changes in trends in a data set. They found 'no evidence of any detectable change in the global warming trend since ~1970.'

Finally, Lewandowsky et al (2016) have investigated whether the most recent fluctuation in 15 yr trend value (as defined by a z-score) is unusual, by considering all possible 15 yr trends in GMST between 1970 and 2014. They find at least three instances of similar or greater fluctuations in 15 yr trends, the largest of all being the exceptionally rapid warming trend during 1992–2006. Incidentally, this large trend was noted by Rahmstorf et al (2007), who proposed 'intrinsic variability within the climate system' as the first candidate reason. Lewandowsky et al conclude that the pause period (comprising the 15 yr trends 1998–2012, 1999–2013 and 2000–2014) 'is not unusual or extraordinary relative to other fluctuations and it does not stand out in any meaningful statistical sense.'

In contrast to these studies, Fyfe et al (2016) claimed that 'the surface warming from 2001 to 2014 is significantly smaller than the baseline warming rate,' where that baseline is 1972–2000. This claim was not backed up by statistical analysis, nor was any of the previous analysis cited that we discussed above.

In the following we will revisit the issue of trend changes in GMST with data up to 2016. We highlight two problems with some previous trend analyses: the multiple testing problem and the problem of using broken trends.

One sense of the word 'trend' is the underlying value of a time series, the signal value as opposed to the noise. In the additive noise model the time series values are the sum of trend plus noise, i.e.

where xj are the data values, f(tj) the signal values, and εj the noise values. We adopt the common convention that the signal value is the expected value of the data at a particular time

which imposes the condition that the noise is zero-mean, i.e.

Another sense of the word trend, the one which we will adopt, is the rate of change of the underlying signal, i.e.

Hence in the context of GMST, trend refers to the rate at which global temperature is changing.

Only if the rate of change is constant will the signal follow a straight line. Although this is rarely the case with complete precision, it very often happens that the noise level is high enough to make it impossible to establish trend change except over very long time spans. In such cases the trend is usually estimated by fitting the very same straight-line model. Particularly for global temperature time series, 'trend' usually refers to the estimated rate of change of the underlying temperature signal from linear regression. Given the signal-to-noise ratio of global temperature on decadal time scales, this is the most practical determination of the trend, and is the one most often cited in the literature.

Establishing acceleration or deceleration of global temperature means detecting and confirming a change in the trend. Since the trend is distinct from the noise, the influence of noise will always lead to apparent changes. Distinguishing those which are genuine from those induced by noise is the purpose of statistical significance testing.

Statistical significance is a concept that is widely used but also critically discussed, mainly regarding the ambiguity of the threshold value (e.g. 90% or 95%) and the choice of the null hypothesis (see e.g. Nicholls 2001). The key to its usefulness is to clearly define what is meant. For our purpose, a significant slowdown or acceleration in global warming is a behavior of global-mean temperature which is highly unlikely to occur under the null hypothesis of a constant warming trend plus short-term random variations as observed in the past (where 'past' refers to a suitably defined baseline period). In other words, a significant change in warming trend refers to a temperature evolution which is unlikely to be a result of a simple continuation of the warming trend and random noise found in the baseline period. Our null hypothesis is thus that long-term trend and short-term noise continue unchanged. Any claim of a significant slowdown or acceleration would require data that are highly unlikely (e.g. 5% or 10% likelihood depending on the desired confidence level) to be consistent with this null hypothesis.

We consider five prominent global temperature data sets: (i) NASA GISTEMP (Hansen et al2010, GISTEMP Team 2016), NOAA (Smith and Reynolds 2005, Smith et al2008), HadCRUT4 (Morice et al2012), the revision of HadCRUT by Cowtan and Way (2014), and the Berkeley Earth Surface Temperature (Rohde et al2013). All are combined land+ocean series. Some of these (GISTEMP, Cowtan and Way, Berkeley Earth) aspire to provide a full global mean, by interpolating into some data-sparse regions of the globe, most notably the Arctic. The others simply ignore data gaps and average only over the data-covered part of world, which is systematically biased relative to the true global mean if the data-sparse regions deviate from global mean warming (which is well-documented for the Arctic, which recently has warmed two to three times faster than the globe).

Before we proceed to our change point analysis of global temperature trends up to the present, we discuss two important but underappreciated pitfalls waiting to trap those testing for a change in the trend of global temperature time series. These are the multiple testing problem, and failure to account for the additional degree of freedom when one uses a model with a jump discontinuity (a broken trend). We discuss each in turn.

It is straight-forward to test by Monte Carlo simulations how likely it is that a linear warming trend as low as during some specified time interval (e.g. the interval 2001–2014 defined as 'slowdown period' by Fyfe et al (2016)) would occur under the null hypothesis, i.e. assuming a continuation of the same linear trend and variance found in a previous baseline period. We use a baseline period starting in 1972, as ~1972 marks the beginning of the most recent approximately linear phase of global warming as identified objectively in the change point analysis of Cahill et al (2015). We first determine the linear trend and standard deviation of global temperature during the baseline period (see table 1). Subsequently we perform Monte Carlo simulations by generating 10 000 realisations of time series consisting of this same linear trend plus white (Gaussian) noise with the standard deviation found in the baseline period. Note that the choice of uncorrelated white noise is (a) justified since the observed variability does appear to be close to white, and (b) conservative in the sense that any autocorrelation in the noise would make it more likely to obtain trends by chance that deviate strongly from the baseline trend, so that using auto-correlated noise would make it harder to reject the null hypothesis.

Table 1. Monte Carlo Results. Shown are the standard deviations and trends during the baseline period, the trends during the 'slowdown' periods, and the likelihood that a trend at least as small as the latter would be observed by chance if the baseline trend and standard deviation had continued unchanged.

1972–20001972–20002001–20142001–2014
Data setStd. dev.Baseline trendSlow trendLikelihood
HadCRUT40.103 °C0.0172 °C yr−10.0030 °C yr−131%
GISTEMP0.103 °C0.0173 °C yr−10.0074 °C yr−173%
1972–19991972–19992000–20122000–2012
Data setStd. dev.Baseline trendSlow trendLikelihood
HadCRUT40.104 °C0.0177 °C yr−10.0056 °C yr−165%
GISTEMP0.104 °C0.0179 °C yr−10.0103 °C yr−196%

Table 1 lists the percentage of Monte Carlo simulations that show at least one interval of the same length with a trend at least as low as that found during two alleged slowdown periods in the observational data, i.e. 2000–2012 and 2001–2014. We show these results for two data sets: GISTEMP as it is typical for data sets including (partly interpolated) coverage of the whole globe, and HadCRUT4 as the extreme case of a data set with a large gap of missing data in the Arctic (the region of most rapid recent warming), which leads to particularly low recent warming trends (Cowtan and Way 2014).

For 'slowdown period' 2001–2014 we tested how many of 10 000 Monte Carlo realizations of 43 yr of data (1972–2014) show at least one 14 yr interval with a trend as low or lower than 2001–2014. For 'slowdown period' 2000–2012 we tested how many of 10 000 Monte Carlo realizations of 41 yr of data (1972–2012) show at least one 13 yr interval with a trend as low or lower than 2000–2012.

The results show that even for the HadCRUT data, the chances of getting such a low trend as observed during 2001–2014 are 31%, so this 'slowdown' is far from significant by any standard. For the GISTEMP data, the chances of finding a period with a trend as low as observed during 2001–2014 are 73%, so there is nothing remotely remarkable about this. If one uses the slightly different time interval 2000–2012 (suggested by some for a possibly significant slowdown) and GISTEMP, then it would have been a statistically significant event at the 95% level if one had not found a trend as low as observed! That would have falsified the null hypothesis of ongoing linear warming trend plus noise, not the fact that such a low trend was in fact observed.

Note that conservative assumptions have been made, i.e. white noise was used and the number '14' (looking at 14 yr trends) was taken as a given, although 14 was also chosen after the fact, because a particularly low 14 yr trend appeared. So if investigated more elaborately and rigorously, the likelihood of just by chance getting a slow trend period as observed would be greater still.

It is important to understand that a simple comparison of the trend values of the 'baseline' and the 'slowdown' periods, finding that their uncertainty ranges do not overlap (Hawkins 2016, Santer et al2000) does not provide evidence for a significant slowdown. That would be the case only if the 'slowdown period' were one randomly drawn sample—for one random period it would indeed be unlikely to encounter such a low trend just by chance. For the HadCRUT data the chance would be < 2%, for GISTEMP < 10%. However, the period 2001–2014 was not randomly drawn: it was specially selected because of its low trend from many possible time intervals. This is a well-known and not uncommon statistical mistake: the failure to account in a significance analysis for the fact that a particular number is not one randomly drawn sample but has been specifically selected because of its value. Consider making a test to check whether two dice are loaded towards producing low numbers, rolling those dice once, and they both roll a one. This indeed is a randomly drawn sample and it would provide some support for the suspicion that the dice are loaded, given that it is an event that has less than 5% probability of occurring just by chance with unloaded dice. But if you roll those two dice many times until you finally find one occurrence of two ones, then this event has no significance whatsoever. There is nothing rare about finding one such event in many trials with unloaded dice. This pitfall is known as the multiple comparison or multiple testing problem (Wikipedia 2016). A common approach to correct for multiple testing is the Bonferroni correction (Dunn 1961).

In summary, there is nothing significant or unusual about the interval of lesser warming trend that started around the turn of the century.

The discussion has so far used broken (i.e. discontinuous) trend lines. This is a further problem of many past analyses, also tending to enhance the (in this case false) impression of a significant slowdown.

Figure 1(a) shows a model with broken trends applied to HadCRUT4 data. A naive statistical analysis suggests that the change is real because the two linear segments have significantly different slopes.

However, the underlying model includes more degrees of freedom than just a change of slope, it includes a change of intercept as well, which is not accounted for in the naive statistical comparison. The proper approach is to account for both added degrees of freedom, as is done by the Chow test (Chow 1960). For HadCRUT4 data it returns a p-value of 0.0635, which fails statistical significance at 95% confidence, while for NOAA data the p-value is 0.295, nowhere near significance. Note that these p-values have not yet included allowance for the multiple testing problem, so that either of the two pitfalls alone invalidates claims of a significant slowdown.

There are also grounds to suspect that the 'broken trend' model is unphysical. If instead one allows for a slope change at 2001 but requires the model to be continuous, it yields the models shown in figures 1(b) through (f). None of the continuous trends even gives the visual impression of meaningful trend change, and more important, when tested for significance none achieves even 80% confidence for a trend change in 2001.

Finally, we present a change point analysis of global temperature including the latest values for 2015 and 2016. This analysis models a time series as piecewise linear sections and objectively estimates where/when changes in data trends occur. The model forces each line segment to connect, avoiding unphysical discontinuities. If the data do not support a trend change then such will not be detected. Analysis as in Cahill et al (2015) has been extended to include data from 2015 and 2016 in order to investigate whether or not an acceleration has recently taken place; the estimated trend values are shown in figure 2. This question is increasingly asked by journalists, given the third record-hot year in a row.

No recent (post-1980) change-point was found in any of the five data sets, with three change points suitably capturing the climate signal, suggesting that the recent hot years are a continuation of the existing trend, augmented by noise. The 2016 value seems visually extreme, but does not yet provide statistical evidence for a trend change. Of course, future temperature development might provide evidence that an acceleration indeed happened around 2014, but the data up until now do not. Moreover, the extreme heat of 2016 has almost certainly been enhanced by an El Niño event in the tropical Pacific which is now over, so that we expect a lower temperature again in 2017. Table 2 lists the mean trend for each time span, together with the likely values at a variety of percentiles, as well as the changepoint times with their percentiles, for all five data sets.

Table 2. Trends and change-point times from analysis of five global temperature data sets.

DataPct.
mean2.5%25%50%75%97.5%
GISTEMPRate 1−0.005−0.008−0.006−0.005−0.003−0.000
Rate 20.0140.0090.0110.0120.0150.027
Rate 3−0.003−0.009−0.004−0.002−0.0010.002
Rate 40.0180.0160.0170.0180.0190.021
CP 1191519101911191419181930
CP 2194319381942194319441947
CP 3196919641967196919711975
NOAARate 1−0.007−0.011−0.009−0.007−0.006−0.003
Rate 20.0120.0090.0110.0120.0130.015
Rate 3−0.003−0.008−0.004−0.003−0.0010.002
Rate 40.0170.0150.0170.0170.0180.020
CP 1191119071909191019121917
CP 2194319391942194319441947
CP 3197019641968196919711975
HadCRUTRate 1−0.002−0.003−0.002−0.002−0.001−0.000
Rate 20.0130.0090.0110.0130.0140.018
Rate 3−0.002−0.007−0.004−0.002−0.0010.001
Rate 40.0180.0160.0170.0180.0190.021
CP 1191319081911191319151919
CP 2194319361941194319441949
CP 3197519711974197519761979
Cowtan and WayRate 1−0.001−0.002−0.001−0.001−0.0000.001
Rate 20.0140.0100.0120.0140.0160.021
Rate 3−0.003−0.006−0.004−0.003−0.0020.000
Rate 40.0190.0170.0180.0190.0200.022
CP 1191419081912191419171921
CP 2194219361940194219431947
CP 3197519711974197619771980
Berkeley EarthRate 10.000−0.002−0.0010.0000.0000.001
Rate 20.0160.0110.0130.0150.0170.023
Rate 3−0.002−0.006−0.003−0.002−0.0000.002
Rate 40.0190.0160.0180.0190.0200.022
CP 1191619091913191619181923
CP 2194219371941194219441949
CP 3197519701973197519771982

Short-term fluctuations are unavoidable in global temperature. That episodes will occur which visually seem—sometimes strongly—to represent a change in the underlying trend, is therefore not merely possible, but inevitable. Because fluctuation is ubiquitous, differentiating between genuine trend change and appearances which are merely the manifestation of 'noise' is important.

Our purpose has been to determine what can and cannot be said about trends and their changes, based on the temperature data records only. We find that the public discussion of time intervals within the range 1998–2014 as somehow unusual or unexpected, as indicated by terms like 'hiatus', 'pause' and 'slowdown', has no support in rigorous study of the temperature data. Nor does recent talk of sudden acceleration based on three record-hot years in a row and the exceptional value in 2016. Both the alleged slowdown and the suspected acceleration are in fact well within the expected range of behavior for a constant trend plus the usual 'noise'.

The fact that global temperature data do not reveal any significant trend changes since the acceleration in the 1970s does not rule out that subtle trend changes may nevertheless have occurred; it merely shows that these were not large enough to emerge from the 'noise' of short-term fluctuations.

By physical arguments, by model simulations, or by correlation analyses with additional data (e.g. El Niño/Southern Oscillation indices or solar forcing data) it is possible to identify specific physical causes of temperature fluctuations, and this is a fruitful topic of ongoing climate research (Foster and Rahmstorf 2011, Kosaka and Xie 2013, England et al2014, Suckling et al2016) which helps us to understand natural climate variability. However, this is distinct from the question of whether a significant trend change has occurred in the temperature data as such. That is not the case. It is unfortunate that a major public and media discussion has revolved around an alleged significant and unexpected slowdown in the rate of global warming, for which there never was a statistical basis in the measured global surface temperature data.

References

  • [1]
    Arnell N et al 2014 Global-scale climate impact functions: the relationship between climate forcing and impact Clim. Change134 475–87

    Crossref

  • [2]
    Boykoff M T 2014 Media discourse on the climate slowdown Nat. Clim. Change4 156–8

    Crossref

  • [3]
    Cahill N et al 2015 Change points of global temperature Environ. Res. Lett.10 084002

    IOPscience

  • [4]
    Chow G C 1960 Tests of equality between sets of coefficients in two linear regressions Econometrica28 591–605

    Crossref

  • [5]
    Cowtan K and Way R G 2014 Q. J. R. Meteorol. Soc.140 1935–44 (Dataset accessed: 10 February 2017)

    Crossref

  • [6]
    England M H et al 2014 Recent intensification of wind-driven circulation in the Pacific and the ongoing warming hiatus Nat. Clim. Change4 222227

    Crossref

  • [7]

    Foster G and Abraham J 2015 Lack of evidence for a slowdown in global temperature US CLIVAR13 6–9

  • [8]
    Foster G and Rahmstorf S 2011 Global temperature evolution 1979–2010 Environ. Res. Lett.6 044022

    IOPscience

  • [9]

    Fourier J J 1827 MEMOIRE sur les temperatures du globe terrestre et des espaces planetaires Memoires d l’Academie Royale des Sciences de l’Institute de FranceVII 570–604

  • [10]
    Fyfe J C et al 2016 Making sense of the early-2000s warming slowdown Nat. Clim. Change6 224–8

    Crossref

  • [11]

    GISTEMP Team 2016 GISS Surface Temperature Analysis (GISTEMP) (Dataset accessed: 10 February 2017)

  • [12]
    Hansen J et al 2010 Global surface temperature change Rev. Geophys.48 RG4004

    Crossref

  • [13]

    Hawkins E 2016 Slowdown discussion Retrieved from permanently archived at

  • [14]
    IPCC 2013 Climate change: the physical science basis Contribution of Working Group I to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change ed TF Stocker et al (Cambridge: Cambridge University Press)

    Crossref

  • [15]
    Kosaka Y and Xie S-P 2013 Nature501 403407

    Crossref

  • [16]
    Lewandowsky S, Oreskes N, Risbey J S, Newell B R and Smithson M 2015 Seepage: climate change denial and its effect on the scientific community Glob. Environ. Change33 1–13

    Crossref

  • [17]
    Lewandowsky S et al 2016 The ‘pause’ in global warming: turning a routine fluctuation into a problem for science B. Am. Meteorol. Soc.97 723–33

    Crossref

  • [18]

    Mooney C 2013 Who created the global warming ‘pause’? Grist

  • [19]

    Morice C P et al

0 thoughts on “Temperature Research Paper

Leave a Reply

Your email address will not be published. Required fields are marked *