Overcoming the Challenges of Ocean Data Uncertainty – Eos
In oceanography, as in any scientific field, the goal is not to eliminate uncertainty in data, but instead to better quantify and clearly communicate its size and nature.
Data characterizing the ocean are inherently estimates and are therefore uncertain. This is true of all in situ and remotely sensed observations—of, say, sea surface temperature or sea level—as well as of outputs and forecasts from numerical models and of analysis products resulting from the synthesis of observations and models.
The typical meaning of uncertainty with respect to data is a familiar concept for scientists: A numerical value quantifying the state of a variable can be associated with one or more ancillary numerical values characterizing the possible error. However, it is essential to distinguish between quantifying error and quantifying uncertainty. The error of an estimate, defined as the difference between the estimate and the true value of a variable, cannot be known—if the true value were known, the estimate could be corrected. In contrast, the uncertainty of an estimate can be assessed using various statistical, theoretical, and numerical methodologies.
An ocean data set may otherwise be of the highest scientific quality, but if quantified uncertainties do not accompany it, it will not be useful to scientists or other stakeholders.
In oceanography and climate science, the nature of uncertainty associated with different types of data—for instance, direct and indirect observations versus analysis products—has been semantically and philosophically debated [Parker, 2016]. This sort of debate is helpful because inadequate understanding and treatment of data uncertainty persist in the research community, decreasing the potential usefulness of—and confidence in—many data sets.
As a requirement for proposing, planning, and implementing ocean observing, modeling, and analysis systems, we advocate that resulting data should be accompanied by clearly described and easily accessible uncertainty information. To put it bluntly, an ocean data set may otherwise be of the highest scientific quality, but if quantified uncertainties do not accompany it, it will not be useful to scientists or other stakeholders [Moroni et al., 2019].
Uncertainty Completes the Data
Efforts to reconstruct changes in global mean sea level since the first known sea level measurement was taken in the mid-19th century (i.e., the observational era) and to attribute these changes to driving factors are a field in which uncertainty quantification is at the core of the scientific investigation process. Understanding these changes requires measurements of ocean temperature, cryospheric and terrestrial water mass distributions, and sea surface height at numerous locations and times. From these observations, the contributions of global ocean thermal expansion and global ocean mass change to global mean sea level change can be determined.
The sea level budget is considered “closed” when the sum of these independent components agrees with direct measurements of total sea level, meaning that the body of existing observations is sufficient to interpret the causes of sea level change. Only recently, thanks to the assembled efforts of many individual studies across the interdisciplinary fields that contribute to sea level science, has the sea level budget been closed within quantified uncertainties—an achievement that testifies to our adequate understanding of processes influencing sea level, and their uncertainties, at the global scale [Frederikse et al., 2020].
This example illustrates that determining the magnitude of uncertainties associated with ocean data is necessary not only so that these data can be used meaningfully in scientific investigations—uncertainty quantification makes the data complete. In other words, uncertainty quantification is necessary to evaluate the confidence, or, equivalently, the doubt, one can have in ocean data.
Challenges with Ocean Data
Uncertainty is a major focus of metrology, the science of measurement, and standards for uncertainty quantification are well cataloged in documents in that field. These documents should serve as starting points for oceanographers to lay out a strategy for quantifying the uncertainties in their data [e.g., Joint Committee for Guides in Metrology, 2008].
Some concepts that are applicable to bench measurements are difficult to translate to the oceanographer’s laboratory—the ocean—because the ocean and the climate system in which it is embedded are constantly changing.
Yet some concepts that are applicable to bench measurements in metrology, such as being able to repeat observations under the exact same conditions, are difficult to translate to the oceanographer’s laboratory—the ocean—because the ocean and the climate system in which it is embedded are constantly changing. For example, repeat sampling of hydrographic properties (e.g., temperature, salinity, oxygen) in some remote parts of the ocean has occurred only after decades, if at all. And some high-resolution, global, numerical ocean models can be run only once because of prohibitive computational costs, so the statistical distributions of their output under different initial conditions are unknown.
There are also challenges distinct to oceanography and related fields. Satellite measurements offer indirect estimates of ocean surface properties that are calibrated and validated with in situ observations, yet these “cal/val” exercises are burdened by multiple sources of uncertainty. One such source is representation error, which can arise because pointwise in situ measurements (e.g., of sea surface temperature) do not always agree with satellite measurements, which represent averages of physical quantities over the satellite’s ground footprint. In this case, the measured values disagree because they represent different quantities, and natural variability at short spatial scales masquerades as a possible error in either measurement. Uncertainty related to representation error can be understood only by combining geophysical theories, extensive observations, and methodological knowledge.
In oceanography, as in other fields, the example of representation errors illustrates the necessity to identify sources of errors correctly and to strive to characterize them with appropriate and traceable uncertainties. This is a challenge because the classification of uncertainties (or errors) based on established statistical principles does not necessarily and readily map onto the idiosyncratic classifications used in ocean science. As an example, biases (systematic errors) and random errors are often conflated in ocean observations for lack of appropriate knowledge.
Another example is in climate and ocean modeling, for which there is a need to consider separately structural or model uncertainties and uncertainties due to the chaotic behavior of the Earth system [National Research Council, 2012]. Further, when models and observations are combined to generate state estimates or forecasts, confidence in their outputs can be accurately assessed only if observational uncertainties are available and are carefully propagated through the machinery of data assimilation, in which models and their output are repeatedly updated to incorporate new observations [Leutbecher and Palmer, 2008].
Effective communication of uncertainties among observationalists, modelers, and theoreticians is thus essential. This communication requires coordinated efforts and standardized protocols among these groups—a tall order considering that different oceanographic disciplines have traditionally been insular and have used distinct vocabularies to describe uncertainty. Such disconnects might be remedied if ocean scientists put greater emphasis on training in statistical sciences and collaboration with experts in that field.
Data characterizing the ocean are inherently estimates and are therefore uncertain. This is true of all in situ and remotely sensed observations—of, say, sea surface temperature or sea level—as well as of outputs and forecasts from numerical models and of analysis products resulting from the synthesis of observations and models.
The typical meaning of uncertainty with respect to data is a familiar concept for scientists: A numerical value quantifying the state of a variable can be associated with one or more ancillary numerical values characterizing the possible error. However, it is essential to distinguish between quantifying error and quantifying uncertainty. The error of an estimate, defined as the difference between the estimate and the true value of a variable, cannot be known—if the true value were known, the estimate could be corrected. In contrast, the uncertainty of an estimate can be assessed using various statistical, theoretical, and numerical methodologies.
An ocean data set may otherwise be of the highest scientific quality, but if quantified uncertainties do not accompany it, it will not be useful to scientists or other stakeholders.
In oceanography and climate science, the nature of uncertainty associated with different types of data—for instance, direct and indirect observations versus analysis products—has been semantically and philosophically debated [Parker, 2016]. This sort of debate is helpful because inadequate understanding and treatment of data uncertainty persist in the research community, decreasing the potential usefulness of—and confidence in—many data sets.
As a requirement for proposing, planning, and implementing ocean observing, modeling, and analysis systems, we advocate that resulting data should be accompanied by clearly described and easily accessible uncertainty information. To put it bluntly, an ocean data set may otherwise be of the highest scientific quality, but if quantified uncertainties do not accompany it, it will not be useful to scientists or other stakeholders [Moroni et al., 2019].
Uncertainty Completes the Data
Efforts to reconstruct changes in global mean sea level since the first known sea level measurement was taken in the mid-19th century (i.e., the observational era) and to attribute these changes to driving factors are a field in which uncertainty quantification is at the core of the scientific investigation process. Understanding these changes requires measurements of ocean temperature, cryospheric and terrestrial water mass distributions, and sea surface height at numerous locations and times. From these observations, the contributions of global ocean thermal expansion and global ocean mass change to global mean sea level change can be determined.
The sea level budget is considered “closed” when the sum of these independent components agrees with direct measurements of total sea level, meaning that the body of existing observations is sufficient to interpret the causes of sea level change. Only recently, thanks to the assembled efforts of many individual studies across the interdisciplinary fields that contribute to sea level science, has the sea level budget been closed within quantified uncertainties—an achievement that testifies to our adequate understanding of processes influencing sea level, and their uncertainties, at the global scale [Frederikse et al., 2020].
This example illustrates that determining the magnitude of uncertainties associated with ocean data is necessary not only so that these data can be used meaningfully in scientific investigations—uncertainty quantification makes the data complete. In other words, uncertainty quantification is necessary to evaluate the confidence, or, equivalently, the doubt, one can have in ocean data.
Challenges with Ocean Data
Uncertainty is a major focus of metrology, the science of measurement, and standards for uncertainty quantification are well cataloged in documents in that field. These documents should serve as starting points for oceanographers to lay out a strategy for quantifying the uncertainties in their data [e.g., Joint Committee for Guides in Metrology, 2008].
Some concepts that are applicable to bench measurements are difficult to translate to the oceanographer’s laboratory—the ocean—because the ocean and the climate system in which it is embedded are constantly changing.
Yet some concepts that are applicable to bench measurements in metrology, such as being able to repeat observations under the exact same conditions, are difficult to translate to the oceanographer’s laboratory—the ocean—because the ocean and the climate system in which it is embedded are constantly changing. For example, repeat sampling of hydrographic properties (e.g., temperature, salinity, oxygen) in some remote parts of the ocean has occurred only after decades, if at all. And some high-resolution, global, numerical ocean models can be run only once because of prohibitive computational costs, so the statistical distributions of their output under different initial conditions are unknown.
There are also challenges distinct to oceanography and related fields. Satellite measurements offer indirect estimates of ocean surface properties that are calibrated and validated with in situ observations, yet these “cal/val” exercises are burdened by multiple sources of uncertainty. One such source is representation error, which can arise because pointwise in situ measurements (e.g., of sea surface temperature) do not always agree with satellite measurements, which represent averages of physical quantities over the satellite’s ground footprint. In this case, the measured values disagree because they represent different quantities, and natural variability at short spatial scales masquerades as a possible error in either measurement. Uncertainty related to representation error can be understood only by combining geophysical theories, extensive observations, and methodological knowledge.
In oceanography, as in other fields, the example of representation errors illustrates the necessity to identify sources of errors correctly and to strive to characterize them with appropriate and traceable uncertainties. This is a challenge because the classification of uncertainties (or errors) based on established statistical principles does not necessarily and readily map onto the idiosyncratic classifications used in ocean science. As an example, biases (systematic errors) and random errors are often conflated in ocean observations for lack of appropriate knowledge.
Another example is in climate and ocean modeling, for which there is a need to consider separately structural or model uncertainties and uncertainties due to the chaotic behavior of the Earth system [National Research Council, 2012]. Further, when models and observations are combined to generate state estimates or forecasts, confidence in their outputs can be accurately assessed only if observational uncertainties are available and are carefully propagated through the machinery of data assimilation, in which models and their output are repeatedly updated to incorporate new observations [Leutbecher and Palmer, 2008].
Effective communication of uncertainties among observationalists, modelers, and theoreticians is thus essential. This communication requires coordinated efforts and standardized protocols among these groups—a tall order considering that different oceanographic disciplines have traditionally been insular and have used distinct vocabularies to describe uncertainty. Such disconnects might be remedied if ocean scientists put greater emphasis on training in statistical sciences and collaboration with experts in that field.
eos.org