An effect leading directly from an item's true score to a downstream variable may either supplement or counteract this indirect effect. An item's total effect is the sum of its direct and indirect effects, so a positive direct effect supplements a positive indirect effect and indicates the item has a stronger impact on the downstream variable than can be accounted for by the scale alone.

A negative direct effect counteracts a positive indirect effect and indicates the scale provides an unwarrantedly strong connection between the item and downstream variable. For Leadership three of the four direct effects of items on downstream variables are negative, indicating that requiring these items to work through the Leadership scale produces artificially and inappropriately strong estimates of these items' effects on the applicable downstream variables Table 4. The lone positive direct effect indicates one item Mentors should be granted a stronger impact on a downstream variable Like Working Here than the Leadership scale permits.

The guaranteed-weak indirect effects of items acting through scales are susceptible to being overshadowed by effects leading directly from the items to downstream variables. All three negative direct item effects in the amended Leadership model, for example, are stronger than the items' small-positive effects carried through the Leadership scale. Two of these direct item effects essentially nullify the corresponding indirect effects, but the third produces a noticeable net negative reversed impact Table 4. The Leadership scale's validity is clearly questioned whenever an item's direct effect nullifies or reverses an effect purportedly attributable to the scale containing that item.

Direct effects substantially enhancing an item's indirect effect through the scale similarly question the scale e. Only four of 42 possible direct effects of the six items on the seven downstream variables are required in the enhanced Leadership model but these effects clearly recommend theoretical reconsideration of the Leadership scale.

The involvement of several different scale items and several different outcome variables make the theory challenges somewhat awkward. This interpretation is inconsistent with the amended model's estimates because additional causes leading to the scale variable do not explain the original Leadership scale.

The new effects redefine the scale such that it only partially corresponds to the original Leadership scale. The original scale was defined as. Retaining the same fixed item effects that defined the Leadership scale while adding a new variable's effect changes the equation to. A predictor variable in an equation does not explain another predictor in that equation, so any additional cause does not explain the original scale, it redefines the scale. Explaining original Leadership would require explaining the items averaged to create the original Leadership scale.

The downstream variables will usually be included in the model because they are directly caused by the scale, so enhancing a model by adding an effect leading from a downstream variable back to the scale is likely to introduce a causal loop. Though somewhat unusual, causal loops are understandable and not particularly statistically problematic Hayduk, Chapter 8; Hayduk, Chapter 3. A more fundamental concern is that even this single causal loop ensnares Leadership in a causal web that renders it impossible to define or measure Leadership without modeling the appropriate looped causal structure.

A variable that was formerly an effect of Leadership becomes both a cause and effect of New-Leadership—and that new causal embeddedness renders standard measurement procedures inappropriate. That stymies traditional scale score calculations even though it employs the same observed variables and permits valid investigation of the causal connections between the scale items, the scale, and the downstream variables.

We now briefly consider the fusion validity of the Leadership scale using data from health care aides in the Canadian province of Manitoba. The Manitoba baseline Leadership model, like the Alberta baseline model, was highly significantly inconsistent with the data Table 2. The remaining alterations differ between the Alberta and Manitoba models, including the challenging loop-creating effect, and these clearly warrant additional investigation. But rather than pursuing the substantive details of these Leadership models, we turn to more general technicalities involved in assessing fusion validity.

We developed fusion validity to investigate scales developed by researchers participating in TREC Translating Research into Elder Care studies of residents and care aides in long-term care facilities Estabrooks et al. We thank one of our reviewers for encouraging us to report and reference connections between fusion validity and various threads within the statistical and methodological literature. Fusion validity's grounding in causal networks places it closer to the causal-formative rather than composite-formative indicators discussed by Bollen and Bauldry , and fusion validity's dependence on context-dependent theory distances it from some components of traditional classical test theory.

The inclusion of both a scale and its items within the same model provides an opportunity to reassess the points of friction evident in exchanges between Hardin and Bollen and Diamantopoulos The points are too diverse and complex for us to resolve, though we hope our comments below provide helpful direction. Fusion validity's dependence on embedding the scale in an appropriate causal context raises potential technical as well as theoretical concerns.

The baseline model may fit, or fail to fit, and either result may prove problematic. A fitting baseline model containing unreasonable estimates questions whether the control and downstream variables are sufficiently well-understood to be entrusted with scale adjudication. Nothing forbids a few mild modifications to initially-failing baseline models but it may be technically tricky to avoid inserting coefficients more appropriately regarded as scale-confronting. Reasonable modifications might rectify downstream variables' causal interconnections, or exogenous control variables' connections to the downstream variables, but ferreting out whether or not a modification questions the scale may prove difficult.

For example, if a control variable correlates substantially with an item's true-scores the modification indices may equivocate between whether the control variable or the item effects a downstream variable, and thereby equivocate between whether the researcher is confronting scale-compatible or scale-incompatible evidence. Baseline models having complicated interconnections among the downstream variables, or unresolved issues with multiple indicators of control or downstream variables are likely to prove particularly challenging.

Neophytes may have difficulty recognizing, let alone resisting, coefficients that could lead to inappropriately obtained model fit, especially knowing that persistent baseline model failure questions their scale. Validity requires consistency with our understandings, but when our modeled understandings whether in a baseline or amended model are problematic, concern for validity transmutes into concern for the fundamental commitments underlying scientific research. Standardized residual covariances typically provide diagnostic direction, but they provided minimal assistance in fusion validity assessments because the scale latent variable and the item true-score latents have no direct indicators and consequently contribute only indirectly to the covariance residuals.

Furthermore, the residual covariance ill fit among the scale items should be essentially zero because the model's structure nearly guarantees that the estimated covariances among the item true scores should reproduce the observed item covariances irrespective of the number or nature of the items' sources. The issue addressed by fusion validity is not the source of the items but whether the items causally combine into a scale that is unidimensional in its production of downstream variables.

Fusion validity is not about the dimensionality of the scale variable. The issue is the causal fidelity of fusing the potentially-diverse items into a unidimensional variable capable of transmitting the potentially-diverse items' effects to the downstream variables.

Here the most useful diagnostics are the modification indices and expected parameter change statistics. A large, not merely marginally-significant, modification index for an item's effect on a downstream variable, combined with an implicationally-understandable expected parameter change statistic, would suggest including a coefficient speaking against the scale. The magnitude and sign of the expected parameter change statistic for an item's direct effect should be understandable in the context of the indirect effect that the item transmits through the scale as discussed in regard to Figure 3.

A scale-bypassing effect speaks against the thoroughness of the encapsulation provided by the scale but if the world contains multiple indirect effect mechanisms Albert et al. Unreasonably-signed scale bypassing effects speak more clearly against the scale.

If one specific item requires stronger or weaker effects on multiple downstream variables, and if the required effect adjustments are nearly proportional to the scale's effects, that might be accommodated by strengthening or weakening the item's fixed effect on the scale. For example, a substantial modification index corresponding to one item's fixed 0. Similarly, if the baseline model contained fixed unequal item weightings, large modification indices for some weights might recommend reweighting the items. It should be clear that an amended model requiring a direct effect of an item's true-score on a downstream variable is not equivalent to, and should not be described as, having altered the item's contributions via the scale.

Effects transmitted via the scale must spread proportionately to all the variables downstream from the scale. An effect leading from one item to a specific downstream variable disrupts the scale's proportional distribution requirement for that specific pairing of an item and downstream variable. The proportionality constraints on the other items' effects via the scale on the downstream variables are also slightly loosened by the scale-bypassing effect but the greater the number of items and scale-affected downstream variables the feebler the loosening of these constraints.

Each additional scale-bypassing effect progressively, even if minimally, loosens the proportionality constraints on all the items' effects on the downstream variables via the scale. This suggests an accumulation of minor constraint relaxations resulting from multiple scale-bypassing effects in an amended model might constitute holistic scale-misrepresentation. A substantial modification index might also be connected to the fixed zero variance assigned to the residual variable that causes the scale—namely the zero resulting from the absence of an error variable in the item-averaging equation constructing the scale.

A substantial modification index here suggests some currently unidentified variable may be fusing with the modeled scale items, or that there are some other unmodeled common causes of the downstream variables. A scale known to be incomplete due to unavailability of some specific cause might warrant assigning the scale's residual variance a fixed nonzero value, or possibly a constrained value. The scale's residual variance might even be freed if sufficient downstream variables were available to permit estimation.

A nonzero residual variance should prompt careful consideration of the missed-variable's identity. The potential freeing of the scale's residual variance clearly differentiates fusion validity from confirmatory composite analysis, which by definition forbids each composite from receiving effects from anything other than a specified set of indicators Schuberth et al.

Indeed, the potential freeing of the scale's residual variance pinpoints a causal conundrum in confirmatory composite analysis—namely how to account for the covariance-parameters connecting composites without introducing any additional effects leading to any composite Schuberth et al. This is rendered a non-issue by fusion validity's causal epistemological foundation.

The relevant modeling alternatives will be context-specific but likely of substantial theoretical and academic interest. The fixed measurement error variances on the observed items might also require modification but the implications of erroneous values of this kind are likely to be difficult to detect, and could probably be more effectively investigated by checking the model's sensitivity to alternative fixed measurement error variance specifications. Attending to modification indices, or moving to a Bayesian mode of assessment, would implicitly sidle toward exploration, which nibbles at the edges of validity, so especially-cautious and muted interpretations would likely be advisable.

- Services on Demand?
- Autism, Advocates, and Law Enforcement Professionals: Recognizing and Reducing Risk Situations for People with Autism Spectrum Disorders;
- Measurement Theory in Action?
- Login using;
- The Word and the World: Indias Contribution to the Study of Language!
- Measurement in Science (Stanford Encyclopedia of Philosophy).
- Test theory without true scores? | SpringerLink.

Other technicalities might arise because the scale variable and the item true score variables have no direct indicators, which forces the related model estimates to depend on indirect causal connections to the observed indicators. The scale's effects on the downstream variables, for example, are driven by the observed covariances between the items' indicators and the indicators of the downstream variables because the scale's effects provide the primary even if indirect causal connections between these sets of observed indicators. The absence of direct latent to indicator connections may produce program-specific difficulties, as when the indicatorless item true score latents stymied LISREL's attempts to provide start values for these covariances Joreskog and Sorbom, This particular technicality is easily circumvented by providing initial estimates approximating the corresponding items' observed variances and covariances.

This statistical annoyance arises because the measurement error variance in each item unavoidably contributes to the scale. In the extreme, a fusion validity model might specify all the item measurement error variance as dead-ending in the indicators so the scale is created from fixed effects arriving from the items' true-scores. This would correspond to moving the fixed effects currently leading to the scale from the observed-items to the true-score items in Figure 1 , and would permit investigating how a scale would function if it was purified of indicator measurement errors.

This version of the fusion validity model would attain the epitome of scale construction—a scale freed from measurement errors—which is unattainable in contexts employing actual error-containing items. It would be possible to simultaneously assess the fusion validity of two or more different scales constructed from a single set of items if the model contains downstream variables differentially responding to those scales. Importantly, factor score indeterminacy does not hinder fusion validity assessments.

Indeed, if the items were modeled as being caused by a common factor rather than as having separate latent causes as illustrated , fusion-validity modeling of the scale would provide a potentially informative estimate of the correlation between the factor and the scale now factor scores. We should also note that fusion validity surpasses composite invariance testing Henseler et al. In general, replacing items with parcels disrupts the item-level diagnostics potentially refining fusion validity models, and hence is not advised.

Fusion validity's theory-emphasis does not end with formulation of appropriate baseline and amended models—it may extend into the future via consideration of what should be done next. For example, one author CE was concerned that the demand for parsimony during data collection resulted in omission of causes of leadership, and she was uneasy about employing downstream latents having single indicators instead of similarly named scales having multiple indicators. These seemingly methodological concerns transform into theory-options as one considers exactly how a supposedly-missed cause should be incorporated in an alternative baseline model—namely is the missed variable a control variable, a downstream variable, or possibly an instantiation of the scale's residual variable?

These have very different theoretical and methodological implications. Similar detailed theoretical concerns arise from considering how an additional-scale, or multiple indicators used by others as a scale, should be modeled by a researcher investigating a focal scale such as Leadership. Fusion validity models are unlikely to provide definitive-finales for their focal scales but rather are likely to stand as comparative structural benchmarks highlighting precise and constructible theoretical alternatives.

An advance in theory-precision is likely, irrespective of the focal scale's fate. A scale's fusion validity is assessed by simultaneously modeling the scale and its constituent items in the context of appropriate theory-based variables. Fusion validity presumes the items were previously assessed for sufficient variance, appropriate wordings, etcetera, and that a specific scale-producing procedure exists or has been proposed whether summing, averaging, factor score weightings, or conjecture.

This makes the scale's proximal causal foundations known because the researcher knows how they produce, or anticipate producing, scale values from the items, but whether the resultant scale corresponds to a unidimensional world variable appropriately fusing and subsequently dispensing the items' effects to downstream variables awaits fusion validity assessment. Fusion validity circumvents the data collinearity between a scale and its constituent items by employing only the items as data while incorporating the scale as a latent variable known through its causal foundations and consequences.

The scale is modeled as encapsulating and fusing the items, and as subsequently indirectly transmitting the items' impacts to the downstream variables. An item effect bypassing the scale by running directly to a downstream variable signals the scale's inability to appropriately encapsulate that item's causal powers. The fixed effects leading from the items to the scale are dictated by the item averaging, summing, or weighting employed in calculating the scale's values.

The effects leading from the scale to the downstream variables are unashamedly, even proudly, theory-based because validity depends upon consistency with current theoretical understandings Cronbach and Meehl, ; Hubley and Zumbo, ; American Educational Research Association, The unavoidable collinearity between item and scale data ostensibly hindered checking the synchronization between items, scales, and theory-recommended variables—a hindrance overcome by the fusion validity model specification presented here.

It is clear how items caused by a single underlying factor might fuse into a unidimensional scale. The consistent true-score components of the items accumulate and concentrate the underlying causal factor's value while random measurement errors in the items tend to cancel one another out. The simplicity and persuasiveness of this argument switched the historical focus of scale validity assessments toward the factor structuring of the causal source of the items and away from the assessment of whether some items fuse to form a scale entity.

Fusion validity examines whether the items fuse to form a unitary variable irrespective of whether or not the items originate from a common causal factor. That is, fusion validity acknowledges that the world's causal forces may funnel and combine the effects of items even if those items do not share a common cause. It is possible for non-redundant items failing to satisfy a factor model to nonetheless combine into a unidimensional scale displaying fusion validity.

For example, the magnitude of gravitational, mechanical, and frictional forces do not have a common factor cause, yet these forces combine in producing the movement of objects. The causal world might similarly combine diverse psychological or social attributes into unidimensional entities such as Leadership ability, or the like. And the reverse is also possible. Items having a common cause and satisfying the factor model may, or may not, fuse into valid scales.

## New Ideas in Psychology

That is, items sharing a common cause do not necessarily have common effects. In brief, fusion validity focuses on whether the items' effects combine, meld, or fuse into an effective unidimensional scale entity irrespective of the nature of the items' causal foundations. If a researcher believes their items share a common factor cause and also fuse into a scale dimension, it is easy to replace the item true-score segment of the fusion validity model with a causal factor structure.

Such a factor-plus-fusion model introduces additional model constraints and is more restrictive than the illustrated fusion validity model specification. Fusion validity can therefore be applied to both reflective and formative indicators. Evidence confronting a scale arises when a failing baseline model must be amended: by introducing item effects bypassing the scale on the way to downstream variables, by introducing additional effects leading to the scale, by altering the fixed effects constituting the scale's calculation, or by altering the error variance specifications.

An effect leading directly from an item to a downstream variable alters the understanding of the scale irrespective of whether that effect supplements or counteracts the item's indirect effect through the scale. Either way, the scale is demonstrated as being incapable of appropriately encapsulating the item's causal consequences, and hence retaining both the item and scale may be required for a proper causal understanding.

An item effect bypassing the scale does not necessarily devastate the scale because it is possible for several items to fuse into an appropriate scale entity having real effects and yet require supplementation by individual item effects. Items having direct effects on downstream variables that cancel out or radically alter the item's indirect effect via the scale are more scale-confronting.

Scale-bypassing effects and other model modifications encourage additional theory precision—precision which is likely to constitute both the most challenging and the most potentially-beneficial aspect of fusion validity assessment. Amending the baseline model by introducing an additional effect leading to the scale variable—namely an effect beyond the originally scale-defining item effects—produces a new and somewhat different, but potentially correct, scale variable.

The new effect does not explain the original scale. Both the original scale and new-scale are fully explained because both scales typically have zero residual error variance. They are just different fully explained variables which possess and transmit somewhat different effects.

The new scale variable may retain the ability to absorb and transmit the original items' effects to the downstream variables but the new scale is also capable of absorbing and transmitting the actions of the additional causal variable. The researcher's theory should reflect a scale's changing identity.

Both theory and methods are likely to be challenged by attempting to expunge the old scale scores from the literature—especially since the new scale's scores would not be calculable in existing data sets lacking the new scale-defining variable. Both theory and methods are likely to be more strongly challenged if model alteration requires effects leading to the scale from downstream variables because such effects are likely to introduce causal loops.

Loops provide substantial, though surmountable, theory challenges Hayduk, , , ; Hayduk et al. A model can contain as many equations as are required to properly model looped causal actions but the single equation required for calculating a scale's scores becomes unavoidably misspecified if the equation contains one of the scale's effects as a contributory component. If a substantial modification index calls for a loop-producing effect that effect would likely be identified. In contrast, theory-proposed looped effects may prove more difficult to identify Nagase and Kano, ; Wang et al.

The requirement that valid scales function causally appropriately when embedded in relevant theoretical contexts implicitly challenges factor models for having insufficient latent-level structure to endorse scale validity. Adding latents implicitly challenges the multiple indictors touted by factor analysis because adding latents while retaining the same indicators sidles toward single indicators Hayduk and Littvay, Researchers from factor analytic backgrounds are likely to find it comparatively easy to sharpen their model testing skills but will probably encounter greater difficulty pursuing theoretical alternatives involving effects among additional similar latent variables, or appreciating how items having diverse causal backgrounds might nonetheless combine into an effective unidimensional causal entity—such as leadership, trust, stress, or happiness.

The tight coordination between theory and scale validity assessment provides another illustration of why measurement should accompany, not precede, theoretical considerations Cronbach and Meehl, ; Hayduk and Glaser, a , Hayduk and Glaser, b. Scales were traditionally justified as more reliable than single indicators, and as easier to manage than a slew of indicators. Both these justifications crumble however, if the scale's structure is importantly causally misspecified, because invalidity undermines reliability, and because a causal-muddle of indicators cannot be managed rationally.

In medical contexts, for example, it is unacceptable to report a medical trial's outcome based on a problematic criterion scale, but equally unacceptable to throw away the data and pretend the scale-based trial never happened. This dilemma underpins the call for CONSORT the Consolidated Standards for Reporting Trials to instruct researchers on how to proceed if a scale registered as a medical trial's criterion measure is found to misbehave Downey et al.

Ultimately, avoiding iatrogenic consequences requires a proper causal, not merely correlational, understanding of the connections linking the items, the scale, the downstream variables, and even the control variables. Pearl and Mackenzie and Pearl present clear and systematic introductions to thinking about causal structures and why control variables deserve consideration.

One of our reviewers pointed us toward a special issue of the journal Measurement focused on causal indicators and issues potentially relating to fusion validity. We disagree with enough points in both the target article by Aguirre-Urreta et al. Try to follow the consequences of the Aguirre-Urreta et al.

- Planned Giving for Small Nonprofits.
- OV-10 Bronco in Action?
- Kindle Fire HD: The Missing Manual;
- Body at Home: A Simple Plan to Drop 10 Pounds;

It should also prove instructive to notice the emergent focus on measurement's connection to substantive theory—and not just measurement traditions. The assessment of fusion validity illustrated above slightly favors the scale by initially modeling the scale's presumed effects, and by permitting baseline model modifications which potentially, even if inadvertently, assist the scale. A scale-unfriendly approach might begin with a baseline model permitting some scale-bypassing item effects, while excluding all the scale's effects on the downstream variables until specific scale effects are demanded by the data.

However done, models assessing whether a set of items fuse to form a scale will depend on theory, will focus attention on theory, and will provide opportunities to correct problematic theoretical commitments. Fusion validity shares traditional concerns for item face validity and methodology but requires variables beyond the items included in the scale—specifically variables causally downstream from the postulated scale but possibly control variables which may be upstream of the items.

Fusion validity permits but does not require that the scaled items have a common factor cause, or even that the items correlate with one another. Traditional formulations make reliability a prerequisite for validity but some forms of reliability are not a prerequisite for fusion validity because fusion validity does not share a factor-model basis.

It does require that the items fuse or meld in forming the scale according to the researcher's specifications. This means the researcher must be as attentive to the possibility of faulty theory as to faulty scaling—which seems to be an unavoidable concomitant of the strong appeal to theory required by seeking validity. Fusion validity's inclusion in the model of theory-based variables along with both the items and scale permits many assessments unavailable to traditional analyses, and potentially recommends correspondingly diverse theory, scale, and item improvements.

Complexity abounds, so only those strong in both their theory and structural equation modeling need apply. Embedding a scale in deficient theory will highlight the deficiencies, while embedding a scale in trustworthy theory will provide unparalleled validity assessments.

Fusion validity assessment does not guarantee progress but provides a way to investigate whether our scales coordinate with our causal understandings, and a way to check whether traditional scale assessments have served us well. TREC is a pan-Canadian applied longitudinal ongoing health services research program in residential long term care. The TREC umbrella covers multiple ethics-reviewed studies designed to investigate and improve long term care. The appended LISREL syntax contains the covariance data matrix sufficient for replicating the Alberta estimates or estimating alternative models.

Ethics approval was obtained by the Translating Research in Elder Care team from both universities and all the institutions and participants participating in the reported studies. LH conceived the analytical procedure, conducted the analyses, wrote the draft article, and revised the article incorporating coauthor suggestions. CE and MH critically assessed the article and suggested revisions. All authors contributed to manuscript revision, read and approved the submitted version.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The authors thank Greta Cummings, Elizabeth Anderson, and Genevieve Thompson for suggesting the downstream variables that should be used; Mike Gillespie for thought-provoking discussions; and Joseph Akinlawon and Ferenc Toth for data and archive assistance.

Aguirre-Urreta, M. Omission of causal indicators: consequences and implications for measurement. Measurement 14, 75— Albert, J. Generalized causal mediation and path analysis: Extensions and practical considerations. Methods Med. American Educational Research Association Standards for Educational and Psychological Testing.

Bollen, K. Structural Equations with Latent Variables. Google Scholar. Three Cs in measurement models: causal indicators, composite indicators, and covariates. It is to this debate we now turn. One of the main catalysts for the development of mathematical theories of measurement was an ongoing debate surrounding measurability in psychology. These differences were assumed to be equal increments of intensity of sensation.

This law in turn provides a method for indirectly measuring the intensity of sensation by measuring the intensity of the stimulus, and hence, Fechner argued, provides justification for measuring intensities of sensation on the real numbers. Those objecting to the measurability of sensation, such as Campbell, stressed the necessity of an empirical concatenation operation for fundamental measurement.

Since intensities of sensation cannot be concatenated to each other in the manner afforded by lengths and weights, there could be no fundamental measurement of sensation intensity. Moreover, Campbell claimed that none of the psychophysical regularities discovered thus far are sufficiently universal to count as laws in the sense required for derived measurement Campbell in Ferguson et al. All that psychophysicists have shown is that intensities of sensation can be consistently ordered, but order by itself does not yet warrant the use of numerical relations such as sums and ratios to express empirical results.

The central opponent of Campbell in this debate was Stevens, whose distinction between types of measurement scale was discussed above. In useful cases of scientific inquiry, Stevens claimed, measurement can be construed somewhat more narrowly as a numerical assignment that is based on the results of matching operations, such as the coupling of temperature to mercury volume or the matching of sensations to each other.

Stevens argued against the view that relations among numbers need to mirror qualitative empirical structures, claiming instead that measurement scales should be regarded as arbitrary formal schemas and adopted in accordance with their usefulness for describing empirical data. Such assignment of numbers to sensations counts as measurement because it is consistent and non-random, because it is based on the matching operations performed by experimental subjects, and because it captures regularities in the experimental results.

In the mid-twentieth century the two main lines of inquiry in measurement theory, the one dedicated to the empirical conditions of quantification and the one concerning the classification of scales, converged in the work of Patrick Suppes ; Scott and Suppes ; for historical surveys see Savage and Ehrlich ; Diez a,b. RTM defines measurement as the construction of mappings from empirical relational structures into numerical relational structures Krantz et al.

An empirical relational structure consists of a set of empirical objects e. Simply put, a measurement scale is a many-to-one mapping—a homomorphism—from an empirical to a numerical relational structure, and measurement is the construction of scales. Each type of scale is associated with a set of assumptions about the qualitative relations obtaining among objects represented on that type of scale. From these assumptions, or axioms, the authors of RTM derive the representational adequacy of each scale type, as well as the family of permissible transformations making that type of scale unique.

In this way RTM provides a conceptual link between the empirical basis of measurement and the typology of scales. On the issue of measurability, the Representational Theory takes a middle path between the liberal approach adopted by Stevens and the strict emphasis on concatenation operations espoused by Campbell. Like Campbell, RTM accepts that rules of quantification must be grounded in known empirical structures and should not be chosen arbitrarily to fit the data.

However, RTM rejects the idea that additive scales are adequate only when concatenation operations are available Luce and Suppes Instead, RTM argues for the existence of fundamental measurement operations that do not involve concatenation. Here, measurements of two or more different types of attribute, such as the temperature and pressure of a gas, are obtained by observing their joint effect, such as the volume of the gas.

Luce and Tukey showed that by establishing certain qualitative relations among volumes under variations of temperature and pressure, one can construct additive representations of temperature and pressure, without invoking any antecedent method of measuring volume. This sort of procedure is generalizable to any suitably related triplet of attributes, such as the loudness, intensity and frequency of pure tones, or the preference for a reward, it size and the delay in receiving it Luce and Suppes Under this new conception of fundamentality, all the traditional physical attributes can be measured fundamentally, as well as many psychological attributes Krantz et al.

Above we saw that mathematical theories of measurement are primarily concerned with the mathematical properties of measurement scales and the conditions of their application. A related but distinct strand of scholarship concerns the meaning and use of quantity terms.

A realist about one of these terms would argue that it refers to a set of properties or relations that exist independently of being measured. An operationalist or conventionalist would argue that the way such quantity-terms apply to concrete particulars depends on nontrivial choices made by humans, and specifically on choices that have to do with the way the relevant quantity is measured. Note that under this broad construal, realism is compatible with operationalism and conventionalism. That is, it is conceivable that choices of measurement method regulate the use of a quantity-term and that, given the correct choice, this term succeeds in referring to a mind-independent property or relation.

Nonetheless, many operationalists and conventionalists adopted stronger views, according to which there are no facts of the matter as to which of several and nontrivially different operations is correct for applying a given quantity-term. These stronger variants are inconsistent with realism about measurement. This section will be dedicated to operationalism and conventionalism, and the next to realism about measurement.

The strongest expression of operationalism appears in the early work of Percy Bridgman , who argued that. Length, for example, would be defined as the result of the operation of concatenating rigid rods. According to this extreme version of operationalism, different operations measure different quantities. Nevertheless, Bridgman conceded that as long as the results of different operations agree within experimental error it is pragmatically justified to label the corresponding quantities with the same name Operationalism became influential in psychology, where it was well-received by behaviorists like Edwin Boring and B.

Skinner As long as the assignment of numbers to objects is performed in accordance with concrete and consistent rules, Stevens maintained that such assignment has empirical meaning and does not need to satisfy any additional constraints. Nonetheless, Stevens probably did not embrace an anti-realist view about psychological attributes.

Instead, there are good reasons to think that he understood operationalism as a methodological attitude that was valuable to the extent that it allowed psychologists to justify the conclusions they drew from experiments Feest For example, Stevens did not treat operational definitions as a priori but as amenable to improvement in light of empirical discoveries, implying that he took psychological attributes to exist independently of such definitions Stevens Operationalism met with initial enthusiasm by logical positivists, who viewed it as akin to verificationism.

Nonetheless, it was soon revealed that any attempt to base a theory of meaning on operationalist principles was riddled with problems. Among such problems were the automatic reliability operationalism conferred on measurement operations, the ambiguities surrounding the notion of operation, the overly restrictive operational criterion of meaningfulness, and the fact that many useful theoretical concepts lack clear operational definitions Chang Accordingly, most writers on the semantics of quantity-terms have avoided espousing an operational analysis.

A more widely advocated approach admitted a conventional element to the use of quantity-terms, while resisting attempts to reduce the meaning of quantity terms to measurement operations. Mach noted that different types of thermometric fluid expand at different and nonlinearly related rates when heated, raising the question: which fluid expands most uniformly with temperature?

According to Mach, there is no fact of the matter as to which fluid expands more uniformly, since the very notion of equality among temperature intervals has no determinate application prior to a conventional choice of standard thermometric fluid. Conventionalism with respect to measurement reached its most sophisticated expression in logical positivism. These a priori , definition-like statements were intended to regulate the use of theoretical terms by connecting them with empirical procedures Reichenbach 14—19; Carnap Ch.

In accordance with verificationism, statements that are unverifiable are neither true nor false. Instead, Reichenbach took this statement to expresses an arbitrary rule for regulating the use of the concept of equality of length, namely, for determining whether particular instances of length are equal Reichenbach At the same time, coordinative definitions were not seen as replacements, but rather as necessary additions, to the familiar sort of theoretical definitions of concepts in terms of other concepts Under the conventionalist viewpoint, then, the specification of measurement operations did not exhaust the meaning of concepts such as length or length-equality, thereby avoiding many of the problems associated with operationalism.

Realists about measurement maintain that measurement is best understood as the empirical estimation of an objective property or relation. A few clarificatory remarks are in order with respect to this characterization of measurement. Rather, measurable properties or relations are taken to be objective inasmuch as they are independent of the beliefs and conventions of the humans performing the measurement and of the methods used for measuring.

For example, a realist would argue that the ratio of the length of a given solid rod to the standard meter has an objective value regardless of whether and how it is measured. Third, according to realists, measurement is aimed at obtaining knowledge about properties and relations, rather than at assigning values directly to individual objects. This is significant because observable objects e. Knowledge claims about such properties and relations must presuppose some background theory. By shifting the emphasis from objects to properties and relations, realists highlight the theory-laden character of measurements.

Realism about measurement should not be confused with realism about entities e. Nor does realism about measurement necessarily entail realism about properties e. Nonetheless, most philosophers who have defended realism about measurement have done so by arguing for some form of realism about properties Byerly and Lazara ; Swoyer ; Mundy ; Trout , These realists argue that at least some measurable properties exist independently of the beliefs and conventions of the humans who measure them, and that the existence and structure of these properties provides the best explanation for key features of measurement, including the usefulness of numbers in expressing measurement results and the reliability of measuring instruments.

The existence of an extensive property structure means that lengths share much of their structure with the positive real numbers, and this explains the usefulness of the positive reals in representing lengths. Moreover, if measurable properties are analyzed in dispositional terms, it becomes easy to explain why some measuring instruments are reliable. A different argument for realism about measurement is due to Joel Michell , , who proposes a realist theory of number based on the Euclidean concept of ratio.

According to Michell, numbers are ratios between quantities, and therefore exist in space and time. Specifically, real numbers are ratios between pairs of infinite standard sequences, e. Measurement is the discovery and estimation of such ratios. An interesting consequence of this empirical realism about numbers is that measurement is not a representational activity, but rather the activity of approximating mind-independent numbers Michell Realist accounts of measurement are largely formulated in opposition to strong versions of operationalism and conventionalism, which dominated philosophical discussions of measurement from the s until the s.

In addition to the drawbacks of operationalism already discussed in the previous section, realists point out that anti-realism about measurable quantities fails to make sense of scientific practice. By contrast, realists can easily make sense of the notions of accuracy and error in terms of the distance between real and measured values Byerly and Lazara 17—8; Swoyer ; Trout A closely related point is the fact that newer measurement procedures tend to improve on the accuracy of older ones. If choices of measurement procedure were merely conventional it would be difficult to make sense of such progress.

In addition, realism provides an intuitive explanation for why different measurement procedures often yield similar results, namely, because they are sensitive to the same facts Swoyer ; Trout Finally, realists note that the construction of measurement apparatus and the analysis of measurement results are guided by theoretical assumptions concerning causal relationships among quantities. The ability of such causal assumptions to guide measurement suggests that quantities are ontologically prior to the procedures that measure them. While their stance towards operationalism and conventionalism is largely critical, realists are more charitable in their assessment of mathematical theories of measurement.

Brent Mundy and Chris Swoyer both accept the axiomatic treatment of measurement scales, but object to the empiricist interpretation given to the axioms by prominent measurement theorists like Campbell and Ernest Nagel ; Cohen and Nagel Ch. Rather than interpreting the axioms as pertaining to concrete objects or to observable relations among such objects, Mundy and Swoyer reinterpret the axioms as pertaining to universal magnitudes, e.

Moreover, under their interpretation measurement theory becomes a genuine scientific theory, with explanatory hypotheses and testable predictions. Despite these virtues, the realist interpretation has been largely ignored in the wider literature on measurement theory.

Information-theoretic accounts of measurement are based on an analogy between measuring systems and communication systems. The accuracy of the transmission depends on features of the communication system as well as on features of the environment, i. The accuracy of a measurement similarly depends on the instrument as well as on the level of noise in its environment.

Conceived as a special sort of information transmission, measurement becomes analyzable in terms of the conceptual apparatus of information theory Hartley ; Shannon ; Shannon and Weaver Ludwik Finkelstein , and Luca Mari suggested the possibility of a synthesis between Shannon-Weaver information theory and measurement theory. As they argue, both theories centrally appeal to the idea of mapping: information theory concerns the mapping between symbols in the input and output messages, while measurement theory concerns the mapping between objects and numbers.

If measurement is taken to be analogous to symbol-manipulation, then Shannon-Weaver theory could provide a formalization of the syntax of measurement while measurement theory could provide a formalization of its semantics. Nonetheless, Mari also warns that the analogy between communication and measurement systems is limited. Information-theoretic accounts of measurement were originally developed by metrologists with little involvement from philosophers.

Metrologists typically work at standardization bureaus or at specialized laboratories that are responsible for the calibration of measurement equipment, the comparison of standards and the evaluation of measurement uncertainties, among other tasks. It is only recently that philosophers have begun to engage with the rich conceptual issues underlying metrological practice, and particularly with the inferences involved in evaluating and improving the accuracy of measurement standards Chang ; Boumans a: Chap.

Further philosophical work is required to explore the assumptions and consequences of information-theoretic accounts of measurement, their implications for metrological practice, and their connections with other accounts of measurement. Independently of developments in metrology, Bas van Fraassen — has recently proposed a conception of measurement in which information plays a key role. He views measurement as composed of two levels: on the physical level, the measuring apparatus interacts with an object and produces a reading, e.

Measurement locates an object on a sub-region of this abstract parameter space, thereby reducing the range of possible states and This reduction of possibilities amounts to the collection of information about the measured object. Since the early s a new wave of philosophical scholarship has emerged that emphasizes the relationships between measurement and theoretical and statistical modeling. The central goal of measurement according to this view is to assign values to one or more parameters of interest in the model in a manner that satisfies certain epistemic desiderata, in particular coherence and consistency.

A central motivation for the development of model-based accounts is the attempt to clarify the epistemological principles underlying aspects of measurement practice. For example, metrologists employ a variety of methods for the calibration of measuring instruments, the standardization and tracing of units and the evaluation of uncertainties for a discussion of metrology, see the previous section. Traditional philosophical accounts such as mathematical theories of measurement do not elaborate on the assumptions, inference patterns, evidential grounds or success criteria associated with such methods.

As Frigerio et al. By contrast, model-based accounts take scale construction to be merely one of several tasks involved in measurement, alongside the definition of measured parameters, instrument design and calibration, object sampling and preparation, error detection and uncertainty evaluation, among others —7.

Other, secondary interactions may also be relevant for the determination of a measurement outcome, such as the interaction between the measuring instrument and the reference standards used for its calibration, and the chain of comparisons that trace the reference standard back to primary measurement standards Mari Although measurands need not be quantities, a quantitative measurement scenario will be supposed in what follows. Two sorts of measurement outputs are distinguished by model-based accounts [JCGM 2.

As proponents of model-based accounts stress, inferences from instrument indications to measurement outcomes are nontrivial and depend on a host of theoretical and statistical assumptions about the object being measured, the instrument, the environment and the calibration process. Measurement outcomes are often obtained through statistical analysis of multiple indications, thereby involving assumptions about the shape of the distribution of indications and the randomness of environmental effects Bogen and Woodward — Measurement outcomes also incorporate corrections for systematic effects, and such corrections are based on theoretical assumptions concerning the workings of the instrument and its interactions with the object and environment.

Systematic corrections involve uncertainties of their own, for example in the determination of the values of constants, and these uncertainties are assessed through secondary experiments involving further theoretical and statistical assumptions. Moreover, the uncertainty associated with a measurement outcome depends on the methods employed for the calibration of the instrument. Calibration involves additional assumptions about the instrument, the calibrating apparatus, the quantity being measured and the properties of measurement standards Rothbart and Slayden ; Franklin ; Baird Ch.

Finally, measurement involves background assumptions about the scale type and unit system being used, and these assumptions are often tied to broader theoretical and technological considerations relating to the definition and realization of scales and units. These various theoretical and statistical assumptions form the basis for the construction of one or more models of the measurement process.

Measurement is viewed as a set of procedures whose aim is to coherently assign values to model parameters based on instrument indications. Models are therefore seen as necessary preconditions for the possibility of inferring measurement outcomes from instrument indications, and as crucial for determining the content of measurement outcomes.

As proponents of model-based accounts emphasize, the same indications produced by the same measurement process may be used to establish different measurement outcomes depending on how the measurement process is modeled, e. As Luca Mari puts it,. Similarly, models are said to provide the necessary context for evaluating various aspects of the goodness of measurement outcomes, including accuracy, precision, error and uncertainty Boumans , a, , b; Mari b. Model-based accounts diverge from empiricist interpretations of measurement theory in that they do not require relations among measurement outcomes to be isomorphic or homomorphic to observable relations among the items being measured Mari Indeed, according to model-based accounts relations among measured objects need not be observable at all prior to their measurement Frigerio et al.

Instead, the key normative requirement of model-based accounts is that values be assigned to model parameters in a coherent manner. The coherence criterion may be viewed as a conjunction of two sub-criteria: i coherence of model assumptions with relevant background theories or other substantive presuppositions about the quantity being measured; and ii objectivity, i. The first sub-criterion is meant to ensure that the intended quantity is being measured, while the second sub-criterion is meant to ensure that measurement outcomes can be reasonably attributed to the measured object rather than to some artifact of the measuring instrument, environment or model.

Taken together, these two requirements ensure that measurement outcomes remain valid independently of the specific assumptions involved in their production, and hence that the context-dependence of measurement outcomes does not threaten their general applicability. Besides their applicability to physical measurement, model-based analyses also shed light on measurement in economics.

Like physical quantities, values of economic variables often cannot be observed directly and must be inferred from observations based on abstract and idealized models. The nineteenth century economist William Jevons, for example, measured changes in the value of gold by postulating certain causal relationships between the value of gold, the supply of gold and the general level of prices Hoover and Dowell —; Morgan Taken together, these models allowed Jevons to infer the change in the value of gold from data concerning the historical prices of various goods. The ways in which models function in economic measurement have led some philosophers to view certain economic models as measuring instruments in their own right, analogously to rulers and balances Boumans , c, , a, , a; Morgan Marcel Boumans explains how macroeconomists are able to isolate a variable of interest from external influences by tuning parameters in a model of the macroeconomic system.

This technique frees economists from the impossible task of controlling the actual system. As Boumans argues, macroeconomic models function as measuring instruments insofar as they produce invariant relations between inputs indications and outputs outcomes , and insofar as this invariance can be tested by calibration against known and stable facts. Another area where models play a central role in measurement is psychology.

The measurement of most psychological attributes, such as intelligence, anxiety and depression, does not rely on homomorphic mappings of the sort espoused by the Representational Theory of Measurement Wilson These models are constructed from substantive and statistical assumptions about the psychological attribute being measured and its relation to each measurement task.

For example, Item Response Theory, a popular approach to psychological measurement, employs a variety of models to evaluate the validity of questionnaires. One of the simplest models used to validate such questionnaires is the Rasch model Rasch New questionnaires are calibrated by testing the fit between their indications and the predictions of the Rasch model and assigning difficulty levels to each item accordingly. The model is then used in conjunction with the questionnaire to infer levels of English language comprehension outcomes from raw questionnaire scores indications Wilson ; Mari and Wilson Psychologists are typically interested in the results of a measure not for its own sake, but for the sake of assessing some underlying and latent psychological attribute.

It is therefore desirable to be able to test whether different measures, such as different questionnaires or multiple controlled experiments, all measure the same latent attribute. A construct is an abstract representation of the latent attribute intended to be measured, and.

Constructs are denoted by variables in a model that predicts which correlations would be observed among the indications of different measures if they are indeed measures of the same attribute. Several scholars have pointed out similarities between the ways models are used to standardize measurable quantities in the natural and social sciences. Others have raised doubts about the feasibility and desirability of adopting the example of the natural sciences when standardizing constructs in the social sciences.

### Recommended For You

As Anna Alexandrova points out, ethical considerations bear on questions about construct validity no less than considerations of reproducibility. Such ethical considerations are context sensitive, and can only be applied piecemeal. Examples of Ballung concepts are race, poverty, social exclusion, and the quality of PhD programs. Such concepts are too multifaceted to be measured on a single metric without loss of meaning, and must be represented either by a matrix of indices or by several different measures depending on which goals and values are at play see also Cartwright and Bradburn In a similar vein, Leah McClimans argues that uniformity is not always an appropriate goal for designing questionnaires, as the open-endedness of questions is often both unavoidable and desirable for obtaining relevant information from subjects.

Rather than emphasizing the mathematical foundations, metaphysics or semantics of measurement, philosophical work in recent years tends to focus on the presuppositions and inferential patterns involved in concrete practices of measurement, and on the historical, social and material dimensions of measuring.

In the broadest sense, the epistemology of measurement is the study of the relationships between measurement and knowledge. Central topics that fall under the purview of the epistemology of measurement include the conditions under which measurement produces knowledge; the content, scope, justification and limits of such knowledge; the reasons why particular methodologies of measurement and standardization succeed or fail in supporting particular knowledge claims, and the relationships between measurement and other knowledge-producing activities such as observation, theorizing, experimentation, modelling and calculation.

In pursuing these objectives, philosophers are drawing on the work of historians and sociologists of science, who have been investigating measurement practices for a longer period Wise and Smith ; Latour Ch. The following subsections survey some of the topics discussed in this burgeoning body of literature.

A topic that has attracted considerable philosophical attention in recent years is the selection and improvement of measurement standards. Generally speaking, to standardize a quantity concept is to prescribe a determinate way in which that concept is to be applied to concrete particulars. This duality in meaning reflects the dual nature of standardization, which involves both abstract and concrete aspects. In Section 4 it was noted that standardization involves choices among nontrivial alternatives, such as the choice among different thermometric fluids or among different ways of marking equal duration.

Appealing to theory to decide which standard is more accurate would be circular, since the theory cannot be determinately applied to particulars prior to a choice of measurement standard. A drawback of this solution is that it supposes that choices of measurement standard are arbitrary and static, whereas in actual practice measurement standards tend to be chosen based on empirical considerations and are eventually improved or replaced with standards that are deemed more accurate. A new strand of writing on the problem of coordination has emerged in recent years, consisting most notably of the works of Hasok Chang , , and Bas van Fraassen Ch.

These works take a historical and coherentist approach to the problem. Rather than attempting to avoid the problem of circularity completely, as their predecessors did, they set out to show that the circularity is not vicious. Chang argues that constructing a quantity-concept and standardizing its measurement are co-dependent and iterative tasks.

The pre-scientific concept of temperature, for example, was associated with crude and ambiguous methods of ordering objects from hot to cold. Thermoscopes, and eventually thermometers, helped modify the original concept and made it more precise. With each such iteration the quantity concept was re-coordinated to a more stable set of standards, which in turn allowed theoretical predictions to be tested more precisely, facilitating the subsequent development of theory and the construction of more stable standards, and so on.

From either vantage point, coordination succeeds because it increases coherence among elements of theory and instrumentation. It is only when one adopts a foundationalist view and attempts to find a starting point for coordination free of presupposition that this historical process erroneously appears to lack epistemic justification The new literature on coordination shifts the emphasis of the discussion from the definitions of quantity-terms to the realizations of those definitions.

JCGM 5. Examples of metrological realizations are the official prototypes of the kilogram and the cesium fountain clocks used to standardize the second. Recent studies suggest that the methods used to design, maintain and compare realizations have a direct bearing on the practical application of concepts of quantity, unit and scale, no less than the definitions of those concepts Tal forthcoming-a; Riordan As already discussed above Sections 7 and 8. On the historical side, the development of theory and measurement proceeds through iterative and mutual refinements.

On the conceptual side, the specification of measurement procedures shapes the empirical content of theoretical concepts, while theory provides a systematic interpretation for the indications of measuring instruments. This interdependence of measurement and theory may seem like a threat to the evidential role that measurement is supposed to play in the scientific enterprise. After all, measurement outcomes are thought to be able to test theoretical hypotheses, and this seems to require some degree of independence of measurement from theory.

This threat is especially clear when the theoretical hypothesis being tested is already presupposed as part of the model of the measuring instrument. To cite an example from Franklin et al. There would seem to be, at first glance, a vicious circularity if one were to use a mercury thermometer to measure the temperature of objects as part of an experiment to test whether or not objects expand as their temperature increases.

Nonetheless, Franklin et al. The mercury thermometer could be calibrated against another thermometer whose principle of operation does not presuppose the law of thermal expansion, such as a constant-volume gas thermometer, thereby establishing the reliability of the mercury thermometer on independent grounds. To put the point more generally, in the context of local hypothesis-testing the threat of circularity can usually be avoided by appealing to other kinds of instruments and other parts of theory. A different sort of worry about the evidential function of measurement arises on the global scale, when the testing of entire theories is concerned.

As Thomas Kuhn argues, scientific theories are usually accepted long before quantitative methods for testing them become available. The reliability of newly introduced measurement methods is typically tested against the predictions of the theory rather than the other way around. Hence, Kuhn argues, the function of measurement in the physical sciences is not to test the theory but to apply it with increasing scope and precision, and eventually to allow persistent anomalies to surface that would precipitate the next crisis and scientific revolution.

Note that Kuhn is not claiming that measurement has no evidential role to play in science. The theory-ladenness of measurement was correctly perceived as a threat to the possibility of a clear demarcation between the two languages. Contemporary discussions, by contrast, no longer present theory-ladenness as an epistemological threat but take for granted that some level of theory-ladenness is a prerequisite for measurements to have any evidential power. Without some minimal substantive assumptions about the quantity being measured, such as its amenability to manipulation and its relations to other quantities, it would be impossible to interpret the indications of measuring instruments and hence impossible to ascertain the evidential relevance of those indications.

This point was already made by Pierre Duhem —6; see also Carrier 9— Moreover, contemporary authors emphasize that theoretical assumptions play crucial roles in correcting for measurement errors and evaluating measurement uncertainties. Indeed, physical measurement procedures become more accurate when the model underlying them is de-idealized, a process which involves increasing the theoretical richness of the model Tal This problem is especially clear when one attempts to account for the increasing use of computational methods for performing tasks that were traditionally accomplished by measuring instruments.

As Margaret Morrison and Wendy Parker forthcoming argue, there are cases where reliable quantitative information is gathered about a target system with the aid of a computer simulation, but in a manner that satisfies some of the central desiderata for measurement such as being empirically grounded and backward-looking. Such information does not rely on signals transmitted from the particular object of interest to the instrument, but on the use of theoretical and statistical models to process empirical data about related objects.

For example, data assimilation methods are customarily used to estimate past atmospheric temperatures in regions where thermometer readings are not available. These estimations are then used in various ways, including as data for evaluating forward-looking climate models. Two key aspects of the reliability of measurement outcomes are accuracy and precision. Consider a series of repeated weight measurements performed on a particular object with an equal-arms balance.

JCGM 2. Though intuitive, the error-based way of carving the distinction raises an epistemological difficulty. It is commonly thought that the exact true values of most quantities of interest to science are unknowable, at least when those quantities are measured on continuous scales. If this assumption is granted, the accuracy with which such quantities are measured cannot be known with exactitude, but only estimated by comparing inaccurate measurements to each other.

And yet it is unclear why convergence among inaccurate measurements should be taken as an indication of truth. After all, the measurements could be plagued by a common bias that prevents their individual inaccuracies from cancelling each other out when averaged. In the absence of cognitive access to true values, how is the evaluation of measurement accuracy possible?

At least five different senses have been identified: metaphysical, epistemic, operational, comparative and pragmatic Tal —5. Instead, the accuracy of a measurement outcome is taken to be the closeness of agreement among values reasonably attributed to a quantity given available empirical data and background knowledge cf.

Thus construed, measurement accuracy can be evaluated by establishing robustness among the consequences of models representing different measurement processes. Under the uncertainty-based conception, imprecision is a special type of inaccuracy. The imprecision of these measurements is the component of inaccuracy arising from uncontrolled variations to the indications of the balance over repeated trials.

Other sources of inaccuracy besides imprecision include imperfect corrections to systematic errors, inaccurately known physical constants, and vague measurand definitions, among others see Section 7. Paul Teller b raises a different objection to the error-based conception of measurement accuracy. Teller argues that this assumption is false insofar as it concerns the quantities habitually measured in physics, because any specification of definite values or value ranges for such quantities involves idealization and hence cannot refer to anything in reality.

Removing these idealizations completely would require adding infinite amount of detail to each specification. As Teller argues, measurement accuracy should itself be understood as a useful idealization, namely as a concept that allows scientists to assess coherence and consistency among measurement outcomes as if the linguistic expression of these outcomes latched onto anything in the world. The author is also indebted to Joel Michell and Oliver Schliemann for useful bibliographical advice, and to John Wiley and Sons Publishers for permission to reproduce excerpt from Tal Measurement in Science First published Mon Jun 15, Overview 2.

## MESA Note 8: Campbell's Theory of Fundamental Measurement

Quantity and Magnitude: A Brief History 3. Operationalism and Conventionalism 5. Realist Accounts of Measurement 6. Information-Theoretic Accounts of Measurement 7. Model-Based Accounts of Measurement 7. The Epistemology of Measurement 8. Overview Modern philosophical discussions about measurement—spanning from the late nineteenth century to the present day—may be divided into several strands of scholarship. The following is a very rough overview of these perspectives: Mathematical theories of measurement view measurement as the mapping of qualitative empirical relations to relations among numbers or other mathematical entities.

Information-theoretic accounts view measurement as the gathering and interpretation of information about a system. Quantity and Magnitude: A Brief History Although the philosophy of measurement formed as a distinct area of inquiry only during the second half of the nineteenth century, fundamental concepts of measurement such as magnitude and quantity have been discussed since antiquity. Bertrand Russell similarly stated that measurement is any method by which a unique and reciprocal correspondence is established between all or some of the magnitudes of a kind and all or some of the numbers, integral, rational or real.

Operationalism and Conventionalism Above we saw that mathematical theories of measurement are primarily concerned with the mathematical properties of measurement scales and the conditions of their application. The strongest expression of operationalism appears in the early work of Percy Bridgman , who argued that we mean by any concept nothing more than a set of operations; the concept is synonymous with the corresponding set of operations. Realist Accounts of Measurement Realists about measurement maintain that measurement is best understood as the empirical estimation of an objective property or relation.

Information-Theoretic Accounts of Measurement Information-theoretic accounts of measurement are based on an analogy between measuring systems and communication systems. Model-Based Accounts of Measurement Since the early s a new wave of philosophical scholarship has emerged that emphasizes the relationships between measurement and theoretical and statistical modeling. Indications may be represented by numbers, but such numbers describe states of the instrument and should not be confused with measurement outcomes, which concern states of the object being measured.

As Luca Mari puts it, any measurement result reports information that is meaningful only in the context of a metrological model, such a model being required to include a specification for all the entities that explicitly or implicitly appear in the expression of the measurement result. Bibliography Alder, K. Alexandrova, A. Angner, E. Bruni, F. Comim, and M. Pugno eds. Barnes ed. Baird, D. Bogen, J. Boring, E. Bridgman, H. Feigl, H. Israel, C. C Pratt, and B. Boumans, M. Westerstahl eds.

Leonelli, and K. Eigner, Pittsburgh: University of Pittsburgh Press, pp. Hartmann, and S. Okasha eds. Bridgman, P.