In a recent posting on his website The Morning Claret, Simon Woolf thoughtfully discusses the current status of wine competitions. The key elements of “discussion, retasting, and encouragement” among judges that frequently lead to re-evaluations and collective learning experiences are fascinating and promising. It is undeniable that humans are open to engaging in discussions or debates about taste and to being persuaded rationally, whether it’s through requests to reconsider a different perspective, to note previously missed features, or to understand the cohesive representation of characteristics—potentially making the entire experience more aesthetically pleasing than the mere sum of its parts. However, the reliance on point scores in such assessments arguably overlooks unfounded presumptions.
Woolf highlights the widespread presence of “medal thresholds” in wine competitions, which utilize point scores. However, a sound and believable explanation of group scoring must assume the comparability of the judges’ scores, which is fantastical and its implausibility merely masked by the common usage of numbers. For instance, imagine both of us tasting the same St-Joseph under similar conditions and seeking each other’s score if obliged to give. If I rate it 90 and you rate it 88, does that definitively mean I enjoyed the wine more, value the craftsmanship higher, or consider it a better representation of its kind (like northern Rhône, St-Joseph, Syrah…) than you? Absolutely not. The number difference could merely reflect varying perceptions of craftsmanship, aesthetic value, or gratification that each score conveys. And how can one determine or deduce the reason behind these scoring differences?
The dilemma seen in wine competitions is reminiscent of how doctors ask patients to rate their pain on a scale from one to ten, and from this, try to infer their “tolerance” of pain. How can they dismiss the apparent possibility that different patients might simply have different interpretations of what number accurately represents a particular sensation—or, to delve deeper into epistemology, how can it be dismissed that experiencing a certain amount of pressure from a vise on one’s finger (or, similarly, the same wine tasting experience) is innately perceived differently between individuals? Even if neurophysiologists believe they can tackle this challenge, it would be contingent on questionable theoretical assumptions.
The dilemma presented through the pain example is akin to issues with the 100-point wine scale. If “10” or “100” represent the pinnacle, using these metrics implies that one cannot conceive of a more intense pain or more perfect wine. What occurs if a wine or pain exceeds previous experiences? This would necessitate recalibrating earlier scores downward.
Related
The problem of measuring with comparable standards is illustrated by the circumstances that led to Robert Parker’s adoption of the 100-point scale. This scale was previously used in American educational systems, a practice traceable to as early as 1837 at Harvard. A notable historical reference is an 1853 session described in The Western Horticultural Review and Botanical Magazine, which discussed Cincinnati’s esteemed wine growers as well as the use of a similar scale for academic evaluations over a century prior to the launch of The Wine Advocate.
In American education, few grading distinctions are as critical as those between the grades D and F, where F indicates failure. However, the exact threshold for failing can vary significantly by institution, sometimes as low as 50 or as high as 70. Institutions, and even individual educators, determine their own standards. They tailor assessments not only to reflect grades but also to align with their judgment of students’ understanding. When test scores differ broadly from a teacher’s expectations of students’ capabilities, these subjective assessments often override numerical scores and lead to graded “curves.” Wine judging may involve a similar reliance on intuition when determining whether a wine merits a gold medal versus being a “90-point wine.”
Is it possible to achieve mutual calibration? Theoretically, it is feasible. For instance, in the United States, educators involved in assessing Advanced Placement Exams are educated using specific paradigms. However, these are designed for highly consistent test scenarios and are intended to represent only six integer scores. Attempting to adapt this method to the varied world of wine rating, where evaluators must abandon their typical scoring methods for compatibility on a 100-point scale, poses a significant challenge. Furthermore, proposing a universal standard to average numerical evaluations from different wine critics seems nearly as dubious as attempting to average the distances between two points measured in kilometers, miles, leagues, and furlongs.