Freebies Dept: The conveyor belt keeps on conveying...

Post by **Capn Jimbo** » Thu Dec 05, 2013 4:42 pm

How can this be? Wherein lies this largesse?

Perhaps the Rum Project is the most intelligent rum website on the net, not least due to our small but extremely competent cadre of posters, moi excepted. Or perhaps not. And it's not like we hate rum here - quite the opposite. Last and as for the reviews here, they are honest in the sense that most rums' scores fall nicely into a credible normal distribution, the well known bell curve. The Project remains entirely independent and free of commercial influence. In the distribution of rum scores you'll find but a few great rums, and likewise just a few really awful ones. Like everything else in life...

Everything tends to roughly centerabout an average, with ever fewer rums on either side of the median. That's life. Still, of the perhaps 200 spirits we've reviewed I can remember only two that were offered to us. So how is it the Artic Wolf gets literally hundreds of freebies, one after another in conveyor belt fashion? It's like the scene in "I Love Lucy" where Lucy and Ethel get jobs at a candy factory, where they are totally inept - especially at wrapping chocolates - due to a speeding conveyor belt that has them stuffing chocolates in their mouths, blouses, and hats. And so it is for the ravenous predator.

Now its not the goal to pick on this reviewer but seeing as how he publishes reviews like Whammo sold hula hoops, it begs our attention. Recently he listed all his rum review scores on a single page, thus saving untold hours and making an analysis easy and possibles. And since this site had not been reviewed for a very long time, it seemed obligatory to determine whether possible bias was perhaps still evident. Or not.

Take a look...

Based on his reviews of 119 "aged" rums, if his scoring was normal, it'd look like this...

. . . . . . .

Yup, the old bell curve we all know and love, especially if we were on the right side of it, lol.

Nice and normal, with scores centered around the median, and ever fewer rums as scores became higher or lower. In the real world this correlates with A-B-C-D-F, 1 to 5 stars, 50-99, or Poor-fair-average-good-great. "Average" is considered to be "C", "3-stars" or "75" as expressed in by far the most common and well understood scales (all based on the Standard American Grading System). So how did this reviewer's scores arrange themselves on the same common scale?

. . . . . . .

.

I'll save you the spreadsheet but if no horrible mistakes were made, the average score is an amazing 87.6, about 13 points higher than most would expect. Worse yet, based on the usual scoring systems, there wasn't a single rum found "poor", only one found "fair" and just a paltry five rums were identified as what most of us understand as "average". At the same time 74 rums fell into the typical "good" range and an absolutely astounding 39 products scored in the stratospheric "great"range of 90-99!

Is it any wonder that he gets freebies? The distributors and distillers know that the Wolf's actual "average" score is close to ninety points and further that 94% of his reviews will fall into one of what most reviewers in the top two categories, "good" or "great", standard 4-star or 5-star categories.

But there's a problem here. These scores are not based on a standard breakdown, but on a very unusual system of the reviewer's own design. Thus, an "88" here will not compare to the the "88" of the BTI, F. Paul Pacult, Ralfy, Robert Parker or that of almost any other reviewers, all of whom rely on a version of the American standard system.

Is this is by design?

You see what this reviewer has done is to invent his own rather curious scoring system. His scores run from 0 to 100 in eleven increasingly smaller ranges (?). Even though this entire range of 0 to 100 is proposed as available for scoring, the lowest reported score I could find for any aged rum was 68.5 (so apparently the scores and ranges from 0 to 68 don't count). Then what are arbitrarily called "mixers" seem assigned to be given scores from 70 to 84. The scoring guide says so. The range of 85 to 89 appears to be a crossover range designated for scoring excellent mixers and possible sippers, while all rums designated as "primarily sippers" get at least a 90.

It's quite a mish mosh and there's really nothing else quite like it on the net. The big curiosity is just why anyone would reject general practice and well established de facto standards that everyone understands.

There no way to really compare this McGuyver'd, duct tape scoring system to what we all know and expect from the usual and customary reviews from most other reviewers. That said my dear friends you may now understand why the actual average score reported by this reviewer is 87.6, and also why the distillers can't seem to send him product fast enough. They know that based on the general understanding and acceptance by the public of the standard ranges (F-A, 1-5 stars, 50-100), that this site's average of 87.6 sounds a whole lot better than "C", "3-stars" or "75" the same rum might get elsewhere.

But fair is fair...

After all, it may just be happenstance that this reviewer's system is so darned high. He may have been completely well intended. So if I may, I've devised a conversion chart to make it easier for monkeys to properly compare scores to those scales used by practically every other reviewer of almost any product:

1-Star: This reviewer's 69-74
2-Star: 75-80
3-Star: 81-87
4-Star: 88-93
5-Star: 94-100

Kudo's to Robert Parker. The only completely unbiased way to reinterpret this reviewer's scores is to use the actual reported scores from his lowest of 69 to 100, and thence to subdivide this actual range into five equally spaced sub-ranges. We can then fairly compare his own actual range and scores and see how or if they are weighted.

By doing so it's then possible to make a fair comparison with the de facto American standard systems that we expect. This reviewer's scores may now be fairly compared to his own actual scores divided into the usual five equal ranges.

That has now been done. Splice the main brace!

Post by **Capn Jimbo** » Thu Dec 05, 2013 5:53 pm

With all good intentions, there's a furball in my soup...

As noted above, I first analyzed this reviewer's scores based on the usual scales (50-100, 1-5 Stars, A-B-C-D-F, Great/good/average/fair/poor), and what did we get?

. . . . . . .

Not pretty as there were almost no rums scoring 1 to 3-stars, but a great prepondence of 4 to 5-star scores (94% of them).. Not good at all, but are we comparing apples to oranges? After all, this reviewer had created his own customized scoring chart that was far, far remote from almost all other standards and needed to examined on its own. Was there a way to do this? Of course and simply to...

Use his very own actual scores - regardless of what they were - from his actual low to his own highest possible score, and divide them into five equal ranges. This overall range turned out to be a low 69 to a high 98. This was fairly divided into five equal subranges between 69 to the usual 100. I even created a nice Conversion Chart for a handy reference:

Conversion Chart

1-Star: This reviewer's 69-74
2-Star: 75-80
3-Star: 81-87
4-Star: 88-93
5-Star: 94-100

Not bad, but the acid test? Let's chart it all again, now using what amounts to his own scores divided into five equal ranges...

. . . . . . .

So what happened? To be fair, by using his own actual low and high, things are a bit better. Still a notable bias to the high side remains and there's really no real way to get rid of it. Last and no matter what we are all forced to compare the reported scores with those of other reviewers and they are simply not the same.

This reviewer's "90 "is not that of most anyone else. In truth it's a whole lot closer to what others would give a 77, ie a pretty average score. So what to do? Simple. His average is 87.6, compared to a standard scale average of 75.

Thus perhaps the fairest conversion is also the simplest: subtract 13 points from his reported scores, and realize they may still be high.

I'd give that 87.5 howls...

*******
Special Question: How the hell does anyone actually determine a half point difference on a 100 point scale? Fack if I know...

sleepy · Post by **sleepy** » Thu Dec 05, 2013 9:44 pm

..."Special Question: How the hell does anyone actually determine a half point difference on a 100 point scale? Fack if I know..."

Well, if you have a sample of 10,000, a difference of 1/2 point might be statistically significant - but meaningful - damned rarely?

Actually, what you showed is something resembling a log-normal distribution (although excessively skewed) - i.e. take the natural log of his scores and plot them = closer to normal.

This is common when there is more space at the low end of a scale (body weight) with a moderately hard cap on the high end.

In Wolfie's case - low scores are not permissible unless from sponsor competitors, while he cannot give more than 100 by custom. (note your review of Pappy!)

...or something.

Then again. the first and final question is whether the scale examined measures anything of real interest. Sure, a blind man can numerically rate the color saturation of HD screens or a deaf man, performances of the Goldberg Variations. Not much useful there.

A better example - IQ. The most repeatable and reliable (in a formal sense) measurement in psychology, but... The universe of IQ tests are all based on a set of tasks that almost exclusively male, Euro-American eggheads felt most represented the important cognitive tasks that would cumulatively represent intellectual potential. Sadly, IQ is just plain UNCORRELATED with achievement. How would these "geniuses" do, cumulatively if dropped deep in the mountains with a pocket knife and 3' of twine?

Oops - they vainly asked the wrong question incorrectly and through nothing but vanity succeeded in burying their pitiful preconceptions into modern psychology and education.

<dope slaps self> Sorry, got off-topic. But not far.

Simply put, I hang out here because the ratings mean something - like Tom's reviews on coffee @ Sweet Maria's, by providing clear descriptions of rums (known or unknown to me), I know what to look for when trying something new. Screw the numbers (yours are probably log-normal as well). By highlighting the strengths and weaknesses of a spirit in your brief reviews, you provide something useful. Jimbo number, Wolfie number, IQ - alone all are useless crap! The information is in the description.

*******
Capn's Log: Great post, see below. Actually in the de facto American system as has now become ubiquitous in reviewing there is indeed a hard cap at both ends of the scale. At the high end it's 100, 5-stars, A or Excellent. At the low end it's 50, 1-star, F or Poor. The granddaddy of all systems is the American Standard Grading System of A-B-C-D-F, as adopted by Robert Parker, the father of modern spirits ratings. See below...

Post by **Capn Jimbo** » Fri Dec 06, 2013 12:35 am

An brief overview...

...might now be worthwhile.. The history of rating scales is very interesting and can be attributed to Robert Parker of the Wine Advocate. There is a wonderful book called "Questions of Taste, The Philosophy of Wine" edited by Barry Smith. It is quite a deep and thoughtful book which brings a great deal of light to wine, wine reviewing and reviewing in general.

Parker first gained fame as the gentleman who went against the mass of the then prestigious wine reviewers when he praised the 1982 Bordeaux as exquisite, when all the others thought it a failure. In time he was proved right, and became the most respected judge and reviewer of wine in the world.

At about that time he established his now famous journal - The Wine Advocate - and established a rating system he based on the Standard American Grading System with which all sentient persons are intimately familiar, namely A-B-C-D-F. He felt this was effective for two reasons: first that its distribution is "normal", ie the bell curve with which we are all familiar and second, that this system is by far the most common and immediately understood by all.

When you get a "C", you know your score was "average", not "good" ("B") or "excellent" ("A"). The highest honor in education is Summa Cum Laude, earned by only a small percentage of students. If most students earned this honor, the teacher would be fired. Let's continue. Parker's system was based on a 100 point scale, but intended that all of his reported scores - like the American Standard system - would likewise be reported from 50 to 100 in five equal subranges in exact accord with the Standard.

This was so simple and obvious that is was amazing it hadn't happened before. Following his lead his system with only minor variations was so widely adopted so as to become the de facto standard for rating reviews of all manner of wines, spirits, cheeses and the like.

So why is a normal distribution important?

Think about it - if a reviewer's scores are all in the 90's you'd surely have second thoughts about his or her honesty or the ability to distinguish quality. All wines or rums cannot be great. In analyzing and reporting any group of items, if competently examined most should earn a middle or average score, with fewer earning higher or lower scores, and very few receiving the highest or lowest scores.

This is the common and statistically expected normal distribution which is true for almost anything: shoe sizes, IQ, penis length (be honest) and yes, spirit scores, lol....

While a reviewer can set up any scale they'd like, it's only really useful and fully understandable if it follows the American standard. For example a reviewer could score rums from 1029 to 990 with a "990" being the best score. Really, you could do that. But in reality such a custom scale would be completely counter intuitive with the lowest score being the best rum. If on top of that all the rums scored in the low 990's, this would hardly be helpful. Capish? And because this unique system did not relate to the common American standard any score this reviewer would report would be hard to quote, to understand or to compare.

So when is a "90" not a 90?

The problem with the instant reviewer's scale is that his "90" is almost everyone else's "77". To him an 85 is a mixer and primary sippers only begin at 90. Almost any rum will garner an 87, a very good number under normal circumstances. But these numbers simply have a hard time being computed by the general public who are expecting standard system scoring. Ask yourself, is it accidental that his bias is toward giving scores what many will interpret as "good" and "excellent", with fewer "average" scores? Is it coincidental that he receives hundreds of free samples? That he was invited to visit Guyana's El Dorado? That he was likewise invited to judge at the Rum Renaissance and become one of Burr's inner circle?

You decide.

Spirit and its sales, marketing, distribution and promotion has become a a Big Boyz Club, where the price of admission is an "...it's all good or great" attitude. The only difference for commercially oriented reviewers is where and how to draw the line and get the product and invitations they need. Is this the case here?

You decide. In our case we are rarely offered samples and we don't seek them out. And is anyone really surprised? On the one or two occasions when we did get a sample, I can report a guilty sense of obligation - it's harder to be honestly negative with a freebie. We like our independence and our ability to provide a place where honest and intelligent people can share ideas and perspectives without fear or intimidation.

But I digress.

A need for common scales? Yes!

Using a common scale is particularly important as most consumers compare ratings among different reviewers who mostly use the most common system - the de facto American standard - and whose reviews are fairly and believably distributed. Most of us seek out a reviewer or two that we then come to respect, and use his or her scores to point us toward new "good" or "great" finds.

For example it is natural to follow the numbers and to seek out and read especially the 4 to 5-star reviews. No one hopes to add to their 3-star collections, do we? Instead we tend to seek top quality and especially wonderful and interesting spirits as suggested by a reviewer we know and trust, and whose distributions are normal.

So "points" absolutely do matter as a form of shorthand, but only in context.

A standard system and points alone are not enough...

This is Sleepy's point and it's quite valid.

Even Parker would agree and takes Sleep's point even further, to wit: "There can never be any substitute for your own palate nor any education better than tasting the wine yourself". Still it is fair to say that in this economy most of us are hesitant to spend good money to experiment in Parker's fashion, but first try to get a handle on the possibilities via a good and fair review.

If a reviewer you respect publishes a fascinating and appealing review along with a rare great score well, that gets our attention and probably our dollars. A low score and disenchanting review has exactly the opposite effect.

Speaking of reviewers and trust...

Robert Parker is one of the few reviewer's reviewers and perhaps the most trusted name in wine for a number of reasons:

1. He truly loves wine and is both a taster and a passionate drinker.

2. He considers himself a consumer advocate.

3. He is identified as a gentleman of honor.

4. His publication is by subscription and he does not take advertising.

5. He does not accept gifts, or invitations to visit.

6. He does not speculate in wine.

7. He insists on tasting in the privacy of his own home, alone and without any social or business pressure.

8. His reports are simple and understandable, without the typical vague verbosity.

9. He challenges the system and does not respect or particpate in the politics or heirarchies of wine, but remains stubbornly independent.

10. He has a reputation for accuracy and consistency, honesty and independence.

Descriptions alone are not enough. I once posted three reviews which by description appeared to be three very different rums. All were of the very same rum. Without a simple point score it would be far more difficult to assess a candidate for purchase. We want our trusted reviewer to commit and to put an overall number to it as a measure of confidence.

To me the ideal review is by a reviewer whose distributions are normal enough that I know his or her best scores and high recommendations are truly earned, whose reviews are accessible, competent, precise and understandable and last whose tastes match my own to some degree. I am speaking of the Parkers and reviewers like Robinson, Jackson, Broom and Ralfy to name a few. There are others, my apologies.

Once we have been so fortunate to identify a trusted reviewer, we need to be open minded enough to then learn from him or her and to take a chance now and again.