[eDebate] My thoughts 50 point scale at Wake

William J Repko repkowil at msu.edu
Thu Nov 1 19:27:19 CDT 2007


For reasons that I'll spell-out in a moment, I do not support a long-term
move to a 50 point scale.

That said, the most important item I'd like to raise in this post is that --
irrespective of my personal opinion of the system -- I will abide by the
guidelines set-forth by the tournament.

Point # 1 Going unilateral = bad

Seems to ruin the experiment. Even if you sense the experiment will prove a
bad idea, doesn't it still seem useful to have good data on the
effectiveness of the *experiment* ?...

Also seems to defy the basic idea of being *invited* to a tourney. Wake
asked us to their party -- we should strive to be reasonably gracious
guests.

Point # 2 Trying to avoid accidentally jacking the scale

There simply won't be too many iconoclasts actively seeking to ruin the new
scale.

But, I fear non-iconoclasts could accidentally hurt the experiment. In hopes
of sparking a consistent read of the new scale, I included my read of it. If
people read Ross's scale differently, let's talk about it before -- not
after -- the tourney.

Assuming the scale Ross posted earlier today holds:
(http://www.ndtceda.com/pipermail/edebate/2007-November/072820.html)

a) I'll probably issue zero or next-to-zero 49's or 50's. More likely the
former.

It's fairly easy to "imagine a better performance". If the tabsheet has as
many 50's-as-typical 30's, or (gulp) shows as many 49's as 29's, then people
probably missed the point.

b) I won't import the 30 point model.

There are debaters to whom I consistently issue a 28.5, but are sorta
"closer" to 29.0 than a 28.0.

Suppose one of these debaters really stepped-it-up. I might have given them
a 29.0, but I think I am not going to give them a 49.0

c) I'll keep in mind that roughly 30 teams break to elims of the NDT.

Thus the second category "NDT elim worthy performance" is larger than one
might expect.

In fact, in *many* debates I judge, at least one person has at least an
"early NDT elim round performance".

Therefore, I think it's more likely-than-not that most of my ballots will
award at least one of the competitors a 47 (or higher).

That said, a close read of Ross's scale has a 47 applying to BOTH the bottom
of the second tier and the top of third tier. This makes sense, as roughly
as many team clear at WFU as clear at the NDT.

It seems, then, that a 48 (for me) will be issued to students that give a
performance that is worthy of quarters or later performance at the NDT/Wake.
A 47 will be for something akin to an octas/doubles performance. A 46 seems
to be warranted if someone put-forth a "bubble-clearing" performance for a
typical major. A 45 is feels like a performance akin those of a strong
4-4ish team or weak 5-3ish team.

d) If I judge two completely likeable but inexperienced debaters, I will
not be afraid to issue 42 points.

If I sense they put-forth a 2-6ish style showing, that seems to be a
generous read of the new scale.

MANY such points should be issued at the Wake tourney.

To put it another way, it would be highly abnormal to judge a 4 round
commitment at the Wake tournament and NOT issue MORE THAN one set of 43's
(or lower). You are QUITE LIKELY to judge a few teams that are currently
"below-average" relative to the field. If you refuse to give such points,
you are inflating the scale.

Point # 3 -- the reason I do not favor a long-term move to a 50-point scale.

I favor an different experiment. It uses a revised judge variance-scheme for
issuing speaker points (and as the first-tiebreaker for clearing).

Ross's post begins with a critique of "judge variance".

I think judge variance is only meaningless b/c the sample size for variance
is currently set to track only variance *within* the Shirley tournament (or
any given tourney).

I, however, feel that judge variance can be tweaked. I'd prefer an
experiment that uses season-long (or, even better, career-long) judge
variance by accessing archives from Bruschke's system.

Under this system, it would not matter if the Russian judge (Hardy) gives
everyone "low-points" (9.3 or lower to every gymnast). What matters is that
variation is meaningful within that judge's scale.

To contextualize, many of us like Aaron Hardy as a judge and we KNOW that a
29.0 from him means SO MUCH MORE than someone that gives out 29.0 like
candy. Why should speaker awards (or clearing) be a referendum on the luck
of which judges you got during the prelims ?... Worse, why should it
discourage us from pref-ing some of our favorite critics solely b/c they
tend to be "low-pointers" ?..

The 50 point scale remedies none of this -- and only would do so if you
believed that inflation is somehow NOT the result of very human and very
foreseeable long-term variables.

From diving to gymnastics, points have always gradually inflated.

To me, the 50 point scale may be a useful experiment -- but is ultimately
cut from the same cloth. In time, the scale will ride-up for some, but only
some. We will be left with comparable frustrations.

The heart of the issue is that it is exceptionally difficult to get a large
group of judges to look at any speaker-point scale in the same way. It is
more workable, however, to ask them to have stare decisis within their own
scale.

At worst, I fear the 50 point experiment will be counter-productive. I even
think there may be a disad to the perm of trying both experiments.

Specifically, I fear it may trade-off with a broader move towards using
(career-long) judge variance.

I know of two tournaments that have both been seriously toying with the idea
of loading all of the Bruschke archives, creating a far larger pool of data
from which to draw "variance", and using career-long judge variance as the
standard for speaker awards and-or clearing.

One such tournament director opted against this experiment b/c Wake system
(50 points) changes the baseline and makes variance a touch odd.

Another Tournament director may still proceed with the experiment, but would
need to exclude the Wake '07 data (which is a really, really, large and
useful pool for sample size).

In the end, I will support the Wake system -- over time, Ross (and others)
have used the Wake tourney to experiment and I think the community has grown
because of it.

But, I would encourage other tourneys to consider the proposal on the table.

It seems to move-away from the unworkable notion that we will read the scale
the same, and move-towards a model where we read OUR OWN scale consistently.


-- Will





More information about the eDebate mailing list