BritDisc article #1124

From britdisc-owner@csv.warwick.ac.uk  Mon Nov  3 11:35:13 1997
Received: (from daemon@localhost)
	by pansy.csv.warwick.ac.uk (8.8.7/8.8.7) id LAA19490
	for britdisc-outgoing; Mon, 3 Nov 1997 11:18:49 GMT
Received: from wol.ra.phy.cam.ac.uk (eor.ra.phy.cam.ac.uk [131.111.48.66])
	by pansy.csv.warwick.ac.uk (8.8.7/8.8.7) with SMTP id LAA19474
	for <britdisc@csv.warwick.ac.uk>; Mon, 3 Nov 1997 11:18:47 GMT
Received: by wol.ra.phy.cam.ac.uk (UK-Smail 3.1.25.1/15)
  id <m0xSKWs-0003J4C@wol.ra.phy.cam.ac.uk>; Mon, 3 Nov 97 11:18 GMT
Message-Id: <m0xSKWs-0003J4C@wol.ra.phy.cam.ac.uk>
Date: Mon, 3 Nov 97 11:18 GMT
From: mackay@mrao.cam.ac.uk (David J.C. MacKay)
To: britdisc@csv.warwick.ac.uk, frisbee@mrao.cam.ac.uk
Subject: SE regs 97
Sender: owner-britdisc@warwick.ac.uk
Precedence: bulk

------------------------------------------------------------------------
Southeast Regional Outdoor Ultimate Championships
Cambridge, November 1-2 1997 
------------------------------------------------------------------------

Results:
            Team             infera rating
           
          1 UTI              11.2 +/- 0.2
          2 Red Shift        10.9 +/- 0.2
          3 First Touch      10.8 +/- 0.2
          4 Slowhawks        10.2 +/- 0.2
          5 Doh'hawks        10.1 +/- 0.2
          6 Strange Blue 2    9.7 +/- 0.2
          7 Skunks 1          9.5 +/- 0.2
          8 Strange Blue 1    9.2 +/- 0.2
          9 Skunks 2          8.3 +/- 0.2

Spirit of the game: Doh'hawks

------------------------------------------------------------------------

The start of the 97 regionals featured thick fog - fog thicker than
the winds of the 96 regionals were windy.  But all the teams managed
occasionally to find the disc, each other, and the endzones during
their initial games. The sun burst through after an hour or two, and
the weather from then on was pleasant and gentle.

The Spirit in the tournament was really good, and I think everyone had
a good time. Skunks and Mohawks led the Halloween partying at Darwin
College, and UTI and First Touch were able to return to London for
their Halloween parties without adverse effects on their games.

More than half of the teams had a substantial number of beginners, and 
I think that they really enjoyed learning from playing with the 
top teams of the region. 

Many thanks to everyone who came. Thankyou for playing your games on time,
and for being clean and tidy guests. I think the Cambridge Rugby Club will 
be willing to have us all again. Maybe a Summer tournament next year?

-------------------------------------------------------------------------

There now follow:

(A) Lost property.
(B) Discussion of the 'infera' software used to process the scores
	and spit out the above ranking and ratings.
(C) Scores of all games.
(D) Photos from the tournament.

-------------------------------------------------------------------------

(A)  =====================================================================
Lost property:	found --
		one key on chain with hospice medallion
		one black hat, one pair black gloves
		three shirts
		one pair black tracksuit trousers

(B)  =====================================================================
Discussion of Infera's ranks
==========================================================================

0) Background: Infera is a program which infers the most probable
 ranking, by ability, of a set of teams based on the scores of any 
 games they have played. It is applicable to any tournament format.
 Teams may have played different numbers of games, the games may have been 
 of different durations, and teams need not have been arranged 
 in equal strength pools. 

 A description of the program can be found on the web here.
 http://wol.ra.phy.cam.ac.uk/ultimate/infera/

 The basic idea is that the score of any one game provides information
 about the relative rank of the two teams. -- A game does not provide
 concrete information though; if two teams are close in ability, the
 game might go either way. So a close win for team A over team B does
 not show for certain that A is better than B; it's just more
 probable.  The longer the game, the more information it gives about
 the teams' relative abilities.  Infera uses probability theory to
 figure out the most probable ranking.

 Infera was first used for real at the 1997 Southeast Regionals in 
 Cambridge. The tournament format was in fact a round robin, so it is
 possible to compare infera's ranking with the rankings given by 
 more traditional methods which can be applied to round robins.

-----------------------------------------------------------------------
Results:
                        infera score  games won    goal difference
1 UTI              UTI  11.2 +/- 0.2   8            67
2 Red Shift         RS  10.9 +/- 0.2   7            44
3 First Touch       FT  10.8 +/- 0.2   6            48
4 Slowhawks         M1  10.2 +/- 0.2   5            8
5 Doh'hawks         M2  10.1 +/- 0.2   4            1
6 Strange Blue 2   SB2   9.7 +/- 0.2   2            -21
7 Skunks 1         SK1   9.5 +/- 0.2   3            -34
8 Strange Blue 1   SB1   9.2 +/- 0.2   1            -43
9 Skunks 2         SK2   8.3 +/- 0.2   0            -70

------------------------------------------------------------------------

1) Comparison with goal difference

 After a round robin, one possible way to rank teams is by goal
 difference.  In this tournie, it turned out that infera's rankings
 were almost the same as the rankings you would get from goal
 difference, except Red Shift and First Touch (who have goal
 differences 44 and 48) came out switched.  Maybe the reason that Red
 Shift had a slightly poorer goal difference than FT is that UTI
 really pulled out the stops in their last game, against RS. And this
 final game was longer in duration by 33% than all other games in the
 tournament, so this game has a slightly disproportionate effect on
 the ranking by goal difference.

2) Comparison with traditional (win/lose) rankings.

 Another traditional (and rather crude) performance measure for a round robin
 is number of games won. In this tournament, it gives a clean ranking
 of the teams, but a _different_ one from infera's.  Skunks1 come ahead
 of SB2 by `games won', because the SK1/SB2 result was SK1 6 SB2 5.
 But Infera gave SB2 a score of 9.7 +/- 0.2, and SK1 a score of 9.5
 +/- 0.2.

 So, why did infera put SB2 slightly ahead of SK1?  The SB2/SK1 result
 was a close result, obviously. (only a draw could have been closer,
 and the hooter happened to go during an odd point.)  So to rank SB2
 relative to SK1, Infera takes into account not only the SK1/SB2
 result, but also the results against other teams, and the ranks of
 those other teams.

 So let's look at the other scores of SK1 and SB2 ...

SK1 3  UTI 13    SB2 2  UTI 13
SK1 2  RS 13  	 SB2 1  RS 13 
SK1 5  FT 13  	 SB2 8  FT 10   <<<<<<<<<<<<<<<<
SK1 6  M1 11	 SB2 6  M1 13 
SK1 4  M2 13  	 SB2 4  M2 7    <<<<<<<<<
SK1 6  SB1 4	 SB2 9  SB1 6 
SK1 9  SK2 3	 SB2 13 SK2 1   <

Clearly, SB2 did much better against FT and against M2.

It's because of these strong results that SB was ranked a tiny bit higher.

Which is the fairer ranking? Infera reports what it reckons is the
_most_probable_ ranking, and it takes into account more than just the
simple win/lose outcome. Its estimate is that it is more probable,
given all the results, that SB2 was stronger than SK1, and that the
SK1/SB2 game happened to go the other way, rather than the alternative
hypothesis, that SK1 is better than SB2, but SB2 managed by fluke to
get a much better result against FT. What do you think? 

==========================================================================

3) Counterfactuals concerning the final:

Just as the close outcome of the single SB2/SK1 game was overruled by
evidence from other games, the ranking by infera of the number 1 and 2 teams
isn't simply determined by the result of the "final" game that they
play against each other. 

So people might be interested to know:

What would have happened if the score in the final had been closer?  I
have plugged in a few alternative scores (the true score was UTI 15,
RS 3) to see how big a win the finalists needed to guarantee the
number 1 ranking. (RS went into the final with a slightly higher
ranking than UTI, on the basis of the previous 35 games.)

If the score had been UTI 15, RS 14 then the ranks would have come out:

1         RS  11.08  
2        UTI  11.04  
3         FT  10.82  
4         M1  10.19  
5         M2  10.14  
6        SB2   9.68  
7        SK1   9.52  
8        SB1   9.18  
9        SK2   8.35  

So winning by one point would not have been enough for UTI to be
ranked number 1 (though it must be emphasised that the difference
between 11.08 and 11.04 is utterly tiny - and a sensible idea would be
to have the option of calling the overall outcome a tie when the
differences are so small relative to the remaining uncertainty).

The critical score in this case is 15-13. If UTI won by more than two
points, then they got the number 1 ranking from infera.

We could ask, why this difference? Why did Red Shift come into the
final with a head start in the rankings? There are two simple
explanations: [i] UTI accidentally turned up late for their game with
Mohawks 1, and generously conceded five points, making the final score
13-7 instead of 13-2. [ii] UTI played a friendly game with Skunks 2
(with Nick Haslam switching sides) which ended with a score of 8-2.
If we 'correct' these two exceptional events, by entering the Mohawks
game as a 13-2 result, and, say, omitting the UTI/Skunks2 game from
the data, we find that it is now _UTI_ who enter the finals with the
highest rank, and Red Shift would have had to beat them better than
15-13 for infera to be persuaded that Red Shift were the number 1
team.

In conclusion,

(1) I think infera worked just fine and gave rankings that made
 complete sense. I'll put in data from other tournaments if people
 send it to me in the right format.

(2) When close hypothetical scores (eg 15-13 or 13-15) are put in for
 a game between two teams (eg the final), the outcomes of other games
 could sometimes overrule the outcome of that game. To ensure that
 these effects are not spurious, I would recommend that when infera is
 used, teams should not have penalties put onto their score for
 turning up late to games, or totally failing to show up; this mucking
 with the scores is the sort of thing which might on rare occasions
 cause infera to be confused. It should be easy to find other ways to
 penalise late teams, if necessary!  Incidentally, I didn't include
 any penalties in the tournament rules, and almost all the games at
 the 97 Cambridge tournament ran on time.

(3) It might be good to declare two teams to have equal rank, if their
 infera ratings are closer than, say 0.1.

(4) If you have any comments on infera, I may well have responded to
 them already on the web pages I mentioned above. There is a huge
 number of ways you can use infera; for example, if you want to tell
 it only the win/lose/draw outcomes, instead of the actual scores, you
 may. My chief reason for recommending it is that allows you to choose
 arbitrary tournament formats no matter how many teams turn up and
 what games are played, and still easily get rankings out at the end
 of the day. But you could use it, for example, to rank teams in a
 long term league of several tournaments (even if some teams have
 attended different numbers of tournaments). Alternatively you could
 use infera to determine the number of tour points allocated to teams
 for their performance a tournament. Infera gives each team a rating
 such that teams judged to have been very close end up with similar
 ratings, and teams that are far better than the others get
 proportionally bigger ratings. Instead of using some arbitrary fixed
 numbers like 1st=200, 2nd=120, 3rd=80, the infera ratings would
 return numbers that reflect how close the number 2 team came to the
 number 1 team, etc.

(C) ============== Scores ============================
 Tournament schedule can be seen here:
           http://wol.ra.phy.cam.ac.uk/ultimate/schedule/9.html
 Here are the scores:
# saturday nov 1 97 #
RS 7 M1 3
RS 13 SB1 1
RS 8 FT 6
RS 9 M2 4
FT 13 SB1 3
SB1 11 SK2 4
UTI 13 SB1 3
SK2 2 M2 10
SB2 13 SK2 1
FT 13 SK2 0
UTI 13 M1 7
# note: M1 were given 5 points by UTI as an apology for being late
M1 13 SB2 6
SB2 8 FT 10
SK1 6 SB2 5
SK1 6 M1 11
UTI 13 SK1 3
M2 13 SK1 4
UTI 13 M2 5
# sunday
UTI 9 FT 5
RS 13 SB2 1
M2 8 SB1 0
M2 7 SB2 4
RS 13 SK2 3
M1 10 SB1 5
SK1 6 SB1 4
SK2 1 M1 9
RS 13 SK1 2
UTI 8 SK2 2
FT 13 M2 2
UTI 13 SB2 2
FT 13 M1 3
SK1 9 SK2 3
M1 8 M2 5
SB1 6 SB2 9
FT 13 SK1 5
# Final ( to 15 points )
UTI 15 RS 3

(D) =====================================================================
	Photos
=========================================================================

A few photos of fog, discs, teams and human pyramids are on the web here:

http://wol.ra.phy.cam.ac.uk/ultimate/pics/seregs97/

==========================================================================
David J.C. MacKay        email: mackay@mrao.cam.ac.uk                     
                           www: http://wol.ra.phy.cam.ac.uk/mackay/
Cavendish Laboratory,      tel: (01223) 339852 fax: 354599  home: 276411
Madingley Road,                 international code: +44 1223
Cambridge CB3 0HE. U.K.   room: 982 Rutherford Building