Up: Issue 7 Next: Paper 3 Previous: Paper 1

Voting matters - Issue 7, September 1996

Large elections by computer

B A Wichmann

Introduction

By a large election, in this article we mean elections in which there are a large number of candidates, say over 100. Such an election was reported in reference 1, in which the periodicals to be retained in a library were to be decided. In that case, the Meek algorithm was used, but on re-running the same data with the Newland-Britton (ERS) rules, a disturbing fact was noted. Towards the end of the count, none of the remaining candidates were credited with any votes at all, so that the last few 'seats' were filled at random from the remaining candidates. This was quite inappropriate, since the number of journals that received some support in the votes was more than enough to fill all the places. Hence Woodall has defined the property No-support in reference 2 to cover this issue.

In this paper, we are concerned not with the limiting case of the ERS rules electing candidates without support, but with other large elections in which some candidates are elected despite having less than half the quota. In such situations, it might appear that the ERS rules might elect the 'wrong' person. Unfortunately, it is not easy to devise a means of determining the 'right' choice. Here we use random ballot papers with some characteristics of a real election.

UKCC

The United Kingdom Central Council for Nursing, Mid-wifery and Health Visiting (UKCC) election is the largest one conducted by Electoral Reform Ballot Services Ltd (at least, using STV). It is possible that other such elections could arise of this type if multi-national organisations undertake employee-council elections to satisfy the 'Social Chapter'.

The data from the last UKCC election is impressive: 129 candidates for 7 seats with 62,216 ballot papers. The election is conducted to the ERS rules assisted by Rosenstiel's program.

Mr Wadsworth of ERBS has kindly given me the information above and also a print-out from the Rosenstiel program which gives for the seven elected candidates:

            First   Stage    Votes
         Preference  when     when
Candidate   Votes/ elected   elected/
            Quota            Quota

A          112.8%      1     100.0%
B           17.3%    122      47.7%
C           19.8%    121      53.2%
D           21.0%    121      50.5%
E           16.0%    123      34.9%
F           11.1%    123      36.4%
G           11.0%    123      42.6%

The concern here is that since one candidate was elected on only about one third of the votes that had to be retained by the most popular candidate, can one be sure the correct choice was made? The result of that particular election is not being questioned, but the choice of algorithm for elections of this type.

Computer processing

Since the computer programs to conduct elections are not used for the large public elections, there is no experience in using these programs for very large elections. As noted above, Rosenstiel's program was used for UKCC, but this program is for assisting a manual count, and could not be used for the Meek algorithm (for instance). Although the programs for the ERS rules (by I D Hill) or that for Meek do not have hard limits, it is not immediately obvious that they could be used for elections as large as that for UKCC.

To determine the feasibility of using these programs on a PC (personal computer) for elections like UKCC, a program was written to construct a large number of random ballot papers. Of course, real ballot papers were not available, and even if they were, the data preparation problem would be formidable.

At this point, a major problem arose. Both Meek and the ERS computer programs allow for the storage of the complete set of preferences. If this information is written to temporary disc storage, then the programs will run quite slowly. However, the total storage for UKCC-like elections is around 8Mbytes, which is only just within the reach of current PCs. The obvious solution was to undertake modifications to both programs to take advantage of the fact that only a small fraction of the total possible number of preferences would be specified. In fact, the modification to Meek was very easy and undertaken, but that for ERS (which is much more complex in computer terms) was too difficult. In any case, both programs were successfully run with random data on my home computer.

The conclusion from this study was that running the Meek or ERS rules on a modern PC would be possible for large elections. However, program modifications would be desirable to ensure that the programs kept within system limits. It was also observed that both programs produced result files which were excessive in size (and too big to print with convenience). With the preferences kept in main store, the time both programs took to execute was limited by the speed of processing; moreover, it was linear in the number of ballot papers. The time taken for the programs on my home computer was about 500 seconds per 10,000 papers for Meek and about ten times faster than that for the ERS rules. These times are clearly minor compared with the data preparation overheads in undertaking such counts.

Random UKCC-like data

Having determined that it is feasible to undertake UKCC-like elections on a computer with either Meek or the ERS rules, we now wish to see if there is a significant risk of either algorithm producing the 'wrong' result.

For this part of the study we use simulated data with only 1,000 papers, rather than the 62,000 that were actually recorded for UKCC. The reason for this reduction is to save on the computer time required, since many elections must be analysed (in fact, 100 elections were used). However, to have a realistic chance of determining the effect of using either algorithm, it is clear that the ballot papers must adhere to some of the characteristics of the real data.

The method used to construct the papers was to use a random number generator, but to use some of the characteristics of the UKCC election to determine the distribution functions used. The two major parameters are the popularity of each candidate and the length of each ballot paper. We can estimate the popularity of each candidate in the real election by means of their (known) number of first preference votes. Hence the popularities of the candidates in the simulated elected were adjusted so that the leading candidate had more than the quota of first-preference votes, candidates numbered 2 to 20 had reducing popularity of 95% of the previous candidate, and the remaining candidates had a constant popularity of 95% of the 20th candidate. The reason for this constant tail is that if the 95% rule was carried on, it was observed that the lower candidates had virtually no votes at all.

The distribution of the length of preferences chosen was as follows: For those expressing a single preference: 8.0% of the papers; for two preferences, 8.7%; for 3: 9.4%; for 4: 10.1%; for 5: 10.9%; for 6: 11.6%, for 7: 12.3%, and for 8 to 11 preferences: 7.2%. This distribution increases linearly to 7, the number of candidates then drops to a constant amount.

We can now compare a randomly produced set of papers with those above from UKCC. In this case with random ballots, the quota becomes 1000/8=125, instead of 62216/8=7777 for the real election. The table entries below and for the comparative table for UKCC are expressed in proportion to the quota to give directly comparable data.

            First   Stage    Votes
         Preference  when     when
Candidate   Votes/ elected  elected/
            Quota            Quota

A          129.6%      1     100.0%
B           16.8%    119      53.3%
C           10.4%    121      46.1%
D           13.6%    121      46.9%
E           12.0%    121      45.3%
F            8.8%    121      44.1%
G            9.6%    121      42.2%

The pattern is clearly similar. We need not be concerned about minor differences, since the study is of elections of this general type. To generate each set of ballot papers merely requires as input the three integer seeds for the random number generator. In consequence, all the data presented here which is based upon a set of 100 elections can be recomputed from 300 integers. The seeds for the election in the above table were 1, 1 and 18.

Comparative tests: Meek versus ERS

We now have the ability to generate large election data and process the results with two algorithms: Meek and the ERS rules. The remaining problem is to determine characteristics of the results which would decide between the two. In fact, four different tests were applied as follows:

Non-transferables: In this test, the number of non- transferable votes of each algorithm are compared. The 'better' algorithm is the one which gives the lower figure.
Condorcet: In this test, we take those elections produced in which the two algorithms elected different candidates. We then compare the first candidate elected by Meek who was not elected by ERS with the first candidate elected by ERS who was not elected by Meek. The comparison is by Condorcet. Since there is no correlation between the votes for different candidates, the winning algorithm for this test is the one which has the higher number of Condorcet winners.
No-hopers: In this test, we eliminate the candidates with no realistic hope of being elected, namely the candidates numbered 21-129, so there are 20 candidates. Again, since there is no correlation between the votes of different candidates, the winning algorithm for this test is the one for whom this change makes the least difference. In other words, one is expecting the removal of the no- hope candidates to make no difference. Steadiness: This test is that specified by I D Hill in reference 3. The test is applied when there is only one pair of candidates elected differently by Meek and ERS. The election is then re-run with only 8 candidates. The winner is the algorithm for which this makes the least difference to the result.

At this point, the author thinks that readers should reflect upon the tests above. If the results are against your favourite algorithm, will you be convinced that your algorithm should not be used for such elections?

We now consider the results of each of these tests:

Non-transferables: There is a consistent pattern with the number of non-transferable votes with each algorithm which can be summarised as follows:
Meek 559.0 (±18.8); ERS 482.6 (±13.7); Meek/ERS 1.159 (±0.031); where the range represents two standard deviations. Hence Meek consistently gives 16% more non-transferable votes.
Condorcet: Out of the 100 elections constructed with the random ballot papers, 30 produced a different result. Hence for these 30, the Condorcet test could be applied. The results were that for 24 cases, Meek elected the Condorcet winner, and for 6 cases, ERS elected the Condorcet winner.
No-hopers: In this test, we wish to know if the elimination of the no-hope candidates changed those that were elected. For Meek no change occurred for any of the 29 cases examined, but there were changes for all but three cases with ERS. Hence Meek is a clear winner here.
Steadiness: This test is applied to the 29 cases in which there was one difference between the two algorithms. To pass the test, the result of the election with just eight candidates must be the same as for the full election. Meek passes the test for all of the 29 cases, and ERS 6 times (and failed 23 times). Again, Meek is the clear winner of this test.

The above analysis understates the differences between the two algorithms. Of the 29 cases that can be compared for steadiness, the following table indicates how the results compared in 20 cases:

               Meek Elects   ERS Elects

129 candidates  [S]+A            [S]+B
20 candidates   [S]+A            [S]+A
8 candidates    [S]+A            [S]+A

Here [S] represents a set of six candidates and A and B are different candidates not in the set [S]. In other words, ERS reverts to the Meek result when the no-hope candidates are removed, and this reversion is retained when only 8 candidates are considered. This is clearly strong evidence that the full election using the ERS rules produces the 'wrong' result.

Conclusions

The study indicates that it is feasible to use computer algorithms such as Meek on a PC for elections as large as that for UKCC (although the data preparation problem has not been considered). Furthermore, a comparison between Meek and ERS shows that Meek is superior except for the number of non-transferable votes. The increased number of non-transferable votes is clearly secondary to producing the 'correct' result, and from that perspective, the Meek algorithm appears to be superior. The fact that random papers from no-hope candidates can change the result is strong evidence against the ERS rules.

Of course, this study only relates to elections with a large number of candidates. It can hardly be considered a criticism of Newland and Britton, since it is doubtful whether they ever conducted an election of the size considered here.

Acknowledgements

I should like to thank Joe Wadsworth of ERBS for providing me with a summary of the UKCC election result 'sheet'. Also, David Hill and Douglas Woodall provided many comments on a draft of this paper which I hope has resulted in a clearer presentation here.

References

B A Wichmann. Two STV Elections. Voting matters, Issue 2, pp7-9, September 1994.
D R Woodall. Properties of Preferential Election Rules. Voting matters, Issue 3. pp8-15, December 1994.
I D Hill. The comparative steadiness test of electoral methods. Voting matters, Issue 3, p5, December 1994.
B L Meek. A New Approach to the Single Transferable Vote. Reprinted in Voting matters, Issue 1, pp1-10, March 1994.
R A Newland and F S Britton. How to conduct an election by the Single Transferable Vote, second edition, ERS, 1976.

Up: Issue 7 Next: Paper 3 Previous: Paper 1