Distributing Election Volunteers In Polling Stations

There’s an upcoming election in my country, and I’m a member of the governing body of one of the new parties. As we have a lot of focus on technology (and e-governance), our internal operations are also benefiting from some IT skills. The particular task at hand these days was to distribute a number of election day volunteers (that help observe the fair election process) to polling stations. And I think it’s an interesting technical task, so I’ll try to explain the process.

First – data sources. We have an online form for gathering volunteer requests. And second, we have local coordinators that collect volunteer declarations and send them centrally. Collecting all the data is problematic (to this moment), because filling the online form doesn’t make you eligible – you also have to mail a paper declaration to the central office (horrible bureaucracy).

Then there’s the volunteer preferences – in the form they’ve filled whether they are willing to travel, or they prefer their closest poling station. And then there’s the “priority” polling stations, which are considered to be more risky and therefore we need volunteers there.

I decided to do the following:

  • Create a database table “volunteers” that holds all the data about all prospective volunteers
  • Import all data – using apache CSV parser, parse the CSV files (converted from Google sheets) with the 1. online form 2. data from the received paper declarations
  • Match the entries from the two sources by full name (as the declarations cannot contain an email, which would otherwise be the primary key)
  • Geocode the addresses of people
  • Import all polling stations and their addresses (public data by the central election commission)
  • Geocode the addresses of the polling stations
  • Find the closest polling station address for each volunteer

All of the steps are somewhat trivial, except the last part, but I’ll still explain in short. The CSV parsing and importing is straightfoward. The only thing one has to be careful is have the ability to insert additional records on a later date, because declarations are being received as I’m writing.

Geocoding is a bit trickier. I used the OpenStreetMap initially, but it managed to find only a fraction of the addresses (which are not normalized – volunteers and officials are sometimes careless about the structure of the addresses). The OpenStreetMap API can be found here. It’s basically calling http://nominatim.openstreetmap.org/search.php?q=address&format=json with the address. I tried cleaning up some of the addresses automatically, which lead to a couple more successful geocodings, but not much.

The rest of the coordinates I obtained through Google maps. I extract all the non-geocoded addresses and their corresponding primary keys (for volunteers – the full name; for polling stations – the hash of a semi-normalized address), parse them with javascript, which then invokes the Google Maps API. Something like this:

<script type="text/javascript" src="jquery.csv.min.js"></script>
<script type="text/javascript">
	var idx = 1;
	function initMap() {
        var map = new google.maps.Map(document.getElementById('map'), {
          zoom: 8,
          center: {lat: -42.7339, lng: 25.4858}
        });
        var geocoder = new google.maps.Geocoder();

		$.get("geocode.csv", function(csv) {
			var stations = $.csv.toArrays(csv);
			for (var i = 1; i < stations.length; i ++) {
				setTimeout(function() {
					geocodeAddress(geocoder, map, stations[idx][1], stations[idx][0]);
					idx++;
				}, i * 2000);
			}
		});
      }

      function geocodeAddress(geocoder, resultsMap, address, label) {
        geocoder.geocode({'address': address}, function(results, status) {
          if (status === 'OK') {
            $("#out").append(results[0].geometry.location.lat() + "," + results[0].geometry.location.lng() + ",\"" + label.replace('"', '""').trim() + "\"<br />");
          } else {
            console.log('Geocode was not successful for the following reason: ' + status);
          }
        });
      }
</script>

This spits out CSV on the screen. Which I then took and transformed with regex replace (Notepad++) to update queries:

Find: (\d+\.\d+),(\d+\.\d+),(".+")
Replace: UPDATE addresses SET lat=$1, lon=$2 WHERE hash=$3

Now that I had most of the addresses geocoded, the distance searching had to begin. I used the query from this SO question to come up with this (My)SQL query:

SELECT MIN(distance), email, names, stationCode, calc.address FROM
(SELECT email, codePrefix, addresses.address, names, ( 3959 * acos( cos( radians(volunteers.lat) ) * cos( radians( addresses.lat ) )
   * cos( radians(addresses.lon) - radians(volunteers.lon)) + sin(radians(volunteers.lat))
   * sin( radians(addresses.lat)))) AS distance
 from (select address, hash, stationCode, city, lat, lon FROM addresses JOIN stations ON addresses.hash = stations.addressHash GROUP BY hash) as addresses
 JOIN volunteers WHERE addresses.lat IS NOT NULL AND volunteers.lat IS NOT NULL ORDER BY distance ASC) as calc
GROUP BY names;

This spits out the closest polling station to each of the volunteers. It is easily turned into an update query to set the polling station code to each of the volunteers in a designated field.

Then there’s some manual amendments to be made, based on traveling preferences – if the person is willing the travel, we pick one of the “priority stations” and assign it to them. Since these are a small number, it’s not worth automating it.

Of course, in reality, due to data collection flaws, the above idealized example was accompanied by a lot of manual labour of checking paper declarations, annoying people on the phone multiple times and cleaning up the data, but in the end a sizable portion of the volunteers were distributed with the above mechanism.

Apart from being an interesting task, I think it shows that programming skills are useful for practically every task nowadays. If we had to do this manually (and even if we had multiple people with good excel skills), it would be a long and tedious process. So I’m quite in favour of everyone being taught to write code. They don’t have to end up being a developer, but the way programming helps non-trivial tasks is enormously beneficial.

3 thoughts on “Distributing Election Volunteers In Polling Stations”

  1. What is the number of volunteers? (The question is inspired by the prospect of minimizing the total distance between volunteers and polling stations.)

  2. That was not a concern. We had volunteers in fewer than 1/8th of the polling stations

  3. HI, the problem is definitely every interesting and it is a sub problem of a traveling salesman problem.

    If you are interested, take a look https://csv2geo.com. This a batch geocoder tool that will take unparsed addresses and help you assemble them in any way you want. Then will process them and the final result will be full valid address with latitude and longitude attached to it. The tool is very cheap – starts from free and then dirt cheap.
    It can save you ton of work to process addresses in advance.
    The geolocation is always live and accurate.

    Another tool that can help you is https://walklists.com
    That is a combination of a batch geocoder and canvassing tool.
    The output is a multi pdf/electronic maps
    that groups together map markers around something. It your case could be a station, etc…

Leave a Reply

Your email address will not be published. Required fields are marked *