There’s an upcoming election in my country, and I’m a member of the governing body of one of the new parties. As we have a lot of focus on technology (and e-governance), our internal operations are also benefiting from some IT skills. The particular task at hand these days was to distribute a number of election day volunteers (that help observe the fair election process) to polling stations. And I think it’s an interesting technical task, so I’ll try to explain the process.
First – data sources. We have an online form for gathering volunteer requests. And second, we have local coordinators that collect volunteer declarations and send them centrally. Collecting all the data is problematic (to this moment), because filling the online form doesn’t make you eligible – you also have to mail a paper declaration to the central office (horrible bureaucracy).
Then there’s the volunteer preferences – in the form they’ve filled whether they are willing to travel, or they prefer their closest poling station. And then there’s the “priority” polling stations, which are considered to be more risky and therefore we need volunteers there.
I decided to do the following:
- Create a database table “volunteers” that holds all the data about all prospective volunteers
- Import all data – using apache CSV parser, parse the CSV files (converted from Google sheets) with the 1. online form 2. data from the received paper declarations
- Match the entries from the two sources by full name (as the declarations cannot contain an email, which would otherwise be the primary key)
- Geocode the addresses of people
- Import all polling stations and their addresses (public data by the central election commission)
- Geocode the addresses of the polling stations
- Find the closest polling station address for each volunteer
All of the steps are somewhat trivial, except the last part, but I’ll still explain in short. The CSV parsing and importing is straightfoward. The only thing one has to be careful is have the ability to insert additional records on a later date, because declarations are being received as I’m writing.
Geocoding is a bit trickier. I used the OpenStreetMap initially, but it managed to find only a fraction of the addresses (which are not normalized – volunteers and officials are sometimes careless about the structure of the addresses). The OpenStreetMap API can be found here. It’s basically calling
http://nominatim.openstreetmap.org/search.php?q=address&format=json with the address. I tried cleaning up some of the addresses automatically, which lead to a couple more successful geocodings, but not much.
This spits out CSV on the screen. Which I then took and transformed with regex replace (Notepad++) to update queries:
Find: (\d+\.\d+),(\d+\.\d+),(".+") Replace: UPDATE addresses SET lat=$1, lon=$2 WHERE hash=$3
Now that I had most of the addresses geocoded, the distance searching had to begin. I used the query from this SO question to come up with this (My)SQL query:
SELECT MIN(distance), email, names, stationCode, calc.address FROM (SELECT email, codePrefix, addresses.address, names, ( 3959 * acos( cos( radians(volunteers.lat) ) * cos( radians( addresses.lat ) ) * cos( radians(addresses.lon) - radians(volunteers.lon)) + sin(radians(volunteers.lat)) * sin( radians(addresses.lat)))) AS distance from (select address, hash, stationCode, city, lat, lon FROM addresses JOIN stations ON addresses.hash = stations.addressHash GROUP BY hash) as addresses JOIN volunteers WHERE addresses.lat IS NOT NULL AND volunteers.lat IS NOT NULL ORDER BY distance ASC) as calc GROUP BY names;
This spits out the closest polling station to each of the volunteers. It is easily turned into an update query to set the polling station code to each of the volunteers in a designated field.
Then there’s some manual amendments to be made, based on traveling preferences – if the person is willing the travel, we pick one of the “priority stations” and assign it to them. Since these are a small number, it’s not worth automating it.
Of course, in reality, due to data collection flaws, the above idealized example was accompanied by a lot of manual labour of checking paper declarations, annoying people on the phone multiple times and cleaning up the data, but in the end a sizable portion of the volunteers were distributed with the above mechanism.
Apart from being an interesting task, I think it shows that programming skills are useful for practically every task nowadays. If we had to do this manually (and even if we had multiple people with good excel skills), it would be a long and tedious process. So I’m quite in favour of everyone being taught to write code. They don’t have to end up being a developer, but the way programming helps non-trivial tasks is enormously beneficial.