Why All The Fear of Electronic Voting?

October 2, 2015

Due to an upcoming referendum in Bulgaria about whether we want “remote electronic voting”, I, as a technical person, and at the same time, as government adviser, argue a lot about electronic voting. A year and a half ago I’ve given a very brief overview of what I think has to be done and is doable.

But now I’d like to ask a more general question – why all this fear of electronic voting? I have heard literally hundreds of versions of the same bunch of arguments: anonymity is not guaranteed, someone can change everything with one command, I can’t be sure what happens to my vote, it’s a black box, someone may easily compromise everything with a virus, you’ll be DDoSed, etc.

This for example is one giant strawman video. Every single bit it in, spoken very quickly and very assertively, is wrong. Votes can be anonymous, you can verify your vote, there are ways to prevent massive result changes, there are ways to protect even the clients (hardware keypads, virtualized voting environments), and good operational security gives you, ontop of the essential security, a way to detect every attempt to attack the system, and there are ways to prevent DDoS.

And horribly few people have actually read anything on the topic. I recommend, for example, reading this 136 pages report. And these papers: paper, paper, paper, paper, paper, paper.

But you won’t, will you? Because you are sure that it cannot ever be secure, the ballot secrecy cannot be guaranteed, and you can’t be sure whether your vote is counted. Even though the literature hints otherwise. Let me just outline a few key things:

  • One man – one vote. This relies on a national e-id infrastructure, that some countries (like Estonia, Belgium, and soon Bulgaria) have. The id card has a smartcard built-in. That guarantees the one man – one vote principle. A person can only have a single id card with a single keypair to use for voting.
  • Ballot secrecy. David Chaum has proposed the so-called “blind signature” that, using cryptography, allows the voting officials “stamp” your vote without seeing it and count you as “already voted”, and then on a second step you send your stamped vote, without the identifying information. There’s also the double-envelope approach used in postal voting, that is applied to electronic voting (but it relies on good operational security). And then there are the anonymous credentials schemes which I haven’t looked into in details.
  • Mass replacement of votes. Sending an SQL query to the vote count server is probably what you imagine can happen. OpSec, of course, is tasked to prevent it, but that is not enough. A very determined attacker can break into almost any system. But recent research is being done on using the bitcoin blockchain data structure. A de-facto distributed, unchangeable database. The votes on the central server can then be compared to the public ledger, in order to verify that there are no discrepancies. Which is, by the way, what happens with paper voting, as you’ll see below.
  • Black box. Of course, proprietary, closed-source solutions are a no-go. But a fully open-source, peer-reviewed, pilot-tested, in-person tested (as recommended in the report) system is not a black box. It is a fully and constantly auditable system.
  • I don’t know whether my vote is counted at all. That’s a major concern, addressed by a lot of E2E research (end-to-end verifiable voting). All sorts of approaches exist. For example a receipt, that you can later verify against a central system. The receipt doesn’t have to contain the actual vote (because someone may have paid to and then wants to check), but a number that you get while voting should match with a number that you see later on a website. The receipt can be issued via a smartphone app, sms, the screen, or any combination of those for a higher level of assurance.
  • Client side malware. Now that’s hard. But there are ways to address it. Hardware keypads for entering the PIN for the smartcard, with a small screen to show the actual information that is required to be signed/encrypted. Then come the multiple-factor authentication and validation. You can use a mobile phone, where receipts (as mentioned above) are sent. If a malware replaces your vote (if you are allowed to cast replacement votes, methods for which are also described in papers), you’ll get notified. You may even have to cast your vote from two devices – one computer and one smartphone (identification with a smartphone is a separate topic). That way a large-scale malware attack becomes unlikely. If you add that the client-side software used for voting can be digitally signed, or can be changing itself constantly, then a generalized malware has to target millions of combinations of versions of desktop and mobile OSs, the voting software, etc. And if you instead vote from a remote virtualized environment, to which you login via a sort of a VPN client (with a reasonable assurance on the other end that it is not a fake virtualized environment), then yes – individuals can be targeted, but large-scale attacks may hit a brick wall.
  • How will we avoid coercion and vote-buying in remote, uncontrolled environments. That’s a good question, and although it doesn’t sound technical, it is. First, biometric factor as part of the identification may defend against mass collecting of smartcards for voting. Then there’s the concept of a “panic PIN”, which allows a coerced voter to appear to have voted, but to instead send an alarm to the authorities that he is being coerced, which has been discussed in papers as well.

You probably won’t notice the last recommendation of the report, which says that at the present moment there is no voting system that is secure enough to be deployed for national elections. And that is true (as well as the other recommendation). Yes, it is very hard to build a proper e-voting system. You have to take into account at least all the thing listed in the 136 page report. And even more. You have to be paranoid and expect a state-level attack, insider attack, botnets, etc. But that makes it very hard, not “a bad idea” or “impossible”. I’ll quote the comment by Matthew Proorok on the above youtube video:

The thing is, none of this makes electronic voting a bad idea. It makes electronic voting a problem with a lot of hurdles to overcome. After all, you start out the video pointing out that physical voting, too, has its weaknesses. And that attempt after attempt has been made to defraud the system. And that, over time, we’ve found ways to defend against those attempts. Effectively, you’re saying that electronic voting hasn’t had that kind of proving period yet, and thus it’s a bad idea, and thus we shouldn’t use it. That sounds like a great mindset for NEVER DOING ANYTHING NEW.

And at the same time nobody realizes how flawed the paper based system is, and how the same type of loosely defined arguments can be used against the paper based system as well. Saying the “we’ve found ways to counter all types of fraud in a paper based system” is entirely wrong, as I can prove to you if you come and visit just a single Bulgarian election. By the way, do you know that at the moment, paper voting results are finally combined on a computer? Possibly using excel somewhere. How are we sure these computer system are not attacked? How are we sure that the computers that send the the protocols from the local centers to the central committees are not compromised with a malware? There is a paper trail, I hear. Recounting rarely happens, and discrepancies, even if discovered, are often buried, because otherwise the whole election may have to be rerun. My point is, these are problems not inherent only to remote, electronic voting. They exist even now.

So, ultimately, I don’t understand all the fear in e-voting, even from people that are moderately tech-savvy. The mantra “if it’s a computer, it can break” is in fact “if it is anything in the real world, it can break”. But when has that stopped us from progressing and fixing broken systems (and paper voting is broken; the fact it doesn’t appear so in western democracies is because society doesn’t exploit it, and not because it’s unexploitable).

But I do understand the psychology that leads to accepting all pseudo-arguments thrown in the air, as a massive FUD campaign (sometimes even coordinated, by the way) – it is way easier to throw these fears, than to debunk them, one by one, especially when debunking them requires linking scientific papers. It’s easy to tell people “this can’t be done”, because sometimes it sounds counterintuitive that it can, and then it’s hard to explain why it can.

I’m not saying we should be all voting online by now, I’m saying we should push in that direction, and we should agree that this is the direction to push, because it feels like it’s right behind the corner and it’s a way to increase participation, especially for future generations, and therefore enhance not only the legitimacy of the democracy, but the opportunities for more direct democracy.

And it will come down to trust in the system. For which, the whole FUD-technical explanation cycle will be repeated many times. But I believe that in due time we will have trust in such systems (as we do in many other electronic systems) and that will enable us to do more with our democratic rights.


Common Sense Driven Development [talk]

September 17, 2015

A few months ago I spoke on a conference (jPrime) about common sense driven development (and what it isn’t).

Here’s the video:

And here are the slides:

As you’ll see, I am not a very good and engaging speaker. I hope at least the content is meaningful, although this time it felt a bit chaotic. Nevertheless, the points I’m trying to make are still worth noting, and I hope they help identifying the lack of common sense we are often surrounded with.


“Forget me” and Tests

September 10, 2015

Your users have profiles on your web application. And normally you should give them a way to delete their profiles (at least that’s what the European Court has decided).

That “simply” means you need to have a /forget-me endpoint which deletes every piece of data for the current user. From the database, from the file storage, from the search engine, etc. Apart from giving your users at least partial control over their own data (whether you can have it or not is their decision), it is also a benefit for developers.

Apart from your isolated unit tests, containing a lot of mocks, you have other sorts of tests – integration test, acceptance tests, Selenim tests. All of these need a way to leave the database in the same state that it was before they were executed. In some cases you can use read-only transactions (e.g. with spring-test you get that automatically), or you can use an in-memory database and hope it will work the same way as your production one, or you can drop the database and recreate it on each run. But these are partial solutions with some additional complexity.

The best way, I think, is to just reuse the “forget me” functionality. From your acceptance/selenium tests you can call the /forget-me endpoint at the end of the test (tearDown), and for your integration tests y. If you distribute client-side APIs (or a third-party is building them against a test deployments of your system), you can again call the forget-me endpoint.

That, of course, doesn’t cover non-user-related data that you need in the database. If you have such data (apart from enumarations and data that should be always there), you have to take care of it separately.

Doesn’t that bring some additional complexity as well, and the constant need to update your forget-me functionality? Isn’t having read-only transactions, or a shell script that recreates the database after each run, simpler to support? Assuming that you need to have a properly working forget-me functionality anyway – no. It’s better to reuse it. That would also make sure the endpoint is indeed working properly, and your users can be fully forgotten.


The “Software Engineer” Mindset

September 7, 2015

What is a software engineer? And what is a senior software engineer? Many companies define a “senior software engineer” as a person who has spent more then 6 years as a programmer. And that’s not always correct.

The other day I was asked whether I recommend becoming a “generalist” or a “specialist”. Whether one should stay focused on one particular technology and become really proficient with it, or do a little bit of everything. I once wrote that if you do a little bit of everything, you become no expert at all. And while I still partially hold that view, it needs to be elaborated.

The software engineer mindset never results in narrow specialists. But it doesn’t mean you don’t “drill” into a particular technology. In fact, you drill into many particular technologies/frameworks/levels of abstractions. You become proficient with them, then you move on to the next one. Probably with side projects, probably as transitioning from one job to another, where something unfamiliar is used alongside the known bits. Over time, you gather enough experience that each new technology is familiar and you get into it pretty quickly. On the other hand, staying focused on one particular technology for a long time doesn’t let you see the full spectrum of possible solutions to a problem. So no, doing mostly jQuery/Rails/Spring/Android/… for 15 years doesn’t make you a “senior software engineer”.

The software engineer mindset is about solving the problem. The more senior you are, the faster you are in finding simpler solutions. The more technologies you are familiar with, the more non-localized solutions you are able to produce – in a multi-technology project (web, android and iOS frontends, with a java backend, a public API, for example) a solution that looks okay in one particular technology, may be a hack in the rest.

The software engineer mindset is not saying “I don’t know about that, another colleague was doing it”. I’ve been getting answers like this on interviews – people have even been implementing JSR specs, and only knew the part they were working on for the past 2 years. How it fits with the rest is what the software engineer should be concerned with.

Isn’t that the role of the architect, some people may ask. But the architect is a role, not a job. Each software engineer with the right mindset and knowledge is an architect, and should be. Maybe one will represent the team in front of committees (if such are needed at all), but the top-down architect approach is broken. Mostly because an architect-only position doesn’t get to write code, and loses grip with reality soon enough.

Maybe I’m trying to label what I like doing (going into all parts of the application, from the high-level architecture to the low-level details) as a “software engineering mindset”. And maybe I’m just adding yet another synonym for the “full-stack developer” cliche. Anyway, I think it’s good to encourage people to see the broader technology landscape, and it is equally important to encourage them to spend time focusing on particular problems and technologies. Otherwise they may become one of those architects and seniors, that pretend to know a lot, but haven’t actually seen the intricate details. And the devil is in the detail. The software engineer has both.


Comments on The Twelve-Factor App

August 22, 2015

The Twelve-Factor App is a recent methodology (and/or a manifesto) for writing web applications that hopefully is getting quite popular. Although I don’t agree 100% with the recommendations, I ‘ll quickly go through all 12 factors and discuss them in the light of the Java ecosystem, mentioning the absolute “musts” and the points where I disagree. For more, visit the 12factor.net site.

  1. Codebase – one codebase, multiple deploys. This means you must not have various codebase for various versions. Branches is okay, different repos are not. I’d even go further and not recommend Subversion. Not because it’s not fine, but because git and mercurial do the same and much more. You can use git/mercurial the way you use SVN, but not the other way around. And tools for DVCS (e.g. SourceTree) are already quite good
  2. Dependencies – obviously, you must put as many dependencies in your manifests (e.g. pom.xml) as possible. The manifesto advices against relying on pre-installed software, for example ImageMagick or Pandoc, but I wouldn’t be that strict. If your deployments are automated and you guarantee the presence of a given tool, you shouldn’t spend days trying to wrap it in a library of your working language. If it’s as easy as putting an exetuable script in a jar file and then extracting it, that’s fine. But if it requires installation, and you really need it (ImageMagick is a good example indeed), I don’t think it’s wrong to expect it to be installed. Just check on startup if it’s present and fail fast if it’s not.
  3. Config – the most important rule here is – never commit your environment-specific configuration (most importantly: password) in the source code repo. Otherwise your production system may be vulnerable, as are probably at least a third of these wordpress deployments (and yes, mysql probably won’t allow external connections, but I bet nobody has verified that).

    But from there on my opinion is different than the one of the 12-factor app. No, you shouldn’t use environment variables for your configuration. Because when you have 15 variables, managing them becomes way easier if they are in a single file. You can have some shell script that sets them all, but that goes against the OS independence. Having a key-value .properties file (for which Java has native support), and only passing the absolute path to that file as an environment variable (or JVM param) is a better approach, I think. I’ve discussed it previously. E.g. CONFIG_PATH=/var/conf/app.properties, which you load on startup.

    And in your application you can keep a blank app.example.properties which contains a list of all properties to be configured – database credentials, keys and secrets for external systems, etc. (without any values). That way you have all the properties in one place and it’s very easy to discover what you may need to add/reconfigure in a given scenario. If you use environment variables, you’d have to have a list of them in a txt file in order to make them “discoverable”, or alternatively, let the developers dig into the code to find out which properties are available.

    And last, but not least – when I said that you shouldn’t commit properties files to source control, there is one very specific exception. You can choose to version your environment configurations. It must be a private repo, with limited access and all that, but the (Dev)Ops can have a place where they keep the properties and other specifics for each environment, versioned. It’s easier to have that with a properties file (not impossible with env variables, but then again you need a shell script).

    The 12-factor app authors warn about explosion of environments. If you have a properties file for each environment, these may grow. But they don’t have to. You can change the values in a properties file exactly the way you would manage the environment variables.

  4. Backing Services – it’s about treating that external services that your application depends on equally, regardless of whether you manage them, or whether another party manages them. From the application’s perspective that should not matter. What I can add here is that you should try to minimize this. If an in-memory queue would do, don’t deploy a separate MQ. If an in-memory cache would do, don’t deploy a redis instance. If an embedded database would do, don’t manage a DB installation (e.g. neo4j offers an embedded variant). And so on. But if you do need the full-featured external service, make the path/credentials to it configurable as if it’s external (rather than, for example, pointing to localhost by default).
  5. Build, release, run – it is well described on the page. It is great to have such a lifecycle. But it takes time and resources to set it up. Depending on your constraints, you may not have the full pipeline, and some stages may be more manual and fluid than ideal. Sometimes, for example in the early stages of a startup, it may be beneficial to be able to swap class files or web pages on a running production server, rather than going through a full release process (which you haven’t had the time to fully automate). I know this sounds like heresy, and one should strive to a fully automated and separated process, but before getting there, don’t entirely throw away the option for manually dropping a fixed file in production. As longs as you don’t do it all the time and you don’t end up with a production environment for which you have no idea what version of the codebase is run.
  6. Processes – this is about being stateless, and also about not relying on any state being present in memory or on the file system. And indeed, state does not belong in the code.

    However, there’s something I don’t agree with. The 12-factor preferred way of packaging your assets is during build time (merging all css files into one, for example). That has several drawbacks – you can’t combine assets dynamically, e.g. if you have 6 scripts, and on one page you need 4, on another page you need 2 of the ones used on the first page, and another 2, then you have to build all this permutations beforehand. Which is fine and works, but why is it needed? There is no apparent benefit. And depending on the tools you use, it may be easier to work with CDN if you are dynamically generating the bundles.

    Another thing where further Java-related details can be given is “sticky sessions”. It’s not a good idea to have them, but note that you can use your session to store data about the user in memory. You just have to configure your servlet container (or application server) to share that state. Basically, under the hood it still uses a distributed cache like memcached or ehcache (I guess you could also use a redis implementation of the session clustering). It’s just transparent from the developer and he can still use the session store.

  7. Port Binding – this is about having your application as standlone, instead of relying on a running instance of an application server, where you deploy. While that seems easier to manage, it isn’t. Starting an servlet container and pushing a deployment is just as easy. But in order to have your application bind to a port, you need to have the tooling for that. They mention jetty, and there is also an embedded version of tomcat, and spring-boot (which wraps both). And while I’m not against the port binding, I’d say it’s equally good to have it the other way around. Container configuration is done equally easy, regardless of whether you drop an environment-specific xml file, or do it programmatically and load the properties from the file mentioned in point 3. The point is – it doesn’t matter – do whichever is easier for you. Not to mention that you may need some apache/nginx functionality.
  8. Concurrency – it’s about using native processes. This, I think, isn’t so relevant to a Java runtime, which uses threads under the hood and hides away the unix process. By the way, another explicit reference to unix (rather than staying OS-independent).
  9. Disposability – that’s about embracing failure. Your system must work fine even though one or more of application instances die. And that’s bound to happen, especially “in the cloud”. They mention SIGTERM, which is a *nix-specific signal, whereas the general idea of the 12-factor app is to be OS-independent. There is an apparent leaning towards Linux, which is fine though.
  10. Dev/prod parity – your development environment should almost identical to a production one (for example, to avoid some “works on my machine” issues). That doesn’t mean your OS has to be the OS running in production, though. You can run Windows, for example, and have your database, MQ, etc. running on a local virtual machine (like my setup). This also underlines the OS-independence of your application. Just have in mind to keep the versions the same.
  11. Logs – the 12-factor app recommends writing all logging information to the system out. A Java developer will rightly disagree. With tools like loggack/slf4j you can manage the logging aspects within the application, rather than relying on 3rd party tools to do that. E.g. log rotation and cleanup, or sending to a centralized logging facility. It’s much easier to configure a graylog or splunk adapter, than having another process gather that from system out and push it. There can be environment-specific log configurations, which is again just one file bundled together with the app.properties). If that seems complicated, consider the complications of setting up whatever is going to capture the output.
  12. Admin processes – generally agreed, but in addition I’d say it’s preferable to execute migrations on deployment or startup, rather than manually, and that manually changing “stuff” on production should preferably be done through something like capistrano in order to make sure it’s identical on all instances.

Overall, it’s a good set of advice and an approach to building apps that I’d recommend, with the above comments in mind.


A Software Engineer As a High-Level Government Adviser

August 13, 2015

Two months ago I took the job of adviser to the cabinet of the deputy prime minister of my country (the Republic of Bulgaria, an EU member). And I’d like to share my perspective of a technical person, as well as some of my day-to-day activities which might be of interest.

How does a software engineer get to such a position in the first place? Some people from NGO that I used to be part of (including myself) communicated with the interim government our open source campaign and also built the OpenData portal of Bulgaria (based on CKAN). We continued the communication with the newly elected government and helped with opendata-related stuff, so several months later we got a proposal for a part-time advisory position. And I took it, reducing my software engineer job to four hours. I don’t have to mention that hiring a 27-year old software engineer isn’t something a typical government would do, so that’s progress already.

What do I see? Slow waterfall processes, low-quality software, abandonware. Millions spent on hardware and software licenses which are then underutilized (to say the least). I knew that before, hence the push for open source and more agile processes. Basically, the common perception of the regular citizen is that millions have been spent on the so called “e-government” and there is nothing coming out of it. And that’s mostly correct.

To be honest, I cannot yet tell why exactly. Is it the processes, is it the lack of technical expertise on the side of the civil service, is it the businesses’ inability to provide quality software, or is it corruption? Maybe a little bit of each. But as you can imagine, a part-time adviser cannot fix any of these things at scale. So what do I do?

Currently the most important task is finalizing two laws – the changes to the law for e-governance and the law for electronic identification. The former introduces a “e-governance” agency, which will oversee all software projects in the country, and the latter is about a scheme to allow citizens to be identified online (a step in the direction of my campaign for electronic identification throughout the EU). I’m not a lawyer, so the technical aspects that are put in the laws get phrased by lawyers.

I have to say that I’m not the main driver of all this – there are other advisers that do a lot of the work, one of whom is way more technically experienced than me (though not particularly in writing software).

The agency that is being introduced is supposed to act as something like a CIO and we are defining what it can and must do. Among more strategic things we also plan to task it with the development of open data, including providing help to all administrations (which is currently something we do, see below) as well as standardizing an open-source development workflow for bespoke software. As I’ve written in a previous post, we already have EU-approved requirements for software – it should be built in the open from day one. And the point of that is long-term stability of the software projects. Whether it’s going to be directly pushed to GitHub, or replicated there from an on-premise deployment (or vice-versa) is a matter of discussion.

The electronic identity is about giving each citizen the means to identify online in order to get services from the government. This includes the right of every citizen to access all data that the government has about them, and request correction or even possibly deletion. I am not a Big Brother fan and I’m trying to push things into a direction where convenience doesn’t mean breach of privacy.

I try to map existing infrastructure to an idea of an architecture and act whenever there’s a discrepancy or a missing bit. For example an important “quest” of mine is to allow each administration to access the data it needs for each citizen online. That may sound like the opposite direction of the last sentence in the previous paragraph, but it isn’t. The government already has that data. And with due procedures each civil servant can access it. What I’m trying to do is automate that access, again preserving all the due legal requirements (civil servants can only access data that they need by law, in order to fulfil a given service), and also keeping a log entry for each access. Then this access will be visible to citizens when they identify with their e-id card. And whenever someone is looking for data about you, you will be notified.

The security aspect is the most serious one and the most overlooked one, so I’m putting a lot of thought into that. Nobody should be able to just get a pair of credentials and read their neighbour’s medical record.

In order to get to such a time-saving, semi-automated solution, I speak to companies that develop the software that’s part of the existing infrastructure and advise for some tweaks. Things are a bit fuzzy, because very minor things (like not using digital signatures to sign information requests) can break the whole idea. And that’s why, I think, a technical person is needed on such a high level, so that we don’t get another abandonware just because a hundred lines of code are missing.

Other things that I do:

  • open data – whenever an administration needs technical help with exporting, I should help. For example, I’ve written a php converter for Excel documents to proper CSV, because Excel’s “save as .csv” functionality is broken – it saves files in non-UTF-8 encodings and uses semicolons instead of commas (depending to regional settings). And since much of the data is currently in Excel files, exporting to a machine-readable csv should go through some “correction” script. Another thing is helping with “big, fat” SQL queries to extract relevant data from ages-old databases. So actual programming stuff
  • case study for introducing electronic document process in the administration of the Council of Ministers. That is more on the business analysis side, but still needs technical “eyes”
  • ongoing projects – as mentioned above, I speak to companies that are creating software for the government and I give feedback. This is rather “rudimentary” as I don’t have an official say of what should and what should not be done, but I hope fellow software engineers see it as a good input, rather than an attempt for interference
  • some low-hanging fruit. For example I wrote an automated test of a list of 600 government websites and it turned out that 10% do not work with or without “www” in the URL. Two are already fixed, and we are proceeding to instruct the institutions to fix the rest.
  • I try to generate new project ideas that can help. One of which is the development portal. Currently companies communicate in an ad-hoc way, which means that if you need a library for accessing a given service, you call the other company and they send you a jar via email. Or if you have a question, only you get to know the answer, and other companies must ask for themselves. The dev portal is meant to be a place for providing SDKs for inter-system communication and also serve as a Q&A site, where answers are accessible to all the companies that work on e-government projects.
  • Various uncategorizable activities, like investigate current EU projects, discuss budgeting of software projects, writing an egov roadmap, and general “common sense” stuff”

I use a private Trello to organize the tasks, because they are really diverse and I’m sure I can forget something of the 6 ongoing tasks. And that’s, by the way, one of the challenges – things happen slowly, so my trello column “Waiting for” is as full as the “In Progress” one. And that’s expected – I can’t just add two points to a law project and forget about it – it has to follow the due process.

So it may seem that so far I haven’t achieved anything. But “wheels are in motion”, if I may quote my ever more favourite “Yes, Minister” series. And my short term goal is to deploy usable systems which the administration can really use in order to make both their own lives and the lives of citizens easier, by not asking for filling dozens of documents with data that is available on a server two streets away. And if it happens that I have to write a piece of code in order to achieve that, rather than go through a 9-month official “upgrade” procedure, I’m willing to do that. And fortunately I feel I have the freedom to.

In software development, starting from a grand plan usually doesn’t get you anywhere. You should start small and expand. But at the same time you should have a grand plan in mind, so that along the way you don’t make stupid decisions. Usable, workable, pieces of the whole puzzle can be deployed and that’s what I’m “advising” (and acting) for. And it’s surprisingly interesting so far. Maybe because I still have my software development job and I don’t get to miss writing code.


Events Don’t Eliminate Dependencies

August 2, 2015

Event (or message) driven systems (in their two flavors) have some benefits. I’ve already discussed why I think they are overused. But that’s not what I’m going to write about now.

I’m going to write (very briefly) about “dependencies” and “coupling”. It may seem that when we eliminate the compile-time dependencies, we eliminate coupling between components. For example:

class CustomerActions {
  void purchaseItem(int itemId) {
    purchaseService.makePurchase(item, userId);


class CustomerActions {
  void purchaseItem(int itemId) {
    queue.sendMessage(new PurchaseItemMessage(item, userId));

It looks as though your CustomerActions class no longer depends on a PurchaseService. It doesn’t care who will process the PurchaseItem message. There will certainly be some PurchaseService out there that will handle the message, but the former class is not tied to it at compile time. Which may look like a good example of “loose coupling”. But it isn’t.

First, the two classes may be loosely coupled in the first place. The fact that one interacts with the other doesn’t mean they are coupled – they are free to change independently, as long as the PurchaseService maintains the contract of its makePurchase method

Second, having eliminated the compile-time dependencies doesn’t mean we have eliminated logical dependencies. The event is sent, we need something to receive and process it. In many cases that’s a single target class, within the same VM/deployment. And the wikipedia article defines a way to measure coupling in terms of the data. Is it different in the two approaches above? No – in the first instance we will have to change the method definition, and in the second instance – the event class definition. And we will still have a processing class whose logic we may also have to change after changing the contract. In a way, the former class still depends logically on the latter class, even though that’s not explicitly realized at compile time.

The point is, the logical coupling remains. And by simply moving it into an event doesn’t give the “promised” benefits. In fact, it makes code harder to read and trace. While in the former case you’d simple ask your IDE for a call hierarchy, it may be harder to trace who produces and who consumes the given message. The event approach has some pluses – events may be pushed to a queue, but so can direct invocations (through a proxy, for example, as spring does with just a single @Async annotation).

Of course that’s a simplified use-case. More complicated ones would benefit from an event-driven approach, but in my view these use-cases rarely cover the whole application architecture; they are most often better suited for specific problems, e.g. the NIO library. And I’ll continue to perpetuate this common sense mantra – don’t do something unless you know what exactly are the benefits it gives you.


Government Abandonware

July 25, 2015

Governments order software for their allegedly very specific needs. And the EU government (The European Commission) orders “visionary” software that will supposedly one day be used by many member states that will be able to communicate with each other.

Unfortunately, a lot of that software is abandonware. It gets built, the large public doesn’t see the results, and it dies. A while a go I listed a couple of abandoned projects of the EC. Some of them, I guess, are really research projects and their results may be important for further development. But I couldn’t find the results. All there is are expired domains and possibly lengthy documents. But 100-page documents about software are not software. I haven’t seen any piece of code whatsoever. The only repo that I found is that of OntoGov, but all there is there are zip files with “bin” in their name.

Even though unused, the software maybe isn’t lost (although the agency behind some of the projects doesn’t exist anymore), and may be utilized by new projects or even businesses. But it’s just not accessible to the general public. It’s probably hidden in a desk, in a dark basement, with a leopard warning on the door (as in the Hithchiker’s Guide to the Galaxy).

But the problem is not only on EU level. It is on national level as well. Since June I’m an adviser to the deputy prime minister of Bulgaria (which I’ll write about another time), and from the inside it’s apparent that software has been built over the years and then forgotten. I’m pretty sure this is true in most countries as well.

Why this is bad? Not only because of the suboptimal public spending, but also because a lot of effort sinks into a bottomless pit. Software, even though not useful at the given moment, may be used as a base or a building block for future projects that will be really used. Now everything has to start from scratch.

A solution I’ve been advocating for for a while is open source. And, getting back to the EU, there’s an “Open Source observatory” to which I’m subscribed, and I get news of all sorts of state-level and EU-level open-source initiatives. And the other day I saw one pretty good example of the state of affairs.

I read that an open-source tool for collaboratively editing laws is available. Which sounded great, because in the past weeks I’ve been participating in law-making and the process of sending Word document with “Track changes” via email among a dozen people is not the most optimal editing process a developer like me can imagine. So I eagerly opened the link, and…

It got to the front page of /r/opensource, and eventually, after my request to the admins, the page is accessible. And guess what? There is no link to the code or to the product, and the contact person in the link hasn’t answered my email. Yes, it’s holiday season, but that’s not the point. The point is this is not real open source. All dead software from my tweet above is gone and useless, even though, maybe, you can contact someone somewhere to get it. Update: a while later the source finally appeared, but as a zip file, rather than an open-source repo.

Not only that, but currently important EU projects like the ones under e-SENS (an umbrella for several cross-border access projects, e.g. e-procurment, e-jusice, e-authentication (Stork)) are practically closed to the general public and even governments – it took me 10 days to get get the reference implementation (which, according to the implementors, will be made public once the instructions for its use are ready, in a few months)

I hope I’m wrong, but my prediction is that these new projects will follow the fate of the countless other abandonware. Unless we change something. The solution is not just to write “open source” in the title. It should be really open source.

The good news, for Bulgaria at least, is that all new publicly funded projects will have to be open source from day one. Thus not only the work is less likely to die, but also the process will be transparent. What software gets built, with what level of quality and for how much money.

We are currently in the process of selecting the best solution to host all the repositories – either on-premise or SaaS, and we are also including requirements for supporting this solution in proposed amendments to the e-governance law.

I hope that this way we’ll get way more reusable code and way less abandonware. And I hope to extend this policy to an EU level – not just claiming it’s open in the title, but being really open. Not for the sake of being open, but for the sake of higher quality. It won’t necessarily prevent abandonware altogether (because useless software gets created all the time, and useful software gets forgotten), but it will reduce the wasted effort.


Tomcat’s Default Connector(s)

July 15, 2015

Tomcat has a couple of connectors to choose from. I’ll leave aside the APR connector, and focus on the BIO and NIO.

The BIO connector (blocking I/O) is blocking – it uses a thread pool where each thread receives a request, handles it, responds, and is returned to the pool. During blocking operations (e.g. reading from database or calling an external API) the thread is blocked.

The NIO connector (non-blocking I/O) is a bit more complicated. It uses the java NIO library and multiplexes between requests. It has two thread pools – one holds the the poller threads, which handle all incoming requests and push these requests to be handled by worker threads, held in another pool. Both pool sizes are configurable.

When to prefer NIO vs BIO depends on the use case. If you mostly have regular request-response usage, then it doesn’t matter, and even BIO might be a better choice (as seen in my previous benchmarks). If you have long-living connections, then NIO is the better choice, because it can server more concurrent users without the need to dedicate a blocked thread to each. The poller threads handle the sending of data back to the client, while the worker threads handle new requests. In other words, neither poller, nor worker threads are blocked and reserved by a single user.

With the introduction of async processing servlet it became easier to have the latter scenario from the previous paragraph. And maybe that was one of the reasons to switch the default connector from BIO to NIO in Tomcat 8. It’s an important thing to have in mind, especially because they didn’t exactly change the “default value”.

The default value is always “HTTP/1.1″, but in Tomcat 7 that “uses an auto-switching mechanism to select either a blocking Java based connector or an APR/native based connector”, while in Tomcat 8 “uses an auto-switching mechanism to select either a non blocking Java NIO based connector or an APR/native based connector”. And to make things even harder, they introduced a NIO2 connector. And to be honest, I don’t know which one of the two NIO connectors is used by default.

So even if you are experienced with tomcat configuration, have in mind this change of defaults. (And generally I’d recommend reading the documentation for all the properties and play with them on your servers)


Blue-Green Deployment With a Single Database

June 23, 2015

A blue-green deployment is a way to have incremental updates to your production stack without downtime and without any complexity for properly handling rolling updates (including the rollback functionality)

I don’t need to repeat this wonderful explanation or Martin Fowler’s original piece. But I’ll extend on them.

A blue-green deployment is one where there is an “active” and a “spare” set of servers. The active running the current version, and the spare being ready to run any newly deployed version. The “active” and “spare” is slightly different than “blue” and “green”, because one set is always “blue” and one is always “green”, while the “active” and “spare” labels change.

On AWS, for example, you can script the deployment by having two child stacks of your main stacks – active and spare (indicated by a stack label), each having one (or more) auto-scaling group for your application layer, and a script that does the following (applicable to non-AWS as well):

  • push build to an accessible location (e.g. s3)
  • set the spare auto-scaling group size to the desired value (the spare stays at 0 when not used)
  • make it fetch the pushed build on startup
  • wait for it to start
  • run sanity tests
  • switch DNS to point to an ELB in front of the spare ASG
  • switch the labels to make the spare one active and vice versa
  • set the previously active ASG size to 0

The application layer is stateless, so it’s easy to do hot-replaces like that.

But (as Fowler indicated) the database is the most tricky component. If you have 2 databases, where the spare one is a slave replica of the active one (and that changes every time you switch), the setup becomes more complicated. And you’ll still have to do schema changes. So using a single database, if possible, is the easier approach, regardless of whether you have a “regular” database or a schemaless one.

In fact, it boils down to having your application modify the database on startup, in a way that works with both versions. This includes schema changes – table (or the relevant term in the schemaless db) creation, field addition/removal and inserting new data (e.g. enumerations). And it can go wrong in many ways, depending on the data and datatypes. Some nulls, some datatype change that makes a few values unparseable, etc.

Of course, it’s harder to do it with a regular SQL database. As suggested in the post I linked earlier, you can use stored procedures (which I don’t like), or you can use a database migration tool. For a schemaless database you must do stuff manually, but but fewer actions are normally needed – you don’t have to alter tables or explicitly create new ones, as everything is handled automatically. And the most important thing is to not break the running version.

But how to make sure everything works?

  • test on staging – preferably with a replica of the production database
  • (automatically) run your behaviour/acceptance/sanity test suites against the not-yet-active new deployment before switching the DNS to point to it. Stop the process if they fail.

Only after these checks pass, switch the DNS and point your domain to the previously spare group, thus promoting it to “active”. Switching can be done manually, or automatically with the deployment script. The “switch” can be other than a DNS one (as you need a low TTL for that). It can be a load-balancer or a subnet configuration, for example – the best option depends on your setup. And while it is good to automate everything, having a few manual steps isn’t necessarily a bad thing.

Overall, I’d recommend the blue-green deployment approach in order to achieve zero downtime upgrades. But always make sure your database is properly upgraded, so that it works with both the old and the new version.