Electronic Signature Using The WebCrypto API

June 11, 2017

Sometimes we need to let users sign something electronically. Often people understand that as placing your handwritten signature on the screen somehow. Depending on the jurisdiction, that may be fine, or it may not be sufficient to just store the image. In Europe, for example, there’s the Regulation 910/2014 which defines what electronic signature are. As it can be expected from a legal text, the definition is rather vague:

‘electronic signature’ means data in electronic form which is attached to or logically associated with other data in electronic form and which is used by the signatory to sign;

Yes, read it a few more times, say “wat” a few more times, and let’s discuss what that means. And it can mean basically anything. It is technically acceptable to just attach an image of the drawn signature (e.g. using an html canvas) to the data and that may still count.

But when we get to the more specific types of electronic signature – the advanced and qualified electronic signatures, things get a little better:

An advanced electronic signature shall meet the following requirements:
(a) it is uniquely linked to the signatory;
(b) it is capable of identifying the signatory;
(c) it is created using electronic signature creation data that the signatory can, with a high level of confidence, use under his sole control; and
(d) it is linked to the data signed therewith in such a way that any subsequent change in the data is detectable.

That looks like a proper “digital signature” in the technical sense – e.g. using a private key to sign and a public key to verify the signature. The “qualified” signatures need to be issued by qualified provider that is basically a trusted Certificate Authority. The keys for placing qualified signatures have to be issued on secure devices (smart cards and HSMs) so that nobody but the owner can have access to the private key.

But the legal distinction between advanced and qualified signatures isn’t entirely clear – the Regulation explicitly states that non-qualified signatures also have legal value. Working with qualified signatures (with smartcards) in browsers is a horrifying user experience – in most cases it goes through a Java Applet, which works basically just on Internet Explorer and a special build of Firefox nowadays. Alternatives include desktop software and local service JWS applications that handles the signing, but smartcards are a big issue and offtopic at the moment.

So, how do we allow users to “place” an electronic signature? I had an idea that this could be done entirely using the WebCrypto API that’s more or less supported in browsers these days. The idea is as follows:

  • Let the user type in a password for the purpose of sining
  • Derive a key from the password (e.g. using PBKDF2)
  • Sign the contents of the form that the user is submitting with the derived key
  • Store the signature alongside the rest of the form data
  • Optionally, store the derived key for verification purposes

Here’s a javascript gist with implementation of that flow.

Many of the pieces are taken from the very helpful webcrypto examples repo. The hex2buf, buf2hex and str2ab functions are utilities (that sadly are not standard in js).

What the code does is straightforward, even though it’s a bit verbose. All the operations are chained using promises and “then”, which to be honest is a big tedious to write and read (but inevitable I guess):

  • The password is loaded as a raw key (after transforming to an array buffer)
  • A secret key is derived using PBKDF2 (with 100 iterations)
  • The secret key is used to do an HMAC “signature” on the content filled in by the user
  • The signature and the key are stored (in the UI in this example)
  • Then the signature can be verified using: the data, the signature and the key

You can test it here:

Having the signature stored should be enough to fulfill the definition of “electronic signature”. The fact that it’s a secret password known only to the user may even mean this is an “advanced electronic signature”. Storing the derived secret key is questionable – if you store it, it means you can “forge” signatures on behalf of the user. But not storing it means you can’t verify the signature – only the user can. Depending on the use-case, you can choose one or the other.

Now, I have to admit I tried deriving an asymmetric keypair from the password (both RSA and ECDSA). The WebCrypto API doesn’t allow that out of the box. So I tried “generating” the keys using deriveBits(), e.g. setting the “n” and “d” values for RSA, and the x, y and d values for ECDSA (which can be found here, after a bit of searching). But I failed – you can’t specify just any values as importKey parameters, and the constraints are not documented anywhere, except for the low-level algorithm details, and that was a bit out of the scope of my experiment.

The goal was that if we only derive the private key from the password, we can easily derive the public key from the private key (but not vice-versa) – then we store the public key for verification, and the private key remains really private, so that we can’t forge signatures.

I have to add a disclaimer here that I realize this isn’t very secure. To begin with, deriving a key from a password is questionable in many contexts. However, in this context (placing a signature), it’s fine.

As a side note – working with the WebCrypto API is tedious. Maybe because nobody has actually used it yet, so googling for errors basically gives you the source code of Chromium and nothing else. It feels like uncharted territory (although the documentation and examples are good enough to get you started).

Whether it will be useful to do electronic signatures in this way, I don’t know. I implemented it for a use-case that it actually made sense (party membership declaration signature). Whether it’s better than hand-drawn signature on a canvas – I think it is (unless you derive the key from the image, in which case the handwritten one is better due to a higher entropy).


“Architect” Should Be a Role, Not a Position

May 31, 2017

What happens when a senior developer becomes…more senior? It often happens that they get promoted to “architect”. Sometimes an architect doesn’t have to have been a developer, if they see “the bigger picture”. In the end, there’s often a person that holds the position of “architect”; a person who makes decisions about the architecture of the system or systems being developed. In bigger companies there are “architect councils”, where the designated architects of each team gather and decide wise things…

But I think it’s a bad idea to have a position of “architect”. Architect is a position in construction – and it makes sense there, as you can’t change and tweak the architecture mid-project. But software architecture is flexible and should not be defined strictly upfront. And development and architecture are so intertwined, it doesn’t make much sense to have someone who “says what’s to be done” and others who “do it”. It creates all sorts of problems, mainly coming from the fact that the architect often doesn’t fully imagine how the implementation will play out. If the architect hasn’t written code for a long time, they tend to disregard “implementation details” and go for just the abstraction. However, abstractions leak all the time, and it’s rarely a workable solution to just think of the abstraction without the particular implementation.

That’s my first claim – you cannot be a good architect without knowing exactly how to write the whole code underneath. And no, too often it’s not “simple coding”. And if you have been an architect for years, and so you haven’t written code in years, you are almost certainly not a good architect.

Yes, okay, YOU in particular may be a good architect. Maybe in your particular company it makes sense to have someone sitting in an ivory tower of abstractions and mandate how the peons have to integrate this and implement that. But even in that case, there’s a better approach.

The architect should be a role. Every senior team member can and should take the role of an architect. It doesn’t have to be one person per team. In fact, it’s better to have more. To discuss architecture in meetings similar to the feature design meetings, with the clear idea that it will be you who is going to implement the whole thing. Any overdesign (of which architects are often guilty) will have to be justified in front of yourself – “do I want to write all this boilerplate stuff, or is there a simple and more elegant way”.

The position is “software engineer”. The role can be “scrum master”, “architect”, “continuous integration officer”, etc. If the company needs an “architects’ council” to decide “bigger picture” integrations between systems, the developers can nominate someone to attend these meetings – possibly the most knowledgeable. (As many commenters put it – the architect is often a business analyst, a manager, a tech lead, most of the time. Basically “a senior developer with soft skills”. And that’s true – being “architect” is just one of their role, again)

I know what the architects are thinking now – that there are high-level concerns that developers either don’t understand or shouldn’t be bothered with. Wrong. If your developers don’t understand the higher level architectural concerns, you have a problem that would manifest itself sooner or later. And yes, they should be bothered with the bigger picture, because in order to fit the code that you are writing into the bigger picture, you should be pretty familiar with it.

There’s another aspect, and that’s team member attitudes and interaction dynamics. If someone gets “promoted” to an “architect”, without being a particularly good or respected developer, that may damage the team morale. On the other hand, one promoted to “architect” can become too self-confident and push design decisions just because they think so, and despite good arguments against them.

So, ideally (and that’s my second claim) – get rid of the architect position. Make sure your senior team members are engaged in architectural discussions and decision making – they will be more motivated that way, and they will have a more clear picture of what their development efforts should achieve. And most importantly – the architectural decisions won’t be detached from the day-to-day development “real world”, nor will they be unnecessarily complicated.


Overview of Message Queues [slides]

May 23, 2017

Yesterday I gave a talk that went through all the aspects of using messages queues. I’ve previously written that “you probably don’t need a message queue” – now the conclusion is a bit more nuanced, but I still stand by the simplicity argument.

The talk goes through the various benefits and use cases of using message queues, and discusses alternatives of the typical “message queue broker” architecture. The slides are available here

One maybe strange approach that I propose is to use distributed locks (e.g. with Hazelcast) for distributed batch processing – you lock on a particular id (possibly organization id / client id, rather than an individual record id) thus allowing multiple processors to run in parallel without stepping on each other’s toes (one node picks the first entry, the other one tries the first, but fails, and picks the second one).

Something I missed as a benefit of brokers like RabbitMQ is the available tooling – you can monitor and debug your queues pretty easily.

I wasn’t able to focus in detail on much on any of the concepts – e.g. how does brokerless work, or how to deploy a broker (e.g. RabbitMQ) in multiple data centers (availability zones), or how akka (and akka cluster) fits in the the “message queue” landscape. But I hope it’s a good overview that lets everyone have a clear picture of the options, which then they can analyze and assess in more details.

And I’ll finish with the Dijkstra quote from the slides:

Simplicity is prerequisite for reliability


Event Logs

May 12, 2017

Most system have some sort of event logs – i.e. what has happened in the system and who did it. And sometimes it has a dual existence – once as an “audit log”, and once as event log, which is used to replay what has happened.

These are actually two separate concepts:

  • the audit log is the trace that every action leaves in the system so that the system can later be audited. It’s preferable that this log is somehow secured (will discuss that another time)
  • the event log is a crucial part of the event-sourcing model where the database only stores modifications, rather than the current state. The current state is the obtained after applying all the stored modifications until the present moment. This allows seeing the state of the data at any moment in the past.

There are a bunch of ways to get that functionality. For audit logs there is Hibernate Envers, which stores all modifications in a separate table. You can also have a custom solution using spring aspects or JPA listeners (that store modifications in an audit log table whenever a change happens). You can store the changes in multiple ways – as key/value rows (one row per modified field), or as objects serialized to JSON.

Event-sourcing can be achieved by always inserting a new record instead of updating or deleting (and incrementing a version and/or setting a “current” flag). There some event-sourcing-native databases – Datomic and Event Store. (Note: “event-sourcing” isn’t equal to “insert-only”, but the approach is very similar)

They still seem pretty similar – both track what has happened on the system in a detailed way. So can an audit log be used as an event log, and can event logs be used as audit logs? And is there a difference?

Audit logs have specific features that you don’t need for event sourcing – storing business actions and being secure. When a user logs in, or views a given item, that wouldn’t issue an update (well, maybe last_login_time or last_seen can be updated, but those are side-effects). But you still may want to see details about the event in your audit log – who logged in, when, from what IP, after how many unsuccessful attempts. You may also want to know who read a given piece of information – if its sensitive personal data, it’s important to know who has seen it and whether the access was not some form of “stalking”. In the audit log you can also have “business events” rather than individual database operations. For example “checkout a basket” involves updating the basket status, decreasing the availability of the items, updating the purchase history. And you may want to just see “user X checked out basket Y with items Z at time W”.

Another very important feature if the audit logs is their integrity. You should somehow protect them from modification (e.g. by digitally signing them). Why? Because at some point that may be evidence in court (e.g. after some employee misuses your system), so the more secure and protected it is, the better evidence it is. This isn’t needed for event-sourcing, obviously.

Event sourcing has other requirements – every modification with every field must be stored. You should also be able to have “snapshots” so that the “current state” isn’t recalculated every time. The event log isn’t just a listener or an aspect around your persistence layer – it’s at the core of your data model. So in that sense it has more impact on the whole system design. Another thing event logs are useful for is consuming events from the system. For example, if you have an indexer module and you want to index changes as the come, you can “subscribe” the indexer to the events and each insert would trigger an index operation. That is not limited to indexing, and can be used for synchronizing (parts of) the state with external systems (which a Search engine is just one example of).

But due to the many similarities – can’t we have a common solution that does both? An audit log is a poor event-sourcing tool per-se, and event-sourcing models are not sufficient for audit logs.

Since event sourcing is a design choice, I think it should take the leading role – if you use event sourcing, then you can add some additional inserts for non-data-modifying business-level operations (login, view), and have your high-level actions (e.g. checkout) represented as events. You can also get a scheduled job to sign the entries (that may mean doing updates in an insert-only model, which would be weird, but still). And you will get a full-featured audit log with a little additional effort over the event sourcing mechanism.

The other way around is a bit tricky – you can still use the audit log to see the state of a piece of data in the past, but that doesn’t mean your data model will be insert-only. If you rely too much on the audit log as if it was event sourcing, maybe you need event sourcing in the first place. If it’s just occasional interest “what happened here two days ago”, then it’s fine having just the audit log.

The two concepts are overlapping so much, that it’s tempting to have both. But in conclusion, I’d say that:

  • you always need an audit log while you don’t necessarily need event sourcing.
  • if you have decided on using event-sourcing, walk the extra mile to get a proper audit log

But of course, as usual, it all depends on the particular system and its use-cases.


Spring Boot, @EnableWebMvc And Common Use-Cases

April 21, 2017

It turns out that Spring Boot doesn’t mix well with the standard Spring MVC @EnableWebMvc. What happens when you add the annotation is that spring boot autoconfiguration is disabled.

The bad part (that wasted me a few hours) is that in no guide you can find that explicitly stated. In this guide it says that Spring Boot adds it automatically, but doesn’t say what happens if you follow your previous experience and just put the annotation.

In fact, people that are having issues stemming from this automatically disabled autoconfiguration, are trying to address it in various ways. Most often – by keeping @EnableWebMvc, but also extending Spring Boot’s WebMvcAutoConfiguration. Like here, here and somewhat here. I found them after I got the idea and implemented it that way. Then realized doing it is redundant, after going through Spring Boot’s code and seeing that an inner class in the autoconfiguration class has a single-line javadoc stating

Configuration equivalent to {@code @EnableWebMvc}.

That answered my question whether spring boot autoconfiguration misses some of the EnableWebMvc “features”. And it’s good that they extended the class that provides EnableWebMvc, rather than mirroring the functionality (which is obvious, I guess).

What should you do when you want to customize your beans? As usual, extend WebMvcConfigurerAdapter (annotate the new class with @Component) and do your customizations.

So, bottom line of the particular problem: don’t use @EnableWebMvc in spring boot, just include spring-web as a maven/gradle dependency and it will be autoconfigured.

The bigger picture here resulted in me adding a comment in the main configuration class detailing why @EnableWebMvc should not be put there. So the autoconfiguration magic saved me doing a lot of stuff, but I still added a line explaining why something isn’t there.

And that’s because of the common use-cases – people are used to using @EnableWebMvc. So the most natural and common thing to do is to add it, especially if you don’t know how spring boot autoconfiguration works in detail. And they will keep doing it, and wasting a few hours before realizing they should remove it (or before extending a bunch of boot’s classes in order to achieve the same effect).

My suggestion in situations like this is: log a warning. And require explicitly disabling autoconfiguration in order to get rid of the warning. I had to turn on debug to see what gets autoconfigured, and then explore a bunch of classes to check the necessary conditions in order to figure out the situation.

And one of Spring Boot’s main use-cases is jar-packaged web applications. That’s what most tutorials are for, and that’s what it’s mostly used for, I guess. So there should be special treatment for this common use case – with additional logging and logged information helping people get through the maze of autoconfiguration.

I don’t want to be seen as “lecturing” the Spring team, who have done amazing job in having good documentation and straightforward behaviour. But in this case, where multiple sub-projects “collide”, it seems it could be improved.


Distributed Cache – Overview

April 8, 2017

What’s a distributed cache? A solution that is “deployed” in an application (typically a web application) and that makes sure data is loaded from memory, rather than from disk (which is much slower), in order to improve performance and response time.

That looks easy if the cache is to be used on a single machine – you just load your most active data from the database in memory (e.g. a Guava Cache instance), and serve it from there. It becomes a bit more complicated when this has to work in a cluster – e.g. 5 application nodes serving requests to users in a round-robin fashion.

You have to update the in-memory cache on all machines each time a piece of data is updated by a request to one of the machines. If you just load all the data in memory and don’t invalidate it, the cache won’t be “coherent” – it will have stale values and requests to different application nodes will have different results, which you most certainly want to avoid. Or you can have a single big cache server with tons of memory, but it can die – and that may disrupt the smooth operation, so you’d want to have at least 2 machines in a cluster.

You can get a distributed cache in different ways. To list a few: Infinispan (which I’ve covered previously), Terracotta/Ehcache, Hazelcast, Memcached, Redis, Cassandra, Elasticache(by Amazon). The former three are Java-specific (both JCache compliant), but the rest can be used in any setup. Cassandra wasn’t initially meant to be cache solution, but it can easily be used as such.

All of these have different configurations and different options, even different architectures. For example, you can run Ehcache with a central Terracotta server, or with peer-to-peer gossip protocol. The in-process approach is also applicable for Infinispan and Hazelcast. Alternatively, you can rely on a cloud-provided service, like Elasticache, or you can setup your own cluster of Memcached/Redis/Cassandra servers. Having the cache on the application nodes themselves is slightly faster than having a dedicated memory server (cluster), because of the network overhead.

The cache is structured around keys and values – there’s a cached entry for each key. When you want to load something form the database, you first check whether the cache doesn’t have an entry with that key (based on the ID of your database record, for example). If a key exists in the cache, you don’t do a database query.

But how is the data “distributed”? It wouldn’t make sense to load all the data on all nodes at the same time, as the duplication is a waste of space and keeping the cache coherent would again be a problem. Most of the solutions rely on the so called “consistent hashing”. When you look for a particular key, its hash is calculated and (depending on the number of machines in the cache cluster), the cache solution knows exactly on which machine the corresponding value is located. You can read more details in the wikipedia article, but the approach works even when you add or remove nodes from the cache cluster, i.e. the cluster of machines that hold the cached data.

Then, when you do an update, you also update the cache entry, which the caching solution propagates in order to make the cache coherent. Note that if you do manual updates directly in the database, the cache will have stale data.

On the application level, you’d normally want to abstract the interactions with the cache. You’d end up with something like a generic “get” method that checks the cache and only then goes to the database. This is true for get-by-id queries, but caches can also be applied for other SELECT queries – you just use the query as a key, and the query result as value. For updates it would similarly update the database and the cache.

Some frameworks offer these abstractions out of the box – ORMs like Hibernate have the so-called cache providers, so you just add a configuration option saying “I want to use (2nd level) cache” and your ORM operations automatically consult the configured cache provider before hitting the database. Spring has the @Cacheable annotation which can cache method invocations, using their parameters as keys. Other frameworks and languages usually have something like that as well.

An important side-note – you may need to pre-load the cache. On a fresh startup of a cluster (e.g. after a new deployment in a blue-green deployment scheme/crash/network failure/etc.) the system may be slow to fill the cache. So you may have a batch job run on startup to fetch some pieces of data from the database and put them in the cache.

It all sounds easy and that’s good – the basic setup covers most of the cases. In reality it’s a bit more complicated to tweak the cache configurations. Even when you manage to setup a cluster (which can be challenging in, say, AWS), your cache strategies may not be straightforward. You have to answer a couple of questions that may be important – how much do you want your cache entries to live? How big do you need your cache to be? How should elements expire (cache eviction strategy) – least recently used, least frequently used, first-in-first-out?

These options often sound arbitrary. You normally go with the defaults and don’t bother. But you have to constantly keep on eye on the cache statistics, do performances tests, measure and tweak. Sometimes a single configuration option can have a big impact.

One important note here on the measuring side – you don’t necessarily want to cache everything. You can start that way, but it may turn out that for some types of entries you are actually better off without a cache. Normally these are the ones that are updated too often and read not that often, so constantly updating the cache with them is an overhead that doesn’t pay off. So – measure, tweak, repeat.

Distributed caches are an almost mandatory component of web applications. Yet, I’ve had numerous discussions about how a given system is very big and needs a lot of hardware, where all the data would fit in my laptop’s memory. So, much to my surprise, it turns out it’s not yet a universally known concept. I hope I’ve given a short, but sufficient overview of the concept and the various options.


Distributing Election Volunteers In Polling Stations

March 20, 2017

There’s an upcoming election in my country, and I’m a member of the governing body of one of the new parties. As we have a lot of focus on technology (and e-governance), our internal operations are also benefiting from some IT skills. The particular task at hand these days was to distribute a number of election day volunteers (that help observe the fair election process) to polling stations. And I think it’s an interesting technical task, so I’ll try to explain the process.

First – data sources. We have an online form for gathering volunteer requests. And second, we have local coordinators that collect volunteer declarations and send them centrally. Collecting all the data is problematic (to this moment), because filling the online form doesn’t make you eligible – you also have to mail a paper declaration to the central office (horrible bureaucracy).

Then there’s the volunteer preferences – in the form they’ve filled whether they are willing to travel, or they prefer their closest poling station. And then there’s the “priority” polling stations, which are considered to be more risky and therefore we need volunteers there.

I decided to do the following:

  • Create a database table “volunteers” that holds all the data about all prospective volunteers
  • Import all data – using apache CSV parser, parse the CSV files (converted from Google sheets) with the 1. online form 2. data from the received paper declarations
  • Match the entries from the two sources by full name (as the declarations cannot contain an email, which would otherwise be the primary key)
  • Geocode the addresses of people
  • Import all polling stations and their addresses (public data by the central election commission)
  • Geocode the addresses of the polling stations
  • Find the closest polling station address for each volunteer

All of the steps are somewhat trivial, except the last part, but I’ll still explain in short. The CSV parsing and importing is straightfoward. The only thing one has to be careful is have the ability to insert additional records on a later date, because declarations are being received as I’m writing.

Geocoding is a bit trickier. I used the OpenStreetMap initially, but it managed to find only a fraction of the addresses (which are not normalized – volunteers and officials are sometimes careless about the structure of the addresses). The OpenStreetMap API can be found here. It’s basically calling http://nominatim.openstreetmap.org/search.php?q=address&format=json with the address. I tried cleaning up some of the addresses automatically, which lead to a couple more successful geocodings, but not much.

The rest of the coordinates I obtained through Google maps. I extract all the non-geocoded addresses and their corresponding primary keys (for volunteers – the full name; for polling stations – the hash of a semi-normalized address), parse them with javascript, which then invokes the Google Maps API. Something like this:

<script type="text/javascript" src="jquery.csv.min.js"></script>
<script type="text/javascript">
	var idx = 1;
	function initMap() {
        var map = new google.maps.Map(document.getElementById('map'), {
          zoom: 8,
          center: {lat: -42.7339, lng: 25.4858}
        var geocoder = new google.maps.Geocoder();

		$.get("geocode.csv", function(csv) {
			var stations = $.csv.toArrays(csv);
			for (var i = 1; i < stations.length; i ++) {
				setTimeout(function() {
					geocodeAddress(geocoder, map, stations[idx][1], stations[idx][0]);
				}, i * 2000);

      function geocodeAddress(geocoder, resultsMap, address, label) {
        geocoder.geocode({'address': address}, function(results, status) {
          if (status === 'OK') {
            $("#out").append(results[0].geometry.location.lat() + "," + results[0].geometry.location.lng() + ",\"" + label.replace('"', '""').trim() + "\"<br />");
          } else {
            console.log('Geocode was not successful for the following reason: ' + status);

This spits out CSV on the screen. Which I then took and transformed with regex replace (Notepad++) to update queries:

Find: (\d+\.\d+),(\d+\.\d+),(".+")
Replace: UPDATE addresses SET lat=$1, lon=$2 WHERE hash=$3

Now that I had most of the addresses geocoded, the distance searching had to begin. I used the query from this SO question to come up with this (My)SQL query:

SELECT MIN(distance), email, names, stationCode, calc.address FROM
(SELECT email, codePrefix, addresses.address, names, ( 3959 * acos( cos( radians(volunteers.lat) ) * cos( radians( addresses.lat ) )
   * cos( radians(addresses.lon) - radians(volunteers.lon)) + sin(radians(volunteers.lat))
   * sin( radians(addresses.lat)))) AS distance
 from (select address, hash, stationCode, city, lat, lon FROM addresses JOIN stations ON addresses.hash = stations.addressHash GROUP BY hash) as addresses
 JOIN volunteers WHERE addresses.lat IS NOT NULL AND volunteers.lat IS NOT NULL ORDER BY distance ASC) as calc
GROUP BY names;

This spits out the closest polling station to each of the volunteers. It is easily turned into an update query to set the polling station code to each of the volunteers in a designated field.

Then there’s some manual amendments to be made, based on traveling preferences – if the person is willing the travel, we pick one of the “priority stations” and assign it to them. Since these are a small number, it’s not worth automating it.

Of course, in reality, due to data collection flaws, the above idealized example was accompanied by a lot of manual labour of checking paper declarations, annoying people on the phone multiple times and cleaning up the data, but in the end a sizable portion of the volunteers were distributed with the above mechanism.

Apart from being an interesting task, I think it shows that programming skills are useful for practically every task nowadays. If we had to do this manually (and even if we had multiple people with good excel skills), it would be a long and tedious process. So I’m quite in favour of everyone being taught to write code. They don’t have to end up being a developer, but the way programming helps non-trivial tasks is enormously beneficial.


“Infinity” is a Bad Default Timeout

March 17, 2017

Many libraries wrap some external communication. Be it a REST-like API, a message queue, a database, a mail server or something else. And therefore you have to have some timeout – for connecting, for reading, writing or idling. And sadly, many libraries have their default timeouts set to “0” or “-1” which means “infinity”.

And that is a very useless and even harmful default. There isn’t a practical use case where you’d want to hang on forever waiting for a resource. And there are tons of situations where this can happen, e.g. the other end gets stuck. In the past 3 months I had 2 libraries that have a default timeout of “infinity” and that eventually lead to production problems because we’ve forgotten to configure them properly. Sometimes you even don’t see the problem, until a thread pool gets exhausted.

So, I have a request to API/library designers (as I’ve done before – against property maps and encoding other than UTF-8). Never have “infinity” as a default timeout. Your library will thus cause lots of production issues.
Also note that it’s sometimes an underlying HTTP client (or Socket) that doesn’t have a reasonable default – it’s still your job to fix that when wrapping it.

What default should you provide? Reasonable. 5 seconds maybe? You may (rightly) say you don’t want to impose an arbitrary timeout on your users. In that case I have a better proposal:

Explicitly require a timeout for building your “client” (because these libraries are most often clients for some external system). E.g. Client.create(url, credentials, timeout). And fail if no timeout is provided. That makes the users of the client actively consider what is a good timeout for their usecase – without imposing anything, and most importantly – without risking stuck connections in production. Additionally, you can still present them with a “default” option, but still making them explicitly choose it. For example:

Client client = ClientBuilder.create(url)
// OR
Client client = ClientBuilder.create(url)

The builder above should require “timeouts” to be set, and should fail if neither of the two methods was invoked. Even if you don’t provide these options, at least have a good way of specifying timeouts – some libraries require reflection to set the timeout of their underlying client.

I believe this is one of those issues that look tiny, but caus a lot of problems in the real world. And it can (and should) be solved by the library/client designers.

But since it isn’t always the case, we must make sure that timeouts are configured every time we use a 3rd party library.


Protecting Sensitive Data

March 12, 2017

If you are building a service that stores sensitive data, your number one concern should be how to protect it. What IS sensitive data? There are some obvious examples, like medical data or bank account data. But would you consider a dating site database as sensitive data? Based on a recent leaks of a big dating site I’d say yes. Is a cloud turn-by-turn nagivation database sensitive? Most likely, as users journeys are stored there. Facebook messages, emails, etc – all of that can and should be considered sensitive. And therefore must be highly protected. If you’re not sure if the data you store is sensitive, assume it is, just in case. Or a subsequent breach can bring your business down easily.

Now, protecting data is no trivial feat. And certainly cannot be covered in a single blog post. I’ll start with outlining a few good practices:

  • Don’t dump your production data anywhere else. If you want a “replica” for testing purposes, obfuscate the data – replace the real values with fakes ones.
  • Make sure access to your servers is properly restricted. This includes using a “bastion” host, proper access control settings for your administrators, key-based SSH access.
  • Encrypt your backups – if your system is “perfectly” secured, but your backups lie around unencrypted, they would be the weak spot. The decryption key should be as protected as possible (will discuss it below)
  • Encrypt your storage – especially if using a cloud provider, assume you can’t trust it. AWS, for example, offers EBS encryption, which is quite good. There are other approaches as well, e.g. using LUKS with keys stored within your organization’s infrastructure. This and the previous point are about “Data at rest” encryption.
  • Monitoring all access and auditing operations – there shouldn’t be an unaudited command issued on production.
  • In some cases, you may even want to use split keys for logging into a production machine – meaning two administrators have to come together in order to gain access.
  • Always be up-to-date with software packages and libraries (well, maybe wait a few days/weeks to make sure no new obvious vulnerability has been introduced)
  • Encrypt internal communication between servers – the fact that your data is encrypted “at rest”, may not matter, if it’s in plain text “in transit”.
  • in rare cases, when only the user has to be able to see their data and it’s very confidential, you may encrypt it with a key based (in part) on their password. The password alone does not make a good encryption key, but there are key-derivation functions (e.g. PBKDF2) that are created to turn low-entropy passwords into fair keys. The key can be combined with another part, stored on the server side. Thus only the user can decrypt their content, as their password is not stored anywhere in plain text and can’t be accessed even in case of a breach.

You see there’s a lot of encryption happening, but with encryption there’s one key question – who holds the decryption key. If the key is stored in a configuration file on one of your servers, the attacker that has gained access to your infrastructure, will find that key as well, get it, get the whole db, and then happily wait for it to be fully decrypted on his own machines.

To store a key securely, it has to be on a tamper-proof storage. For example, an HSM (Hardware Security Module). If you don’t have HSMs, Amazon offers it as part of AWS. It also offers key management as a service, but the particular provider is not important – the concept is important. Then you need to have a securely stored key on a device that doesn’t let the key out under no circumstances, even a breach (HSM vendors claim that’s the case).

Now, how to use these keys depends on the particular case. Normally, you wouldn’t use the HSM itself to decrypt data, but rather to decrypt the decryption key, which in turn is used to decrypt the data. If all the sensitive data in your database is encrypted, even if the attacker gains SSH access and thus gains access to the database (because your application needs unencrypted data to work with; homomorphic encryption is not yet here), he’ll have to get hold of the in-memory decryption key. And if you’re using envelope encryption, it will be even harder for an attacker to just dump your data and walk away.

Note that the ecnryption and decryption here are at the application level – so the encrypted data can be not simply “per storage” or “per database”, but also per column – usernames don’t have to be kept so secret, but the associated personal data (in the next 3 database columns) should. So you can plug the encryption mechanism in your pre-persist (and decryption – in post-load) hooks. If speed is an issue, i.e. you don’t want to do the decryption in real-time, you may have a (distributed) cache of decrypted data that you can refresh with a background job.

But if your application has to know the data, an attacker that gains full control for an unlimited amount of time, will also have the full data eventually. No amount of enveloping and layers of encryption can stop that, it can only make it harder and slower to obtain the dump (even if the master key, stored on HSM, is not extracted, the attacker will have an interface to that key to use it for decrypting the data). That’s why intrusion detection is key. All of the above steps combined with an early notification of intrusion can mean your data is protected.

As we are all well aware, there is never a 100% secure system. Our job is to make it nearly impossible for bulk data extraction. And that includes proper key management, proper system-level and application-level handling of encryption and proper monitoring and intrusion detection.


A Case For Native Smart Card Support in Browsers

February 22, 2017

A smart card is a device that holds a private key securely without letting it out of its storage. The chip on your credit card is a “smart card” (yup, terminology is ambiguous – the card and the chip are interchangeably called “smart card”). There are smaller USB-pluggable hardware readers that only hold the chip (without an actual card – e.g. this one).

But what’s the use? This w3c workshop from several years ago outlines some of them: multi-factor authentication, state-accepted electronic identification, digital signatures. All these are part of a bigger picture – that using the internet is now the main means of communication. We are moving most of our real-world activities online, so having a way to identify who we are online (e.g. to a government, to a bank), or being able to sign documents online (with legal value) is crucial.

That’s why the EU introduced the eIDAS regulation which defines (among other things) electronic identification and digital signatures. The framework laid there is aimed at having legally binding electronic communication, which is important in so many cases. Have you ever done the print-sign-scan exercise? Has your e-banking been accessed by an unauthorized person? Well, the regulation is supposed to fix these and more more issues.

Two factor authentication is another more broad concept, which has a tons of sub-optimal solutions. OTP tokens, google authenticator, sms code confirmation. All these have issues (e.g. clock syncing, sms interception, cost). There are hardware tokens like YubiKey, but they offer only a subset of the features a smart card does.

But it’s not just about legally-recognized actions online and two-factor authentication. It opens up other possibilities, like a more secure online credit card payment – e.g. you put your card in a reader and type your PIN, rather than entering the card number, CVC, date, names, 3d password and whatnot.

With this long introduction I got to the problem: browsers don’t support smart cards. In the EU, where electronic signatures are legally recognized, there is always the struggle of making them work with browsers. The solution so far: Java applets. A Java applet can interact with the smart card through the java crypto APIs, and thus provide signing features. However, with the deprecation of Java applets this era of constant struggle will end soon (and it is a struggle – having to click at least 2 confirmations and keep your java up to date, which even for developers is a hassle). There used to be a way to do it a few years ago in Firefox and IE, using window.crypto and CAPICOM APIs, but these got deprecated.

Recently the trend has been to use a “cloud-based” approach, where the keys reside on an HSM. That’s of course useful, but the problem with identification remains – getting access to your keys on the HSM requires, again, two factor authentication. Having the hardware token “in your hands” is what adds the security.

Smart people in Estonia (which has the most digital government in the world) had a better solution than Java or HSM – browser plugins that allow interaction with their ID card (which is/has a smart card). The solution is here and here. This has worked pretty well – you install the plugins one (which a one-in-all installer) and you can sign documents with javascript. You also get the proper PKCS libraries installed, and the root certificates needed to allow TLS 1.2 authentication with the hardware token (identification and authentication vs signing). The small downside of this approach is that it is somewhat fragile and dependent on browser whims – the plugins have to be upgraded constantly and are at risk of being completely broken if some browser decides to deprecate some Plugin APIs.

Another approach is the “local service” approach, which has two flavours. One is – you install a local application that exposes an HTTP interface and using javascript and proper same-origin configuration you send the files needed for signing to the service, and then get the result as an HTTP response, which you can then, again using javascript, append to the page that requested the signing. The downside here – getting a service installed to listen to a given port without administrator rights. The other approach is having an application hooked to a custom protocol (e.g. signature://). So whenever the page wants the user to sign something, it opens signnature://path-to-document-to-sign, which is intercepted by the locally installed application, digital signing is performed, and the result is pushed to (one-time) URL specified in the metadata of the document to sign. Something like that is implemented by 4identity.eu and it actually works.

Now, signature is one thing, identification (TLS client auth) is another. Allegedly, things should work there – PKCS#11 is a standard that should allow TLS client auth to happen with a smart card. Reality is – it doesn’t. You often need a vendor-specific PKCS#11 library. OpenSC, which is a cool tool that works with many smart cards, only works with Firefox and Safari. Charismatics commercial is a piece of software that is supposed to work with all smart cards out there – well, it doesn’t always.

And the problem here is the smart card vendors. The need for OpenSC and Charismatics arises because even though there are a few PKCS standards, smart cards are a complete mess. Not only it’s a mess, but it’s a closed, secretive mess. APDUs (the commands you send to the smartcard in order to communicate with it) are in most cases secret. You don’t get to know them even if you purchase tens of thousands of cards – you only get a custom vendor software that knows them. Then you have to reverse-engineer them to know how to actually talk to them. And they differ not only across vendors, but across card models of different vendors. For that reason the Estonian approach was a bit simpler to implement – they had only one type of smart card, given to all citizens and they were mostly in control. In other countries it’s a … mess. At least a dozen different types of cards to be supported.

So my first request is to smart card vendors (which are not that many) – please, please fix your mess. Get rid of that extra bit of “security through obscurity” to allow browsers to communicate with you without extra shenanigans.

My second request is to browser vendors – please do support smart card crypto natively. Unfortunately, due to the smart card mess above (among other things), hardware crypto has explicitly been excluded from the Web crypto API. As a follow-up to that, there’s the Hardware security working group, but afaik it’s still “work in progress”, and my feeling is it’s not that much yet. In w3c it’s important that browser vendors agree to implement something before it’s a standard, and I’ve heard that some are opposing the smart card integration. Due to the aforementioned mess, I guess.

You may say – standardization will fix this. Well, it hasn’t so far. The EU officials are aware of the problem, and that the eIDAS regulation may be thwarted by these technical issues, but they are powerless, as the EU is not a standardization body.

So it all comes down to having a joint effort between browser and smart card vendors to fix this thing once and for all. So, please do that in order to enable a more secure and legally-compliant web.