KISS With Essential Complexity

April 14, 2015

Accidental complexity, in a broader sense, is the complexity that developers add to their code and that is not necessary for the code to work. That may include overengineering, overuse of design patterns, poor choice of tools, frameworks and paradigms, writing snippets of code in a hard to read way. For example, if you can do a project with a simple layered architecture, doing it with microservices (and having to decide on granularity, coordinate them, etc.) or message-driven architecture (with setting up the broker and its queues’ all sorts of configurations) increases the complexity of the software unnecessarily, and is therefore accidental complexity. If you want to parse XML and covert it to objects, using SAX adds a lot of accidental complexity, compared to xmo-to-object mapper (e..g JAXB), where you just add a few annotations (hopefully). If your logic can be expressed with a few lambda expressions, but instead you do several nested for-loops with if clauses inside, then that’s accidental complexity. For me at least, accidental complexity is about making it hard to read, maintain and deploy with no good reason, apart from not knowing better.

Essential complexity on the other hand comes from the world you are trying to model with your software. It’s about the inevitable edge cases you have to handle if you want your software to be fit for actual use. Essential complexity can and does make your code harder to read and maintain. It makes it look like “legacy” code, but as Spolsky points out, that’s the way of things, and that’s the way it should be. Unexpected API calls, classes that exist for some bizarre edge-case that you discovered after half a year of actual use, ifs and fors that you think you can just remove – these are the marks of real software.

(I’m aware of of another view of accidental complexity – that it still adds value, but is not the problem that you are solving. That’s a long discussion, but I think that anything that is inherently complex and needs to be done (e.g. rolling updates) is essential complexity, i.e. you can’t do without it.)

If the business process you are modeling has a lot of branches and even loops, and it can’t be optimized, then the code that handled that business process has to be “hairy”. When you have to run your software on a device that can lose connectivity, or have poor connectivity, can be restarted at any moment, then the code for retrying, for re-applying offline steps, and the likes, is necessary, even if it’s huge and hard to follow.

But is that it? Is it that we can’t do anything about our essential complexity, and we can only leave the ugly bits of code there, shrugging and saying “well, I know it’s bad, man, but what can you do – essential complexity”.

Well, we can’t get rid of it. But we can make it slightly friendlier. I have two specific approaches.

Document the scenarios that require the complexity. Either directly in the code, or linked in the code. Most of the code that looks “WTF” can look completely logical if you know why it’s there. If we make sure all the bizarre code makes sense to everyone, by explaining the business reason behind it, then we have solved part of the problem.

But that’s just on the surface. Can we actually follow the “Keep it simple, stupid” (KISS) principle when it comes to essential complexity? Yes, to an extent. You can’t make complexity simple, but you can present it in a simpler way. What we want to achieve is reduce the perceived complexity, to make it easier to follow and reason about.

The first thing to look for is any accidental complexity that you have introduced around the essential one. It usually happens that essential complexity makes accidental complexity more likely to appear, probably because all the focus of the developer is on grasping every aspect of the scenario he’s working on, that he forgets about good practices. But eliminating that is not enough either.

Ironically, here is where common (design) patterns and specific frameworks come handy. You need to represent a complex sequence of states of your application? Use a finite state machine implementation, rather than bits and pieces here and there. You need to represent a complex business process? Use a business process management framework, rather than just flow control structures. You have a lot of dependencies in your classes (even though your classes are designed and packaged well)? Use a dependency injection framework. Or in many cases – just refactor, I know this answer of mine is the most obvious thing, but we’ve all seen complex methods that just do a lot of stuff and do not follow that approach. Because it grew with time, so nobody realized it has become that big.

But apart from a couple of example, I cannot give a general rule. Reducing the perceived complexity is (obviously) highly dependent on the perception of the one reducing it. But as a one-line advice – always think of how you can rearrange the code around the inherent, essential complexity of your application to make it look less complex.


Getting Notified About RabbitMQ Cluster Partitioning

April 6, 2015

If you are running RabbitMQ in a cluster, it is not unlikely that the cluster gets partitioned (part of the cluster losing connection to the rest). The basic commands to show the status and configure the behaviour is explained in the linked page above. And when partitioning happens, you want to first be notified about that, and second – resolve it.

RabbitMQ actually automatically handles the second, with the cluster_partition_handling configuration. It has three values: ignore, pause_minority and autoheal. The partitions guide linked above explains that as well (“Which mode should I pick?”). Note that whatever you choose, you have a problem and you have to restore the connectivity. For example, in a multi-availability-zone setup I explained a while ago it’s probably better to use pause_minority and then to manually reconnect.

Fortunately, it’s rather simple to detect partitioning. The status command has an empty “partitions” element if there is no partitioning, and there is either a non-empty partitions element, or no such element at all, if there are partitions. So this line does the detection:

clusterOK=$(sudo rabbitmqctl cluster_status | grep "{partitions,\[\]}" | wc -l)

You would want to schedule that script to run every minute, for example. What to do with the result depends on the tool you use (Nagios, CloudWatch, etc). For Nagios there is a ready-to-use plugin, actually. And if it’s AWS CloudWatch, then you can do as follows:

if [ "$clusterOK" -eq "0" ]; then
	echo "RabbitMQ cluster is partitioned"
	aws cloudwatch put-metric-data --metric-name $METRIC_NAME --namespace $NAMESPACE --value 1 --dimensions Stack=$STACKNAME --region $REGION
	aws cloudwatch put-metric-data --metric-name $METRIC_NAME --namespace $NAMESPACE --value 0 --dimensions Stack=$STACKNAME --region $REGION

When partitioning happens, the important things is getting notified about it. After that it depends on the particular application, problem, configuration of queues (durable, mirrored, etc.)


A Non-Blocking Benchmark

March 23, 2015

A couple of weeks ago I asked the question “Why non-blocking?”. And I didn’t reach a definitive answer, although it seemed that writing non-blocking code is not the better option – it’s not supposed to be faster or have higher throughput, even though conventional wisdom says it should.

So, leaving behind the theoretical questions, I decided to do a benchmark. The code is quite simple – it reads a 46kb file into memory and then writes it to the response. That’s the simplest scenario that’s still close the the regular usecase of a web application – reading stuff from the database, performing some logic on it, and then writing a view to the client (it’s disk I/O vs network I/O in case the database is on another server, but let’s disregard that for now)

There are 5 distinct scenarios: Servlet using BIO connector, Servlet using NIO connector, Node.js, Node.js using sync file reading and Spray (a scala non-blocking web framework). Gatling was used to perform the tests, and was run on a t2.small AWS instance; the application code was run on a separate m3.large instance.

The code used in the benchmark as well as the full results are available on GitHub. (Note: please let me know if you spot something really wrong with the benchmark that skews the results)

What do the results tell us? That it doesn’t matter whether it’s blocking or non-blocking. Differences in response time and requests/sec (as well as the other factors) are negligible.

Spray appears to be slightly better when the load is not so high, whereas BIO happens to have more errors on a really high load (but being fastest at the same time), Node.js is surprisingly fast for a javascript runtime (kudos to Google for V8).

The differences in the different runs are way more likely to be due to the host VM current CPU and disk utilization or the network latency, rather than the programming model or the framework used.

After reaching this conclusion, the fact that spray is seemingly faster bugged me (especially given that I executed the spray tests half an hour after the rest), so I wanted to rerun the tests this morning. And my assumption about the role of infrastructure factors could not have been proven more right. I ran the 60 thousand requests test and the mean time was 3 seconds (for both spray and servlet), with a couple of hundred failures and only 650 requests/sec. This aligned with my observation that AWS works a lot faster when I start and delete cloud formation stacks early in the morning (GMT+2, when Europe is still sleeping and the US is already in bed).

The benchmark is still valid, as I executed it within 1 hour on a Sunday afternoon. But the whole experiment convinced me even more of what I concluded in my previous post – that non-blocking doesn’t have visible benefits and one should not force himself to use the possibly unfriendly callback programming model for the sake of imaginary performance gains. Niche cases aside, for the general scenario you should pick the framework, language and programming model that people in the team are most comfortable with.


How to Land a Software Engineering Job?

March 13, 2015

The other day I read this piece by David Byttow on “How to land an engineering job”. And I don’t fully agree with his assertions.

I do agree, of course, that one must always be writing code. Not writing code is the worst that can happen to a software engineer.

But some details are where our opinions diverge. I don’t agree that you should know the complexities of famous algorithms and data structures by heart, and I do not agree that you should be able to implement them from scratch. He gives no justification for this advise, and just says “do so”. And don’t get me wrong – you should know what computational complexity is, and what algorithms there are for traversing graphs and trees. But implementing them yourself? What for? I have implemented sorting algorithms, tree structures and the likes a couple of times, just for the sake of it. 2 years later I can’t do it again without checking an example or a description. Why? Because you never need those things in your day-to-day programming. And why would you know the complexity of a graph search algorithms if you can look it up in 30 seconds?

The other thing I don’t agree with is solving TopCoder-like problems. Yes, they probably help you improve your algorithm writing skills, but spending time on that, rather than writing actual code (e.g. as side-projects) to me is a bit of waste. Not something you should not do, but something that you don’t have to. If you like solving those types of problems – by all means, do it. But don’t insist that “real programmers solve non-real-world puzzles”. Especially when the question is how to get an software engineering job.

Because software engineering, as I again agree with David Byttow, is a lot more than writing code. It’s contemplating all aspects of a software system, using many technologies and many levels of abstraction. But what he insists is that you must focus on lower levels (e.g. data structures) and be expert there. But I think you are free to choose the levels of abstraction you are an expert in, as long as you have a good overview of those below/above.

And let’s face it – getting an engineering job is easy. The demand for engineers is way higher than the supply, so you have to be really incompetent not to be able to get any job. How to get an interesting and highly-paid job is a different thing, but I can assure you that there’s enough of those as well, and not all of them require you to solve freshman year style problems on interviews. And I see that there is this trend, especially in Silicon Valley, to demand knowing the computer science components of software engineering by heart. And I don’t particularly like it, but probably if you want a job at Google or Facebook, then you do have to know the complexities of popular algorithms, and be able to implement a red-black tree on a whiteboard. But that doesn’t mean every interesting company out there requires those things, and does not mean that you are not a worthy engineer.

One final disagreement – not knowing exact details about the company you are applying at (or is recruiting you), is fine. Maybe companies are obsessed with themselves, but when you go to a small-to-medium sized company that does not have world-wide fame, not knowing the competition in their niche is mostly fine. (And it makes a difference whether you applied, or they headhunted you.)

But won’t my counter-advise land you a mediocre job? No. There are companies doing “cool stuff” that don’t care if you know Dijkstra’s algorithm by heart. As long as you demonstrate the ability to solve problems, broad expertise, and passion about programming, you are in. That includes (among others) TomTom, eBay, Rakuten, Ericsson (those I’ve interviewed with or worked at). It may not land you a job at Google, but should we focus on being good engineers, or on fulfilling Silicon Valley artificial interview criteria?

So far I’ve mostly disagreed, but I didn’t actually give a bullet-point how-to. So in addition to the things I agree with in David’s article, here’s some more:

  • know a technology well – if you’ve worked with a given technology for the past year, you have to know it in depth; otherwise you seem like that guy that doesn’t actually know what he’s doing, but still gets some of the boilerplate/easy tasks.
  • show that software engineering is not a 9-to-5 thing for you. Staying up-to-date with latest trends, having a blog, GitHub contributions, own side projects, talks, meetups – all of these count.
  • have broad expertise – being just a “very good Spring/Rails/Akka/…” developer doesn’t cut it. You have to know how software is designed, deployed, managed. You don’t need to have written millions of lines of CloudFormation, or supported a Puppet installation by yourself, but at least you have to know what infrastructure and deployment automation is. (Whew, I managed to avoid the “full-stack” buzzword)
  • know the basics – as pointed out above, you don’t have to know complexities and implementations by heart. But not knowing what a hashtable or a linked list is (in terms of usage patterns at least) hurts your chances significantly. Knowing that somethings exists when you need it is the practical compromise between knowing how to write it and not having the faintest idea about it.
  • be able to solve problems – usually interviewers may usually ask a hypothetical question (in fact, one that they recently faced) and see how you attack the problem. Don’t say you don’t have enough information or you don’t know – just try to solve it. It may not be correct, but a well-thought attempt still counts.
  • be respectful. That doesn’t mean overly-professional or shy, but assume that the people interviewing you are just like you – good developers that love creating software.

That won’t guarantee you a job, of course. And it won’t get you a job at Google. But you can land a job where you can do pretty interesting things on a large scale.


Why Non-Blocking?

March 2, 2015

I’ve been writing non-blocking, asynchronous code for the past year. Learning how it works and how to write it is not hard. Where are the benefits coming from is what I don’t understand. Moreover, there is so much hype surrounding some programming models, that you have to be pretty good at telling marketing from rumours from facts.

So let’s first start with clarifying the terms. Non-blocking applications are written in a way that threads never block – whenever a thread would have to block on I/O (e.g. reading/writing from/to a socket), it instead gets notified when new data is available. How is that implemented is out of the scope of this post. Non-blocking applications are normally implemented with message passing (or events). “Asynchronous” is related to that (in fact, in many cases it’s a synonym for “non-blocking”), as you send your request events and then get response to them in a different thread, at a different time – asynchronously. And then there’s the “reactive” buzzword, which I honestly can’t explain – on one hand there’s the reactive functional programming, which is rather abstract; on the other hand there’s the reactive manifesto which defines 3 requirements for practically every application out there (responsive, elastic, resilient) and one implementation detail (message-driven), which is there for no apparent reason. And how does the whole thing relate to non-blocking/asynchronous programming – probably because of the message-driven thing, but often the three go together in the buzzword-driven marketing jargon.

Two examples of frameworks/tools that are used to implement non-blocking (web) applications are Akka (for Scala nad Java) and Node.js. I’ve been using the former, but most of the things are relevant to Node as well.

Here’s a rather simplified description of how it works. It uses the reactor pattern (ahaa, maybe that’s where “reactive” comes from?) where one thread serves all requests by multiplexing between tasks and never blocks anywhere – whenever something is ready, it gets processed by that thread (or a couple of threads). So, if two requests are made to a web app that reads from the database and writes the response, the framework reads the input from each socket (by getting notified on incoming data, switching between the two sockets), and when it has read everything, passes a “here’s the request” message to the application code. The application code then sends a message to a database access layer, which in turn sends a message to the database (driver), and gets notified whenever reading the data from the database is complete. In the callback it in turn sends a message to the frontend/controller, which in turn writes the data as response, by sending it as message(s). Everything consists of a lot of message passing and possibly callbacks.

One problem of that setup is that if at any point in the code the thread blocks, then the whole things goes to hell. But let’s assume all your code and 3rd party libraries are non-blocking and/or you have some clever way to avoid blocking everything (e.g. an internal thread pool that handles the blocking part).

That brings me to another point – whether only reading and writing the socket is non-blocking as opposed to the whole application being non-blocking. For example, Tomcat’s NIO connector is non-blocking, but (afaik, via a thread pool) the application code can be executed in the “good old” synchronous way. Though I admit I don’t fully understand that part, we have to distinguish asynchronous application code from asynchronous I/O, provided by the infrastructure.

And another important distinction – the fact that your server code is non-blocking/asynchronous, doesn’t mean your application is asynchronous to the client. The two things are related, but not the same – if your client uses long-living connection where it expects new data to be pushed from the server (e.g. websockets/comet) then the asynchronicity goes outside your code and becomes a feature of your application, from the perspective of the client. And that can be achieved in multiple ways, including Java Servlet with async=true (that is using a non-blocking model so that long-living connections do not each hold a blocked thread).

Okay, now we know roughly how it works, and we can even write code in that paradigm. We can pass messages around, write callbacks, or get notified with a different message (i.e. akka’s “ask” vs “tell” pattern). But again – what’s the point?

That’s where it gets tricky. You can experiment with googling for stuff like “benefits of non-blocking/NIO”, benchmarks, “what is faster – blocking or non-blocking”, etc. People will say non-blocking is faster, or more scalable, that it requires less memory for threads, has higher throughput, or any combination of these. Are they true? Nobody knows. It indeed makes sense that by not blocking your threads, and when you don’t have a thread-per-socket, you can have less threads service more requests. But is that faster or more memory efficient? Do you reach the maximum number of threads in a big thread pool before you max the CPU, network I/O or disk I/O? Is the bottleneck in a regular web application really the thread pool? Possibly, but I couldn’t find a definitive answer.

This benchmark shows raw servlets are faster than Node (and when spray (akka) was present in that benechmark, it was also slower). This one shows that the NIO tomcat connector gives worse throughput. My own benchmark (which I lost) of spray vs spring-mvc showed that spray started returning 500 (Internal Server Error) responses with way less concurrent requests than spring-mvc. I would bet there are counter-benchmarks that “prove” otherwise.

The most comprehensive piece on the topic is the “Thousands of Threads and Blocking I/O” presentation from 2008, which says something I myself felt – that everyone “knows” non-blocking is better and faster, but nobody actually tested it, and that people sometimes confuse “fast” and “scalable”. And that blocking servers actually perform ~20 faster. That presentation, complemented by this “Avoid NIO” post, claim that the non-blocking approach is actually worse in terms of scalability and performance. And this paper (from 2003) claims that “Events Are A Bad Idea (for high-concurrency servers)”. But is all this objective, does it hold true only for the Java NIO library or for the non-blocking approach in general; does it apply to Node.js and akka/spray, and how do applications that are asynchronous from the client perspective fit into the picture – I honestly don’t know.

It feels like the old, thread-pool-based, blocking approach is at least good enough, if not better. Despite the “common knowledge” that it is not.

And to complicate things even further, let’s consider usecases. Maybe you should use a blocking approach for a RESTful API with a traditional request/response paradigm, but maybe you should make a high-speed trading web application non-blocking, because of the asynchronous nature. Should you have only your “connector” (in tomcat terms) nonblocking, and the rest of your application blocking…except for the asynchronous (from client perspective) part? It gets really complicated to answer.

And even “it depends” is not a good-enough answer. Some people would say that you should to your own benchmark, for your usecase. But for a benchmark you need an actual application. Written in all possible ways. Yes, you can use some prototype, basic functionality, but choosing the programming paradigm must happen very early (and it’s hard to refactor it later). So, which approach is more performant, scalable, memory-efficient? I don’t know.

What do I know, however, is which is easier to program, easier to test and easier to support. And that’s the blocking paradigm. Where you simple call methods on objects, not caring about callbacks and handling responses. Synchronous, simple, straightforward. This is actually one of the points in both the presentation and the paper I linked above – that it’s harder to write non-blocking code. And given the unclear benefits (if any), I would say that programming, testing and supporting the code is the main distinguishing feature. Whether you are going to be able to serve 10000 or 11000 concurrent users from a single machine doesn’t really matter. Hardware is cheap. (unless it’s 1000 vs 10000, of course).

But why is the non-blocking, asynchronous, event/message-driven programming paradigm harder? For me, at least, even after a year of writing in that paradigm, it’s still messier. First, it is way harder to trace the programming flow. With a synchronous code you would just tell your IDE to fetch the call hierarchy (or find the usage of a given method if your language is not IDE-friendly), and see where everything comes and goes. With events it’s not that trivial. Who constructs this message? Where is it sent to / who consumes it? How is the response obtained – via callback, via another message? When is the response message constructed and who actually consumes it? And no, that’s not “loose coupling”, because your code is still pretty logically (and compilation-wise) coupled, it’s just harder to read.

What about thread-safety – the event passing allegedly ensure that no contention, deadlocks, or race-conditions occur. Well, even that’s not necessarily true. You have to be very careful with callbacks (unless you really have one thread like in Node) and your “actor” state. Which piece of code is executed by which thread is important (in akka at least), and you can still have a shared state even though only a few threads do the work. And for the synchronous approach you just have to follow one simple rule – state does not belong in the code, period. No instance variables and you are safe, regardless of how many threads execute the same piece of code. The presentation above mentions also immutable and concurrent data structures that are inherently thread-safe and can be used in either of the paradigms. So in terms of concurrency, it’s pretty easy, from the perspective of the developer.

Testing complicated message-passing flows is a nightmare, really. And whereas test code is generally less readable than the production code, test code for a non-blocking application is, in my experience, much uglier. But that’s subjective again, I agree.

I wouldn’t like to finish this long and unfocused piece with “it depends”. I really think the synchronous/blocking programming model, with a thread pool and no message passing in the business logic is the simpler and more straightforward way to write code. And if, as pointed out by the presentation and paper linked about, it’s also faster – great. And when you really need asynchronously sending responses to clients – consider the non-blocking approach only for that part of the functionality. Ultimately, given similar performance, throughput, scalability (and ignoring the marketing buzz), I think one should choose the programming paradigm that is easier to write, read and test. Because it takes 30 minutes to start another server, but accidental complexity can burn weeks and months of programming effort. For me, the blocking/synchronous approach is the easier to write, read and test, but that isn’t necessarily universal. I would just not base my choice of a programming paradigm on vague claims about performance and scalability.


My Development Setup

February 23, 2015

I think I may have a pretty non-standard development setup (even for a Java-and-Scala developer). I use Windows, which I guess almost no “real” developer does. I’ve tried Linux (Ubuntu) a couple of times, but it ruins my productivity (or what’s left of it after checking all social networks).

But how do I manage to get anything done? How do I write scripts that I deploy on servers, how do I run non-Windows software, how do I manage to work on a project where all other developers use either Linux or Mac?

It’s actually quite simple. It’s Windows + Cygwin + VirtualBox with a Linux distro. For most of the things that a Java developer needs, Windows is just fine. IDEs and servlet containers run well, so no issue there. Some project automation is done with shell scripts, but whenever I need to execute them, Cygwin works pretty well. Same goes for project deployment scripts and the likes (and I generally prefer using a class with a main method rather than sed, awk, curl, etc, to test stuff). As for software that doesn’t run on Windows (e.g. Riak doesn’t have a Windows distribution), that goes on the VirtualBox. I always have a virtual machine running with the appropriate software installed and listening on some port, so that I can run any application locally.

No need to mention Git, as there is a git console for Windows, but also there’s SourceTree, which is a pretty neat UI for the day-to-day tasks. Newlines are automatically handled by git, and even when that’s not enough (or is not working, as cygwin needs the Linux endings), Notepad++ has a pretty easy EOL conversion.

What about viruses? Using Firefox with NoScript, combined with good internet habits, means I haven’t had a single virus, ever. Well, maybe I’ve had some very secret one that never manifested itself, who knows.

That may sound like an unnecessary complication – so many components just to achieve what a Linux installation would give out-of-the-box. Well, no. First, it takes 30 minutes to setup, and second, I wouldn’t go for Linux on a desktop. It’s just too unfriendly and you waste so much time fixing little things that usually go wrong. Like when intensive I/O gets your UI completely stuck, or when the wifi doesn’t work because of those-three-commands-you-have-to-execute-to-add-this-to-the-proper-config. In other words, I get the end-user robustness of Windows (and no, it doesn’t give BSOD anymore, that was true 10 years ago) combined with the tools of Linux.

With that I’m not saying that everyone should migrate to my setup tomorrow. I’m just pointing to a good alternative.


Do It Either Way, We’ll Refactor It Later

February 15, 2015

It often happens that a new piece of functionality is discussed within a team and different developers have a different preference over how it should be implemented. “But what if in the future…” is a typical argument, as well as “that way it’s going to be more extensible”.

Well, usually it doesn’t matter. One should rather focus on how to write it well, readable, documented and tested, rather than to try to predict future use-cases. You can’t in most cases anyway, so spending time and effort in discussions which is the perfect way to implement something is fruitless. I’m not advocating a ‘quick and dirty’ approach, but don’t try to predict the future.

Sometimes there is more than one approach to an implementation and you just can’t decide which one is better. Again, it doesn’t matter. Just build whichever you think would take less time, but don’t waste too much time in arguments.

Why? Because of refactoring (if you are using a statically-typed language, that is). Whenever a new use-case arises, or you spot a problem with the initial design, you can simply change it. It’s quite a regular scenario – something turns out to be imperfect – improve it.

Of course, in order to feel free in doing refactoring, one should have a lot of tests. So having tests is more important that implementing it right the first time.

There is one exception, however – public APIs. You cannot change those, at least not easily. If you have a public API, make sure you design it in the best way, especially consider if any given method should be exposed or not. You cannot remove it afterwards. In other words, whenever a 3rd party depends on your decisions, make sure you get it right the first time. It’s still not completely possible, but it’s worth the long discussions.

I generally refrain from long discussions about exact implementations and accept the opinions of others. Not because mine is wrong, but because it doesn’t matter. It’s better to save the overhead of guessing what will be needed in one year and just implement it. That, of course, has to be balanced with the overhead of refactoring – it’s not exactly free.


Time Zone Use Cases

February 9, 2015

Time zones are a tricky thing. Jon Skeet gives a pretty good overview of the horrors (at 20:10).

But besides the inherent complexity of having timezones and DSTs in the first places, and in hope that these things are abstracted from us by our tools (joda-time, noda-time, whatever), we need to use them properly. And I’m not talking about “configuring default timezone for a JVM” type of “properly”.

I’m talking about use-cases involving timezones. Most web applications have registered users. And many web applications need to display time. But even with a jQuery plugin like timeago that displays “5 hours ago” rather than an exact time, you still need to know the users’ timezones. How do you get it? It’s complicated.

Fortunately, there is this little library (jsTimezoneDetect) that does the job almost perfectly for you. How does it work is beyond the scope of this post, but you can check the code, and then immediately remember Jon Skeet’s talk.

Anyway, the first point is – always obtain your users’ time zone – you will need it. Obtain it on registration and store it (making it configurable in a ‘profile settings’ page), but also do that for unregistered users and store it in their session. Even detect it for registered users and use it for the active session instead of the default, because users tend to change timezones (traveling around with their portable device).

Okay, we’ve covered displaying times to users in their current timezone. Why do we have to store their default timezone then? Because in many cases we need to do something about our users even when they are not logged in. For example – send them regular email updates or even SMS.

I’m mentioning SMS specifically in the light of an airbnb bug that I reported 6 months ago and which is not fixed yet (I happen to like pointing out bugs of popular startups). So, I’m sleeping one night, having disabled my “silent sleep” app, and at 4 in the morning I get an SMS. From airbnb, saying my reservation in 3 days is approaching. Nice.

We all have these scheduled jobs that run every couple of hours and send notifications (via email or SMS) to our users. But if we do not respect their timezone, they will hate us. Not everyone is sending SMS, but some people even have sound for their email notifications. And even if users don’t get woken up during the night, usually marketing people analyze the perfect time to send marketing messages, reminders, newsletters, etc. E.g. “our main group of users is usually commuting to work between 8 and 9, so if we send them the newsletter at 7:40, they will read it”. And that goes to hell if your user is in UTC+3 rather than UTC.

The solution is now obvious – when doing scheduled batch operations with your users’ profiles, always respect their timezone. E.g. if you want to send a notification at a predefined time or time window, for each user check if currentTime.withTimeZone(user.getTimeZone()).equals(sendingTime)

I wouldn’t be writing these obvious things if I haven’t seen them ignored so many times. Time zones are hard (I even couldn’t spell them consistently throughout the post), but let’s try to apply some commons sense to them.


I Am Not Allowed to Post My Content on Reddit

January 23, 2015

I’ve once been ninja-banned (I can still post and comment, but nobody else sees it) on reddit (/r/programming, more precisely), and recently my submissions (from this blog) that got to the front page (of /r/programming) were removed by moderators. Without any warning, of course. After my 2nd message to them “why”, I finally got an answer – my posts are considered spam and self-promotion.

Now, I do agree that most of my submissions to /r/programming are to my own posts. And it’s true that it has happened a few times that when I realize that I’ve posted something in a very unpopular time (e.g. people in the US not woken up yet, and people in Europe are out for lunch), I delete it and post it later. And of all that being the case, I can hardly protest the removals and bans. And I take a note of not doing it.

And I do agree people using reddit for marketing is damaging. And one should not spam endlessly with their own content.

However, I don’t agree to applying the general rule without considering particular cases and without assessing the impact. Do I skew or burden the system with 2-3 submissions monthly? No. Should I post random content just in order to get my own articles below 10%? No. Is it spam, if I don’t even have ads on the blog, making it purely informative and a personal brand component (which doesn’t have much to do with reddit upvotes)? No, it’s not spam.

I do not consider my content outstanding in any way, but I think it’s noteworthy. And sometimes (roughly 50% of the posts) I share it. And then, again roughly 50% of the time, it gets to the front page. Probably because it’s interesting and/or discussion-worthy for the community.

Being accused of spam, I have to say I do participate in the discussions below my submissions, upvote and dowvote comments of others (unlike “spammers”). I also read /r/programming, and that’s where I find interesting /r/rprogramming-worthy articles (in addition to Hacker News and twitter). So, I guess I am supposed to change my habits and find other sources of interesting posts, so that I can post them to reddit, so that in turn I can post my thoughts that happen to be interesting to the community, 2-3 times a month.

And of course I’m not worried that “the community will lose my great posts”, and of course it is about being read by more people, for the sake of it. But how does that harm reddit?

So, now I’m not allowed to post to /r/programming. And I accept that. I’ll leave it at the discretion of people who read my posts and are simultaneously reddit users whether to do that. I’m just not sure how that policy helps /r/programming in the long run.


The Internet Is Pseudo-Decentralized

January 19, 2015

We mostly view the internet as something decentralized and resilient. And from practical point of view, it almost is. The Web is decentralized – websites reside on many different servers in many different networks and opening each site does not rely on a central server or authority. And there are all these cool peer-to-peer technologies like BitTorrent and Bitcoin that do not rely on any central server or authority.

Let’s take a look at these two examples more closely. And it may sound like common sense, but bear with me. If you don’t know the IP address of the server, you need to type its alias, i.e. the domain name. And then, in order to resolve that domain name, the whole DNS resolution procedure takes place. So, if all caches are empty, you must go to the root. The root is not a single server of course, but around 500. There are 13 logical root DNS servers (lucky number for the internet), operated by 12 different companies/institutions, most of them in the US. 500 servers for the whole internet is enough, because each intermediate DNS server on the way from your machine to the root have caches, and also because the root itself does not have all the names in a database – it only has the addresses of the top level domains’ name servers.

What about BitTorrent and Bitcoin? In newer, DHT-based BitTorrent clients you can even do full-text search in a decentralized way, only among your peers, without any need for a central coordinator (or tracker). In theory, even if all the cables under the Atlantic get destroyed, we will just have a fragmented distributed hash table, which will still work with all the peers inside. Because these Bit* technologies create the so-called overlay networks. They do not rely on the web, on DNS or anything else to function – each node (user) has a list of the IP addresses of its peers, so any node that is in the network, has all the information it needs to perform its operations – seed files, confirm transactions in the blockchain, etc. But there’s a caveat. How do you get to join the overlay network in the first place? DNS. Each BitTorrent and Bitcoin client has a list of domain names hardcoded in it, which are using round-robin DNS to configure multiple nodes to which each new node connects first. The nodes in the DNS record are already in the network, so they (sort-of) provide a list of peers to the newcomer, and only then he becomes a real member of the overlay network. So, even these fully-decentralized technologies rely on DNS, and so in turn rely on the root name servers. (Note: if DNS lookup fails, the Bit* clients have a small hardcoded list of IP addresses of nodes that are supposed to be always up.). And DNS is required even here, because (to my knowledge) there is no way for a machine to broadcast to the entire world “hey, I have this software running, if anyone else is running it, send me your IP, so that we can form a network” (and thank goodness) (though, probably asking your neighbouring IP range “hey, I’m running this software; if anyone else is, let me know, so that we can form a network” might work).

So, even though the internet is decentralized, practically all services ontop of it that we use (the web and overlay-network based software) do rely on a central service. Which fortunately is highly-available, spread across multiple companies and physical sites and rarely accessed directly. But that doesn’t make it really decentralized. It is a wonderful practical solution that is “good enough”, so that we can rely on DNS as de-facto decentralized, but will it be in case of extreme circumstances?

Here are three imaginary scenarios:

Imagine someone writes a virus, that slowly and invisibly infects all devices on the network. Then the virus disables all caching functionality, and suddenly billions of requests go to the root servers. The root servers will then fail, and the internet will be broken.

Imagine the US becomes “unstable”, the government seizes all US companies running the root servers (including ICANN) and starts blackmailing other countries to allow requests to the root servers (only 3 are outside the US).

Imagine you run a website, or even a distributed network, that governments really don’t want you to run. If it’s something that serious, for example – an alternative, global democracy that ignores countries and governments – then they can shut you down. Even if the TLD name server does resolve your domain, the root server can decide not to resolve the TLD, until it stops resolving your domain.

None of that is happening, because it doesn’t make sense to go to such lengths in order to just generate chaos (V and The Joker are fictional characters, right?). And probably it’s not happening anytime soon. So, practically, we’re safe. Theoretically, and especially if we have conspiracies stuck in our heads, we may have something to worry about.

Would there be a way out of these apocalyptic scenarios? I think existing overlay networks that are big enough can be used to “rebuild” the internet even if DNS is dead. But let’s not think about the details for now, as hopefully we won’t need them.