I Am Not Allowed to Post My Content on Reddit

January 23, 2015

I’ve once been ninja-banned (I can still post and comment, but nobody else sees it) on reddit (/r/programming, more precisely), and recently my submissions (from this blog) that got to the front page (of /r/programming) were removed by moderators. Without any warning, of course. After my 2nd message to them “why”, I finally got an answer – my posts are considered spam and self-promotion.

Now, I do agree that most of my submissions to /r/programming are to my own posts. And it’s true that it has happened a few times that when I realize that I’ve posted something in a very unpopular time (e.g. people in the US not woken up yet, and people in Europe are out for lunch), I delete it and post it later. And of all that being the case, I can hardly protest the removals and bans. And I take a note of not doing it.

And I do agree people using reddit for marketing is damaging. And one should not spam endlessly with their own content.

However, I don’t agree to applying the general rule without considering particular cases and without assessing the impact. Do I skew or burden the system with 2-3 submissions monthly? No. Should I post random content just in order to get my own articles below 10%? No. Is it spam, if I don’t even have ads on the blog, making it purely informative and a personal brand component (which doesn’t have much to do with reddit upvotes)? No, it’s not spam.

I do not consider my content outstanding in any way, but I think it’s noteworthy. And sometimes (roughly 50% of the posts) I share it. And then, again roughly 50% of the time, it gets to the front page. Probably because it’s interesting and/or discussion-worthy for the community.

Being accused of spam, I have to say I do participate in the discussions below my submissions, upvote and dowvote comments of others (unlike “spammers”). I also read /r/programming, and that’s where I find interesting /r/rprogramming-worthy articles (in addition to Hacker News and twitter). So, I guess I am supposed to change my habits and find other sources of interesting posts, so that I can post them to reddit, so that in turn I can post my thoughts that happen to be interesting to the community, 2-3 times a month.

And of course I’m not worried that “the community will lose my great posts”, and of course it is about being read by more people, for the sake of it. But how does that harm reddit?

So, now I’m not allowed to post to /r/programming. And I accept that. I’ll leave it at the discretion of people who read my posts and are simultaneously reddit users whether to do that. I’m just not sure how that policy helps /r/programming in the long run.

4

The Internet Is Pseudo-Decentralized

January 19, 2015

We mostly view the internet as something decentralized and resilient. And from practical point of view, it almost is. The Web is decentralized – websites reside on many different servers in many different networks and opening each site does not rely on a central server or authority. And there are all these cool peer-to-peer technologies like BitTorrent and Bitcoin that do not rely on any central server or authority.

Let’s take a look at these two examples more closely. And it may sound like common sense, but bear with me. If you don’t know the IP address of the server, you need to type its alias, i.e. the domain name. And then, in order to resolve that domain name, the whole DNS resolution procedure takes place. So, if all caches are empty, you must go to the root. The root is not a single server of course, but around 500. There are 13 logical root DNS servers (lucky number for the internet), operated by 12 different companies/institutions, most of them in the US. 500 servers for the whole internet is enough, because each intermediate DNS server on the way from your machine to the root have caches, and also because the root itself does not have all the names in a database – it only has the addresses of the top level domains’ name servers.

What about BitTorrent and Bitcoin? In newer, DHT-based BitTorrent clients you can even do full-text search in a decentralized way, only among your peers, without any need for a central coordinator (or tracker). In theory, even if all the cables under the Atlantic get destroyed, we will just have a fragmented distributed hash table, which will still work with all the peers inside. Because these Bit* technologies create the so-called overlay networks. They do not rely on the web, on DNS or anything else to function – each node (user) has a list of the IP addresses of its peers, so any node that is in the network, has all the information it needs to perform its operations – seed files, confirm transactions in the blockchain, etc. But there’s a caveat. How do you get to join the overlay network in the first place? DNS. Each BitTorrent and Bitcoin client has a list of domain names hardcoded in it, which are using round-robin DNS to configure multiple nodes to which each new node connects first. The nodes in the DNS record are already in the network, so they (sort-of) provide a list of peers to the newcomer, and only then he becomes a real member of the overlay network. So, even these fully-decentralized technologies rely on DNS, and so in turn rely on the root name servers. (Note: if DNS lookup fails, the Bit* clients have a small hardcoded list of IP addresses of nodes that are supposed to be always up.). And DNS is required even here, because (to my knowledge) there is no way for a machine to broadcast to the entire world “hey, I have this software running, if anyone else is running it, send me your IP, so that we can form a network” (and thank goodness) (though, probably asking your neighbouring IP range “hey, I’m running this software; if anyone else is, let me know, so that we can form a network” might work).

So, even though the internet is decentralized, practically all services ontop of it that we use (the web and overlay-network based software) do rely on a central service. Which fortunately is highly-available, spread across multiple companies and physical sites and rarely accessed directly. But that doesn’t make it really decentralized. It is a wonderful practical solution that is “good enough”, so that we can rely on DNS as de-facto decentralized, but will it be in case of extreme circumstances?

Here are three imaginary scenarios:

Imagine someone writes a virus, that slowly and invisibly infects all devices on the network. Then the virus disables all caching functionality, and suddenly billions of requests go to the root servers. The root servers will then fail, and the internet will be broken.

Imagine the US becomes “unstable”, the government seizes all US companies running the root servers (including ICANN) and starts blackmailing other countries to allow requests to the root servers (only 3 are outside the US).

Imagine you run a website, or even a distributed network, that governments really don’t want you to run. If it’s something that serious, for example – an alternative, global democracy that ignores countries and governments – then they can shut you down. Even if the TLD name server does resolve your domain, the root server can decide not to resolve the TLD, until it stops resolving your domain.

None of that is happening, because it doesn’t make sense to go to such lengths in order to just generate chaos (V and The Joker are fictional characters, right?). And probably it’s not happening anytime soon. So, practically, we’re safe. Theoretically, and especially if we have conspiracies stuck in our heads, we may have something to worry about.

Would there be a way out of these apocalyptic scenarios? I think existing overlay networks that are big enough can be used to “rebuild” the internet even if DNS is dead. But let’s not think about the details for now, as hopefully we won’t need them.

3

My Most Frequent Code Review Comment

January 14, 2015

The title implies one prerequisite – you must have code reviews for every piece of code written. For that, FogCreek recently published a list of best practices for code reviews. I will expand on one of the points:

Do comments exist and describe the intent of the code?

This is, in my experience, the problem I’m making review comments about most often. The code itself is good, well-written, tested, follows style guides, handles edge cases, and everything, but sometimes it lacks a description that would be useful to someone who is looking at a given method for the first time in months. So I often add something like:

Could you add a comment describing the intent of this code?

I’ve actually written about that before, and it’s common knowledge that you should add a comment “why” this code is there (rather than what it’s doing). And yet, even my own code sometimes lacks sufficient intent-describing comments.

That’s most probably because when writing the code, the intent is obvious to you. Only if some logic gets rather convoluted, you realize that it would be hard to figure out later. But normally, you consider your code straightforward (and you disagree with yourself when you look at that code in half a year). To correct that, you must make a self-code review after you are done, and ask the same questions that a reviewer would ask.

But whether it’s a preliminary self-code review or the actual one, it seems that the code review is the ideal moment to remind ourselves about adding comments describing why a given piece of code is there and what purpose in the big picture does it serve. So don’t spare review comments like “please add a description about the intent of this piece of code”.

2

The Hash Challenge

January 8, 2015

The facebook discussion about my previous blogpost went off-topic and resulted in an interesting challenge. One of the commenters (Stilgar) was not convinced that passwords hashed with MD5 (or SHA) are so easy to crack if there is salt. So he posted a challenge:

“I’ll post the hash of a “password” which is going to be a dictionary word with or without mixed case letters and up to 3 digits. There will be salt – 8 random symbols before the password. Whoever posts the right password first gets 100 euro (via PayPal). I accept that I may be wrong, but I would risk 100 euro. In the real world, most sites would sell out a user for 100 euro. If I lose, I’ll admit I’m not right [..]

salt: fe#ad746
MD5: CD7B1E790D877EE64613D7F0FD38932A
Code to generate it: https://dotnetfiddle.net/st0RfL

Bonus for 50 EUR more (another password, same salt)
salt: fe#ad746
SHA1: FE3463DC8B98D33C1DD8D823B0D49DCD991E6627

We must note that the salt is really small, which distorts the experiment a bit, but anyway, here’s what happened:

The original challenge was posted at 00:20

At 5:20 Rumen posted:

The password is: DeveloP9

my code: http://pastebin.com/enzgq1iz

So, here’s my PayPal: ….

At 6:15 Petar posted:

fe3463dc8b98d33c1dd8d823b0d49dcd991e6627:fe#ad746:Techno21

And the times reported for cracking the challenge on a desktop computer was 1-3 minutes.

So, Stilgar lost 150 euro (around 180 USD). But hopefully that’s a valuable story – that using the output of hashing algorithms for storing passwords is wrong. Because if anyone gets hold of the hash, he has the password. And that may be worth way more than 150 euro, especially due to the tendency of users to reuse passwords. If your sites stores password incorrectly, an attack may get the accounts of the same users on many other sites.

For precisely that reason last year I wrote this simple project to verify your site’s password storage mechanism and emphasized on the fact that bcrypt is the bare minimum for your website security.

4

Proposal for an E-Government Architecture

January 5, 2015

Having worked for a project that was part of the Bulgarian e-government in the past, I have a good overview of what and how can and should be implemented in order for the administration of a country to function without paper.

That said, I fully acknowledge the fact that the challenges for digitlizing the administration, and in that sense the “government” or in fact the whole country, are only 10% technological and 90% social, legal and administrative. And although I have some knowledge of the legal framework(s) and of administrative processes, I’m not an expert, so I won’t comment on that.

But I am a software engineering expert, so here is my proposal for solving the 10% of the problem. It is a 10-page architectural overview, with some implementation details included, specifying broadly the way systems in the government should be implemented and connected to each other.

The purpose of all that is to make communication between different branches of the administration automatic, so that nobody actually has to send regular mails, containing formal requests, which are due in 2 weeks, in order to provide a simple certificate to a citizen. In a simple example, if you are being hired, the company has to inform the income agency, which in turn must verify whether you have not been convicted, whether you are physically fit for the job and whether you don’t have outstanding liabilities (e.g. as a business owner of a bankrupted enterprise). In order to do that, it needs to get this information. In the worst case, it would require it from you – you go to the court and get a certificate that haven’t been convicted recently; you go to the medical center and get a certificate that there are no known serious diseases in your medical history, and go to the trade registry to get certified that you don’t have a bankrupted company. In a slightly better scenario the income administration would request all these on your behalf, by sending postal requests, waiting for 2 weeks and compiling all that when the response comes. In the perfect scenario, the software of the income administration will send automatic requests, given your personal identifier, and it will get automatic responses from the software of the court, a health registry and the trade registry.

The whole idea of the proposal is to rely on a distributed architecture, where each information system invokes directly the services it needs. It has taken into account that setup of the Estonian e-government as well, as a state-of-the-art. There are a few key points in my proposal:

  • most services are actually simply providing data as response to a lookup for a (primary key, data type). E.g. “I want to know whether citizen with ID=X has been convicted”, response is true/false, or “I want to know the current address of a company with company identifier=X”, response is address. This type of services can be used via a standard protocol, so that the consumer doesn’t actually care about where exactly the data is located. For full-featured services (that may involve manual steps in the providing administration), a full-featured WSDL (or REST) based web services are used, defined by the providing administration
  • there is a central infrastructure, but it does not act as an ESB. It just knows where a given data type is located, what services each administration exposes and which administrations can participate, defining access control lists. Additionally, it logs transactional data.
  • each request for citizens’ or businesses’ information is logged, and the respective entity is notified. That way no government official can secretly spy on citizens by requesting their data from other administrations. In other words, the point is that it is easy for the government to get every information about you that you have already provided to it (practicality), but you always know when that happens, so that inappropriate checks on you are detected as early as possible. Also, only approved consumers can see a given data type (e.g. not everyone in the government can see your medical record). And yes, the government already has all the data about you, so let’s make it practical, without sacrificing our privacy.
  • everything is encrypted and digitally signed, and the government has its own certification authority
  • administrations don’t need to support complicated deployments or implement their systems from scratch. All that is needed is an adapter layer, that can accept requests on a set of standard endpoints and make requests to such endpoints.
  • a base SDK in popular languages is to be provided together with the implementation of the central infrastructure, so that all aspects of the communication are already implemented, and administration information systems should only initialize the SDK client and use it in a developer-friendly way. Each administration, providing custom full-featured services is expected to extend the base SDK and publish that for others to use.
  • administrations that do not yet have the proper information systems can use an administrative portal to get access to the data they are allowed to, in order to accomplish their tasks (e.g. manually get the data through the same channel that their information system would use)

Feel free to provide feedback about the architecture proposal, I will be glad to accommodate use-cases that I haven’t considered.

1

Handling Edge Cases

December 26, 2014

The other day I decided to try Uber. I already had a registration, so, while connected to the Wi-Fi at the house I was in, I requested the car. It came quickly, so I went out of the house and got into the car. The driver (who didn’t speak proper English at all, although we were in New York) confirmed the address and we were on our way. A minute later he pointed to his phone and said (or meant to say, comprendo?) that the ride is over and the fee is only 8 dollars (rather than the estimated 35 I saw in the app). I could not open my Uber app, because I don’t have mobile data (mobile roaming in the US when coming from the EU is quite expensive). Then the receipt email I got confirmed that the ride was only 45 seconds (only two blocks away from the starting point). My dashboard does not show any cancellation, neither from my side, nor from the driver’s side, and my email to Uber’s support resulted in something like “we have no idea what happened, we are sorry”.

Since they have no idea what happened, I will be speculating at best, but I would assume that the problem was due to my disconnect from the internet (it can be that the driver did something malicious, and pretended not to speak English, but that is just another aspect of the same problem I’m going to describe). So, let’s talk about edge cases. (This post is not meant to be Uber-bashing, after all I’m not sure what exactly happened; it’s just that my Uber experience triggered it)

Handling edge cases might seem like the obvious thing to do, but (if my assumption above is right), a company with a 40 billion valuation didn’t get it right (in addition to the story above, a friend of mine shared a story about Uber and poor connectivity that resulted in inconsistent registration). And ironically, I’ve been working on edge cases with device-to-server connectivity for the past weeks, so I have a few tips.

If your application (a smartphone app or a rich web client) relies on communication with a server, it must hold a connection state. It should know when it’s “Connected” or “Disconnected”. This can be achieved in a couple of ways, one of which is to invoke a “ping” endpoint on the server every X number of seconds, and if a request or two subsequent requests fail, set the current state to “Disconnected”. Knowing what state are you in is key to adequate edge-case handling.

Then, if you have something to send to the server, you need a queue of pending operations. The command pattern comes handy here, but you can simply use a single-threaded executor service and a synchronization aid to block until connection is back. Then you proceed with executing the queued commands. The Gmail app for android is a very positive example of that. It works perfectly both online and offline, and synchronizes the contents without any issues upon getting reconnected.

One very important note is that your server should not rely on a 100% stable connection. And therefore disconnects should trigger any business logic. In the above example with Uber, it might be that upon 30 seconds of unresponsive client app, the server decides the ride is over (e.g. I am trying to trick it).

Another aspect is the poor connection quality – the user may be connected, but not all requests may succeed, or they may take a lot of time. Tweaking timeouts and having the ability to retry failed operations is key. You don’t want to assume there is no connection if a request fails once – retry that a couple of times (and then add it to the “pending” queue). At the same time, on the server, you should use some sort of transactions. E.g. in the case of Uber registration on a poor connection, my friend received an email confirmation for the registration, but actually his account was not created (and the activation link failed). Maybe the client made two requests, one of which failed, but the server assumed that if one of them went through, the other one also does.

Edge cases are of course not limited to connectivity ones, and my list of “tips” is not at all exhaustive. Edge cases also include, for example, attempts from users to trick the system, so these must also be accommodated in a sensible way – e.g. do not assume that the user tries to trick the system by default, but do give yourself the facilities to investigate that afterwards. Having adequate logging, both on the server and on the client is very important, so that after an unforeseen edge case happens, you can investigate, rather than reply “we have no idea” (those were not the exact words, but it practically meant that).

While handling edge cases, though, we must not forget our default use case. Optimize for the default use case, be sure that your “happy flow” makes users happy. But things break, and “unhappy flows” must not leave users unhappy.

Handling all edge cases might seem like cluttering your code. It will be filled with if-clasues, event handling, “retry pending” constructs, and what not. But that’s okay. It’s actually a sign of a mature product. As Spolsky has pointed out, all these ugly-looking piece of code are actually there to server a particular use-case that you cannot think of when starting from scratch.

Unfortunately, many of these edge cases cannot be tested automatically. So an extensive manual testing is needed to ensure that the application works well not only in a perfect environment.

Tired of reading obvious things? Well, go and implement them.

3

In Favour of Self-Signed Certificates?

December 18, 2014

Today I watched the Google I/O presentation about HTTPS everywhere and read a couple of articles, saying that Google is going to rank sites using HTTPS higher. Apart from that, SPDY has mandatory usage of TLS, and it’s very likely the same will be true for HTTP/2. Chromium proposes marking non-HTTPS sites as non-secure.

And that’s perfect. Except, it’s not very nice for small site owners. In the presentation above, the speakers say “it’s very easy” multiple times. And it is, you just have to follow a dozen checklists with a dozen items, run your site through a couple of tools and pay a CA 30 bucks per year. I have run a couple of personal sites over HTTPS (non-commercial, so using a free StatCom certificate), and I still shiver at the thought of setting up a certificate. You may say that’s because I’m an Ops newbie, but it’s just a tedious process.

But let’s say every site owner will have a webmaster on contract who will renew the certificate every year. What’s the point? The talk rightly points out three main aspects – data integrity, authentication and encryption. And they also rightly point out that no matter how mundane your site is, there is information about that that can be extracted based on your visit there (well, even with HTTPS the Host address is visible, so it doesn’t matter what torrents you downloaded, it is obvious you were visiting ThePirateBay).

But does it really matter if my blog is properly authenticating to the user? Does it matter if the website of the local hairdresser may suffer a man-in-the-middle attack with someone posing as the site? Arguably, not. If there is a determined attacker that wants to observe what recipes are you cooking right now, I bet he would find it easier to just install a keylogger.

Anyway, do we have any data of how many sites are actually just static websites, or blogs? How many websites don’t have anything more than a contact form (if at all)? 22% of newly registered domains in the U.S. are using WordPress. That doesn’t tell much, as you can build quite interactive sites with WordPress, but is probably an indication. My guess is that the majority if sites are simple content sites that you do not interact with, or interactions are limited to posting an anonymous comment. Do these sites need to go through the complexity and cost of providing an SSL certificate? Certification Authorities may be rejoicing already – forcing HTTPS means there will be an inelastic demand for certificates, and that means prices are not guaranteed to drop.

If HTTPS is forced upon every webmaster (which should be the case, and I firmly support that), we should have a way of free, effortless way to allow the majority of sites to comply. And the only option that comes to mind is self-signed certificates. They do not guarantee there is no man-in-the-middle, but they do allow encrypting the communication and making it impossible for a passive attacker to see what you are browsing or posting. Server software (apache, nginx, tomcat, etc.) can have a switch “use self-signed certificate”, and automatically generate and store the key pair on the server (single server, of course, as traffic is unlikely to be high for these kinds of sites).

Browsers must change, however. They should no longer report self-signed certificates as insecure. At least not until the user tries to POST data to the server (and especially if there is a password field on the page). Upon POSTing data to the server, the browser should warn the user that it cannot verify the authenticity of the certificate and he must proceed only if he thinks data is not sensitive. Or even passing any parameters (be it GET or POST) can trigger a warning. That won’t be sufficient, as one can issue a GET request for site.com/username/password or even embed an image or use javascript. That’s why the heuristics to detect and show a warning can include submitting forms, changing src and href with javascript, etc. Can that cover every possible case, and won’t it defeat the purpose? I don’t know.

Even small, content-only, CMS-based sites have admin panels, and that means the owner sends username and password. Doesn’t this invalidate the whole point made above? It would, if there wasn’t an easy fix – certificate pinning. Even now this approach is employed by mobile apps in order to skip the full certificate checks (including revocation). In short, the owner of the site can get the certificate generated by the webserver, import it in the browser (pin it), and be sure that the browser will warn him if someone tries to intercept his traffic. If he hasn’t imported the certificate, the browser would warn him upon submission of the login form, possibly with instructions what to do.

(When speaking about trust, I must mention PGP. I am not aware whether there is a way to use web-of-trust verification of the server certificate, instead of the CA verification, but it’s worth mentioning as a possible alternative)

So, are self-signed certificate for small, read-only websites, secure enough? Certainly not. But relying on a CA isn’t 100% secure either (breaches are common). And the balance between security and usability is hard.

I’m not saying my proposal in favour of self-signed certificates is the right way to go. But it’s food for thought. The exploits such an approach can bring to even properly secured sites (via MITM) are to be considered with utter seriousness.

Interactive sites, especially online shops, social networks, payment providers, obviously, must use a full-featured HTTPS CA certificate, and that’s only the bare minimum. But let’s not force that upon the website of the local bakery and the millions of sites like it. And let’s not penalize them for not purchasing a certificate.

P.S. EFF’s Let’s Encrypt looks a really promising alternative.

P.P.S See also in-session key negotiation

8

Algorithmic Music Composition [paper]

December 11, 2014

After I wrote my first post about computoser.com, many were interested in the code. Then I open-sourced it.

And now, to complete my contribution, I wrote a paper about my approach and findings. The paper is on Academia.edu and also on arxiv. I’d be happy to get honest peer reviews.

It’s not a great novelty, I agree, but think it’s an improvement over existing attempts and possibly an approach to build upon in the future.

3

Open Source for the Government [presentation]

December 9, 2014

A month ago I gave a talk at OpenFest (a Bulgarian conference for open technologies). The talk was in Bulgarian, but I’ve translated the slides, so here they are.

Currently, many governments orders custom software and companies implement it, but it’s usually of low quality, low applicability, or both. Software is often abandoned. And we don’t even know what bugs and holes are lurking in (I found a security hole in the former egov.bg portal that allowed me to extract all documents in the system, containing personal data and whatnot). And it’s all because it’s a black box.

In short, I propose to have all software ordered by our governments, open-source. I’m obviously not the first to have that idea (there are success stories in some countries, as you’ll see on the slides), but I think the idea is worth pursuing not only in my country, but in many other countries where the government still orders companies to build closed source software.

The process is simple – the company that builds the software (or makes a customization of an existing open-source software) works in a public SCM repo (git or mercurial) that everyone can trace. That way we can not only monitor the development process, but also have a more transparent view on public spending for software.

Here in Bulgaria the idea has already been embraced by some government representatives and is likely to gain traction in the next few years.

0

Static Typing Is Not for Type Checking

December 2, 2014

In his post “Strong typing vs strong testing” Bruce Eckel described the idea, that statically (or strongly) typed languages don’t give you much, because you should verify your programs with tests anyway, and those tests will check the types as well – no need for the compiler to do that (especially if it makes you less productive with the language).

While this looks like a very good point initially, I have some objections.

First, his terminology is not the popularly agreed one. This stackoverflow answer outlines the difference between statically-typed (types are checked at compile time) vs strongly-typed (no or few implicit type conversions). And to clarify this about the language used in the article – Python – this page tells us Python is dynamically and strongly typed language.

But let’s not nitpick about terminology. I have an objection to the claim that static typing simply gives you some additional tests, that you should write anyway.

In a project written in a dynamic language, can you see the callers of a method? Who calls the speak method in his example? You’ll do a search? Well, what if you have many methods with the same name (iterator(), calculate(), handle(), execute())? You would name them differently, maybe? And be sure that you never reuse a method name in the whole project? The ability to quickly navigate through the code of big project is one of the most important ones in terms of productivity. And it’s not that vim with nice plugins doesn’t allow you to navigate through classes and to search for methods – it’s just not possible to make it as precise in a dynamic language, as it can be done in a static one.

Then I want to know what I can call on a given object. To do API “discovery” while I write the code. How often, in a big project, you are absolutely sure which method you want to invoke on an object of a class you see for the first time? Go to that class? Oh, which one is it, since you only know it has the calculate() method (which is called in the current method)? Writing a test that validates whether you can invoke a given method or not is fine, but doesn’t give you the options to discover what are your options at that point – what method would do the job best for you. Here comes autocomplete with inline documentation. It’s not about saving keystrokes, it’s about knowing what is allowed at this point in the code. I guess constantly opening API documentation pages or other class definitions would work, but it’s tedious.

Then comes refactoring. Yes, you knew I’d bring that up. But that’s the single most important feature that the tests we write enable us to user. We have all our tests so that we can guarantee that when we change something, the code will continue to work correctly. And yet, only in a statically typed language it is possible to do full-featured refactoring. Adding arguments to a method, moving a method to another class, even renaming a method without collateral damage. And yes, there are multiple heuristics that can be employed to make refactoring somewhat possible in Ruby or Python (and JetBrains are trying), but by definition it cannot be that good. Does it matter? And even if it doesn’t happen automatically, tests will catch that, right? If you have 100% coverage, they will. But that doesn’t mean it will take less time to do the change. As opposed to a couple of keystrokes for a static language.

And what are those “mythical big projects” where all the features above are game-changers? Well, most projects with a lifespan of more than 6 months, in my experience.

So, no, static typing is not about the type checks. It’s about you being able to comprehend a big, unfamiliar (or forgotten) codebase faster and with higher level of certainty, to make your way through it and to change it safer and faster. Type checking comes as a handy bonus, though. I won’t employ the “statically typed languages have faster runtimes” argument. (and by all this I don’t mean to dismiss dynamically-typed languages, even though I very much prefer static and strong typing)

And then people may say “your fancy tools and IDEs try to compensate for language deficiencies”. Not at all – my fancy tools are build ontop of the language efficiencies. The tools would not exist if the language didn’t make it possible for them to exist. A language that allows powerful tools to be built for it is a powerful one, and that’s the strength of statically-typed languages, in my view.

25