Load-Testing Guidelines

September 18, 2014

Load-testing is not trivial. It’s often not just about downloading JMeter or Gatling, recording some scenarios and then running them. Well, it might be just that, but you are lucky if it is. And what may sound like “Captain Obvious speaking”, it’s good to be reminded of some things that can potentially waste time.

So, when you run the tests, eventually you will hit a bottleneck, and then you’ll have to figure out where it is. It can be:

  • client bottleneck – if your load-testing tool uses HttpURLConnection, the number of requests sent by the client is quite limited. You have to start from that and make sure enough requests are leaving your load-testing machine(s)
  • network bottlenecks – check if your outbound connection allows the desired number of requests to reach the server
  • server machine bottleneck – check the number of open files that your (most probably) linux server allows. For example, if the default is 1024, then you can have at most 1024 concurrent connections. So increase that (limits.conf)
  • application server bottleneck – if the thread pool that handles requests is too low, requests may be kept waiting. If some other tiny configuration switch (e.g. whether to use NIO, which is worth a separate article) has the wrong value, that may reduce performance. You’d have to be familiar with the performance-related configurations of your server.
  • database bottlenecks – check the CPU usage and response times of your database to see if it’s not the one slowing the requests. Misconfiguring your database, or having too small/few DB servers, can obviously be a bottleneck
  • application bottleneck – these you’d have to investigate yourself, possibly using some performance monitoring tool (but be careful when choosing one, as there are many “new and cool”, but unstable and useless ones). We can divide this type in two:
    • framework bottleneck – if a framework you are using has problems. This might be a web framework, a dependency injection framework, an actor system, an ORM, or even a JSON serialization tool
    • application code bottleneck – if you are misusing a tool/framework, have blocking code, or just wrote horrible code with unnecessarily high computational complexity

You’d have to constantly monitor the CPU, memory, network and disk I/O usage of the machines, in order to understand when you’ve hit the hardware bottleneck.

One important aspect is being able to bombard your servers with enough requests. It’s not unlikely that a single machine is insufficient, especially if you are a big company and your product is likely to attract a lot of customers at the start and/or making a request needs some processing power as well, e.g. for encryption. So you may need a cluster of machines to run your load tests. The tool you are using may not support that, so you may have to coordinate the cluster manually.

As a result of your load tests, you’d have to consider how long does it make sense to keep connections waiting, and when to reject them. That is controlled by connect timeout on the client and registration timeout (or pool borrow timeout) on the server. Also have that in mind when viewing the results – too slow response or rejected connection is practically the same thing – your server is not able to service the request.

If you are on AWS, there are some specifics. Leaving auto-scaling apart (which you should probably disable for at least some of the runs), you need to have in mind that the ELB needs warming up. Run the tests a couple of times to warm up the ELB (many requests will fail until it’s fine). Also, when using a load-balancer and long-lived connections are left open (or you use WebSocket, for example), the load balancer may leave connections from itself to the servers behind it open forever and reuse them when a new request for a long-lived connection comes.

Overall, load (performance) testing and analysis is not straightforward, there are many possible problems, but is something that you must do before release. Well, unless you don’t expect more than 100 users. And the next time I do that, I will use my own article for reference, to make sure I’m not missing something.

1

EasyCamera Now in Maven Central

September 11, 2014

Several months ago I created the EasyCamera project (GitHub). As there has been a lot of interest, I just released it to Maven central, so that you don’t need to checkout & build it yourself. The packaging is also changed to aar now.

I would appreciate all reported issues and feedback about the API. Let me just remind the usage:

EasyCamera camera = DefaultEasyCamera.open();
CameraActions actions = camera.startPreview(surface);
PictureCallback callback = new PictureCallback() {
    public void onPictureTaken(byte[] data, CameraActions actions) {
        // store picture
    }
}
actions.takePicture(Callbacks.create().withJpegCallback(callback))

I won’t paste the code that you would normally have to write, but you should follow 10 steps, and I believe the EasyCamera API is a bit friendlier, easier to discover and harder to misuse.

1

Musical Scale Generator

September 7, 2014

We all know the C-major scale: do-re-mi-fa-sol-la-ti-do. But what’s behind it? And how many other scales there are? It’s complicated. Let me do a brief introduction into the theory first, without trying to be precise or complete.

In use are more than a dozen scales, and the popular one in the western world are the major, minor (natural, harmonic), and the “old modes”: Dorian, Lydian, Locrian, etc. All these are heptatonic (7-tone) scales. There are also pentatonic (5-tone) scales, and also other scales like Turkish, Indian, Arabic. All of them share a common purpose: to constraint melodies in order to make them sound pleasant. The notes in each scale trigger a different level of consonance with each other, which in turn provides different “feel”. The predominant scales all fall within the so called chromatic scale, which consists of all the 12 note octave on a piano keyboard (counting both white and black keys).

How are the scales derived? There are two main aspects: the harmonic series and temperament. The harmonic series (closely related to the concept of an overtone) are derived from the physical behaviour of the musical instruments, and more precisely – oscillation (e.g. of a string). The harmonic (or overtone) series produce ever-increasing pitches, which are then transposed into a single octave (the pitch space between the fundamental frequency and 2 times that frequency). This is roughly how the chromatic scale is obtained. Then there is temperament – although the entirely physical explanation sounds a perfect way to link nature and music, in practice the thus obtained frequencies are not practical to play on musical instruments, and also yield some dissonances. That’s why musicians are tuning their instruments by changing the frequencies obtained from the harmonic series. There are multiple ways to do that, one of which is that 12-tone equal temperament, where an octave is divided in 12 parts, which are equal on a logarithmic scale (because pitch changes are perceived as the logarithm of their frequencies).

But what does that have to do with programming? Computers can generate an almost infinite amount of musical scales that follow the rules of the scales already proven to be good. Why limit ourselves to 7-tone scales out of 12 tones, when we can divide the octave into 24 parts and make a scale of 15 tones? In fact, some composers and instrument makers, the most notable being Harry Partch, have experimented with such an approach, and music has been written in such “new” scales (although not everyone would call it “pleasant”). But with computers we can test new scales in seconds, and write music in them (or let the computer write it) in minutes. In fact, I see this as one way for advancing the musical landscape with the help of computers (algorithmic composition aside).

That’s why I wrote a scale generator. It takes a few input parameters – the fundamental frequency, on which you want to base the scale (by default C=262.626); the size of the scale (by default 7); the size of the ‘chromatic scale’ out of which the scale will be drawn (by default 12); and the final parameter specifies whether to use equal temperament or not.

The process, in a few sentences: it starts by calculating the overtones (harmonics), skipping the 7th (for reasons I don’t fully understand). Then transposes all of them into the same octave (it does so, by calculating the ratio from a given harmonic to its tonic (the closest power-of-two multiple of the fundamental frequency), and then using that ratio calculates the frequency from the fundamental frequency itself. It does that until the “chromatic scale size” parameter value is reached. Then it finds the perfect interval (perfect fifth in case of heptatonic (diatonic) scale), i.e. the one with ratio 3/2. If equal temperament is enabled, the previous chromatic scale is replaced with an equal-tempered one. Then the algorithm makes a “circle” from the tones in the chromatic scale (the circle of fifths is one example), based on the perfect interval, and starting from the tone before the fundamental frequency, enumerates N number of tones, where N is the size of the scale. This is the newly formed scale. Note that starting from each note in the scale we just obtained (and continuing in the next octave when we run out of tones) would yield a completely different scale (this is the difference between C-major and a A-minor – they use the same notes)

Finally, my tool plays the generated scale (using low-level sound wave generation, which I copied from somewhere and is beyond the scope of this discussion) and also, using a basic form of my music composition algorithm, composes a melody in the given scale. It sounds terribly at first, because it’s not using any instrument, but it gives a good “picture” of the result. And the default arguments result in the familiar major scale being played.

Why is this interesting? Because hopefully music will evolve, and we will be able to find richer scales pleasant to listen to, giving composers even more material to work with.

3

Caveats of HttpURLConnection

September 5, 2014

Does this piece of code look ok to you?

HttpURLConnection connection = null;
try {
   connection = (HttpURLConnection) url.openConnection();
   try (InputStream in = url.getInputStream()) {
     return streamToString(in);
   }
} finally {
   if (connection != null) connection.disconnect();
}

Looks good – it opens a connection, reads from it, closes the input stream, releases the connection, and that’s it. But while running some performance tests, and trying to figure out a bottleneck issue, we found out that disconnect() is not as benign as it seems – when we stopped disconnecting our connections, there were twice as many outgoing connections. Here’s the javadoc:

Indicates that other requests to the server are unlikely in the near future. Calling disconnect() should not imply that this HttpURLConnection instance can be reused for other requests.

And on the class itslef:

Calling the disconnect() method may close the underlying socket if a persistent connection is otherwise idle at that time.

This is still unclear, but gives us a hint that there’s something more. After reading a couple of stackoverflow and java.net answers (1, 2, 3, 4) and also the android documentation of the same class, which is actually different from the Oracle implementation, it turns out that .disconnect() actually closes (or may close, in the case of android) the underlying socket.

Then we can find this bit of documentation (it is linked in the javadoc, but it’s not immediately obvious that it matters when calling disconnect), which gives us the whole picture:

The keep.alive property (default: true) indicates that sockets can be reused by subsequent requests. That works by leaving the connection to the server (which supports keep alive) open, and then the overhead of opening a socket is no longer needed. By default, up to 5 such sockets are reused (per destination). You can increase this pool size by setting the http.maxConnections property. However, after increasing that to 10, 20 and 50, there was no visible improvement in the number of outgoing requests.

However, when we switched from HttpURLConnection to apache http client, with a pooled connection manager, we had 3 times more outgoing connections per second. And that’s without fine-tuning it.

Load testing, i.e. bombarding a target server with as many requests as possible, sounds like a niche use-case. But in fact, if your application invokes a web service, either within your stack, or an external one, as part of each request, then you have the same problem – you will be able to make fewer requests per second to the target server, and consequently, respond to fewer requests per second to your users.

The advice here is: almost always prefer apache http client – it has a way better API and it seems way better performance, without the need to understand how exactly it functions underneath. But be careful of the same caveats there as well – check pool size and connection reuse. If using HttpURLConnection, do not disconnect your connections after you read their response, consider increasing the socket pool size, and be careful of related problems.

0

Open-Sourcing My Music Composition Algorithm

August 19, 2014

Less than two years ago I wrote about the first version of my algorithm for music composition. Since then computoser.com got some interest and the algorithm was incrementally improved.

Now, on my birthday, I decided it’s time to make it open-source. So it’s on GitHub.

It contains both the algorithm and the supporting code to run it on a website (written with spring and hibernate). The algorithm itself is in the com.music package, everything else is in subpackages, so it’s easy to identify it.

It isn’t a perfect piece of code, but I think it’s readable, if you happen to know some music theory. I am now preparing a paper to present my research (as some research is involved in the creation) as well as how the algorithm functions. Opening the code is part of the preparation for the paper – it will be noted there as a reference implementation.

The license is AGPL – as far as I know, that should not allow closed-source use of my algorithm on the server-side.

I don’t think making it open-source is such a significant step, but I hope it will somehow help algorithmic music composition advance further than it is today.

2

Get Rid of the URL Pollution

August 13, 2014

You want to copy the URL of a nice article/video/picture you’ve just opened and send it to friends in skype chats, whatsapp, other messengers or social networks. And you realize the URL looks like this:

http://somesite.com/artices/title-of-the-article?utm_campagin=fsafser454fasfdsaffffas&utm_bullshit=543fasdfafd534254543&somethingelse=uselessstuffffsafafafad&utm_source=foobar

What are these parameters that pollute the URL? The above example uses some of the Google Analytics parameters (utm*), but other analytics tools use the same approach. And probably other tools as well. How are these parameters useful? They tell Google Analytics (which is run with javascript) details about the current campaign, probably where the user is coming from, and other stuff I and especially users don’t really care about.

And that’s ugly. I myself always delete the meaningless parts of the URL, so that in the end people see only “http://somesite.com/artices/title-of-the-article”. But that’s me – a software engineer, who can distinguish the useless parts of the URL. Not many people can, and even fewer are bothered to cut parts of the URL, which results in looong and ugly URLs being pasted around. Why is that bad?

  • website owners have put effort in making their URLs pretty/ With “url pollution” that efforts goes to waste.
  • defeating the purpose of the parameters – when you copy-paste such a url, all the people that open it may be counted as, for example, coming from a specific AdWords campagin. Or from a source that’s actually wrong (because they got the URL in skype, for example, but utm_source is ‘facebook’)
  • lower likelihood of clicking on a hairy url with meaningless stuff in it (at least I find myself more hesitant)

If you have a website, what can you do about this URL pollution, without breaking your analytics tool? You can get rid of them with javascript:

    window.history.replaceState(null, null, 
        window.location.href.replace("utm_source=....", ""));

This won’t trigger fake analytics results (for GA, at least, as it requires manual work to trigger it after pushState). Now there are three questions: how to get the exact parameters, when to run the above code, and is it worth it?

You can get all parameters (as shown here) and then either remove some blacklisted ones (utm_source, utm_campagin, etc.), or remove all, unless your whitelisted parameters. If your application isn’t using GET parameters at all, that’s easy. If it is, then keeping the whitelist in sync would be tedious, so probably go for the blacklist.

When should you do that? A little after the page loads, and the analytics tool does its job. When exactly is that – I don’t know. Maybe on window.load, maybe you have to wait for a second and then remove the parameters. You’d have to experiment.

And is it worth it? I think yes. Less useless parameters, less noise, nicer, friendlier URLs (that’s why you spent time prettifying them, right?), and less incorrect analytics results due to copy-pasted long URLs.

And I have a request to Google and all other providers of similar tools – please cleanup your “mess” after you read it, so that we don’t have to do it ourselves.

19

Generating equals(..), hashCode() and toString()

August 10, 2014

You most probably need to override hashCode(), equals(..) and toString() – I won’t go into details when and why, but you need that (ok, just a reminder – always implement hashCode and equals together, and you most likely need to implement these methods if you are going to look up objects of a given class in a hashmap or an arraylist). And you have plenty of options to do it:

  • Manually implement the methods – that’s sort-of ok for toString() and quite impractical with hashCode() and equals(..). Unless you are pretty certain that you want a custom, well-considered hash function, then you should rely on another, more practical mechanism
  • Use the IDE – all IDEs can generate the three methods, asking you to specify the fields you want to base them on. The hash function is usually good enough, and the rest just saves you from the headache of writing boilerplate comparisons, ifs and elses. But when you add a field, you shouldn’t forget to regenerate the methods.
  • commons-lang – there’s EqualsBuilder, HashCodeBuilder and ToStringBuilder there, which help you write the methods quickly, either with manual append(field).append(field), or with reflection, e.g. reflectionEquals(..). Adding a field again requires modifications, and it’s easy to forget that.
  • guava – very similar to commons-lang, with all the pros and cons. Guava has Objects and MoreObjects, with helper functions for equals(..) and hashCode and a builder for toString() – you still have to manually add/compare each field you want to include.
  • project lombok – it plugs into the compiler and turns some annotations into actual implementations, sparing you writing the biolerplate code completely. For example, if you annotated the class with @EqualsAndHashCode, Lombok will generate the two methods with all the fields in the class (you can customize that). The other annotations are @ToString, @Value (for immutables), @Data (for value-objects). You just have to put a jar on your compile time classpath, and it should work.

Which of these should you use? I generally exclude the manual approach, as well as guava and commons-lang – they require too much manual work for a task that you shouldn’t need to care in 99% of the cases. The reflection option with commons-lang sounds interesting, but also sounds like performance overhead.

I’ve always used the IDE – the only downside of this is that you have to regenerate them. Sometimes you may forget and that may yield unexpected behaviour. But apart from that, it’s quick and robust approach.

Project lombok seems to eliminate the risk of forgetting to regenerate, but that sometimes has another side effect – you may not need to automatically include all new fields, and you can forget to exclude them. But my personal reluctance to use lombok is based on a sort-of a superstition – it does “black magic” by plugging into the compiler. It does work, but it you don’t know how exactly it manages to handle both eclipse compiler, javac, IntelliJ compiler; will it always work with maven, including your CI environment? Will it work through a major/minor compiler version upgrade? Obviously it does, and I have no rational argument against it. And it has some more useful features as well.

So, it’s up to you to pick either of the two approaches. But do not implement it manually, and I don’t think the helper functions/builders are that practical.

8

Suggestion for Spam Filters

August 4, 2014

One of the issues with spam is false positives. “Did you check your spam folder” is often a question to ask if your email is not received on the other end.

I’m not a machine learning expert and I’ve never made a spam filter, and I only know the naive Bayes approach. So this suggestion is not a machine-learning “breakthrough”. But from what I know about classification algorithms is that they usually provide a likelihood of one item being in one group or another. Some items are not identified as spam with absolute certainty – they are 51% likely to be spam, for example.

My suggestions is: for borderline items (lower certainty that they should be classified as spam), the spam filter should send emails to the sender indicating that his message was considered spam. A genuine sender will probably take additional steps, like sending another short email or calling/messaging the recipient (‘click here to confirm you are not spam’ won’t work, because it will easily be automated).

It’s rather a usability suggestion than a technical one, and I’m sure there are some issues that I’m missing. But I thought it’s at least worth sharing.

5

RabbitMQ in Multiple AWS Availability Zones

July 17, 2014

When working with AWS, in order to have a highly-available setup, once must have instances in more than one availability zone (AZ ≈ data center). If one AZ dies (which may happen), your application should continue serving requests.

It’s simple to setup your application nodes in multiple AZ (if they are properly written to be stateless), but it’s trickier for databases, message queues and everything that has state. So let’s see how to configure RabbitMQ. The first steps are not relevant only to RabbitMQ, but to any persistent data solution.

First (no matter whether using CloudFormation or manual setup), you must:

  • Have a VPC. It might be possible without a VPC, but I can’t guarnatee that, especially the DNS hostnames as discussed below
  • Declare private subnets (for each AZ)
  • Declare the RabbitMQ autoscaling group (recommended to have one) to span multiple AZs, using:
            "AvailabilityZones" : { 
              "Fn::GetAZs" : {
                "Ref": "AWS::Region"
              }
            }
            
  • Declare the RabbitMQ autoscaling group to span multiple subnets using the VPCZoneIdentifier property
  • Declare the LoadBalancer in front of your RabbitMQ nodes (that is the easiest way to ensure even distribution of load to your Rabbit cluster) to span all the subnets
  • Declare LoadBalancer to be "CrossZone": true

Then comes the specific RabbitMQ configuration. Generally, you have two options:

Clustering is not recommended in case of WAN, but the connection between availability zones can be viewed (maybe a bit optimistically) as a LAN. (This detailed post assumes otherwise, but this thread hints that using a cluster over multiple AZ is fine)

With federation, you declare your exchanges to send all messages they receive to another node’s exchange. This is pretty useful in a WAN, where network disconnects are common and speed is not so important. But it may still be applicable in a multi-AZ scenario, so it’s worth investigating. Here is an example, with exact commands to execute, of how to achieve that, using the federation plugin. The tricky part with federation is auto-scaling – whenever you need to add a new node, you should modify (some of) your existing nodes configuration in order to set the new node as their upstream. You may also need to allow other machines to connect as guest to rabbitmq ([{rabbit, [{loopback_users, []}]}] in your rabbitmq conf file), or find a way to configure a custom username/password pair for federation to work.

With clustering, it’s a bit different, and in fact simpler to setup. All you have to do is write a script to automatically join a cluster on startup. This might be a shell script or a python script using the AWS SDK. The main steps in such a script (which, yeah, frankly, isn’t that simple), are:

  • Find all running instances in the RabbitMQ autoscaling group (using the AWS API filtering options)
  • If this is the first node (the order is random and doesn’t matter), assume it’s the “seed” node for the cluster and all other nodes will connect to it
  • If this is not the first node, connect to the first node (using rabbitmqctl join_cluster rabbit@{node}), where {node} is the instance private DNS name (available through the SDK)
  • Stop RabbitMQ when doing all configurations, start it after your are done

In all cases (clustering or federation), RabbitMQ relies on domain names. The easiest way to make it work is to enable DNS hostnames in your VPC: "EnableDnsHostnames": true. There’s a little hack here, when it terms to joining a cluster – the AWS API may return the fully qualified domain name, which includes something like “.eu-west-1.compute.internal” in addition to the ip-xxx-xxx-xxx-xxx part. So when joining the RabbitMQ cluster, you should strip this suffix, otherwise it doesn’t work.

The end results should allow for a cluster, where if a node dies and another one is spawned (by the auto-scaling group), the cluster should function properly.

Comparing the two approaches with PerfTest yields better throughput for the clustering option – about 1/3 less messages were processed with federation, and also there was a bit higher latency. The tests should be executed from an application node, towards the RabbitMQ ELB (otherwise you are testing just one node). You can get PerfTest and execute it with something like that (where the amqp address is the DNS name of the RabbitMQ load balancer):

wget http://www.rabbitmq.com/releases/rabbitmq-java-client/v3.3.4/rabbitmq-java-client-bin-3.3.4.tar.gz
tar -xvf rabbitmq-java-client-bin-3.3.4.tar.gz
cd rabbitmq-java-client-bin-3.3.4
sudo sh runjava.sh com.rabbitmq.examples.PerfTest -x 10 -y 10 -z 10 -h amqp://internal-foo-RabbitMQEl-1GM6IW33O-1097824.eu-west-1.elb.amazonaws.com:5672

Which of the two approaches you are going to pick up depends on your particular case, but I would generally recommend the clustering option. A bit more performant and a bit easier to setup and to support in a cloud environment, with nodes spawning and dying often.

0

The Cloud Beyond the Buzzword [presentation]

July 14, 2014

The other day I gave a presentation about “The Cloud”. I talked about buzzwords, incompetence, classification, and most importantly – embracing failure.

Here are the slides (the talk was not in English). I didn’t have time to go into too much details, but I hope it’s a nice overview.

0