Recently I’ve undertaken a personal project (computoser) that is a bit strange. I tried to write software that generates music at random. Good music, that is, because “random” usually generates noise. The idea, of course, is not at all new – there has been research on the topic and there is software that attempts to generate music. I generally group this software in three categories:
- software for helping composers – it’s the composers who handle most of the things, but the software can suggest some motifs, variations, etc. to help the composer
- programs that take functions or other mathematical concept and try to transform it to frequencies, based on some arbitrary mapping
- programs that generate music adhering to composition rules
Out of these three, the first one is very specialized and not suitable for a large audience, and the second is not producing results that a sane person would listen to (the mathematical/physics rules that these algorithms try to employ are already used at the lower level of music – e.g. note frequencies double each octave; additionally, the mapping between the functions/sequences to notes are often arbitrary, so even though the the Fibonacci numbers are “nice”, there is no definitive way of mapping them to music that is guaranteed to sound good). The third group can generate “listenable” music, but it needs to properly define the rules that composers adhere to when writing music. Some of these programs use fractals, other use pictures, text, or whatever to generate music. In fact, these are all different ways to seed the random.
So, I obviously I went for the third option, implemented the algorithm, which is obviously the hard part. There is more music to it than programming, so it would probably suffice to say that there is a lot of
random.nextInt() all over the place to pick different routes in the generation process, so that each piece may sound a bit different. And the choices include the note lengths, the scale, the interval between consecutive pitches (notes), maintaining a proper contour, and more. Later I realized I have used something similar to Markov chains, because there are probabilities for these decisions. But in addition to the algorithm, there was a technical challenge: how do I play the generated midi file in the browser?
- browsers don’t play midi. Flash doesn’t play midi
- generating an mp3 from midi is a CPU-heavy process and if 10 people open the page at the same time and trigger the music generation, the server dies (a micro EC2-instance, yes, but that wouldn’t scale even on a medium instance)
I considered a lot of options. The Web Audio browser API is a workable option, and with the help of Michael Deal’s js library one can indeed play midi in the browser. However, there are complications – you’d have to load the soundbank on the client, and the cross-browser support is not exactly guaranteed. Initially the library did not support multiple channels, but now it does, so it looks like a very viable option. It is always an option to just send the MIDI for download and let the user player with whatever player is installed on their machine, but that’s not good user experience.
So I went for a different approach – a scheduled job runs every couple of minutes and generates a new track, renders it to mp3 (via wav) and stores it. Then whenever a user comes, he gets served a newly generated track. If a lot of users come at the same time, this is recorded, and the scheduled job starts generating more tracks. If it can’t cope, then old tracks are served to the user, but such that are not listened by more than one or two other people.
The end result, Computoser, is now live, and I think it is capable of generating pretty nice music.
The UI is using Bootstrap (I’m not much of a designer) and the player is jPlayer, which plays the mp3. The whole thing runs on Amazon EC2 (free tier, for now) and stores the generated music in an S3 bucket. The programming stack is Java and Spring MVC, with MySQL and EhCache. Nothing fancy, but the focus was more on the music generation part and less on the architecture, and as I didn’t want to spend time learning new technologies, I picked the ones I’m familiar with. One of the reasons to pick Java was that there are nice high-level APIs for working with MIDI – jMusic (I’m using that one) and jFugue, which let you work with the proper terms (scales, chord, instruments, notes, rests, etc.) rather than with low-level midi instructions. Oh, and the track titles are generated by a tiny linguistic algorithm that tries to construct fairly proper English phrases based on a small dictionary and a couple of predefined structures (and it needs a lot more work).
P.S. Added to Github, paper with details linked in the Readme