Managing CSS And Javascript In Production Is Hard

May 16, 2012

You develop your site, and you are going to deploy it to production. Assuming you have a lot of traffic, you’d need to optimize your assets (javascript and css). Some of these things are also valid for images, but there’s more to be said there, so I’ll focus on css and javascript. It’s way more complicated than you’d like. Here are the steps:

  • merge – all your javascript and css files should be merged into a single file so that the browser fires only one request. Frameworks sometimes have tags to do that, otherwise you have to merge them in your build and then use the merged version.
  • minify – before or after getting the files merged you need minify them. That is, get rid of all useless symbols, which make the files readable, but increase the file sizes. This normally happens when you build your application, though minification-on-the-fly-and-then-cache is also a good option
  • version/hash – normally your assets should be cached by browsers for a long time, so when you make a change you need a good way to make the browsers refresh them. That’s why you append a version (/styles/1/main.css) which you can configure in your app, or you compute a hash of the resource and use that as automatic version. (Of course, always set a long expiry header to assets, so that they are requested only once by each user)
  • gzip – after the files are merged and minified you should gzip them before sending them to the browser. This is handled by most servers and/or frameworks, so it sounds like an easy bit. But read on for some complications
  • CDN / asset server / server cache – that’s the hard part that messes things. You don’t want your assets to be loaded from disk, merged, minified and gzipped each time they are served, because this takes resources, and assets don’t change that often anyway. That’s why you put them in a CDN or use a custom asset server with some cache like Varnish in front. Ideally, you should be able to point the CDN to your server which does all the above dynamically. But often you should pre-generate the versioned, gzipped, merged and minified version of the asset and deploy it somewhere.

There’s another complication: development mode. Your application should support non-cached, dynamic going through the above steps. There exist many utilities to help you with that. RoR has asset pipeline, Java has jawr (which I don’t recommend, btw), others have their options as well. But there is one important thing that you should try to follow: always prefer dynamically generating the final form of the assets (by the site/framework), and avoid performing any of the above operations in the build process. If you put it in the build, it automatically becomes a lot more tedious – you need to ship two versions (gzipped and non-gzipped), merge the resources outside a web context (the build is web-agnostic), append the asset version during the build (if you don’t use the automatic versioning based on the hash). Then the packaged assets need to be somehow distributed to the CDN or you your asset server. And if you make it all dynamic, all you need is your framework to support an “asset url” property that is used in the webpage. How that works:

  • when your page is generated, the asset url is used to point to the asset server / CDN.
  • your application is still able to serve the assets dynamically, so that the CDN/asset server can pick them up

This means you should package the assets in your application, rather than outside of it. This is debatable, and some people prefer to package them separately and distribute them, but with all the steps given above, it becomes harder to manage. The last point here is how you deploy changes to the assets – is it together with the whole application, or just the css and js files? If assets are packaged separately, you can deploy them separately. Otherwise you’d have to deploy the whole app (or manually copy files). Choose which one fits you better. But have in mind that it’s not easy and you should put some thought into it early in the development process.

0

ORM Haters Don’t Get It

May 9, 2012

I’ve seen tons of articles and comments (especially comments) that tell us how bad, crappy and wrong is the concept of ORM (object-relational mapping). Here are the usual claims, and my comments to them:

  • “they are slow” – there is some overhead in mapping, but it is nothing serious. Chances are you will have much slower pieces of code.
  • “they generate bad queries which hurts performance” – first, it generates better queries than the regular developer would write, and second – it generates bad queries if you use bad mappings
  • “they deprive you of control” – you are free to execute native queries
  • “you don’t need them, plain SQL and XDBC is fine” – no, but I’ll discuss this in the next paragraph
  • “they force you to have getters and setters which is bad” – your entities are simple value objects, and having setters/getters is fine there. More on this below
  • “database upgrade is hard” – there are a lot of tools around the ORMs that make schema transition easy. Many ORMs have these tools built-in

But why do you need an ORM in the first place? Assume you decided not to use one. You write your query and get the result back, in the form of a ResultSet (or whatever it looks like in the language you use). There you can access each column by its name. The result is a type unsafe map-like structure. But the rest of your system requires objects – your front-end components take objects, your service methods need objects as parameters, etc. These objects are simple value-objects, and exposing their state via getters is nothing wrong. They don’t have any logic that operates on their state, they are just used to transfer that state. If you are using a statically-typed language, you are most likely using objects rather than type-unsafe structures around your code, not to mention that these structures are database-access interfaces, and you wouldn’t have them in your front-end code. So then a brilliant idea comes to your mind – “I will create a value object and transfer everything from the result set to it. Now I have the data in an object, and I don’t need database-access specific interfaces to pass around in my code”. That’s a great step. But soon you realize that this is a repetitive task – you are creating a new object and manually, field by field, transferring the result from your SQL query to that object. And you devise some clever reflection utility that reads the object fields, assumes you have the same column names in the DB, reads the result set and populates the object. Well, guess what – ORMs have been doing the same thing for years and years now. I bet theirs are better and work in many scenarios that you don’t suspect you’ll need. (And I will just scratch the surface of how odd is the process of maintaining native queries – some put them in one huge text file (ugly), others put them inline (how can the DBAs optimize them now?))

To summarize the previous paragraph – you will create some sort of ORM in your project, but yours will suck more than anything out there, and you won’t admit it’s ORM.

This is a good place to mention an utility called commons-dbutils (Java). It is a simple tool to map database results to objects that covers the basic cases. It is not an ORM, but it does what an ORM does – maps the database to your objects. But there’s something missing in the basic column-to-field mapper, and that’s foreign keys and joins. With an ORM you can get the User’s address in an Address field even though a JOIN would be required to fetch it. That’s both a strength and a major weakness of ORMs. The *ToOne mappings are generally safe. But *ToMany collections can be very tricky, and they are very often misused. This is partly the fault of ORMs as they don’t warn you in any way about the consequences of mapping a collection of, say, all orders belonging to a company. You will never and must never need to access that collection, but you can map it. This is an argument I’ve never heard from ORM haters, because they didn’t get to this point.

So, are ORMs basically dbutils plus the evil and risky collection mapping? No, it gives you many extras, that you need. Dialects – you write your code in a database-agnostic way, and although you are probably not going to change your initially selected database vendor, it is much easier to use any database without every developer learning the culprits if its syntax. I’ve worked with MSSQL and Oracle, and I barely felt the pain in working with them. Another very, very important thing is caching. Would you execute the same query twice? I guess no, but if it happens to be in two separate methods invoked by a third method, it might be hard to catch, or hard to avoid. Here comes the session caching, and it saves you all duplicated queries to get some row (object) from the database. There is one more criticism to ORMs here – the session management is too complicated. I have mainly used JPA, so I can’t tell about others, but it is really tricky to get the session management right. It is all for very good reasons (the aforementioned cache, transaction management, lazy mappings, etc.), but it is still too complicated. You would need at least one person on the team that has a lot of experience with a particular ORM to set it up right.

But there’s also the 2nd level cache, which is significantly more important. This sort of thing is what allows services like facebook and twitter to exist – you stuff your rarely-changing data in (distributed) memory and instead of querying the database every time, you get the object from memory, which is many times faster. Why is this related to ORMs? Because the caching solution can usually be plugged into the ORM and you can store the very same objects that the ORM generated, in memory. This way caching becomes completely transparent to your database-access code, which keeps it simple and yet performant.

So, to summarize – ORMs are doing what you would need to do anyway, but it is almost certain that a framework that’s been around for 10 years is better than your homegrown mapper, and they are providing a lot of necessary and important extras on top of their core functionality. They also have two weak points (they both practically say “you need to know what you are doing”):

  • they are easy to misuse, which can lead to fetching huge, unnecessary results from the database. You can very easily create a crappy mapping which can slow down your application. Of course, it is your responsibility to have a good mapping, but ORMs don’t really give you a hand there
  • their session management is complicated, and although it is for very good reasons, it may require a very experienced person on the team to set things up properly

I’ve never seen these two being used as arguments against ORMs, whereas the wrong ones in the beginning of this article are used a lot, which leads me to believe that people raging against ORMs rarely know what they are talking about.

14

Javascript-off Unfriendly, And Proud About It

April 30, 2012

Your site should work even with javascript turned off, they say. I strongly disagree. First, it is a lot of effort to make a function-heavy site work without javascript. You’ve build the thing to work with ajax, cool controls and lots of dialogs. And you should build an entirely new version for those 2 percent that don’t have javascript? The effort just isn’t worth it. Not even for facebook, youtube and twitter – neither of them works without javascript.

So, when I develop a web app that fails to work properly without javascript (like welshare), I’m very much aware of that, and it’s done on purpose – to save unnecessary effort, and not to provide a crappy version of the UI. Of course, completely ignoring the fact that javascript may be turned off would be too one-sided. So:

  • make the basic home page and the signup page be workable without javascript. Some users (like me) block scripts with AddOns, for the sake of security. If your homepage is blank when my javascript is off, I’d be hesitant to enable scripts for it. But if I see it’s a trustworthy service, I may go and actually signup. Of course, ajax validation is a must nowadays, but you should have server-side validation anyway, so make your signup form work.
  • display a message “This site requires JavaScript to function properly”. That’s what many sites (including youtube, stackoverflow, and my aforementioned welshare) do. It’s easily handled by the <noscript> tag.
  • content-heavy websites are generally less reliant on javascript. They may use it for paging or comments but it’s not their main functionality, it’s just an “extra”. Such minor features can be supported, it costs you close to nothing (you’d have your 2nd page available for SEO reasons anyway)

Apart from that, the vast majority of services that rely heavily on javascript (because they are more applications than sites) should not bother making a javascript-off version.

5

Runtime Classpath vs Compile-Time Classpath

April 25, 2012

This should really be a simple distinction, but I’ve been answering a slew of similar questions on Stackoverflow, and often people misunderstand the matter.

So, what is a classpath? A set of all the classes (and jars with classes) that are required by your application. But there are two, or actually three distinct classpaths:

  • compile-time classpath. Contains the classes that you’ve added in your IDE (assuming you use an IDE) in order to compile your code. In other words, this is the classpath passed to “javac” (though you may be using another compiler).
  • runtime classpath. Contains the classes that are used when your application is running. That’s the classpath passed to the “java” executable. In the case of web apps this is your /lib folder, plus any other jars provided by the application server/servlet container
  • test classpath – this is also a sort of runtime classpath, but it is used when you run tests. Tests do not run inside your application server/servlet container, so their classpath is a bit different

Maven defines dependency scopes that are really useful for explaining the differences between the different types of classpaths. Read the short description of each scope.

Many people assume that if they successfully compiled the application with a given jar file present, it means that the application will run fine. But it doesn’t – you need the same jars that you used to compile your application to be present on your runtime classpath as well. Well, not necessarily all of them, and not necessarily only them. A few examples:

  • you compile the code with a given library on the compile-time classpath, but forget to add it to the runtime classpath. The JVM throws NoClasDefFoundError, which means that a class is missing, which was present when the code was compiled. This error is a clear sign that you are missing a jar file on your runtime classpath that you have on your compile-time classpath. It is also possible that a jar you depend on in turn depends on a jar that you don’t have anywhere. That’s why libraries (must) have their dependencies declared, so that you know which jars to put on your runtime classpath
  • containers (servlet containers, application servers) have some libraries built-in. Normally you can’t override the built-in dependencies, and even when you can, it requires additional configuration. So, for example, you use Tomcat, which provides the servlet-api.jar. You compile your application with the servlet-api.jar on your compile-time classpath, so that you can use HttpServletRequest in your classes, but do not include it in your WEB-INF/lib folder, because tomcat will put its own jar in the runtime classpath. If you duplicate the dependency, you may get bizarre results, as classloaders get confused.
  • a framework you are using (let’s say spring-mvc) relies on another library to do JSON serialization (usually Jackson). You don’t actually need Jackson on your compile-time classpath, because you are not referring to any of its classes or even spring classes that refer to them. But spring needs Jackson internally, so the jackson jar must be in WEB-INF/lib (runtime classpath) for JSON serialization to work.

The cases might be complicated even further, when you consider compile-time constants and version mismatches, but the general point is this: the classpaths that you use for compiling and for running the application are different, and you should be aware of that.

2

Keep As Much Stuff As Possible In The Application Itself

April 22, 2012

There’s a lot of Ops work to every project. Setting up server machines, and clusters of them, managing the cloud instances, setting up the application servers, HAProxy, load balancers, database clusters, message queues, search engine, DNS, alerts, and whatnot. That’s why the Devops movement is popular – there’s a lot more happening outside the application that is vital to its success. But unix/linux is tedious. I hate it, to be honest. Shell script is ugly and I would rather invent a new language and write a compiler for it, that write a shell script. I know many “hackers” will gasp at this statement, but let’s face it – it should be used only as a really last resort, because it will most likely stay out of the application’s repository, it is not developer friendly, and it’s ugly (yes, you can version it, you can write it with good comments, and still…)

But enough for my hate for shell scripting (and command-line executed perl scripts for that matter). That’s not the only thing that should be kept to minimum. (Btw, this is the ‘whining’ paragraph’, you can probably skip it). The “Getting Started” guide of all databases, message queues, search engines, servers, etc. says “easy to install”. Sure, you just apt-get install it, then go to /usr/lib/foo/bar and change a configuration file, then give permissions to a newly-created user that runs it, oh, and you customize the shell-script to do something, and you’re there. Oh, and /usr/lib/foo/bar – that’s different depending on how you install it and who has installed it. I’ve seen tomcat installed in at least 5 different ways. One time all of its folders (bin, lib, conf, webapps, logs, temp) were in a completely different location on the server. And of course somebody decided to use the built-in connection pool, so the configuration has to be done in the servlet container itself. Use the defaults. Put that application server there and leave it alone. But we need a message queue. And a NoSQL database in addition to MySQL. And our architects say “no, this should not be run in embedded mode, it will couple the components”. So a whole new slew of configurations and installations for stuff that can very easily be run inside our main application virtual machine/process. And when you think the external variables are just too many – then comes URL rewriting. “Yes, that’s easy, we will just add another rewrite rule”. 6 months later some unlucky developer will be browsing through the code wondering for hours why the hell this doesn’t open. And then he finally realizes it is outside the application, opens the apache configuration file, and he sees wicked signs all over.

To summarize the previous paragraph – there’s just too much to do on the operations side, and it is (obviously) not programming. Ops people should be really strict about versioning configuration and even versioning whole environment setups (Amazon’s cloud gives a nice option to make snapshots and then deploy them on new instances). But then, when somethings “doesn’t work”, it’s back to the developers to find the problem in the code. And it’s just not there.

That’s why I have always strived to keep as much stuff as possible in the application itself. NoSQL store? Embedded, please. Message queue? Again. URL rewrites – your web framework does that. Application server configurations? None, if possible, you can do them per-application. Modifications of the application server startup script? No, thanks. Caching? It’s in-memory anyway, why would you need a separate process. Every external configuration needed goes to a single file that resides outside the application, and Ops (or devs, or devops) can change that configuration. No more hidden stones to find in /usr/appconf, apache or whatever. Consolidate as much as possible in the piece that you are familiar and experienced with – the code.

Obviously, not everything can be there. Some databases you can’t run embedded, or you really want to have separate machines. You need a load balancer, and it has to be in front of the application. You need to pass initialization parameters for the virtual machine / process, in the startup script. But stick to the bare minimum. If you need ti make something transparent to the application, do it with a layer of code, not with scripts and configurations. I don’t know if that aligns somehow with the devops philosophy, because it is more “dev” and less “ops”, but it actually allows developers to do the ops part, because it is kept down to a minimum. And it does not involve ugly scripting languages and two-line long shell commands.

I know I sound like a big *nix noob. And I truly am. But as most of these hacks can be put up in the application and be more predictable and easy to read and maintain – I prefer to stay that way. If it is not possible – let them be outside it, but really version them, even in the same repository as the code, and document them.

The main purpose of all that is to improve maintainability and manageability. You have a lot of tools, infrastructure and processes around your code, so make use of them for as much as possible.

1

Replacing a JSON Message Converter With MessagePack

April 17, 2012

You may be using JSON to transfer data (we were using it in our message queue). While this is good, it has the only benefit of being human-readable. If you don’t care about readability, you’d probably want to use a more efficient serialization mechanism. Multiple options exist: protobuf, MessagePack, protostuff, java serialization. The easiest of them to use is java serialization, but it is less efficient (with both memory and time) than the other solutions. There are some benchmarks that will help you choose the most efficient solution, but if you want it to be easy and almost drop-in replacement to your JSON solution, MessagePack might be the best option.

I made a simple test to compare the JSON output to the MessagePack output in terms of size: 2300 vs 150 bytes for a simple message. Pretty good reduction, and if the messages are a lot, it’s a must to optimize.

However, you need to register all classes in the message pack. There are two options:

  • use @Message on all the objects in the serialized graph. This is a bit tedious, especially if you already have a lot of classes that are transferred. You have to go through the whole graph
  • you can manually register all classes with the mesagpack. Again tedious, because you also have to register all classes that the message class contains as a field (recursively)

That’s why I wrote the following code to loop all our message classes, and register them with the message pack on startup. It partly relies on spring classes, but if you are not using spring, you can replace them:

    private MessagePack serializer = new MessagePack();
	private ClassMapper classMapper = new DefaultClassMapper();

	@PostConstruct
	public void init() {
		// we need to find all messages, and register their classes, and also all their fields' recursively
		ClassPathScanningCandidateComponentProvider provider = new ClassPathScanningCandidateComponentProvider(false);
		Set<BeanDefinition> classes = provider.findCandidateComponents("com.foo.bar.messages");

                // hacking MessagePack to allow Set handling
		Field fld = ReflectionUtils.findField(MessagePack.class, "registry");
		ReflectionUtils.makeAccessible(fld);
		TemplateRegistry registry = (TemplateRegistry) ReflectionUtils.getField(fld, serializer);
		registry.register(Set.class, new SetTemplate(new AnyTemplate(registry)));
		registry.registerGeneric(Set.class, new GenericCollectionTemplate(registry, SetTemplate.class));

		try {
			for (BeanDefinition def : classes) {
				Class<?> clazz = Class.forName(def.getBeanClassName());
				registerHierarcy(clazz, serializer, Sets.<Class<?>>newHashSet());
			}
		} catch (ClassNotFoundException e) {
			throw new IllegalStateException(e);
		}
	}

	private void registerHierarcy(Class<?> clazz, MessagePack serializer, Set<Class<?>> handledClasses) {
		if (!isEligibleForRegistration(clazz)) {
			return;
		}
		Class<?> currentClass = clazz;
		while (currentClass != null && !currentClass.isEnum() && currentClass != Object.class) {
			for (Field field : currentClass.getDeclaredFields()) {
				registerHierarcy(field.getType(), serializer, handledClasses);

				// type parameters
				Type type = field.getGenericType();
				if (type instanceof ParameterizedType) {
					for (Type typeParam : ((ParameterizedType) type).getActualTypeArguments()) {
						// avoid circular generics references, resulting in stackoverflow
						Class<?> typeParamClass = (Class<?>) typeParam;
						if (!handledClasses.contains(typeParamClass)) {
							handledClasses.add(typeParamClass);
							registerHierarcy(typeParamClass, serializer, handledClasses);
						}
					}
				}
			}
			currentClass = currentClass.getSuperclass();
		}

		try {
			serializer.register(clazz);
		} catch (Exception ex) {
			logger.warn("Problem registering class " + clazz, ex.getMessage());
		}
	}

	private boolean isEligibleForRegistration(Class<?> clazz) {
		return !(clazz.isAnnotationPresent(Entity.class) || clazz == Class.class || Type.class.isAssignableFrom(clazz) || clazz.isInterface() || clazz.isArray() || ClassUtils.isPrimitiveOrWrapper(clazz) || clazz == String.class || clazz == Date.class || clazz == Object.class);
	}
0

Programming Puns

April 5, 2012

Though not strictly puns, I tried to make them sort-of funny MHR2TUBPG2NP :

- What are you doing on this bench with a bank slip and a marker?
- Benchmarking our transactions.

What’s “on” on Earth? The air conditional.

What do communists and functional programmers have in common? They hate classes.

“You shall not pass by reference”, said Gandalf to James Gosling.

I’ll create a programming language named obl. That way, when using a for-loop, I will be obl-iterating.

Why are “i”, “j” and “k” the most used letters for loop variables? Ask Dijkstra.

What’s common about basements and maven repositories? They hold jars.

I’ve written a movie about insects. Here’s the ant script.

- What are you doing with this bucket of paint at the construction site?
- Making the build green.

Write string concatenation in PHP on the dotted line.

A garbage collector joined Men in Black to help them with erasing memory.

I asked the Oracle what is my future, and it responded with ORA-27102: out of memory

6

Code Reviews Do Not Guarantee A Good Product

March 29, 2012

If you have a decent development process, you most probably have code reviews. But it appears many people (especially management) believe that code reviews make sure the end product is good, and bug-free. This has nothing to do with reality, as you can probably guess. The purpose of code reviews is to:

  • make sure coding style is consistent throughout the application
  • basic things are done properly. For example – don’t use Strings for numbers, don’t synchronize access to singletons, use StringBuilder inside loops, etc.
  • no code duplication exists
  • layer boundaries are preserved – no database access in the view layer, no UI code in the service layer, etc.

By doing that, code reviews try to minimize the possibility of generating a lot of technical debt. That debt in turn may lead to problematic development of new features and maintenance. But preventing this does not guarantee the product will be good. It certainly does not mean that it will be as if the code was written by the reviewer. The code review cannot (and should not) catch problems with the business logic and the program flow. Exceptions flying all over is something not easily detectable by the reviewer. Depending on the structure this should be caught by QA, the developers themselves or in most cases – both.

So, when management asks you “how come the product is crap if we had code reviews?”, you can say “how come my car breaks all the time if all parts were inspected when built?”.

1

My Problem With Your Interviews

March 21, 2012

This article comes right after Facebook rejected me after 3 phone interviews, but it is not going to be a hate-post. In fact, I’ve been planning to write it for a couple of months. But now onto the topic: tech companies (Google, Facebook, VMWare, at least, but certainly many more) are all trying to find the best technical talent. (So they contacted me and asked if I’m interested in “exploring opportunities” with them). But how do they do that?

The typical interview (be it a phone screen, or an onsite interview) consists of solving a problem. Some call these problems “puzzles”. They are usually non-real world problems that aim to verify your algorithmic skills and your computer science knowledge. The simple ones include recursion, binary search, basic data structures (linked list, hasthable, trees). The more complex ones require red-black trees, Dijkstra, knowledge of NP-completeness, etc. If you are on the phone, you write the code in a shared document. If onsite – you write it on a whiteboard. So, these puzzles should verify your computer science and algorithm skills. But let’s step back a little and see the picture from another angle.

  • what you do on these interviews is something you never, ever do in real life: you write code without using any compiler or debugger. You do that in a limited time, with people watching you / waiting for you on the line. But let’s put that aside for now. Let’s assume that writing code without being able to run it is fine for interview purposes.
  • the skills that these puzzles are testing are skills that the majority of developers have never needed. Most people are writing business software, and it does not require red-black trees. What was the last time you used recursion in your business software? So the last time you’ve done anything like that is in college. And many of these problems are really simple if you are a freshman, you did them as a homework just the other day. But then it becomes a bit more tedious to write even things as simple as a binary search. Because you just didn’t do it yesterday. Of course you will be able to do it, but for a little more time, so that you can remember, and for sure by using a compiler. (By the way, the puzzles at facebook were really simple. I didn’t do them perfectly though, which is my bad, perhaps due to interview anxiety or because I just haven’t done anything like that for the past 3 years)
  • the skills tested are rarely what you will do in your daily work anyway. Even in these cool companies like Google and Facebook, there are still pretty regular projects that require coding to APIs, supporting existing code, etc. I don’t think you will be allowed to tweak the search engine in your first week, no matter how great you did on the interview
  • interview preparation is suggested and actually required before these interviews. Exactly as if it is a college exam. But that’s dumb – you don’t want people to study to match your artificial interview criteria. You want them to be…good programmers.
  • focusing on these computer science skills means these companies will probably miss good engineers that are simply not so interested in the low-level details.

Btw, here’s an excerpt from my feedback after my first phone interview with Facebook:

On the other hand, I don’t think having 1st year CS homework problems on interviews for senior developers is a great idea. One thing is – most people (including me) haven’t done this since university, and it looks a bit like trivia questions rather than actual programming.

The problems outlined above are what I don’t like about these types of interviews. And that’s obviously because I don’t like solving these sorts of puzzles. I just don’t like them, they are not interesting for me. You could argue that in addition to your daily job, you can participate in programming competitions (like TopCoder) in order to keep your algorithm skills trained. I’ll give a short story about my high-school years. There were two student competitions – one was about exactly these types of programming puzzles – you are given a number of them for a fixed period of time, and you must submit a solution that covers as many of the pre-defined (but unknown to you) test cases as possible. The other competition was about creating a piece of software at home, and then presenting it in front of a jury. I was a top-competitor in the latter, and sucked quite a lot in the former. Why? Because I hated solving useless, unrealistic problems for the sake of solving them. I liked building software instead. I would probably be good at solving puzzles if I liked them. I just don’t. And these are not two levels of skill – one who can solve complex algorithmic puzzles (superior), and one who can’t, therefore he builds some piece of software (inferior). These are two different types of skills. And both of them are very useful in the process of creating good software. One writes the low-level stuff, the other one designs the APIs, the architecture, the deployment scheme, manages abstraction in the code. So, to get back to the question what I do now in addition to my daily job – I build stuff. I’ve worked on a few personal projects that I enjoyed. Way more than I would’ve enjoyed a TopCoder competition.

Unfortunately these cool companies are hiring primarily the TopCoder-type of people. Which probably works for them, because they have a lot of candidates and they can afford a lot of “false-negatives”. But many smaller companies adopt these interview practices, and so they fail to get the best technical talent. The best article on software engineer interviewing I’ve read appeared just a few weeks ago. Jeff Atwood advised how to hire a programmer, and I completely support his approach.

And my problem with interviews is that they don’t actually verify if you can do real programming work. And obviously my problem is that I don’t like low-level and algorithmic stuff, so I wouldn’t be able to work for cool companies like Google and Facebook.

Important note: I’m not saying you should not know what computational complexity is, how a hashtable works, or how to write recursion. You should, because that is basic stuff that you need in order to be able to write good code. But focusing too much on these things is what I consider irrelevant to day-to-day programming. (And for the trolls: I wouldn’t have passed the 2 phone screens if I was a complete dumbass who can only write websites in PHP and thinks a hashtable is some sort of furniture)

12

Is Moving All Jars To a Shared lib Folder Beneficial?

February 28, 2012

We will be running more than one web application on the same tomcat (7.0.22), so we wondered whether there will be a benefit in moving all jars to tomcat/lib instead of having them in each application’s WEB-INF/lib (since our applications have almost identical dependencies). What this gives as benefit for sure, without even testing, is that the classes are loaded by the parent classloader, and are loaded only once, rather than by each app’s classloader.

Here are the results on my machine, with 2 applications:

Startup times (4 runs):

WEB-INF/lib
INFO: Server startup in 77554 ms
INFO: Server startup in 62391 ms
INFO: Server startup in 62598 ms
INFO: Server startup in 61002 ms

tomcat/lib
INFO: Server startup in 72321 ms
INFO: Server startup in 71151 ms
INFO: Server startup in 69841 ms
INFO: Server startup in 71047 ms

Memory (2 runs):

WEB-INF/lib
PermGen: size=278, Used=198
Heap: size=1,029, Used=465

PermGen: Size=276, Used=198
Heap: Size=1,049, USed=501

tomcat/lib
PermGen: Size=199, Used=151
Heap: size=1,043, Used=412

PermGen: Size=195, Used=151
Heap: size=1,035, Used=418

The only difference is in the used PermGen size, which differs with ~50 MB per app (size differs with ~80). (Used Heap goes up and down, hence the difference). This is without working wit the app, which would add a bit more difference, but most of our classes are loaded eagerly by the spring context. So this is not enough benefit for the deployment complications it introduces – you can no longer package your application directly from maven/ant – you have to post-process it to clear the jars, and also make sure your environments have up-to-date tomcat/lib folders.

0