The DSL Jungle
DSLs are a common thing in the programming world nowadays. Many frameworks and tools decide to build a DSL for their…specific things. Builds tools are the primary candidates, but testing frameworks, web frameworks and whatnot also decide to define a DSL. With these DSLs you define build steps, web routing rules, test acceptance criteria, etc.
What is the most common thing about all these DSLs? Two things. First, they are predominantly about configuration. Some specific way of configuring something specific to the tool or framework. The second thing is that you copy-paste code. Everytime I’m confronted with some DSL that is meant to help with my programming task, I end up copy-pasting examples or existing code, and then modifying it. Even though I’ve been working with a DSL for 8 months (from time to time), I just don’t remember its syntax.
And you may say “yeah, that’s because you use bad DSLs”. Well, then I haven’t seen a good one yet. I’m currently using sbt, spray routing, cucumber for scala, previously I’ve used groovy and grails DSLs, and a few others along the way.
But is it bad that you copy-paste existing pieces of code? Not always. You can, of course, base your configuration on existing, working pieces. But there are three issues – duplicate code, autocomplete and exploration. You know copy-pasting is wrong and leads to duplication. Not only that, but you may forget to change or remove something in the pasted code. And if you want to add some property, it would be good to be able to auto-complete it, rather than mistyping or, or forgetting whether it was “filePath”, “filepath”, “file-path” or just “path”. Having 2-3 DSLs in parts of a big project, you can’t remember all property names, so the alternative is to go and see the documentation (if you don’t have a working piece with that particular property to copy-paste from). Exploration is an even bigger issue. Especially when learning, or remembering how to do certain things with a given DSL, it is crucial to be able to explore the possibilities. What properties does this have, that might be useful? What does this property do exactly and does it have subproperties? What can I nest under this item? This is very important, regardless of your knowledge of the tool/framework.
But with most DSLs you don’t have that. They either have some bizarre syntax, or they are JSON-based, or they look like the language you are using, but not quite, and hence even an IDE finds it difficult to understand them (spray being such an example). You either look at the documentation, or you copy-paste, or both. And you are kind of lost in this DSL jungle of ever so “cooler” DSLs that do a wide variety of things.
And now I’ll drop the X-bomb. I love XML. Trusting the “XML configuration files are evil” meme has lead to many incomprehensible configurations, that are “short and easy to read and write”. Easy, if you remembered what those double-percentage signs meant compared to the single percentage signs, and where exactly to put the parentheses.
In almost every scenario where someone decided that a DSL is a good idea, XML would have worked brilliantly. Using an XSD schema (which, I agree, is a bit tedious to write) you can make any XML-aware tool be turned into an IDE for configuration. Take the maven pom file, for example. Did you forget what element you could nest under “build”? Hit CTRL+space and you’ll find out. Being unified, you can read the XML configuration of any framework or tool that uses it, not just this particular one, that is the n-th DSL in a single project. While XML is verbose, it is straightforward and standard. (To make a distinction: your application properties file is fine with key-value pairs, YAML, or something like typesafe, but that’s not coming from a framework, and it’s not a DSL in the narrower sense)
So if you are writing a tool, and can’t make some configuration available via annotations or via very simple code (builders, setters, fluent interfaces), don’t go for a DSL. Don’t write DSLs where you can easily use XML. It will look good on your README.md, but your users will copy-paste all the time and may actually hate it. So please don’t contribute to the DSL jungle.
And do you know why that is? Remember the initial note that these are DSLs you use when programming. Well, DSLs are not for programmers. DSLs are for non-programmers to express business logic in (almost) prose. Or at least their usage should be limited to that, where they can really excel. If you are making a tool for business analysts, feel free to design the most awesome DSL. If you are building a tool for programmers, don’t.
DSLs are a common thing in the programming world nowadays. Many frameworks and tools decide to build a DSL for their…specific things. Builds tools are the primary candidates, but testing frameworks, web frameworks and whatnot also decide to define a DSL. With these DSLs you define build steps, web routing rules, test acceptance criteria, etc.
What is the most common thing about all these DSLs? Two things. First, they are predominantly about configuration. Some specific way of configuring something specific to the tool or framework. The second thing is that you copy-paste code. Everytime I’m confronted with some DSL that is meant to help with my programming task, I end up copy-pasting examples or existing code, and then modifying it. Even though I’ve been working with a DSL for 8 months (from time to time), I just don’t remember its syntax.
And you may say “yeah, that’s because you use bad DSLs”. Well, then I haven’t seen a good one yet. I’m currently using sbt, spray routing, cucumber for scala, previously I’ve used groovy and grails DSLs, and a few others along the way.
But is it bad that you copy-paste existing pieces of code? Not always. You can, of course, base your configuration on existing, working pieces. But there are three issues – duplicate code, autocomplete and exploration. You know copy-pasting is wrong and leads to duplication. Not only that, but you may forget to change or remove something in the pasted code. And if you want to add some property, it would be good to be able to auto-complete it, rather than mistyping or, or forgetting whether it was “filePath”, “filepath”, “file-path” or just “path”. Having 2-3 DSLs in parts of a big project, you can’t remember all property names, so the alternative is to go and see the documentation (if you don’t have a working piece with that particular property to copy-paste from). Exploration is an even bigger issue. Especially when learning, or remembering how to do certain things with a given DSL, it is crucial to be able to explore the possibilities. What properties does this have, that might be useful? What does this property do exactly and does it have subproperties? What can I nest under this item? This is very important, regardless of your knowledge of the tool/framework.
But with most DSLs you don’t have that. They either have some bizarre syntax, or they are JSON-based, or they look like the language you are using, but not quite, and hence even an IDE finds it difficult to understand them (spray being such an example). You either look at the documentation, or you copy-paste, or both. And you are kind of lost in this DSL jungle of ever so “cooler” DSLs that do a wide variety of things.
And now I’ll drop the X-bomb. I love XML. Trusting the “XML configuration files are evil” meme has lead to many incomprehensible configurations, that are “short and easy to read and write”. Easy, if you remembered what those double-percentage signs meant compared to the single percentage signs, and where exactly to put the parentheses.
In almost every scenario where someone decided that a DSL is a good idea, XML would have worked brilliantly. Using an XSD schema (which, I agree, is a bit tedious to write) you can make any XML-aware tool be turned into an IDE for configuration. Take the maven pom file, for example. Did you forget what element you could nest under “build”? Hit CTRL+space and you’ll find out. Being unified, you can read the XML configuration of any framework or tool that uses it, not just this particular one, that is the n-th DSL in a single project. While XML is verbose, it is straightforward and standard. (To make a distinction: your application properties file is fine with key-value pairs, YAML, or something like typesafe, but that’s not coming from a framework, and it’s not a DSL in the narrower sense)
So if you are writing a tool, and can’t make some configuration available via annotations or via very simple code (builders, setters, fluent interfaces), don’t go for a DSL. Don’t write DSLs where you can easily use XML. It will look good on your README.md, but your users will copy-paste all the time and may actually hate it. So please don’t contribute to the DSL jungle.
And do you know why that is? Remember the initial note that these are DSLs you use when programming. Well, DSLs are not for programmers. DSLs are for non-programmers to express business logic in (almost) prose. Or at least their usage should be limited to that, where they can really excel. If you are making a tool for business analysts, feel free to design the most awesome DSL. If you are building a tool for programmers, don’t.
All good points but I think you are missing one of the main points of DSLs. For static information they are mostly useless because it really doesn’t matter at the end of the day in what format I express application configuration or spell out the dependencies for a Java project. DSLs start to shine in situations where you also have some kind of dynamic component. A well designed DSL can generate a whole-bunch of boilerplate that you would have had to type yourself in a more general purpose language. That is of course if you follow the conventions of the DSL and the semantics of your domain map well enough to the domain modelled by the DSL.
Examples of good DSLs that I have come across are regular expressions, parser combinators, and type annotations. In each of those instances you get a boost in expressivity if your problem can be modelled by the concepts that the DSL exposes.
Fair point. But most DSLs nowadays have next to none dynamic components.
I am not sure you have reached the real underlying cause for your frustration. There are several things you list:
1. Many DSLs are underdocumented.
2. Many DSLs lack support from IDE perspective – sometimes even highlighting, much less autocomplete/exploration
3. Unusual/weird syntax, unexpected behavior, hard to get right
4. There way too many of them
5. Finally – it is you that gets the great pleasure to use them
You propose several fixes:
A. Write these DSLs as internal fluent/builder sub-languages – there is the familiarity and IDE support.
B. Use XML, possibly adding XML schemas for typing especially for configurations
C. Don’t write DSLs at all
D. Finally – if you cannot stand the urge, deliver them unto a group of people you automatically fall out of.
I agree with points 1-5 but I would also note that 1, 4, 5 are also characteristic of many libraries and tools, 3, especially the hard to get right part is also there for some libraries. 2 is characteristic even of a mainstream language when there is a new major version and tools need to catch up.
I would venture that you major pain is that you have had to deal with too many DSLs because in a large project you always get some component that is of lower quality/pleasure to work with compared to the rest of the components. A single DSL is tolerable even as you mention – the standard for quality and documentation is too low. Perhaps too many straws broke the camel’s back.
For your solutions:
(A) – fine when possible, especially when designed properly.
(B) – I can only presume that you mean a limited subset of XML without DTDs, external entities, entity resolvers and the rest. XML Schemas can be really tricky and hard – I’ve had plenty of examples of schemas that were unreadable/unmaintainable and plainly wrong. And there is this following XML constant in Java for a reason – FEATURE_SECURE_PROCESSING. XML is OK for me too, however I have stayed in its murky waters for too long to know my way. I can understand peoples frustration with it.
(C) – You’ve got to be kidding. This will be as effective as “Don’t have premarital sex”
(D) – I would say ‘NO’ – if even I can see that the way such tools are created is to produce balls of mud people swing around, pushing it to other group of people will not solve the mud part.
Well, the real solution would be – be a better programmer and inflict less pain on the world no matter whether you deliver a library, a DSL or a full blown language. Perhaps you have auto-excluded this one as impossible, haven’t you?
And about the number of DSL – this ship has already sailed.
Thus far we have seen the next 700 programming languages, the next 700 data description languages, the next 700 markup languages, the next Byzantine fault-tolerant protocols, the next 700 Kriven machines, The next 700 theorem provers, the next 700 separation logics, the next 700 asynchronous programming models, the next 700 slicing criteria and even more.
But if the producers of DSLs are too promiscuous consumers can be more picky?
Just a clarification: “Be a better programmer” was not aimed at you – but to all programmers, like a slogan.