A Serious Flaw of PaaS
PaaS (Platfrom as a Service) is a marketing term for a sandboxed environment where applications can be deployed. The sandbox usually includes a lot of 3rd party software, which can be used by the application. Notable (java-related) PaaS’s include GoogleAppEngine, Heroku, OpenShift, CloudBees, CloudFoundry, Jelastic.
It all sounds very nice – you configure your application via a web console, you include the services you need (application server, database, message queue, search engine), add a few configuration hooks and everything is up and running. No need to go and install stuff manually, no need to think of scaling, as this may be automatically handled by the platform – you simply add more instances/engines/dynos/gears/whatever to service the request.
But I’ve evaluated a couple of them in the past years and encountered one serious flaw that was essentially a deal-breaker for most projects: software versions.
During the development of each project, a part of the stack is upgraded to the latest versions. That’s due to either important bug fixes, or performance improvements, or for the sake of staying up-to-date. In virtually all projects I’ve used, we have found a need to upgrade one or more components. The message queue has a serious bug, then a NoSQL database needs some performance improvements from the new version that covers the specific case used in your project, then the search engine offers a nice distributed feature that you’d rather use.
And yet, the managed environments rarely offer the latest versions (if they offer the 3rd party software at all). They might upgrade after a while, or they might not. You don’t have control over that. In fact (correct me if I’m wrong), most of them don’t have the notion of “version” for the components they offer. This is sometimes circumvented by adding the new version as a completely separate component, which might mean that even if a new version appears, you need to reconfigure everything.
In most cases you don’t have SSH access, and even if you do, it’s rather limited and you only have access to your application files. Although unrelated, I can’t omit the fact that PaaS’s usually have a command-line interface that you download in order to manage your deployment. There’s nothing bad in command-line management, but what they do is replace native-linux command-line management with a proprietary tool to do roughly the same thing. Not much a gain.
The last thing I’ve tried is OpenShift, but I hit the brick wall pretty early – they only support Maven 3.0.3 (latest being 3.1.0) and some of the plugins used in the project require at least 3.0.4. With GoogleAppEngine I was forced to use sevlet 2.5 rather than 3.0. With Heroku there was something similar I don’t even remember. Amazon’s EC2 is not PaaS (it’s IaaS), but it has Beanstalk which is sort-of a PaaS layer ontop – there the latest version of tomcat wasn’t available.
And if you are really convinced that you should use PaaS, you start making changes to the application to make it conform to the platform’s sandbox. You downgrade a plugin here, you write an ugly workaround there, you write more configuration, where a new version might require less. And if it’s about pet projects (like in my case), that might be fine. But with commercial projects with deadlines, written by a big team, this is a deal-breaker.
I’m not an anti-PaaS advocate, but I think providers should really address the version issue (and it’s non-trivial). Whether PaaS has enough benefits to ignore the version issue is a separate discussion (related SO question), but my personal approach is either getting (cheap) virtual machines or using an Infrastructure-as-a-Service cloud offer where you have more control.
PaaS (Platfrom as a Service) is a marketing term for a sandboxed environment where applications can be deployed. The sandbox usually includes a lot of 3rd party software, which can be used by the application. Notable (java-related) PaaS’s include GoogleAppEngine, Heroku, OpenShift, CloudBees, CloudFoundry, Jelastic.
It all sounds very nice – you configure your application via a web console, you include the services you need (application server, database, message queue, search engine), add a few configuration hooks and everything is up and running. No need to go and install stuff manually, no need to think of scaling, as this may be automatically handled by the platform – you simply add more instances/engines/dynos/gears/whatever to service the request.
But I’ve evaluated a couple of them in the past years and encountered one serious flaw that was essentially a deal-breaker for most projects: software versions.
During the development of each project, a part of the stack is upgraded to the latest versions. That’s due to either important bug fixes, or performance improvements, or for the sake of staying up-to-date. In virtually all projects I’ve used, we have found a need to upgrade one or more components. The message queue has a serious bug, then a NoSQL database needs some performance improvements from the new version that covers the specific case used in your project, then the search engine offers a nice distributed feature that you’d rather use.
And yet, the managed environments rarely offer the latest versions (if they offer the 3rd party software at all). They might upgrade after a while, or they might not. You don’t have control over that. In fact (correct me if I’m wrong), most of them don’t have the notion of “version” for the components they offer. This is sometimes circumvented by adding the new version as a completely separate component, which might mean that even if a new version appears, you need to reconfigure everything.
In most cases you don’t have SSH access, and even if you do, it’s rather limited and you only have access to your application files. Although unrelated, I can’t omit the fact that PaaS’s usually have a command-line interface that you download in order to manage your deployment. There’s nothing bad in command-line management, but what they do is replace native-linux command-line management with a proprietary tool to do roughly the same thing. Not much a gain.
The last thing I’ve tried is OpenShift, but I hit the brick wall pretty early – they only support Maven 3.0.3 (latest being 3.1.0) and some of the plugins used in the project require at least 3.0.4. With GoogleAppEngine I was forced to use sevlet 2.5 rather than 3.0. With Heroku there was something similar I don’t even remember. Amazon’s EC2 is not PaaS (it’s IaaS), but it has Beanstalk which is sort-of a PaaS layer ontop – there the latest version of tomcat wasn’t available.
And if you are really convinced that you should use PaaS, you start making changes to the application to make it conform to the platform’s sandbox. You downgrade a plugin here, you write an ugly workaround there, you write more configuration, where a new version might require less. And if it’s about pet projects (like in my case), that might be fine. But with commercial projects with deadlines, written by a big team, this is a deal-breaker.
I’m not an anti-PaaS advocate, but I think providers should really address the version issue (and it’s non-trivial). Whether PaaS has enough benefits to ignore the version issue is a separate discussion (related SO question), but my personal approach is either getting (cheap) virtual machines or using an Infrastructure-as-a-Service cloud offer where you have more control.
Typically application developed under a particular PaaS should be compliant with the versions of the tools provided by the PaaS. Since PaaS tools provide APIs – they are intended to be designed in such a way that they are backward compatible – hence version updates should not impact already deployed apps. However, as you noted, this is not always the case. Consider, for example, the versions of Python – 2.5, 2.6 and 3.1 are considered completely separate languages due to the lack of backward compatibility. Indeed, in this case, separate sandboxing should be considered by the PaaS provider with the ability to smoothly migrate your application to a new sandbox. Fully automated migration of deployed applications is not feasible in that manner.
What do you think about docker?
http://www.docker.io
An excellent point, and one we’ve been encountering at the Python-based PaaS I work for — from the PaaS provider’s perspective, there’s the problem that users will build their apps to the versions of the tools you provide, and if you upgrade then you can wind up breaking existing apps, which understandably makes people annoyed.
Python does make things a bit easier, though — it has a tool called a virtualenv which lets each user set up multiple Python environments with different versions of different dependencies. So we encourage our users to use virtualenvs to insulate themselves against system version changes and to use later versions of tools than the system default. This is doable because we support ssh to a bash command line inside the sandbox, so they can run the virtualenv command-line tools from there.
It’s still not ideal — we’re working on making it easier for us to upgrade system packages without breaking existing web apps by having multiple baseline images so that new users can get the latest versions of everything while existing users keep the version that was current when they signed up (unless they explicitly say they want to upgrade).
As the anonymous commenter above says, it’s possible that docker.io will make that easier to achieve.
Yup, and in addition to the problem of old versions and lack of updates: new versions can cause problems as well. The claimed utility of PAAS is in part it doesn’t require you to upgrade software since they handle that. The problem is of course that sometimes new versions either have new bugs that break your software or may have some incompatible change that breaks your software. So if you do have a working version on a PAAS, they might potentially break it. With your own environment you can test your software with new versions before upgrading.
Ideally a PAAS should provide an automated framework to run any tests you have to see whether to run you in an upgraded environment, otherwise run you on a legacy environment until you say you are ready to switch. They’d then need to give you access to test in the new environment separately. It isn’t clear how many different legacy environments PAAS vendors would want to keep running.
PaaS isn’t a “marketing term”, its one of the three cloud computing service models defined by an international coalition led by the US Dept of Commerce National Institute of Standards and Technology. PaaS is way more than what you mention here.
Could you give some links, to extend on my deliberately short description?
Thanks for mentioning Jelastic as notable š
I was surprised to see this reposted on DZone “brought to you in partnership with DZone and Red Hat. Try their Openshift cloud platform…” Did someone not read the part about the brick wall you hit?
In any case, I was wondering if you’ve tried Stackato yet. It addresses a lot of the shortcomings you mention, namely:
* SSH/SCP access to application containers (LXC)
* generally much more up-to-date supporting software (database and language runtime releases)
* some choice of language version (Java 6 or 7, Python 2.7 or 3.2, Node.js 6,8, or 10, etc.)
* customizable application containers using Heroku buildpacks (i.e. build-it-yourself frameworks)
Using a PaaS *should* be cheaper (Linux containers vs. full virtual machines) and easier (developer-centric user experience) than dealing with an IaaS, but I’d have to agree that most of the PaaS players are not quite there yet.
That’s no different, IME, from the computational platforms enterprises operate. In any large company, you have to develop and deploy to whatever software components the enterprise has decided to support. It doesn’t matter that Java 7 is out already for quite a while, if IT in a company has not yet adopted that version, you’re stuck with Java 6. The same goes for Tomcat versions, JBoss versions, PHP versions, database versions.
Only, with PaaS, you get rid of the manual administration effort for individual machines, and still get a better optimized utilization of the infrastructure you pay for.
What I agree on is that PaaS offerings at the moment each have serious limitations of one sort of another. But not that you don’t have such limitations with other deployment models (IaaS included), and SaaS doesn’t have some standardized infrastructure to rely on, so you can mix and match various SaaS offerings to build your infrastructure out of services, rather than mainly applications.
Have you tried other PaaS providers (MS Azure, SAP HCP, etc.)? What are your impressions, compared to the one you have already tried?