Some Technical Details About My Startup
Today I launched a “field trial” of my sort-of-a-startup (http://welshare.com). And here I’ll write some technical details.
The stack:
- Java – the language and the platform I’m most familiar with. It is claimed that the JVM is very performant, so that’s welcome.
- spring – all internal functioning of the application is wired by the spring framework. Again – I’m more or less a ‘spring expert’. Very interesting portfolio projects are coming out now (spring-mobile, spring-data, etc), so again – that’s welcome
- spring-mvc – this is the web framework. It provides an easy RESTful style of programming
- jQuery – obviously, some client-side richness is needed, and jQuery is industry-standard. I didn’t have any prior experience with jQuery, so the client-side code needs some polishing.
- MySQL – well, no NoSQL for now. I’ll discuss this decision in a moment
- Hibernate – the most common choice for object-relational mapping. Why object-relational mapping in the first place? I’m used to it, it makes development easy. And it offers some very good features out of the box, like caching, search, sharding, etc.
- Lucene – search is a must. And lucene is, again, an industry-standard. Why just bare lucene, and not Solr? In a minute.
- Ehcache – caching is a must. Both hibernate and spring have good caching options, and both support ehcache. Ehcache is a standard caching solution for java, similar to memcached for LAMP. (well, memcached has a Java API, and differs a little from ehcache in terms of replication at least, but that’s out of the scope of this article)
Now, that’s a typical web-application architecture. A presentation layer (browser/jQuery sending and getting data from spring-mvc controllers), a service layer, which is invoked by the controllers and contains all the logic, and a DAO layer, which contains all the database access.
And yes, it is trivial and boring. If I want this application to become “big”, I need something more .. scalable..and fashionable. Like NoSQL, messaging, distributed file system. Well, no.
I’ve already expressed my views about NoSQL for startups. In short – it was way easier and faster to write the thing with MySQL than to cope with NoSQL. I tried Cassandra (and spent some more time researching why should I try it), but it wasn’t a good match.
Messaging. Twitter uses message queues to balance the load, and to queue the messages until they can be handled. Well, I started the project with JMS (ActiveMQ) and coded to that paradigm for a short while. Then I realized that.. I don’t need this complication yet. And so I dropped it.
I mentioned that search is a must. And the typical choice for a search engine is Solr, which is backed by lucene. But I choose Hibernate-search with lucene. Why? Because Hibernate synchronizes the lucene index with the database automatically. I wouldn’t have to write any code to communicate with Solr, let alone to deploy and manage Solr.
I guess the picture is clear for now. I have created a minimum viable product in terms of technology (and not so minimum in terms of features). If I had to use and deploy NoSQL, Solr, message queue, and whatnot, I wouldn’t be able to launch.
Now, I agree, this sounds like “that guy has made some crappy piece of software that will never be capable of growing”. But the simple architecture, combined with a clean code, is rather extensible. What do I mean:
- When I choose to / have to switch to a NoSQL store for something, the only code I have to change is a few DAOs. All of the rest stays the same. I’ve been strictly keeping the layer boundaries so the service layer has absolutely no knowledge of what storage mechanism is used. In fact, I plan to move the url-shortening to a key-value store really soon, and the user relationships (followings and friendships) to a graph database in the near future. With the separated layers and with the help of the spring-data project I see this as “not a big deal”. Certainly, it will be time-consuming, at least because of the learning curve
- A Message Queue can be incorporated into the architecture really easy. I’ll just have to plug it between the service and the dao layers. Not much hassle, especially with the limited scope of the MQ. Of course, it will take time to test and measure, but it is not a complex task
- Solr – if lucene with hibernate-search turns out to have some performance and scalability problems, all that will be required is to write some post-processor (after the DAOs finish the database work) to communicate with Solr (currently this post-processor is provided by hibernate, and does not communicate with Solr). Not a small task, but one requiring minimal changes in the existing code
You get the picture – I’ve postponed some time-consuming tasks, but I have taken the measures to make them really easy to do later. It’s all about over-architecture and over-design. That has always brought trouble, and so I strived not to overdesign things.
So what will happen if I get millions of users overnight? I won’t 🙂 But I have written the code so that it is prepared to sustain an eventual growth.
(With one exception, that is really a horrible mistake. I don’t have enough tests. A few unit-tests and a few selenium tests just aren’t enough. I hope that doesn’t eat my head.)
Finally, a few operational details – currently using Amazon EC2, but without being coupled to any other service. I’m storing things on S3, but with one configuration I can switch to file system (and I do so for development).
Builds are done via Maven and Hudson. I don’t really use hudson for deploying, at least for now. Deploying without downtime is a wonderful feature of Tomcat 7 – you can have two version of the same application running simultaneously. The older version will be there until all active sessions to it expire.
Monitoring is done via JMX and the CPU monitor of amazon.
As a conclusion – I hope I’m right with my decisions. Time and server monitoring will tell.
Today I launched a “field trial” of my sort-of-a-startup (http://welshare.com). And here I’ll write some technical details.
The stack:
- Java – the language and the platform I’m most familiar with. It is claimed that the JVM is very performant, so that’s welcome.
- spring – all internal functioning of the application is wired by the spring framework. Again – I’m more or less a ‘spring expert’. Very interesting portfolio projects are coming out now (spring-mobile, spring-data, etc), so again – that’s welcome
- spring-mvc – this is the web framework. It provides an easy RESTful style of programming
- jQuery – obviously, some client-side richness is needed, and jQuery is industry-standard. I didn’t have any prior experience with jQuery, so the client-side code needs some polishing.
- MySQL – well, no NoSQL for now. I’ll discuss this decision in a moment
- Hibernate – the most common choice for object-relational mapping. Why object-relational mapping in the first place? I’m used to it, it makes development easy. And it offers some very good features out of the box, like caching, search, sharding, etc.
- Lucene – search is a must. And lucene is, again, an industry-standard. Why just bare lucene, and not Solr? In a minute.
- Ehcache – caching is a must. Both hibernate and spring have good caching options, and both support ehcache. Ehcache is a standard caching solution for java, similar to memcached for LAMP. (well, memcached has a Java API, and differs a little from ehcache in terms of replication at least, but that’s out of the scope of this article)
Now, that’s a typical web-application architecture. A presentation layer (browser/jQuery sending and getting data from spring-mvc controllers), a service layer, which is invoked by the controllers and contains all the logic, and a DAO layer, which contains all the database access.
And yes, it is trivial and boring. If I want this application to become “big”, I need something more .. scalable..and fashionable. Like NoSQL, messaging, distributed file system. Well, no.
I’ve already expressed my views about NoSQL for startups. In short – it was way easier and faster to write the thing with MySQL than to cope with NoSQL. I tried Cassandra (and spent some more time researching why should I try it), but it wasn’t a good match.
Messaging. Twitter uses message queues to balance the load, and to queue the messages until they can be handled. Well, I started the project with JMS (ActiveMQ) and coded to that paradigm for a short while. Then I realized that.. I don’t need this complication yet. And so I dropped it.
I mentioned that search is a must. And the typical choice for a search engine is Solr, which is backed by lucene. But I choose Hibernate-search with lucene. Why? Because Hibernate synchronizes the lucene index with the database automatically. I wouldn’t have to write any code to communicate with Solr, let alone to deploy and manage Solr.
I guess the picture is clear for now. I have created a minimum viable product in terms of technology (and not so minimum in terms of features). If I had to use and deploy NoSQL, Solr, message queue, and whatnot, I wouldn’t be able to launch.
Now, I agree, this sounds like “that guy has made some crappy piece of software that will never be capable of growing”. But the simple architecture, combined with a clean code, is rather extensible. What do I mean:
- When I choose to / have to switch to a NoSQL store for something, the only code I have to change is a few DAOs. All of the rest stays the same. I’ve been strictly keeping the layer boundaries so the service layer has absolutely no knowledge of what storage mechanism is used. In fact, I plan to move the url-shortening to a key-value store really soon, and the user relationships (followings and friendships) to a graph database in the near future. With the separated layers and with the help of the spring-data project I see this as “not a big deal”. Certainly, it will be time-consuming, at least because of the learning curve
- A Message Queue can be incorporated into the architecture really easy. I’ll just have to plug it between the service and the dao layers. Not much hassle, especially with the limited scope of the MQ. Of course, it will take time to test and measure, but it is not a complex task
- Solr – if lucene with hibernate-search turns out to have some performance and scalability problems, all that will be required is to write some post-processor (after the DAOs finish the database work) to communicate with Solr (currently this post-processor is provided by hibernate, and does not communicate with Solr). Not a small task, but one requiring minimal changes in the existing code
You get the picture – I’ve postponed some time-consuming tasks, but I have taken the measures to make them really easy to do later. It’s all about over-architecture and over-design. That has always brought trouble, and so I strived not to overdesign things.
So what will happen if I get millions of users overnight? I won’t 🙂 But I have written the code so that it is prepared to sustain an eventual growth.
(With one exception, that is really a horrible mistake. I don’t have enough tests. A few unit-tests and a few selenium tests just aren’t enough. I hope that doesn’t eat my head.)
Finally, a few operational details – currently using Amazon EC2, but without being coupled to any other service. I’m storing things on S3, but with one configuration I can switch to file system (and I do so for development).
Builds are done via Maven and Hudson. I don’t really use hudson for deploying, at least for now. Deploying without downtime is a wonderful feature of Tomcat 7 – you can have two version of the same application running simultaneously. The older version will be there until all active sessions to it expire.
Monitoring is done via JMX and the CPU monitor of amazon.
As a conclusion – I hope I’m right with my decisions. Time and server monitoring will tell.
If you need more scalability and throughput for your DB, feel free to try Xeround Cloud DB, which is 100% compatible with MySQL and has many great features like high availability, auto scaling and more…
Hi, thanks for sharing ideas for your start-up. I would try to impress user much more on the front page. Seem like there is nothing in there. Tell about your services as first line of your frontpage in big letters otherwise ppl will just leave. I understand you just started the projects but still the more the use understands what service are you offering there are more chances this user will pay you back somehow ( as user or as QA )…
Cheers, good luck with it.
P.S i don’t think lines like easier than Facebook or Twitter are good because you use those services on your site while Facebook users do much more on Facebook.
Try to explain why should users join your service at all what difference ( real ) one it will make to create yet another account on website that cannot beat in fair competition Twitter or Facebook.
@EugeneK – thank you very much for your feedback. I will try to work in that direction.
Page loads are pretty fast (63ms for login page). One thing you may consider is making the jquery references from a CDN instead of your /static folder. Wishing you success! Do you know if anyone uses lucene with microsoft entity framework?
hi. good luck with your project. I’ll try to participate as much as I can.
And thanks for sharing your thoughts. I have extracted some things from this article to put into my study schedule.
great. i really like your sharing here. i will try it in my product. already tried welshare, and i experience the loading page fast enogh. if you don’t mind, would you please share us what is your view engine (jsp/freemarker/velocity/Thymeleaf)? Good luck
plain old JSP 🙂 with lots of jQuery and ajax. And some velocity for the email templates.
hi, we are planning to use ehcache as the caching solution and lucene to perform indexing and full-text searches on the cache. the data changes are very infrequent (almost none). can you share your experience of ehcache and lucene in your own product? thanks.
There have been a lot of changes to these products since I last used them, so you’d better check the current documentation and ask the community