Monday, April 16, 2012

Inefficient Session Implementation on Google App Engine / Java & GAE Critics

Almost two months ago I released my first Google App Engine (GAE) application called www.krisentalk.de.
In general it is best practice to do some profiling and check for bottlenecks before going to production. So did I.

GAE offers a nice tool called “Appstats” which profiles current requests.
Studying the profiling results I had an unpleasant surprise. From time to time one request contained around 40 RPC (RPC=Remote Procedure Calls=Remote GAE Services). That was strange, because I never implemented more than 4 RPC per request. The following screenshot shows this:



What was happening?

It is noticeable that there are many pairs of memcache/datastore-Gets. It turns out that each pair is an attempt to read the session-state. But why there are for the same page sometimes just one session-read and sometime around 40 (like in the screen above)?

A little more digging has shown what’s going on: Whenever there is an invalid session there are many unsuccessful readings for session-state. In other words: The Browser sends a JSESSION cookie, but the session on the GAE is already invalidated (e.g. session got a timeout). My application is using Spring MVC and Freemarker. These frameworks (as many others) try to read the session more than once per request.

First Conclusion

Google App Engine has a grossly inefficient implementation for reading (invalided) session state. Whenever a session is invalid, multiple times RPCs are used just for checking again and again that there is no valid session. There shouldn’t be more than one attempt to read the session via Remote-Calls per request.
I issued a ticket on GAE for this: #7355

Workaround

As a workaround I implemented a request interceptor that checks if a user sends a JSESSION cookie that refers to an invalidated session. If this is the case the user’s JSESSION cookie is removed. Hence, the follow-up requests of this user don’t lead to the unnecessary RP-Calls anymore. The source in a Spring’s HandlerInterceptorAdapte looks like this:

 
void post(HttpServletRequest request, HttpServletResponse response, Object handler, ModelAndView modelAndView) throws Exception {
  Cookie cookie = WebUtils.getCookie(request, WebUtils.COOKIE_SESSION_NAME);
  if ( cookie != null  && request.getSession(false) == null) {                        
    MyWebUtils.deleteCookie("JSESSIONID", request, response);
  }
}


General GAE critics

One of my (and I think most other GAE users) biggest critics is that Google let their customers pay if they have an inefficiency or failure on their GAE platform. You can imagine what GAE’s bad session-implementation shown above would cost a customer with a high traffic application! On the GAE-user-group there was a nice joke that summarizes this to the point:

thecheatah: Why are my requests taking > 1 second all of a sudden?
Jeff: It's the end of the financial quarter and GAE needed to make revenue targets ;-)

The second (even worse) complain is the very bad user support/communication of Google regarding GAE’s customer. Google seems to have a politics of “ignore customer complains”. If you have problems like your costs go up or your application fails because of GAE’s reliability problems you are lost!
A quote of somebody that summarizes the feeling of affected GAE users:

“Some attention from google after a long time is nice to see. but still not clear when this issue get resolved.
More importantly concerns regarding credibility still in the air. I think many including us are planning to get out of GAE. Not because of the issue but the way such a critical issue is being handeled.”


Check the following issue for the quote above and for a serious reliability issue that Google did not response for more than one month.
http://code.google.com/p/googleappengine/issues/detail?id=7133

Comparing this to other platforms like Grails (from Spring Source) there are two totally separate worlds regarding “community-treatment”.

Monday, March 5, 2012

Google App Engine & Two new Babies

My gosh! Many activities were going on the last months. First our lovely new son arrived:



Then I finally released a new website called www.krisentalk.de. It’s all about the financial crisis in Europe and the rest of the world (called "Krise" in German). The content is community-driven.



I developed krisentalk.de on the Google App Engine (GAE) / Java. Starting with a little evaluation on cloud-platforms, I decided to go with GAE.
All in all it was a fun and expedient experience. Nevertheless there are pitfalls and Google’s cloud-platform still feels “green” on some edges. I will blog about some of the pitfalls later on.

I chose the following frameworks, which work well for me.

  • GAE Java Runtime
  • Spring 3.0.x (particularly Spring MVC)
  • Freemarker
  • Sitemesh
  • GAE’s Java Datastore API (wrapped with the good old Data Access Objects [DAO pattern])


Grails wasn’t an option because it doesn’t work very well on GAE.

My positive résumé on Google App Engine:

  • Setup, deployment and scaling is a no-brainer
  • The specific GAE API is proper and easy to use
  • Local development environment based on Eclipse works fine
  • Out of the box administration tools and monitoring capabilities available
  • Common services like emailing, asynchronous queuing, mem-caching, authentication etc. are available and easy to use
  • Scaling (as far as my JMeter tests predict)


My negative résumé on Google App Engine:

  • Some restrictions on the usage of Java’s SDK API (and thus some frameworks are not working well with GAE)
  • Feedback and support from Google and the community is improvable compared to others (e.g. Grails)
  • Critical bugs take sometimes too long to get recognized and fixed from Google (e.g. Website broke because wrong CSS mime-type)
  • Small down-times from 2 to 5 minutes in 24h the first days (running on HR Datastore)
  • Whenever there is a misbehavior in Google’s resource/scaling management, the customer has to pay. Nice business model: Introduce a misbehavior on customer’s platform and revenue goes up. ;) (e.g. Scheduler behavior and Slow requests). I will blog another example the next days.