Monday, April 16, 2012

Inefficient Session Implementation on Google App Engine / Java & GAE Critics

Almost two months ago I released my first Google App Engine (GAE) application called www.krisentalk.de.
In general it is best practice to do some profiling and check for bottlenecks before going to production. So did I.

GAE offers a nice tool called “Appstats” which profiles current requests.
Studying the profiling results I had an unpleasant surprise. From time to time one request contained around 40 RPC (RPC=Remote Procedure Calls=Remote GAE Services). That was strange, because I never implemented more than 4 RPC per request. The following screenshot shows this:



What was happening?

It is noticeable that there are many pairs of memcache/datastore-Gets. It turns out that each pair is an attempt to read the session-state. But why there are for the same page sometimes just one session-read and sometime around 40 (like in the screen above)?

A little more digging has shown what’s going on: Whenever there is an invalid session there are many unsuccessful readings for session-state. In other words: The Browser sends a JSESSION cookie, but the session on the GAE is already invalidated (e.g. session got a timeout). My application is using Spring MVC and Freemarker. These frameworks (as many others) try to read the session more than once per request.

First Conclusion

Google App Engine has a grossly inefficient implementation for reading (invalided) session state. Whenever a session is invalid, multiple times RPCs are used just for checking again and again that there is no valid session. There shouldn’t be more than one attempt to read the session via Remote-Calls per request.
I issued a ticket on GAE for this: #7355

Workaround

As a workaround I implemented a request interceptor that checks if a user sends a JSESSION cookie that refers to an invalidated session. If this is the case the user’s JSESSION cookie is removed. Hence, the follow-up requests of this user don’t lead to the unnecessary RP-Calls anymore. The source in a Spring’s HandlerInterceptorAdapte looks like this:

 
void post(HttpServletRequest request, HttpServletResponse response, Object handler, ModelAndView modelAndView) throws Exception {
  Cookie cookie = WebUtils.getCookie(request, WebUtils.COOKIE_SESSION_NAME);
  if ( cookie != null  && request.getSession(false) == null) {                        
    MyWebUtils.deleteCookie("JSESSIONID", request, response);
  }
}


General GAE critics

One of my (and I think most other GAE users) biggest critics is that Google let their customers pay if they have an inefficiency or failure on their GAE platform. You can imagine what GAE’s bad session-implementation shown above would cost a customer with a high traffic application! On the GAE-user-group there was a nice joke that summarizes this to the point:

thecheatah: Why are my requests taking > 1 second all of a sudden?
Jeff: It's the end of the financial quarter and GAE needed to make revenue targets ;-)

The second (even worse) complain is the very bad user support/communication of Google regarding GAE’s customer. Google seems to have a politics of “ignore customer complains”. If you have problems like your costs go up or your application fails because of GAE’s reliability problems you are lost!
A quote of somebody that summarizes the feeling of affected GAE users:

“Some attention from google after a long time is nice to see. but still not clear when this issue get resolved.
More importantly concerns regarding credibility still in the air. I think many including us are planning to get out of GAE. Not because of the issue but the way such a critical issue is being handeled.”


Check the following issue for the quote above and for a serious reliability issue that Google did not response for more than one month.
http://code.google.com/p/googleappengine/issues/detail?id=7133

Comparing this to other platforms like Grails (from Spring Source) there are two totally separate worlds regarding “community-treatment”.