Play Framework (5) – performance optimization (1)

It is a long way on this road. When you finish one feature, it is just a start. The rest thing is to make it better and better. You will need to care more about performance. Servals things which I experience here I write down.

  1. Deployment method

    1. Don’t use start or run to deploy your production. It is totally wrong. You need to use:
      activator clean; activator dist

      to package your project and then unzip it which is under

      unzip #your_project_path/target/universal/  

      and then cd to this folder. you will find under /bin/ there is script which can run. My normal jvm parmaters like this: (Please note here my project uses Nginx as web server and New Relic to collect application performance info. You need to modify jvm parameters according to your project. )

      nohup bin/#YourApp -Dhttp.port=9081 -Dhttp.address=#MyPrivateServerIP -J-Xms4096m -J-Xmx4096m -J-Xmn2048m -J-javaagent:../../../newrelic/newrelic.jar &
  2. Third-party Tool

    1. check cpu and memory usage by New Relic (I have another post which tell you how to use free New Relic)
  3. Command Tool

    When you really meet problem, New Relic just tells you that your app is not normal. It is not enough to solve the problem. So the rest thing for you is to dig out the real cause. Here are some useful commands which will help you a lot.

    1. htop     This is more detailed info than New Relic. You can do some actions on your application and monitory by htop to see whether these actions will cause performance changes.
    2. jcmd     This will list all jvm applications and you will get each PID. Or jps -l 
    3. jcmd <PID> help  This will list commands to tell you how to use it to get which info you want.
      • The following commands are available:
        1. VM.native_memory
        2. VM.commercial_features
        3. GC.rotate_log
        4. ManagementAgent.stop
        5. ManagementAgent.start_local
        6. ManagementAgent.start
        7. Thread.print
        8. GC.class_histogram
        9. GC.heap_dump
        10. GC.run_finalization
        11. GC.runVM.uptime
        12. VM.flags
        13. VM.system_properties
        14. VM.command_line
        15. VM.version
        16. GC.class_stats    : print the name of the ClassLoader
    4. jcmd <PID> ###     Here ### is getting from the list in above command’s result. I care about cpu usage, so normally I use Thread.print to check which threads are using.
    5. jstack This directly outputs running threads. You also can output to local file by jstack > #YourFile
  4. Problems which I met

    When you already find root cause, the last thing is to solve it.

    1. Today my problem is that I use Akka to do some tasks, but each time scheduler finishes task and doesn’t shutdown its ActorSytem. The more tasks it does, the more threads it opens. Finally my cpu usage reaches 100% and then the project doesn’t have any response.(All of them are analyzed by above commands) I use above commands to find which actors are running and shutdown them when they finish. Now my project’s cpu usage is only 1.0% or so.
    2. Everything seems perfect. I also though my code would not meet performance in future. But nothing is perfect enough. So I come back to update this post to add more problems which I met. Today problem appears when I migrate application from old single-cpu server to new 8-cores server. The cpu usage for my application in old single-cpu server is only 0.5% every day. But for new high configuration cpu it is almost 400% (because it is 8 cores, it is equal to use up half of cpu). I use the method in above: and find root cause is HashMap.
      The easy method is to change hashmap to ConcurrentHashMap. After this modification, cpu usage goes to normal, 0.5%.

      var schedulerIDs = new ConcurrentHashMap[String, Cancellable]().asScala

      ConcurrentHashMap can solve the HashMap’s thread 100% cpu issue. HashMap doesn’t support multiple threads. Detailed info please read here: http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6423457
      ConcurrentHashMap concurrency is the size of segment, by default it is 16. This means at most there are 16 threads operates ConcurrentHashMap. This is great benefit for ConcurrentHashMap, not for HashTable.

    3. My performance issue continues, now it is hard to find the root cause. I try to read more to understand them better. But it is really hard. This time, my memory is always increasing triggering by some actions which are hard to find role. To be honest, I find some hit in Thread.print which mentions lots of time of “ForkJoinPool”. So I go back to my code to check which thread will be created by forkjoinpool and how does forkjoinpool manage memory. Here I read these following words. In fact, in my old code, i used scala standard library, it caused increasing memory problem. When i change it to play.api.libs, thing becomes better.
    4. In Play 2.3.x and prior, play.core.Execution.Implicits.internalContext is a ForkJoinPool with fixed constraints on size, used internally by Play. You should never use it for your application code. From the docs: 
      Play Interval Thread Pool - This is used internally by Play. No application code should ever be executed by a thread in this thread pool, and no blocking should ever be done in this thread pool. Its size can be configured by setting internal-threadpool-size in application.conf, and it defaults to the number of available processors.
      Instead, you would use play.api.libs.concurrent.Execution.Implicits.defaultContext , which uses an ActorSytem.
      In 2.4.x, they both use the same ActorSystem. This means that Akka will distribute work among its own pool of threads, but in a way this is invisible to you (other than configuration). Several Akka actors can share the same thread.
      scala.concurrent.ExecutionContext.Implicits.global is an ExecutionContext defined in the Scala standard library. It is a special ForkJoinPool that using the blocking method to handle potentially blocking code in order to spawn new threads in the pool. You really shouldn't use this in Play application, as Play will have no control over it. It also has the poential to spawn a lot of threads and use a ton of memory, if you're not careful.

To be honest, there is no general method to optimize performance. The more experiences you have, the less errors you create. When you meet performance issue, don’t be afraid and pay attention on finding root cause step by step. Only when you know what’s the cause, you can find right solution to fix it. Before that, any guess or others’ method is only useless.

Advertisements

2 thoughts on “Play Framework (5) – performance optimization (1)

  1. Pingback: Play application performance optimization (2) | Play Harder, Work Harder

  2. Pingback: Scala – Performance Optimization | Play Harder, Work Harder

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s