Category Archives: Visualization

Scala – Performance Optimization

First, we talked lot of information about performance optimization, including slick’s performance optimization by connection pool, JavaScript High Performance tips, MySQL’s performance tuning, and Play Framework Tuning(1) and Tuning(2). If we think it is enough, we are too naive. This post I will list performance optimization tips on Scala.

  1. par
    If we have N tasks which don’t have any relationship between each task, like order, shared variable, we can consider using the parallel collection to fasten computation.
    The parallel collection will make use of max-currency depending on the number of cores to execute which will greatly improve function’s efficiency.

    val list = List[String]("a", "b", "c", ...)
    list.par.foreach( r => {
      ... // your task.  

    Sometimes, the sequential implementation might have better performance than parallel implementation. That’s because using parallel collection has some overhead for distributing(fork) and gathering(join) the data between cores. Thus one can conclude having heavy computations, parallel collections can be of great performance improvement.

  2. Future
    Future is the same with par which can reach to the same purpose.

    import scala.concurrent.{Await, Future}
    import scala.concurrent.duration._
    val arr = List[String]("a", "b", "c")
    val futures = r => Future {
      ... // your task.
    val future = Future.sequence(futures)
    Await.result(future, 1 hour)
  3. Avoid unnecessary loop
    Even though we have  the parallel method to fasten steps, avoiding unnecessary loop is still needed. For my code, I use the random method to handle tokens selection problem to make sure the resources are fair to every customer. This is benefit from the probability theory.
  4. print out thread name to debug parallel running status
  5. Separate ExectureContent
    If you don’t want to influence default ExectureContent, you can create additional one to separate it.

    implicit val ec = ExecutionContext.fromExecutor(Executors.newCachedThreadPool())

    newCachedThreadPool vs newFixedThreadPool
    Creates a thread pool that reuses a fixed number of threads operating off a shared unbounded queue. At any point, at most nThreads threads will be active processing tasks. If additional tasks are submitted when all threads are active, they will wait in the queue until a thread is available. If any thread terminates due to a failure during execution prior to shutdown, a new one will take its place if needed to execute subsequent tasks. The threads in the pool will exist until its is explicitly shutdown.
    Creates a thread pool that creates new threads as needed, but will reuse previously constructed threads when they are available. These pools will typically improve the performance of programs that execute many short-lived asynchronous tasks. Calls to execute will reuse previously constructed threads if available. If no existing thread is available, a new thread will be created and added to the pool. Threads that have not been used for sixty seconds are terminated and removed from the cache. Thus, a pool that remains idle for long enough will not consume any resources. Note that pools with similar properties but different details (for example, timeout parameters) may be created using ThreadPoolExecutor constructors.
    If you have a huge number of long running tasks I would suggest the FixedThreadPool. Otherwise, please choose CachedThreadPool.

Performance Control

There are several ways to do performance control. In past blogs, I already mentioned different ways to see your server’s performance and how to tune it. In this blog, we focus on automatically control of the performance.

As we know, the more transactions come, the more pressure the server undertakes. In this case, we need an automatical way to know the server’s status. Here we introduce two ways, one is crontab, one is loop.


  • several important commands which we need to know.
    // edit crontab script
    crontab -e 
    // list active crontabs
    crontab -l
    // view log file to check crontab's status
    sudo grep cron /var/log/syslog


  • Step1: you need to write a script, for example its name is
    nohup sh -c `while true; do <your_script>.sh >> <your_log>.txt; sleep 1800; done` &

    Here 1800’s unit is seconds, so it is equal to 30 minutes.
    Please note, here <your_script>.sh is the real content which you want to do, not
    In this line, we use three important commands:

    • nohup
      It makes your job keep running in the background when the process gets sighup.
    • &
      It essentially returns control to you immediately and allows the command to complete in the background.
    • while …  do …  done
    • sleep 1800
  • Step2: run this script.
  • Step3: check your log, you will see what you want to print out.
    vi <your_log>.txt

Scala (20) – Execution Context

Execution Context:

  • An ExecutionContext is similar to an Executor: it is free to execute computations in a new thread, in a pooled thread or in the current thread (although executing the computation in the current thread is discouraged)

The Global Execution Context:

  • is an ExecutionContext backed by a ForkJoinPool. It should be sufficient for most situations but requires some care.
    A ForkJoinPool manages a limited amount of threads (the maximum amount of thread being referred to as parallelism level). The number of concurrently blocking computations can exceed the parallelism level only if each blocking call is wrapped inside a blocking call. Otherwise, there is a risk that the thread pool in the global execution context is starved, and no computation can process.
  • By default, the sets the parallelism level of its underlying fork-join-pool to the amount of available processors (Runtime.availableProcessors).  This configuration can be overridden by  setting the following VM attributes: scala.concurrent.context.minThreads, scala.concurrent.context.numThreads, scala.concurrent.context.maxThreads.

Thread Pool:

  • If each incoming request results in a multitude of requests to get another tier of systems, in these systems, thread pools must be managed so that they are balanced according to the ratios of requests in each tier: mismanagement of one thread pool bleeds into another.

Scala (19) – Futures


  • They hold the promise for the result of a computation that is not yet complete. They are a simple container- a placeholder. A computation could fail of course, and this must also be encoded. a Future can be in exactly one of 3 states:
    • pending
    • failed
    • completed
  • With flatMap we can define a Future that is the result of two futures sequenced, the second future computed based on the result of the first one.
  • Future defines many useful methods:
    • Use Future.value() and Future.exception() to create pre-satisfied futures
    • Future.collect(), Future.join() and provide combinators that turn many futures into one (i.e. the gather part of a scatter-gather operation)
  • By default, futures and promises are non-blocking, making use of callbacks instead of typical blocking operations. Scala provides combinators such as flatMap, foreach and filter used to compose futures in a non-blocking method.

Akka (7) -Configuration

There are serval places which we can configure Akka:

  • log level and logger backend
  • enable remote
  • message serializers
  • definition of routers
  • tuning of dispatchers

Two important concepts we need to understand when we do configuration:

  • Throughput
    It defines the number of messages that are processed in a batch before the thread is returned to the pool.
  • parallelism factor
    The parallelism factor is used to determine thread pool size using the following formula: ceil (available processors * factor). Resulting size is then bounded by the parallelism-min and parallelism-max values.

Play Framework (11) – Basic Concept

Play Framework is event-driven server. NodeJS is threaded server.

Screenshot 2016-04-21 12.03.51

Non-blocking IO

  • build on top of Netty
    Netty is an asynchronous event-driven network application framework for rapid development of maintainable high performance protocol servers and clients.
  • no sensitivity to downstream slowness
  • easy to parallelize IO and make parallel request easy
  • supports many concurrent and long running connections, enabling websockets, coment, server-sent-events.

JVM Memory Management (2)

This is a study note from “Everything I Ever Learned about JVM Performance Tuning @Twitter“.

  • Adaptive Sizing Policy:
    Throughput collectors an automatically tune themselves:

    • -XX:+UseAdaptiveSizePolicy
    • -XX:+MaxGCPauseMillis=…(i.e. 100)
    • -XX:+GCTimeRatio=…(i.e. 19)
  • tune young generation tool:
    • enable -XX:+PrintGCDetails, -XX:+PrintHeapAtGC, and -XX:+PrintTenuringDistribution
    • watch survivor size
    • watch the tenuring threshold; it might need to tune it to tenure long lived objects faster.
  • some rules:
    • Too many live objects during young generation GC:
      Reduce NewSize, reduce survivor spaces, reduce tenuring threshold.
    • Too many threads:
      Find the minimal concurrency level, or split the service into serval JVMs.
    • Eden should be big enough to hold more than one times transactions. In this case, there is no stop-the-world and through output would be big.
    • Each survivor should be big enough to maintain alive objects and aged objects.
    • Increasing threshold will put long aged objects to old generation asap to release more space to survivor.
  • Now we use 64-bit JVM, 64-bit pointer will cause CPU buffer is smaller than 32-bit pointer. Involving -XX:+UseCompressedOops will compress 64-bit pointer to 32-bit pointer, but it still will use 64-bit memory space.
    Object stored in memory split into 3 parts:

    • header: mark word + klass pointer
      • mark word stores running data for object itself.
      • klass pointer points to the object’s class metadata.
    • instance data:
      Screenshot 2016-04-21 11.14.09
    • padding: 0 <= padding <= 8
      (header + instance data + padding) % 8 == 0