Server Protection and Monitor

As we all know, there is a huge DDoS attack recently which influences lots of websites. At first beginning, our server didn’t get influenced by it. But last week, I allow ssh server by password.(In the past, we only allow the user to use the public key to ssh server. But last week, we just want to allow a user to log in simply and fast. We open it temporarily and forget to close it) This week, the server is totally attacked. Anyway, the final problem is that our server is blocked by increasing useless threads and bandwidth is used up. So here I list how I find these problems and how we try to fix it.

Step1: check network status

According to check network status, you will find bandwidth is too high which influences other normal customers to use this server’s resources.

sudo apt-get install nethogs
sudo nethogs
sudo nethogs eth0 eth1

Step2: find exact threads

Except knowing the bandwidth status, you also need to know thread status. By viewing real-time thread status, you will know there are many malicious threads which take too many resources. For my case, I find there are 300 malicious threads which are created every 2 minutes. It is not hard to understand that the final server is slow enough to undertake these increasing malicious threads.

sudo apt-get install htop

Step3: kill useless threads

After known these malicious threads, we need to kill them. In fact, killing them can solve this problem by root cause. Because for now I only know its effect, not the root cause. Luckily we find a bash script which causes it. So I kill this bash script together. Until now, it looks like we already finish everything. But things haven’t done. After 8 hours stable network, the server is attacked again. I can’t ssh into the server. So final solution is to shut down and rebuild the server. So for now, I don’t know the root cause.

sudo kill -9 $(pgrep <useless_threads_main_name>)

Step4: add your own public key to ssh

In order to avoid the attack happens again, I involve public key back. Because I’m sure this attack happened when I open password login.

ssh <user_name>@<server_ip> 'mkdir -p ~/.ssh;cat >> ~/.ssh/authorized_keys' < ~/.ssh/

Here is your own public key file.

Step5: disable password to ssh

sudo vi /etc/ssh/sshd_config
# change to no to disable tunnelled clear text passwords
PasswordAuthentication no
PubkeyAuthentication yes
service ssh restart


Using high-level security configuration is needed to avoid this kind of attack. Once this malicious attack happens, the best way is to backup your data as soon as possible and rebuild your server. (During investigation process, I also install several kinds of maldetect tools. I’m trying to use these tools to scan out the malicious scripts/codes/software. But unfortunately, they all failed.)

I still need more knowledge to help me understand and find out the root cause. Keep learning.


Performance Control

There are several ways to do performance control. In past blogs, I already mentioned different ways to see your server’s performance and how to tune it. In this blog, we focus on automatically control of the performance.

As we know, the more transactions come, the more pressure the server undertakes. In this case, we need an automatical way to know the server’s status. Here we introduce two ways, one is crontab, one is loop.


  • several important commands which we need to know.
    // edit crontab script
    crontab -e 
    // list active crontabs
    crontab -l
    // view log file to check crontab's status
    sudo grep cron /var/log/syslog


  • Step1: you need to write a script, for example its name is
    nohup sh -c `while true; do <your_script>.sh >> <your_log>.txt; sleep 1800; done` &

    Here 1800’s unit is seconds, so it is equal to 30 minutes.
    Please note, here <your_script>.sh is the real content which you want to do, not
    In this line, we use three important commands:

    • nohup
      It makes your job keep running in the background when the process gets sighup.
    • &
      It essentially returns control to you immediately and allows the command to complete in the background.
    • while …  do …  done
    • sleep 1800
  • Step2: run this script.
  • Step3: check your log, you will see what you want to print out.
    vi <your_log>.txt

Text Extraction From URL by Scala

In this post, we talk about how to extract text from URL. Please note, we not involve special pages (e.g. Facebook posts, Facebook comments, etc) into this talk. But in another post, I will write a solution for Facebook posts extracting.

We know there are many types for a url, like text/*, application/xml, application/xhtml+xml, application/pdf, image, etc. In this post, we only support the types which we list.

There are three parts for the code snippet, one is for  text/*, application/xml, application/xhtml+xml, see Case 1, the other is for application/pdf, see Case 2. The other is for image, like png, jpg, etc. see Case 3.

Case 1:

import org.jsoup.Jsoup
val doc = Jsoup.connect(<your_url>).get()

According to Jsoup, we get doc, but within the doc, there are many useless elements, like footer, header, etc. In fact, we don’t need them, we just want to obtain pure meaningful content. So here we do some filters. Please note, we all know we can’t filter all, because we don’t know which part is useful, which is not. What we can do is to try all our best to remove common known useless parts. 

import org.jsoup.nodes.Document
private def getTextByDoc(doc: Document): String = {


  val paragraphs ="p")
  val divs ="div")

  paragraphs.text() + divs.text()

Case 2:

For pdf url, it is a little complex. First we need to get its content type to make sure it is “application/pdf” and then we create a local temporary file and then to extract local pdf to obtain pure text. Finally, we delete this temporary file.

val url = new URL(<your_url>)
val conn = url.openConnection()
val contentType = conn.getContentType

contentType match {
  case "application/pdf" =>
    val fileName = Random.alphanumeric.take(5).mkString + ".pdf"
    url #> new File(fileName) !!
    val texts = getTextFromPDF(None, None, fileName)
    val of = new File(fileName)
  case _ => None

Here is to extract text from local pdf file. Here because I don’t know its start page and end page, I just skip it. By default, it will fetch all.

import org.apache.pdfbox.pdmodel.PDDocument
import org.apache.pdfbox.util.PDFTextStripper
private def getTextFromPDF(startPage: Option[Int], endPage: Option[Int], fileName: String): Option[String] = {
  try {
    val pdf = PDDocument.load(new File(fileName))
    val stripper = new PDFTextStripper()
    startPage match {
      case Some(startInt) => stripper.setStartPage(startInt)
      case None =>
    endPage match {
      case Some(endInt) => stripper.setEndPage(endInt)
      case None =>
  } catch {
    case e: Throwable => None

Case 3:

For image, it involves into a new technology, named ‘OCR’ which can help to parse image’s content. So we need a java-ocr-api into system.


In build.sbt to add one line to add dependence.

libraryDependencies += "com.asprise.ocr" % "java-ocr-api" % "[15,)"


To import library:

import com.asprise.ocr.Ocr


Here is the code snippet to show how to implement it. Please note: here <your_file> is a File type. If you only have fileName/filePath, you need to use new File(<file_name>) to convert it. 

try {
  // Image
  val ocr = new Ocr
  ocr.startEngine("eng", Ocr.SPEED_FASTEST)
  val files = List(<your_file>)
  val outputString = ocr.recognize(files.toArray, Ocr.RECOGNIZE_TYPE_ALL, Ocr.OUTPUT_FORMAT_PLAINTEXT)
} catch {
  case e: Exception => None // todo: to support multiple file types

Play Framework(12)-template engine


Templates are complied as standard Scala functions, following a simple naming convention. If you create a views/Application/index.scala.html template file, it will generate a views.html.Application.index class that has an apply() method.

Special Character :

Scala template uses @ as the single special character. Every time this character is encountered, it indicates the beginning of a dynamic statement. If you want to insert a multi-token statement, explicitly mark it using brackets/curly brackets. Because @ is a special character, you’ll sometimes need to escape it by using @@.

Make sure that { is on the same line with for to indicate that expression continues to next line.


A template is like a function, so it needs parameters, which must be declared at the top of the template file.

You can write server side block comments in templates using @@.

Dynamic content parts are escaped according to the template type’s (e.g. HTML or XML) rules. If you want to output a raw content fragment, wrap it in the template content type.



  • left join = A
    select <select_list> from tableA A left join tableB B on A.key=B.key
  • inner join = (common part between A and B)
    select <select_list> from tableA A inner join tableB B on A.key=B.key
  • right join = B
    select <select_list> from tableA A right join tableB B on A.key=B.key
  • A – (common part between A and B)
    select <select_list> from tableA A left join tableB B on A.key=B.key 
    where B.key is NULL
  • B – (common part between A and B)
    select <select_list> from tableA A right join tableB B on A.key=B.key
    where A.key is NULL
  • A + B
    select <select_list> from tableA A full outer join tableB B on 
  • A + B – (common part between A and B)
    select <select_list> from tableA A full outer join tableB B on 
    A.key=B.key where A.key is NULL or B.key is NULL

Scala (20) – Execution Context

Execution Context:

  • An ExecutionContext is similar to an Executor: it is free to execute computations in a new thread, in a pooled thread or in the current thread (although executing the computation in the current thread is discouraged)

The Global Execution Context:

  • is an ExecutionContext backed by a ForkJoinPool. It should be sufficient for most situations but requires some care.
    A ForkJoinPool manages a limited amount of threads (the maximum amount of thread being referred to as parallelism level). The number of concurrently blocking computations can exceed the parallelism level only if each blocking call is wrapped inside a blocking call. Otherwise, there is a risk that the thread pool in the global execution context is starved, and no computation can process.
  • By default, the sets the parallelism level of its underlying fork-join-pool to the amount of available processors (Runtime.availableProcessors).  This configuration can be overridden by  setting the following VM attributes: scala.concurrent.context.minThreads, scala.concurrent.context.numThreads, scala.concurrent.context.maxThreads.

Thread Pool:

  • If each incoming request results in a multitude of requests to get another tier of systems, in these systems, thread pools must be managed so that they are balanced according to the ratios of requests in each tier: mismanagement of one thread pool bleeds into another.

Scala (19) – Futures


  • They hold the promise for the result of a computation that is not yet complete. They are a simple container- a placeholder. A computation could fail of course, and this must also be encoded. a Future can be in exactly one of 3 states:
    • pending
    • failed
    • completed
  • With flatMap we can define a Future that is the result of two futures sequenced, the second future computed based on the result of the first one.
  • Future defines many useful methods:
    • Use Future.value() and Future.exception() to create pre-satisfied futures
    • Future.collect(), Future.join() and provide combinators that turn many futures into one (i.e. the gather part of a scatter-gather operation)
  • By default, futures and promises are non-blocking, making use of callbacks instead of typical blocking operations. Scala provides combinators such as flatMap, foreach and filter used to compose futures in a non-blocking method.