Scala (15): Regular Expressions

In Scala, it supports regular expressions and provides API to implement it. Here we give example to show them one by one.

findFirstIn

import scala.util.matching.Regex
val pattern = "Scala".r // <=> val pattern = new Regex("Scala")
val str = "Scala is very cool"
val result = pattern findFirstIn str
result match {
  case Some(v) => println(v)
  case _ =>
} // output: Scala

Please note: findFirstIn returns a Option[T], so you need to use pattern matching to get real value.

findAllIn

import scala.util.matching.Regex
val pattern = "Scala".r
val str = "Scala is very cool"
val result = (pattern findAllIn str).mkString(",")
println(result) // Output: Scala

Please note: findAllIn returns non-empty iterator, you can use mkString to collect.

replaceFirstIn/replaceAllIn

import scala.util.matching.Regex
val pattern = "Scala".r
val str = "Scala is very cool "
val result = pattern replaceFirstIn(str, "Java") // String = Java is very cool
val result1 = pattern replaceAllIn(str, "Java") // String = Java is very cool

Please note: both replaceFirstIn and replaceAllIn return String.

Pattern Matching

val date = """(\d\d\d\d)-(\d\d)-(\d\d)""".r
"2014-11-20" match {
  case date(year, month, day) => "hello"
} // output: hello
val pattern = "(boo|foo).*".r
"boo123" match {
  case pattern(m) => m
  case _ => "no match"
} // output: boo
val pattern = "(boo|foo).{4}".r
"boo1234" match {
  case pattern(m) => m
  case _ => "no match"
} // output: boo

So in Scala, regular expression always combines with pattern matching to finish some tasks. And then, we do summary of some useful regular expressions.

  1. ^the” : a string which begins with “the”
  2. “of material$“: a string which ends with “of material”
  3. ^abc$“: a string which starts with “abc” and ends with “abc”, so exactly the string is “abc”
  4. “ab*“: a string which starts with “a”, and following is 0 or n-more “b”, like “a”, “ab”, “abb”, etc
  5. “ab+“: a string which starts with “a”, and following is 1-more “b”, like “ab”, “abb”, etc
  6. “ab?“: a string which starts with “a”, and following is 0 or 1 “b”, like “a”, “ab”
  7. “a?b+$“: a string which starts with 0 or 1 “a”, following is 1-more “b” as end
  8. “ab{2}“: there must be 2 “b”, exactly. “abb”
  9. “ab{2,}“: there must be 2-more “b”, like “abb”, “abbb”, etc
  10. “ab{3,5}“: there must be 3-5 “b”, like “abbb”, “abbbb”, “abbbbb”
  11. “a(bc)*“: there must be 0-more “bc”
  12. “hi|hello”: “hi” or “hello”
  13. “(b|cd)ef”: “bef” or “cdef”
  14. “.” can be expressed as any char, except ‘\n’
  15. “a.[0-9]“: “a” combines with any char and any number from 0 to 9

So, we only need to know important chars’s meaning, like: “^“, “$“, “?“, “.“, “+“, “{}“, “|“, “[]” and “*“. That’s enough.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s