XML Parse by Scala

Recently, I need to do some social media collectors to scrawling data from multiple sources, like twitter, google news, rss, etc. Twitter already has nice java library to get data, named twitter4j. But for other sources, we need to parse the data by myself. Most of them can return xml format back, like google news and rss. So the rest thing is to know how to parse xml, that’s enough. (Note: even though most files use xml, but its detail structure is different sometimes. For example, some use “channel” -> “item” structure, but others are using “entry” structure.)

  1. Load data, here sourceURL is the xml/rss link. For example, RSS/XML EXAMPLE

    1. import scala.xml.XML
      val xml = XML.load(sourceURL)
  2. Extract each tag

    1. val forecast = xml \ "channel" \ "item" \ "forecast"
      // if you don't know its full path, you can use this one
      val forecast = xml \\ "forecast"
  3. Extract each attribute under one tag

    1. val url = xml \ "channel" \ "item" \ "forecast" \ "@url"
      // if you know url attribute is unique, you can use this one to shorten
      val url = xml \\ "@url"
      // convert NodeSeq to String
      val urlString = (forecast \ "@url").text
      // obtain Node label
      val forecaseLabel = forecast.label

Read More:

http://alvinalexander.com/scala/xml-parsing-xpath-extract-xml-tag-attributes

http://alvinalexander.com/scala/how-to-extract-data-from-xml-nodes-in-scala

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s