FlashRSS Reader Pg.6

source: http://www.thegoldenmean.com

6 — Storing Data in Arrays

Version One: Using Recusion and Arrays

Restating The Objective:

We want to present data using a TextArea component in a Flash Movie, formatted with HTML tags and structured like this:

<headline><a href="LINK">TITLE</a></headline><p>DESCRIPTION</p>

(where the words in all caps above are the actual content from an RSS document). There might be three or thirty entries. The script shouldn’t care. We simply want to target the <item> nodes and from them extract the text data contained in their <title>, <link> and <description> child nodes.

The First Pass

Long ago I read some code posted by Peter Hall for dealing with XML. It was well beyond me at the time but I tucked it away for the day when it would make sense. This project finally forced me to sit down and study the code until it did make sense. It is a great example of using a recursive function to walk the XML node heirarchy. Mr. Hall’s original example was an ActionScript 1.0 prototype; I modified it to conform better to an ActionScript 2.0 method. As the headline suggests, we make one pass through the XML document to harvest the <item> nodes, and push them into an array. The method is as follows:

   private function getNodes(node:XMLNode, name:String):Array {
      var nodes:Array = new Array();
      var c:XMLNode = node.firstChild;
      while (c) {
         if (c.nodeType != 3) {
            if (c.nodeName   == name) {
               nodes.push(c);
            }
         nodes = nodes.concat(getNodes(c, name));
         }
      c = c.nextSibling;
      }
   return nodes;
   }

There is a lot of power in that brief block of code! We will examine it line by line in a moment, but the overall picture of what it does is this: it travels the length of the document’s first node branch to its very end, collecting and storing any nodes that match “name” as it goes. When it reaches the end of one branch, it starts again with the next and so on until there are no more branches to explore. Fortunately most RSS documents have a relatively simple structure, but this recursive approach is capable of scouring even the most convoluted structure.

Notice that the method expects to have two arguments passed to it: an XML or an XMLNode Object to search in, and a string to search for. Let’s say we store our XML Object in a variable we call “_xml”, and we want to find all of the “item” nodes. We would invoke this method by writing:
getNodes(_xml, "item");
That says: look in "_xml" for matches to a node with the name “item”.

Let’s examine what follows line by line:

A new Array is declared and stored in the variable “nodes”
We learned earlier that we reference the content of an XML Object by looking at its first child. So a local variable (“c”) is created to store that first child.
A “while” loop will by definition continue until a condition ceases to remain true. (Remember a few paragraphs back I cautioned against creating recursive scripts with no means for terminating it? This while condition is the brake on our recursive engine.) The loop checks to see if "c" is true. Initially it is, because the XML Node we are about to examine does have a first child.
XML nodes of type 3 are text nodes. We can ignore text nodes at this point, so the next line saves significant time by ignoring any nodes of type 3.
Now we look to see if the current node’s nodeName matches what we are looking for. If it does, push it into the array.
Here comes the recursive part. Ready?
getNodes() calls itself from within itself, passing the current XMLNode as the new parameter! As long as there continue to be child nodes the function will keep calling itself on the current node. That is how it travels the length of one branch to its very end.
The concat() method of the Array Object is interesting. If you have an array and push() another array into it, you wind up with a nested array. concat() allows you to add the contents of an array to another as elements of the first array. An example might make it more clear:
- var smooth:Array = ["circle", "elipse"];
  var sharp:Array = ["rectangle", "triangle"];
  var shapes:Array = smooth.concat(sharp);
  //shapes is now ["circle", "elipse", "rectangle", triangle"];
When the script has gotten to the end of a branch (when there are no more child nodes to examine), the variable “c” is updated to be the next sibling and on it goes until finally there are no more child nodes to examine. At that point the condition based on “c” having child nodes finally returns false and the while loop is terminated.

I just think that is pretty amazing. After one small block of code we end up with an array consisting of all the nodes named “item”. We are far from done at this point however. Now we have to make a second pass to extract the content from the title, link and description nodes. The method for this task is very similar to the one we just examined.

The Second Pass

The first function has found all the <item> nodes and inserted them into an array, but each of these array elements is itself quite a complex XML construct containing the three child nodes we are actually interested in plus quite a bit more. We still need to present the information and to do that we need to pull it out of the “nodes” and extract the text data. It is for this purpose that I wrote another method I called “extractContent()”. This is a modified version of getNodes(). Like getNodes(), it searches the source node recursively for a match to "name", but what it returns when it does find a match is the text content of that node. I don’t feel the need to go line by line since it is so similar to the previous method. Here it is:

  private function extractContent (source:XMLNode, name:String):String {
    var nodeTxt:String = "";
    var c:XMLNode = source.firstChild;
  
    while (c) {
      if (c.nodeType != 3) {
        if (c.nodeName == name) {
          nodeTxt = c.firstChild.nodeValue;    
        }
        nodeTxt += extractContent(c, name);
      }
      c = c.nextSibling;
    }
    return nodeTxt;
  }

To use this, we would define three new arrays to hold specific content. This extractContent() method would be invoked three times to each <item> element: once for everything we want to get. (Since we want to get title, link and description, we call it three times.) Note this in this method we get the actual text rather than a node (using c.firstChild.nodeValue).

Summary

Assuming your head is swimming at this point, here is a summary of how this approach attacks the problem:

There are a total of four arrays: one to hold the <item> nodes, and one each to hold the text contents of the <title>, <link> and <description> child nodes. In the first pass, getNodes() travels the entire document searching for <item> nodes and adding them to an array. In the second pass extractContent() is called on each of those <item> Objects as many times as there are things we are interested in, adding that extracted text content to the other arrays.

Once that second pass has completed it is easy to build the output string which is passed on to the TextArea component for your site’s visitor to read.

Of course there is more to making a Flash movie than this parsing engine. The project files download has a fully functioning RSS reader based on this code which should make perfect sense to you now.

This approach works, and works quite well. It certainly works far better than my first inflexible, rules-based approach. We could stop here and be pleased with ourselves, but it bothered me that the data had to be evaluated multiple times in order to get at what we want, and it bothered me that there was the residue of four arrays which stored redundant information. Was there a way to get the stuff we want in one pass without storing in in arrays along the way? The next page shows a way to do just that.

go to page: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13

--top--