Tuesday 25 January 2011

Recusrsion -tree walkers

There was these tree walkers fell off a cliff. Boom, boom -splat!  I'll get my coat.

Much data in computers (including this web page) is stored in tree (or graph, a tree is a particular kind of graph) structures.
Tree Structure of an HTML Page

In the diagram above the <html> element is the parent of the <head> element and the <title> element is one of the children of that <head> element.

So how do you find elements in a tree? Say I want to do something to all the <p> paragraph tags (which are at different levels) , the only way I can manipulate them is to check every element and see if it's a <p> tag and then do what I want to do.

Recursion is a classic method of walking a tree elegantly and with a minimal knowledge of the size and configuration of that tree. Recursive functions are ones that call themselves again and again to process data -but (and it'd a big but, bigger than Mr Creosote's) they know when to stop. A simple algorithm to process <p> elements might look like :

    func find_p($element) {
              if($element->name == 'p') {
                   do_stuff()
              }
              while($child = $element->get_child()) {
                     find_p($child);
              }
         }

You would kick this off by calling it with the <html> element, it's not a <p> so it won't do any processing, just get all of <html> element's children in turn and call itself for each of them.for each of those children it will check if they are a <p> and then get each of their children....

If an element has no children then the function just returns, effectively going back up a level.

Hopefully you can see that the function will go down the tree left side first, so <html> -> <head>-><title>, <title> has no children so the function returns to the calling while loop and <head>s next child <link> ...

At no point in the program have I had to know how many children an element has or how wide or deep the tree is.

You can see recursion in action  in the Biomorph program where the method draw_morph() calls itself recursively to create the branching morphs.

NB in the real XML/HTML world you will probably find that the hard work is done for you, if I wanted to process all the <p> tags in a document I'd perhaps use XPath . If you do need to parse an XML document element by element (and I've had to) then recursion would be the way to go.

A word of warning, recursion can be resource heavy, all those function call will reserve resources that won't be freed up until the function returns, if you have a lot of depth and are passing around a lot of data the effect can be significant and your computer could freeze.

No comments:

Post a Comment

linkedin