HXT Arrow Lessons


  1. arrows
  2. document tree
  3. walking a path
  4. non-linear plumbing

Other haskell notes

0. Arrows

Albert Y. C. Lai

What Are Arrows?

Many Haskellers begin by asking "what are arrows?" I won't answer this question. It distracts from the theory and practice.

First Arrow Program

Take a deep breath.

Now we plunge right into our first program using arrows. Don't panic! It is short, and I will explain what it does and why.

So, take a deep breath again, and here we go! You can also download it as lesson-0.hs.

import Text.XML.HXT.Core

play :: Integer -> IO ()
play arg =
    do { results <- runX (dumb arg)
       ; print results

dumb :: Integer -> IOSArrow XmlTree Integer
dumb n =
    arr (const n) <+> arr (const (n+1))
    returnA <+> arr (* 2)
    isA (<= 60)

This program inputs an integer and outputs a list of more integers. If the input is n, the output is [n,2n,n+1,2n+2] --- but with a catch: numbers above 60 are thrown away. This dumb example performs no XML processing, but it helps bootstrap a mental model of arrows in HXT. If you see what's going on in this dumb exercise, the real XML processors will fail to intimidate you! So, please try to enjoy it...

How to enjoy it? At a GHCi prompt, Prelude> :load lesson-0.hs *Main> play 30 Try to run it, run it for various inputs and outputs, modify it for variations, stare at it, look up the arrow and HXT docs for the functions used... until you are thoroughly satisfied or utterly confused. Or just itchingly impatient. Then you're ready for my explanation.

Anatomy of The Program

Now let's examine the program in pieces.

dumb :: Integer -> IOSArrow XmlTree String
dumb 30 :: IOSArrow XmlTree String

I'll spend some time on this type signature first. It is very important to know how to read it because similar type signatures are everywhere among the real XML arrows. I show the types of both dumb itself and after an integer parameter is provided. The latter type says: an HXT arrow that takes a document tree as input and produces strings as output.

In general, most HXT arrows have types of the form IOSArrow x y and it says: an HXT arrow that takes input of type x and produces output of type y. As a first step, you can understand it as a function from x to y. What's more, in the case of HXT, it is a multiple-valued function: internally it produces a list of y's rather than a single y. This is useful in many ways. Beware: I am speaking of HXT arrows specifically here, not arrow types in general; not all arrow types fit this mental model.

(IOSArrow x y is a shorthand for IOStateArrow () x y. More on this in a later lesson.)

In the case of dumb, y is String by our choice, but x has to be XmlTree because we use the HXT function runX to run dumb, and runX wants that. For other arrows, such as those internal to dumb, we are free to choose x, as long as everything fits together.

An HXT arrow is not just a pure function, but also capable of various side effects. We'll have a chance to meet them in later lessons.

Now it's a good time to see what our dumb arrow does.

dumb n =
    arr (const n) ...  -- this line is IOSArrow XmlTree Integer

The job of arr is to convert an ordinary function into an arrow in the most expected way: the arrow behaves like the function. Our function here has type XmlTree->Integer, and so the arrow has type IOSArrow XmlTree Integer. Our function is a constant function, mapping everything to the number 30 (let's say n is 30), and so the arrow outputs 30 under any input. (Yes, we are ignoring the input here.) But an HXT arrow is supposed to output a list. So the actual output is [30].

(Although we don't use the input, you may be itching to know what is in it. This is provided by runX, and it is an XmlTree document tree consisting of one root node with no child.)

    arr (const n) <+> arr (const (n+1))

The job of <+>, in HXT, is to run two arrows with the same input and concatenate the two output lists. (Again, this mental model is specific to HXT and does not apply to all arrow types.) Thus, the arrow on the left produces [30], and the arrow on the right produces [31], and so the whole line produces [30,31].

    arr (const n) <+> arr (const (n+1))

The job of >>> is to chain up arrows. If you write f>>>g, the output of f (upstream) becomes the input of g (downstream). But wait, f outputs a list, and g takes only one item, what's going on? Answer: g will be run multiple times, once for each item in the output list of f; furthermore, the output lists from the multiple runs of g are concatenated to form one big output list. Whew! (If this reminds you of the list monad, yes it's the same deal.) If this is still unclear, it will become apparent when we look at one more line of code for concreteness:

    arr (const n) <+> arr (const (n+1))
    returnA <+> arr (* 2)  -- :: IOSArrow Integer Integer

So here I have to explain two things: what the downstream does in isolation, and what the chaining does in whole.

What the downstream does: returnA passes the input to the output without change (except the output has to be a list), so if you input 30 you get [30]. arr (* 2) multiplies the input by 2, so if you input 30 you get [60]. Combining these two with <+>, if you input 30 you get [30,60].

What the chaining does: The upstream emits [30,31]. Give 30 to the downstream, get [30,60]; give 31 to the downstream, get [31,62]. Combine, get [30,60,31,62]. Whew!

So up to now you have a pretty good idea why I promised the program to output [n,2n,n+1,2n+2]. But I also promised to throw numbers above 60 away, so let's see how.

    ... something outputting four numbers
    isA (<= 60)  -- :: IOSArrow Integer Integer

isA tests the input against the given predicate, in this case (<= 60): if the test passes, the input is passed to the output; if the test fails, the output list is empty. So for example, with input 30, the output is [30]; with input 62, the output is []. Combined with >>>, the effect is letting through certain inputs and blocking others. E.g., if the upstream gives [30,60,31,62], the downstream is run four times, once for each of the numbers, and the outputs are [30], [60], [31], and [], respectively; combining, the result is [30,60,31].

Now that we see what the dumb arrow can do, let's see how to bring it to life.

play :: Integer -> IO ()
play arg =
    do { results <- runX (dumb arg)
       ; print results

runX is the most convenient way to execute HXT arrows. Its type signature says alot: runX :: IOSArrow XmlTree y -> IO [y] It executes an HXT arrow and brings results back to the IO world. Since an HXT arrow outputs a list internally, the IO world receives a list too. The input type of the arrow has to be XmlTree, and the input value is provided by runX. Usually this input goes unused (e.g., the arrow will read an XML file elsewhere instead), so we will pay little attention to it. (But again, if you're curious, it's a document tree with a root and no child.)

Our dumb arrow outputs [30,60,31] (if n is 30). runX runs our arrow and ends up with that list. Then we print it and that's what we get.

I encourage you to experiment with various modifications to this program to increase and verify your understanding. Add more stuff or block more stuff in the arrow, for example.

New Friends from This Lesson

Name from Module Summary
arr Control.Arrow (GHC) converts an ordinary function to an arrow
returnA Control.Arrow (GHC) no-op pass-through arrow
>>> Control.Arrow (GHC) chains arrows
<+> Control.Arrow (GHC) fission
isA Control.Arrow.ArrowList (HXT) filtering by a predicate
runX Text.XML.HXT.Arrow.XmlStateArrow executes an HXT arrow and brings results back to the IO world