Tuesday, 29 May 2007

Poor Web Standards in WebCT Content/Learning Modules

One annoyance we've encountered a number of times with WebCT (both our "old" WebCT 4 Campus Edition and "new" WebCT Vista) is its surprisingly poor support for basic web standards in its Learning Modules (formerly known as Content Modules).

For those who are unaware, a "Learning Module" is essentially a bundle of learning materials that WebCT aggregates into a tree structure for easy navigation. At its simplest, this allows you to build up some kind of structure from a bunch of disparate standalone resources such as PDF files, PowerPoint presentations, images and suchlike. At its richest, it allows you to build complex learning structures from linked hypertext resources like HTML files. It's at this end of the spectrum that significant flaws start to appear in WebCT's delivery of these modules.

The content we deliver in Physics 1A is highly granular in nature with a lot of links to related material. The core of the matherial is highly mathematical in nature and is deployed in standards compliant XHTML+MathML using our Aardvark Course Content Management System.

While we found it easy to get WebCT to deliver "single pages" of this type of content, we soon realised it was impossible to properly use this type of material inside Learning modules.

How WebCT Learning Modules Work

WebCT classifies content inside Learning Modules as either:
  1. "HTML"
  2. Not "HTML"
Both types of content can be added to the module's Table of Contents but WebCT does different things with them:
  1. HTML content is dynamically altered as it is delivered to add in some JavaScript and rewrite hyperlinks so that the WebCT navigation and breadcrumb frames are all updated correctly. Unfortunately, WebCT does this in a bizarrely cack-handed fashion and breaks all standards-compliant HTML or XHTML in the process. Most people don't notice this as most of the web is made of broken HTML and browsers are therefore very good at handling this kind of stuff by going into something usually called "quirks" mode. But this is a showstoper for us as our mathematically rich content must be delivered as well-formed XHTML+MathML in order for browsers to render it correctly. (It also doesn't help in making your content accessibile).
  2. Non-HTML content is delivered unchanged by WebCT so doesn't get mangled like HTML content and actually displays correctly. However, hyperlinks followed from these pages do not correctly update the WebCT navigation and breadcrumb frames and, for complex bodies of material like ours, this is a huge usability flaw.
As a result, we ended up having to recreate the Learning Module functionality using a lot of client-side JavaScript trickery and link to it from WebCT. (A positive outcome from this is that our content can be deployed to any web server as a rich, fully integrated frameset without requiring any server-side software, which is actually quite nice.)

If you reflect about this for a minute though, this situation is actually absurd: a web-based Virtual Learning Environment that claims to be serious about supporting standards can't even support the most basic web standard - HTML - correctly. The sad thing is, it would be reasonably easy for WebCT to be a bit more sophisticated about its HTML handling and fix this issue. Let's see if it ever happens...

More details on this issue can be found at a short note we wrote.

Wednesday, 23 May 2007

XML Pipelining in Aardvark

One thing I've tried to do with Aardvark is identity where bits of code are truly reusable and factor them out so that they can be used in other projects. From this, we've built up some nice general utility classes for doing stuff with Strings and Objects (everyone else has probably done this too!), some helper classes for doing nice things with XML and a simple framework for doing cheap and efficient databinding (that is, converting Objects to and from XML). These reusable classes are collected under the uk.ac.ed.ph.aardvark.commons package hierarchy.

One such generalisation that's proved really useful is our class for doing XML pipelining (uk.ac.ed.ph.aardvark.commons.xml.XMLPipeline).

What's XML pipelining?

All of the text-based Knowledge Objects in Aardvark are ultimately stored as XML, which is great for representing the underlying structure of the content. (For example, lists, paragraphs, key points, mathematics, ...). On its own, this XML is a bit abstract so needs to be processed to turn it into the various outputs the Aardvark produces (e.g. nice web pages, digital overheads, PDF files). XML pipelining basically works like a traditional factory conveyor belt: the raw XML gets passed along the conveyor belt and gets gradually refined into the target output format. Why do it like this? Well, the factory analogy applies here too. People in a factory generally get very good at doing one thing repetitively and that works with XML pipelining too - we can create "pipeline steps" that do a single thing rather well, and then join all of the required steps together to build up something more complex. This is good for a number of reasons:
  • Breaking a complex process down into steps makes it easier to work with;
  • Individual steps are usually simple so can be verified to work correctly and do their job well;
  • Steps can be reused in related pipelines;
  • Steps are often so general that they can be refined for reused in other projects.

How XML Pipelining works

(Warning: the rest of the post is very geeky!)

An XML pipeline normally consists of 3 components:
  1. A "source": that is, information flowing into the pipeline. In Aardvark, we assume that this is something which generates a stream of SAX events. (e.g. a SAX parser)
  2. Zero of more "handlers": these take incoming SAX events, do stuff to them, and send possibly different SAX events on to the next handler.
  3. A "serializer": This takes incoming SAX events and turns them into some kind of finished article. For example, it might create an XML document file or even use the incoming SAX events to build a Java Object or perform some kind of configuration work.
Not all components are necessary. For example, you can have a pipeline with no serializer. In this case, all of the data will "wash away" as it falls out the bottom of the pipeline. That sounds daft but can be useful if some of the handlers are building up information about the incoming data, such as hunting out hyperlinks or suchlike. An explicit source is also optional: we can simply fire SAX events directly at the first handler in the pipeline. We can also have pipelines with no handlers, which means that the data flowing out will be exactly the same as the data flowing in. Again, this sounds daft but can be a simple way of turning incoming SAX events into an XML document and is used in the Aardvark databinding classes. (The vanilla XML APIs in Java make this more awkward than it should be!)

What kind of handlers can we use?

Handlers generally fall into 2 categories:
  1. A SAX filter. This is a low-level filter that simply receives SAX events, does stuff to them and fires out new SAX events. SAX filters are great if you want to make minor perturbations to a document (e.g. do something to hyperlinks, miss sections out).
  2. An XSLT transform. This lets you make really major changes to the incoming data. In Aardvark, we use these to go from the "raw" document formats to more polished output formats. XSLT is much more expensive than SAX but is often necessary and actually performs very well, especially if you reuse your stylesheets.
It's common for there to be a mixture of these two types of handler in a pipeline. Be aware that most XSLT processors will build a DOM tree from incoming SAX events so it makes sense to group XSLT handlers together and have SAX stuff before and/or after all of the XSLT.

XML pipelining in Java

It's possible and fairly easy to do XML pipelining using the existing Java APIs but it's not quite as nice as it should be. One reason for this is that setting up a pipeline often requires a mixture of the standard SAX API and Java TrAX API (used for XSLT) and, being designed by two completly different bodies, they're not at all alike: a filter handler is represented by the org.xml.sax.XMLFilter interface; an XSLT handler is represented by the javax.xml.transform.sax.TransformerHandler interface. Making the pipeline work consists of configuring each handler to ensure it passes its output on to the next handler in the pipeline and the resulting code can be a bit messy. This is where XMLPipeline comes in.

Our uk.ac.ed.ph.aardvark.commons.xml.XMLPipeline class

The design of XMLPipeline is intentionally simple. (My first stab at this tried too hard to be clever and suffered as a result, so I learned from the mistakes made there!) It follows the 'builder' design pattern and is just a thin wrapper over all of the gubbins we usually need to do pipelining. Its main advantage is that it makes it really easy to assemble a pipeline, making the resulting code very easy to understand and less prone to errors and future changes.

To get started, create a new XMLPipeline(). You can then build the pipeline by adding a number of handlers using zero or more of the following methods:
  1. addFilterStep() lets you add a SAX filter to the pipeline. This is overloaded to accept either a "standard" org.xml.sax.helpers.XMLFilterImpl or a more general "lexical" filter (uk.ac.ed.ph.aardvark.commons.xml.XMLLexicalFilterImpl). The difference between the 2 filters is that the latter also receives information about comments, entities and DTDs.
  2. addTransformStep() lets you add an XSLT transform to the pipeline. This is overloaded to take either an implementation of javax.xml.transform.Source, which locates the stylesheet to be read in or loaded, or a javax.xml.transform.Templates, which is a stylesheet that has already been compiled for reuse.
Calls to these methods simply ensure that each handler gets configured to pass its output to the next handler "downstream".

Once you've added a number of handlers, you can choose to terminate the pipeline as follows:
  1. addSerializer() will serialize the resulting XML into the javax.xml.transform.Result you pass to this method. This is the most common way of terminating the pipeline - passing a javax.xml.transform.stream.StreamResult allows you to save the resulting XML to a String or file, which is a common use scenario.
  2. addTerminalStep() takes a generic SAX org.xml.sax.ContentHandler or org.xml.sax.ext.LexicalHandler and makes that the receiver of the pipeline's output. This can be useful if you want to plug a pipeline into another pipeline or someone else's SAX input.
Once you've added a terminal step, the pipeline will not allow you to add any more handlers. You can also choose not to terminate the pipeline, as mentioned earlier.

Once set up, you can run the pipeline in two ways:
  • Call execute() passing either a java.io.File or org.xml.sax.InputSource. This will parse the incoming XML and pass it to through the pipeline.
  • Call getStep(0) to receive the first handler in the pipeline and fire your own SAX events at it. (This is how our Object -> XML databinding works.)
And that's it! It's nice and easy. The XMLPipeline class also tries to help with any runtime XSLT errors by unravelling any Exceptions that are produced; in normal pipelines they tend to get wrapped up by each step in the pipeline and get lost in stacktrace noise. For other goodies, have a look at the JavaDoc or source.

Friday, 11 May 2007

Learning Content 2.0

There's been a lot of excitement about "Web 2.0" in e-Learning circles over the last year or so. Some of this is undoubtedly hot air but there's a lot of interesting stuff happening and I find the social side of things very interesting. (At least, interesting enough to be trying it all out here!)

One area of e-Learning where Web 2.0 hasn't really permeated yet is in good-old "Learning Content". (This is the name we'll use for the stuff we supply to our students to support their learning.) I've always felt that Learning Content is thought of as untrendy and uninteresting in e-Learning circles and it doesn't get the attention it deserves. There's probably many reasons for this. In some disciplines, Learning Content is not actually all that important and the role of the educator is more concerned with "facilitating" or managing students into finding, curating and assimilating materials from elsewhere, be it the web or the library. Therefore, Learning Content is not seen as important as it's always something that can be imported from elsewhere.

In Physical and Mathematical sciences domains like ours, we've traditionally attached more importance to Learning Content and it's common for this to form a fundamental component of the teaching experience we give to students. Why's this? One reason is that, especially in early undergraduate courses teaching fundamental concepts, we feel that it's important to be very clear about what students need to know and the Learning Content we give them quite explicitly maps out the required depth and breadth profile they are required to follow.

The traditional form of Learning Content in our domain has been good old "Printed Lecture Notes". These are normally produced by lecturers, written in LaTeX and turned into PDF files using the normal LaTeX processing workflow. Early use of the web for teaching involved dumping these files on the web, which is equivalent to the (sadly) ubiquitous "PowerPoint slides on the web" that permeates a lot of so-called e-Learning even to this day. Many Courses haven't gone any further this (and some don't really need to). With Physics 1A, we've gone further by offering richer, more interactive XHTML+MathML-based content for a good few years now. We think it's quite good and the students like it too.

Over the last few months, we've been thinking about how this can be improved. One criticism of the existing content is that it's fairly static and rigid. This was actually intentional as it allows the content to be deployed online (both within or outside VLEs) or offline with virtually no fuss and no effort. We now think this can be done better and that we can make it easier for students to navigate and organise what is actually quite a substantial body of content. So we're going to look at improving things for the students in this area. Example possibilities are letting students build "To Do lists" or attach notes or reminders to pages, better breadcrumb and contextual navigation and revision aids like "Build me a random self test and let me know my score".

We also want to see if we can make it easier for students to "make the content their own".
There are lots of things we can do here. One is to allow students to weave private annotations into the notes, for example, by writing down the conclusion of a rare but important "A ha!" or even "D'oh" moment. Similarly, we could also allow students to make public comments about sections in the notes in order to get feedback or assistance from their peers, making the notes a bit bloggy. Educators could use this feedback to improve notes for subsequent years. This draws on the increased use of Wikis in educational contexts and even things like the MySQL online handbook, which has been around for years now and is often highly praised for allowing the community to build on and enhance the core material within. There are lots of other possibilities which we plan to look into. All of these are made do-able by the way the existing material is constructed and deployed (using Aardvark) so we've already got a good foundation to build on.

Another aspect worth looking at is whether we can exploit students' use of social networking tools within an educational context. Our Physics 1A course usually attracts around 250 students and it's very difficult to form a sense of community in such a large course. If lots of our students currently use Facebook, then it's worth looking at whether we can use Facebook to help bond a little better. Public annotations in notes could very easily link to a little "profile" page for each student that has links to their favourite networking sites. That avoids us having to try to build our own networking tools (which probably won't work as well and won't be used) and is actually quite cheap for us to implement. Will students like this? Or will they resent us encroaching into aspects of their online lives that they consider to be separate from learning?

We're calling all of this "Learning Content 2.0" and will hopefully be able to study this in more detail pending funding becoming available! I'll leave it up to a tag cloud expert to create a formal definition of Learning Content 2.0... I'm off to have lunch, which I've been looking forward to all morning.

Thursday, 10 May 2007

Aardvark 0.13.3 hits the wild!

After a couple of months of fairly intensive work, I finally got the latest release of Aardvark out the door onto our teaching server last month. Yay! Today sees the 3rd update since then, which adds a number of conveniences when creating new Nodes that might benefit people who are new to Aardvark. It also adds a bit of debugging code to diagnose a bit of occasional (but minor) strangeness that has been identified on the teaching server. Hmmm...

I think there will be a few more minor releases over the next few weeks as Aardvark is currently being used in anger by a new user for a brand new Course and it's been quite interesting getting some feedback from newbies. (Actually, I've already found a silly bug in the caching code in the Aardvark Content Manager that will be fixed tomorrow so expect 0.13.4 to appear very soon. D'oh!)

Want to find out what's new in Aardvark? Then read the release notes. If you don't know what Aardvark is, you can find out at our e-Learning site.

Erm, welcome?!

Despite worries about contributing to global warming in the blogosphere, we thought it might be a fun idea to start a blog for our group in the School of Physics at the University of Edinburgh.

What possessed us? Well, it seemed like a good idea at the time! It's also a potentially useful experiment in a number of ways:
  • We're planning to do some work over the next few months concerning the use of social software and so-called "Web 2.0" tools by our students. Attempting to use this kind of stuff ourselves is the best way of finding out how well it all works. (Or doesn't...)
  • It's interesting to see how well we can use tools that are not explicitly provided or supported by the University's computing services for work purposes. How do we integrate them? How do we manage having some of our work stuff scattered over the internet? How do we export stuff from these tools if something else comes along? ...
  • I'm interested in how a Blog is an interesting tool for reflection, learning and discussion within (and outwith) our group.
So, erm, welcome!