Saturday, 20 June 2009

LaTeX and ASCIIMathML to Content MathML and Maxima syntax

One of the slightly experimental new features included in SnuggleTeX 1.1.0 is the ability to attempt to convert a limited but hopefully useful subset of math mode LaTeX into "more semantic" formats, such as Content MathML and Maxima input syntax. (This release of SnuggleTeX also includes some equally experimental support for trying to do the same thing with the raw output produced by ASCIIMathML.)

The context of this work was the JISC MathAssess project, where we looked at the feasibility of allowing students of "foundation level" mathematics to input mathematics into computer-aided assessment software using "lax" input syntaxes such as LaTeX or ASCIIMathML. This idea was considered as a possible alternative to using Excel-like formats, or requiring students to learn the syntax for Computer Algebra Systems such as Maxima, Maple or Mathematica.

In general, this "up-conversion" approach - going from "low semantics" such as LaTeX to "higher semantics" such as Content MathML - is not possible. Why not? Well, you don't actually have to look far to see why! Consider the mathematical symbol e. In some contexts, this might represent the exponential number 2.718... but it might also represent the identity element in a group, or some physical quantity. So context is clearly important! Another very trivial example is to compare the written mathematical expressions f(x+2) and a(x+2). To someone who has studied any mathematics, the first of these will probably make them think of the function f applied at x+2, whereas the latter will probably be considered as the product of a and x+2. So, again, the underlying context is vitally important but can sometimes be inferred by following and assuming certain conventions. (This is however complicated by the fact that mathematical notations are localised, so notations common in the UK are not necessarily common anywhere else!)

The approach we take is to look at only a very restricted subset of symbols and constructs, using conventions that are considered common, sensible and familiar in the UK which, in fact, covers a pretty reasonable spectrum of the mathematical contexts that we're aiming at. From this base, it is possible to convert the simple, display-oriented Presentation MathML we expect to get from SnuggleTeX and ASCIIMathML into a more semantic Presentation MathML representation that renders the same way, before converting this to Content MathML and then finally into other formats such as Maxima.

More details on the mechanics of this process can be found in the SnuggleTeX documentation under Semantic Up-Conversion. Techy folks interested in the actual implementation might want to know that it's all done using XSLT 2.0, which is well suited to these types of conversions and is an absolute joy to use. You're welcome to rip off our XSLT and perhaps use it as a basis for similar processes, if useful. It's all in the "full" ZIP distribution of SnuggleTeX. Feel free to ask if you want more information...

No comments: