1 package Text::ASCIIMathML;
3 # The Text::ASCIIMathML module is copyright (c) 2006 Mark Nodine,
4 # USA. All rights reserved.
6 # You may use and distribute them under the terms of either the GNU
7 # General Public License or the Artistic License, as specified in the
10 # Included in the MultiMarkdown package by Fletcher T. Penney
12 # $Id: ASCIIMathML.pm 464 2007-12-16 15:34:39Z fletcher $
14 # MultiMarkdown Version 2.0.b5
20 Text::ASCIIMathML - Perl extension for parsing ASCIIMathML text into MathML
24 use Text::ASCIIMathML;
26 $parser=new Text::ASCIIMathML();
28 $parser->SetAttributes(ForMoz => 1);
30 $ASCIIMathML = "int_0^1 e^x dx";
31 $mathML = $parser->TextToMathML($ASCIIMathML);
32 $mathML = $parser->TextToMathML($ASCIIMathML, [title=>$ASCIIMathML]);
33 $mathML = $parser->TextToMathML($ASCIIMathML, undef, [displaystyle=>1]);
35 $mathMLTree = $parser->TextToMathMLTree($ASCIIMathML);
36 $mathMLTree = $parser->TextToMathMLTree($ASCIIMathML, [title=>$ASCIIMathML]);
37 $mathMLTree = $parser->TextToMathMLTree($ASCIIMathML,undef,[displaystyle=>1]);
39 $mathML = $mathMLTree->text();
40 $latex = $mathMLTree->latex();
44 Text::ASCIIMathML is a parser for ASCIIMathML text which produces
45 MathML XML markup strings that are suitable for rendering by any
46 MathML-compliant browser.
48 The parser uses the following attributes which are settable through
49 the SetAttributes method:
55 Specifies that the fonts should be optimized for Netscape/Mozilla/Firefox.
59 The output of the TextToMathML method always follows the schema
60 <math><mstyle>...</mstyle></math>
61 The first argument of TextToMathML is the ASCIIMathML text to be
62 parsed into MathML. The second argument is a reference to an array of
63 attribute/value pairs to be attached to the <math> node and the third
64 argument is a reference to an array of attribute/value pairs for the
65 <mstyle> node. Common attributes for the <math> node are "title" and
66 "xmlns"=>"&mathml;". Common attributes for the <mstyle> node are
67 "mathcolor" (for text color), "displaystyle"=>"true" for using display
68 style instead of inline style, and "fontfamily".
70 =head2 ASCIIMathML markup
72 The syntax is very permissive and does not generate syntax
73 errors. This allows mathematically incorrect expressions to be
74 displayed, which is important for teaching purposes. It also causes
75 less frustration when previewing formulas.
77 If you encode 'x^2' or 'a_(mn)' or 'a_{mn}' or '(x+1)/y' or 'sqrtx',
78 you pretty much get what you expect. The choice of grouping
79 parenthesis is up to you (they don't have to match either). If the
80 displayed expression can be parsed uniquely without them, they are
81 omitted. Most LaTeX commands are also supported, so the last two
82 formulas above can also be written as '\frac{x+1}{y}' and '\sqrt{x}'.
84 The parser uses no operator precedence and only respects the grouping
85 brackets, subscripts, superscript, fractions and (square) roots. This
86 is done for reasons of efficiency and generality. The resulting MathML
87 code can quite easily be processed further to ensure additional
88 syntactic requirements of any particular application.
92 Here is a definition of the grammar used to parse
93 ASCIIMathML expressions. In the Backus-Naur form given below, the
94 letter on the left of the C<::=> represents a category of symbols that
95 could be one of the possible sequences of symbols listed on the right.
96 The vertical bar C<|> separates the alternatives.
100 c ::= [A-z] | numbers | greek letters | other constant symbols
102 u ::= 'sqrt' | 'text' | 'bb' | other unary symbols for font commands
103 b ::= 'frac' | 'root' | 'stackrel' | 'newcommand' | 'newsymbol'
105 l ::= ( | [ | { | (: | {: left brackets
106 r ::= ) | ] | } | :) | :} right brackets
107 S ::= c | lEr | uS | bSS | "any" simple expression
108 E ::= SE | S/S |S_S | S^S | S_S^S expression (fraction, sub-,
109 super-, subsuperscript)
113 =head3 The translation rules
115 Each terminal symbol is translated into a corresponding MathML
116 node. The constants are mostly converted to their respective Unicode
117 symbols. The other expressions are converted as follows:
121 lSr -> <mrow>lSr</mrow>
122 (note that any pair of brackets can be used to
123 delimit subexpressions, they don't have to match)
124 sqrt S -> <msqrt>S'</msqrt>
125 text S -> <mtext>S'</mtext>
126 "any" -> <mtext>any</mtext>
127 frac S1 S2 -> <mfrac>S1' S2'</mfrac>
128 root S1 S2 -> <mroot>S2' S1'</mroot>
129 stackrel S1 S2 -> <mover>S2' S1'</mover>
130 S1/S2 -> <mfrac>S1' S2'</mfrac>
131 S1_S2 -> <msub>S1 S2'</msub>
132 S1^S2 -> <msup>S1 S2'</msup>
133 S1_S2^S3 -> <msubsup>S1 S2' S3'</msubsup> or
134 <munderover>S1 S2' S3'</munderover> (in some cases)
138 In the rules above, the expression C<S'> is the same as C<S>, except that if
139 C<S> has an outer level of brackets, then C<S'> is the expression inside
144 A simple syntax for matrices is also recognized:
146 l(S11,...,S1n),(...),(Sm1,...,Smn)r
148 l[S11,...,S1n],[...],[Sm1,...,Smn]r.
150 Here C<l> and C<r> stand for any of the left and right
151 brackets (just like in the grammar they do not have to match). Both of
152 these expressions are translated to
154 <mrow>l<mtable><mtr><mtd>S11</mtd>...
155 <mtd>S1n</mtd></mtr>...
156 <mtr><mtd>Sm1</mtd>...
157 <mtd>Smn</mtd></mtr></mtable>r</mrow>.
159 Note that each row must have the same number of expressions, and there
160 should be at least two rows.
162 LaTeX matrix commands are not recognized.
166 The input formula is broken into tokens using a "longest matching
167 initial substring search". Suppose the input formula has been
168 processed from left to right up to a fixed position. The longest
169 string from the list of constants (given below) that matches the
170 initial part of the remainder of the formula is the next token. If
171 there is no matching string, then the first character of the remainder
172 is the next token. The symbol table at the top of the ASCIIMathML.js
173 script specifies whether a symbol is a math operator (surrounded by a
174 C<< <mo> >> tag) or a math identifier (surrounded by a C<< <mi> >>
175 tag). For single character tokens, letters are treated as math
176 identifiers, and non-alphanumeric characters are treated as math
177 operators. For digits, see "Numbers" below.
179 Spaces are significant when they separate characters and thus prevent
180 a certain string of characters from matching one of the
181 constants. Multiple spaces and end-of-line characters are equivalent
186 A string of digits, optionally followed by a decimal point (a period)
187 and another string of digits, is parsed as a single token and
188 converted to a MathML number, i.e., enclosed with the C<< <mn> >>
195 =item Lowercase letters
197 C<alpha> C<beta> C<chi> C<delta> C<epsilon> C<eta> C<gamma> C<iota>
198 C<kappa> C<lambda> C<mu> C<nu> C<omega> C<phi> C<pi> C<psi> C<rho>
199 C<sigma> C<tau> C<theta> C<upsilon> C<xi> C<zeta>
201 =item Uppercase letters
203 C<Delta> C<Gamma> C<Lambda> C<Omega> C<Phi> C<Pi> C<Psi> C<Sigma>
208 C<varepsilon> C<varphi> C<vartheta>
212 =head3 Standard functions
214 sin cos tan csc sec cot sinh cosh tanh log ln det dim lim mod gcd lcm
217 =head3 Operation symbols
219 Type Description Entity
226 xx Cross product ×
227 -: Divided by ÷
228 @ Compose functions ∘
229 o+ Circle with plus ⊕
230 ox Circle with x ⊗
231 o. Circle with dot ⊙
232 sum Sum for sub- and superscript ∑
233 prod Product for sub- and superscript ∏
235 ^^^ Logic "and" for sub- and superscript ⋀
237 vvv Logic "or" for sub- and superscript ⋁
238 nn Logic "intersect" ∩
239 nnn Logic "intersect" for sub- and superscript ⋂
240 uu Logic "union" ∪
241 uuu Logic "union" for sub- and superscript ⋃
243 =head3 Relation symbols
245 Type Description Entity
250 <= Less than or equal ≤
251 >= Greater than or equal ≥
252 -lt Precedes ≺
253 >- Succeeds ≻
255 !in Not an element of ∉
258 sube Subset or equal ⊆
259 supe Superset or equal ⊇
260 -= Equivalent ≡
261 ~= Congruent to ≅
262 ~~ Asymptotically equal to ≈
263 prop Proportional to ∝
265 =head3 Logical symbols
267 Type Description Entity
273 iff If and only if ⇔
275 EE There exists ∃
276 _|_ Perpendicular, bottom ⊥
278 |-- Right tee ⊢
279 |== Double right tee ⊨
281 =head3 Grouping brackets
283 Type Description Entity
290 (: Left angle bracket ⟨
291 :) Right angle bracket ⟩
292 {: Invisible left grouping element
293 :} Invisible right grouping element
295 =head3 Miscellaneous symbols
297 Type Description Entity
299 oint Countour integral ∮
300 del Partial derivative &del;
301 grad Gradient ∇
302 +- Plus or minus ±
305 aleph Hebrew letter aleph ℵ
307 :. Therefore ∴
309 cdots Three centered dots ⋯
310 \<sp> Non-breaking space (<sp> means space)
311 quad Quad space
312 diamond Diamond ⋄
313 square Square □
314 |__ Left floor ⌊
315 __| Right floor ⌋
316 |~ Left ceiling ⌈
317 ~| Right ceiling ⌉
318 CC Complex numbers ℂ
319 NN Natural numbers ℕ
320 QQ Rational numbers ℚ
321 RR Real numbers ℝ
326 Type Description Entity
328 darr Down arrow ↓
329 rarr Right arrow →
330 -> Right arrow →
331 larr Left arrow ←
332 harr Horizontal (two-way) arrow ↔
333 rArr Right double arrow ⇒
334 lArr Left double arrow ⇐
335 hArr Horizontal double arrow ⇔
339 Type Description Output
340 hat x Hat over x <mover><mi>x</mi><mo>^</mo></mover>
341 bar x Bar over x <mover><mi>x</mi><mo>¯</mo></mover>
342 ul x Underbar under x <munder><mi>x</mi><mo>_</mo></munder>
343 vec x Right arrow over x <mover><mi>x</mi><mo>→</mo><mover>
344 dot x Dot over x <mover><mi>x</mi><mo>.</mo><mover>
345 ddot x Double dot over x <mover><mi>x</mi><mo>..</mo><mover>
351 bbb A Double-struck A
352 cc A Calligraphic (script) A
353 tt A Teletype (monospace) A
357 =head3 Defining new commands and symbols
359 It is possible to define new commands and symbols using the
360 'newcommand' and 'newsymbol' binary operators. The former defines a
361 macro that gets expanded and reparsed as ASCIIMathML and the latter
362 defines a constant that gets used as a math operator (C<< <mo> >>)
363 element. Both of the arguments must be text, optionally enclosed in
364 grouping operators. The 'newsymbol' operator also allows the
365 second argument to be a group of two text strings where the first is
366 the mathml operator and the second is the latex code to be output.
368 For example, 'newcommand "DDX" "{:d/dx:}"' would define a new command
369 'DDX'. It could then be invoked like 'DDXf(x)', which would
370 expand to '{:d/dx:}f(x)'. The text 'newsymbol{"!le"}{"≰"}'
371 could be used to create a symbol you could invoke with '!le', as in 'a
374 =head2 Attributes for <math>
380 The title attribute for the element, if specified. In many browsers,
381 this string will appear if you hover over the MathML markup.
385 The id attribute for the element, if specified.
389 The class attribute for the element, if specified.
393 =head2 Attributes for <mstyle>
397 =item C<displaystyle>
399 The displaystyle attribute for the element, if specified. One of the
400 values "true" or "false". If the displaystyle is false, then fractions
401 are represented with a smaller font size and the placement of
402 subscripts and superscripts of sums and integrals changes.
406 The mathvariant attribute for the element, if specified. One of the
407 values "normal", "bold", "italic", "bold-italic", "double-struck",
408 "bold-fraktur", "script", "bold-script", "fraktur", "sans-serif",
409 "bold-sans-serif", "sans-serif-italic", "sans-serif-bold-italic", or
414 The mathsize attribute for the element, if specified. Either "small",
415 "normal" or "big", or of the form "number v-unit".
419 A string representing the font family.
423 The mathcolor attribute for the element, if specified. It be in one of
424 the forms "#rgb" or "#rrggbb", or should be an html-color-name.
426 =item C<mathbackground>
428 The mathbackground attribute for the element, if specified. It should
429 be in one of the forms "#rgb" or "#rrggbb", or an html-color-name, or
430 the keyword "transparent".
433 =head1 BUGS AND SUGGESTIONS
435 If you find bugs, think of anything that could improve Text::ASCIIMathML
436 or have any questions related to it, feel free to contact the author.
440 Mark Nodine <mnodine@alum.mit.edu>
445 <http://www1.chapman.edu/~jipsen/mathml/asciimathsyntax.xml>
447 =head1 ACKNOWLEDGEMENTS
449 This Perl module has been created by modifying Peter Jipsen's
450 ASCIIMathML.js script. He deserves full credit for the original
451 implementation; any bugs have probably been introduced by me.
455 The Text::ASCIIMathML module is copyright (c) 2006 Mark Nodine,
456 USA. All rights reserved.
458 You may use and distribute them under the terms of either the GNU
459 General Public License or the Artistic License, as specified in the
467 our $VERSION = '0.5';
469 # Creates a new Text::ASCIIMathML parser object
472 return bless {}, $class;
475 # Sets an attribute to a given value
476 # Arguments: Attribute name, attribute value
478 # Supported attributes:
479 # ForMoz Boolean to optimize for Netscape/Mozilla/Firefox
480 sub SetAttribute : method {
481 my ($self, $attr, $val) = @_;
482 $self->{attr}{$attr} = $val;
485 # Converts an AsciiMathML string to a MathML one
486 # Arguments: AsciiMathML string,
487 # optional ref to array of attribute/value pairs for math node,
488 # optional ref to array of attribute/value pairs for mstyle node
489 # Returns: MathML string
490 sub TextToMathML : method {
491 my $tree = TextToMathMLTree(@_);
492 return $tree ? $tree->text : '';
495 # Converts an AsciiMathML string to a tree of MathML nodes
496 # Arguments: AsciiMathML string,
497 # optional ref to array of attribute/value pairs for math node,
498 # optional ref to array of attribute/value pairs for mstyle node
499 # Returns: top Text::ASCIIMathML::Node object or undefined
500 sub TextToMathMLTree : method {
501 my ($self, $expr, $mathAttr, $mstyleAttr) = @_;
502 $expr = '' unless defined $expr;
503 my $mstyle = $self->_createElementMathML('mstyle');
504 $mstyle->setAttribute(@$mstyleAttr) if $mstyleAttr;
505 $self->{nestingDepth} = 0;
507 $mstyle->appendChild(($self->_parseExpr($expr, 0))[0]);
508 return unless $mstyle->childNodes > 0;
509 my $math = $self->_createMmlNode('math', $mstyle);
511 $math->setAttribute(@$mathAttr) if $mathAttr;
518 # Creates an Text::ASCIIMathML::Node object with no tag
520 # Returns: node object
521 sub _createDocumentFragment : method {
523 return Text::ASCIIMathML::Node->new($self);
526 # Creates an Text::ASCIIMathML::Node object
528 # Returns: node object
529 sub _createElementMathML : method {
531 return Text::ASCIIMathML::Node->new($self, $t);
534 # Creates an Text::ASCIIMathML::Node object and appends a node as a child
535 # Arguments: tag, node
536 # Returns: node object
537 sub _createMmlNode : method {
538 my ($self, $t, $obj) = @_;
539 my $node = Text::ASCIIMathML::Node->new($self, $t);
540 $node->appendChild($obj);
544 # Creates an Text::ASCIIMathML::Node text object with the given text
546 # Returns: node object
547 sub _createTextNode : method {
548 my ($self, $text) = @_;
549 return Text::ASCIIMathML::Node->newText ($self, $text);
552 # Finds maximal initial substring of str that appears in names
553 # return null if there is none
555 # Returns: matched input, entry from AMSymbol (if any)
556 sub _getSymbol : method {
558 my ($input, $symbol) = $self->_getSymbol_(@_);
559 $self->{previousSymbol} = $symbol->{ttype} if $symbol;
560 return $input, $symbol;
564 # character lists for Mozilla/Netscape fonts
565 my $AMcal = [0xEF35,0x212C,0xEF36,0xEF37,0x2130,0x2131,0xEF38,0x210B,0x2110,0xEF39,0xEF3A,0x2112,0x2133,0xEF3B,0xEF3C,0xEF3D,0xEF3E,0x211B,0xEF3F,0xEF40,0xEF41,0xEF42,0xEF43,0xEF44,0xEF45,0xEF46];
566 my $AMfrk = [0xEF5D,0xEF5E,0x212D,0xEF5F,0xEF60,0xEF61,0xEF62,0x210C,0x2111,0xEF63,0xEF64,0xEF65,0xEF66,0xEF67,0xEF68,0xEF69,0xEF6A,0x211C,0xEF6B,0xEF6C,0xEF6D,0xEF6E,0xEF6F,0xEF70,0xEF71,0x2128];
567 my $AMbbb = [0xEF8C,0xEF8D,0x2102,0xEF8E,0xEF8F,0xEF90,0xEF91,0x210D,0xEF92,0xEF93,0xEF94,0xEF95,0xEF96,0x2115,0xEF97,0x2119,0x211A,0x211D,0xEF98,0xEF99,0xEF9A,0xEF9B,0xEF9C,0xEF9D,0xEF9E,0x2124];
569 # Create closure for static variables
571 "sqrt" => { tag=>"msqrt", output=>"sqrt", tex=>'', ttype=>"UNARY" },
572 "root" => { tag=>"mroot", output=>"root", tex=>'', ttype=>"BINARY" },
573 "frac" => { tag=>"mfrac", output=>"/", tex=>'', ttype=>"BINARY" },
574 "/" => { tag=>"mfrac", output=>"/", tex=>'', ttype=>"INFIX" },
575 "stackrel" => { tag=>"mover", output=>"stackrel", tex=>'', ttype=>"BINARY" },
576 "_" => { tag=>"msub", output=>"_", tex=>'', ttype=>"INFIX" },
577 "^" => { tag=>"msup", output=>"^", tex=>'', ttype=>"INFIX" },
578 "text" => { tag=>"mtext", output=>"text", tex=>'', ttype=>"TEXT" },
579 "mbox" => { tag=>"mtext", output=>"mbox", tex=>'', ttype=>"TEXT" },
580 "\"" => { tag=>"mtext", output=>"mbox", tex=>'', ttype=>"TEXT" },
583 "newcommand" => { ttype=>"BINARY"},
584 "newsymbol" => { ttype=>"BINARY" },
587 "alpha" => { tag=>"mi", output=>"α", tex=>'', ttype=>"CONST" },
588 "beta" => { tag=>"mi", output=>"β", tex=>'', ttype=>"CONST" },
589 "chi" => { tag=>"mi", output=>"χ", tex=>'', ttype=>"CONST" },
590 "delta" => { tag=>"mi", output=>"δ", tex=>'', ttype=>"CONST" },
591 "Delta" => { tag=>"mo", output=>"Δ", tex=>'', ttype=>"CONST" },
592 "epsi" => { tag=>"mi", output=>"ε", tex=>"epsilon", ttype=>"CONST" },
593 "varepsilon" => { tag=>"mi", output=>"ɛ", tex=>'', ttype=>"CONST" },
594 "eta" => { tag=>"mi", output=>"η", tex=>'', ttype=>"CONST" },
595 "gamma" => { tag=>"mi", output=>"γ", tex=>'', ttype=>"CONST" },
596 "Gamma" => { tag=>"mo", output=>"Γ", tex=>'', ttype=>"CONST" },
597 "iota" => { tag=>"mi", output=>"ι", tex=>'', ttype=>"CONST" },
598 "kappa" => { tag=>"mi", output=>"κ", tex=>'', ttype=>"CONST" },
599 "lambda" => { tag=>"mi", output=>"λ", tex=>'', ttype=>"CONST" },
600 "Lambda" => { tag=>"mo", output=>"Λ", tex=>'', ttype=>"CONST" },
601 "mu" => { tag=>"mi", output=>"μ", tex=>'', ttype=>"CONST" },
602 "nu" => { tag=>"mi", output=>"ν", tex=>'', ttype=>"CONST" },
603 "omega" => { tag=>"mi", output=>"ω", tex=>'', ttype=>"CONST" },
604 "Omega" => { tag=>"mo", output=>"Ω", tex=>'', ttype=>"CONST" },
605 "phi" => { tag=>"mi", output=>"ϕ", tex=>'', ttype=>"CONST" },
606 "varphi" => { tag=>"mi", output=>"φ", tex=>'', ttype=>"CONST" },
607 "Phi" => { tag=>"mo", output=>"Φ", tex=>'', ttype=>"CONST" },
608 "pi" => { tag=>"mi", output=>"π", tex=>'', ttype=>"CONST" },
609 "Pi" => { tag=>"mo", output=>"Π", tex=>'', ttype=>"CONST" },
610 "psi" => { tag=>"mi", output=>"ψ", tex=>'', ttype=>"CONST" },
611 "Psi" => { tag=>"mi", output=>"Ψ", tex=>'', ttype=>"CONST" },
612 "rho" => { tag=>"mi", output=>"ρ", tex=>'', ttype=>"CONST" },
613 "sigma" => { tag=>"mi", output=>"σ", tex=>'', ttype=>"CONST" },
614 "Sigma" => { tag=>"mo", output=>"Σ", tex=>'', ttype=>"CONST" },
615 "tau" => { tag=>"mi", output=>"τ", tex=>'', ttype=>"CONST" },
616 "theta" => { tag=>"mi", output=>"θ", tex=>'', ttype=>"CONST" },
617 "vartheta" => { tag=>"mi", output=>"ϑ", tex=>'', ttype=>"CONST" },
618 "Theta" => { tag=>"mo", output=>"Θ", tex=>'', ttype=>"CONST" },
619 "upsilon" => { tag=>"mi", output=>"υ", tex=>'', ttype=>"CONST" },
620 "xi" => { tag=>"mi", output=>"ξ", tex=>'', ttype=>"CONST" },
621 "Xi" => { tag=>"mo", output=>"Ξ", tex=>'', ttype=>"CONST" },
622 "zeta" => { tag=>"mi", output=>"ζ", tex=>'', ttype=>"CONST" },
624 # binary operation symbols
625 "*" => { tag=>"mo", output=>"⋅", tex=>"cdot", ttype=>"CONST" },
626 "**" => { tag=>"mo", output=>"⋆", tex=>"star", ttype=>"CONST" },
627 "//" => { tag=>"mo", output=>"/", tex=>'', ttype=>"CONST" },
628 "\\\\" => { tag=>"mo", output=>"\\", tex=>"backslash", ttype=>"CONST" },
629 "setminus" => { tag=>"mo", output=>"\\", tex=>'', ttype=>"CONST" },
630 "xx" => { tag=>"mo", output=>"×", tex=>"times", ttype=>"CONST" },
631 "-:" => { tag=>"mo", output=>"÷", tex=>"div", ttype=>"CONST" },
632 "@" => { tag=>"mo", output=>"∘", tex=>"circ", ttype=>"CONST" },
633 "o+" => { tag=>"mo", output=>"⊕", tex=>"oplus", ttype=>"CONST" },
634 "ox" => { tag=>"mo", output=>"⊗", tex=>"otimes", ttype=>"CONST" },
635 "o." => { tag=>"mo", output=>"⊙", tex=>"odot", ttype=>"CONST" },
636 "sum" => { tag=>"mo", output=>"∑", tex=>'', ttype=>"UNDEROVER" },
637 "prod" => { tag=>"mo", output=>"∏", tex=>'', ttype=>"UNDEROVER" },
638 "^^" => { tag=>"mo", output=>"∧", tex=>"wedge", ttype=>"CONST" },
639 "^^^" => { tag=>"mo", output=>"⋀", tex=>"bigwedge", ttype=>"UNDEROVER" },
640 "vv" => { tag=>"mo", output=>"∨", tex=>"vee", ttype=>"CONST" },
641 "vvv" => { tag=>"mo", output=>"⋁", tex=>"bigvee", ttype=>"UNDEROVER" },
642 "nn" => { tag=>"mo", output=>"∩", tex=>"cap", ttype=>"CONST" },
643 "nnn" => { tag=>"mo", output=>"⋂", tex=>"bigcap", ttype=>"UNDEROVER" },
644 "uu" => { tag=>"mo", output=>"∪", tex=>"cup", ttype=>"CONST" },
645 "uuu" => { tag=>"mo", output=>"⋃", tex=>"bigcup", ttype=>"UNDEROVER" },
647 # binary relation symbols
648 "!=" => { tag=>"mo", output=>"≠", tex=>"ne", ttype=>"CONST" },
649 ":=" => { tag=>"mo", output=>":=", tex=>'', ttype=>"CONST" },
650 #"lt" => { tag=>"mo", output=>"<", tex=>'', ttype=>"CONST" },
651 "lt" => { tag=>"mo", output=>"<", tex=>'', ttype=>"CONST" },
652 "<=" => { tag=>"mo", output=>"≤", tex=>"le", ttype=>"CONST" },
653 "lt=" => { tag=>"mo", output=>"≤", tex=>"leq", ttype=>"CONST", latex=>1 },
654 ">=" => { tag=>"mo", output=>"≥", tex=>"ge", ttype=>"CONST" },
655 "geq" => { tag=>"mo", output=>"≥", tex=>'', ttype=>"CONST", latex=>1 },
656 "-<" => { tag=>"mo", output=>"≺", tex=>"prec", ttype=>"CONST", latex=>1 },
657 "-lt" => { tag=>"mo", output=>"≺", tex=>'', ttype=>"CONST" },
658 ">-" => { tag=>"mo", output=>"≻", tex=>"succ", ttype=>"CONST" },
659 "in" => { tag=>"mo", output=>"∈", tex=>'', ttype=>"CONST" },
660 "!in" => { tag=>"mo", output=>"∉", tex=>"notin", ttype=>"CONST" },
661 "sub" => { tag=>"mo", output=>"⊂", tex=>"subset", ttype=>"CONST" },
662 "sup" => { tag=>"mo", output=>"⊃", tex=>"supset", ttype=>"CONST" },
663 "sube" => { tag=>"mo", output=>"⊆", tex=>"subseteq", ttype=>"CONST" },
664 "supe" => { tag=>"mo", output=>"⊇", tex=>"supseteq", ttype=>"CONST" },
665 "-=" => { tag=>"mo", output=>"≡", tex=>"equiv", ttype=>"CONST" },
666 "~=" => { tag=>"mo", output=>"≅", tex=>"cong", ttype=>"CONST" },
667 "~~" => { tag=>"mo", output=>"≈", tex=>"approx", ttype=>"CONST" },
668 "prop" => { tag=>"mo", output=>"∝", tex=>"propto", ttype=>"CONST" },
671 "<" => { tag=>"mo", output=>"<", tex=>'', ttype=>"CONST" },
672 "gt" => { tag=>"mo", output=>">", tex=>'', ttype=>"CONST" },
673 ">" => { tag=>"mo", output=>">", tex=>'', ttype=>"CONST" },
676 "and" => { tag=>"mtext", output=>"and", tex=>'', ttype=>"SPACE" },
677 "or" => { tag=>"mtext", output=>"or", tex=>'', ttype=>"SPACE" },
678 "not" => { tag=>"mo", output=>"¬", tex=>"neg", ttype=>"CONST" },
679 "=>" => { tag=>"mo", output=>"⇒", tex=>"implies", ttype=>"CONST" },
680 "if" => { tag=>"mo", output=>"if", tex=>'if', ttype=>"SPACE" },
681 "<=>" => { tag=>"mo", output=>"⇔", tex=>"iff", ttype=>"CONST" },
682 "AA" => { tag=>"mo", output=>"∀", tex=>"forall", ttype=>"CONST" },
683 "EE" => { tag=>"mo", output=>"∃", tex=>"exists", ttype=>"CONST" },
684 "_|_" => { tag=>"mo", output=>"⊥", tex=>"bot", ttype=>"CONST" },
685 "TT" => { tag=>"mo", output=>"⊤", tex=>"top", ttype=>"CONST" },
686 "|--" => { tag=>"mo", output=>"⊢", tex=>"vdash", ttype=>"CONST" },
687 "|==" => { tag=>"mo", output=>"⊨", tex=>"models", ttype=>"CONST" },
690 "(" => { tag=>"mo", output=>"(", tex=>'', ttype=>"LEFTBRACKET" },
691 ")" => { tag=>"mo", output=>")", tex=>'', ttype=>"RIGHTBRACKET" },
692 "[" => { tag=>"mo", output=>"[", tex=>'', ttype=>"LEFTBRACKET" },
693 "]" => { tag=>"mo", output=>"]", tex=>'', ttype=>"RIGHTBRACKET" },
694 "{" => { tag=>"mo", output=>"{", tex=>'', ttype=>"LEFTBRACKET" },
695 "}" => { tag=>"mo", output=>"}", tex=>'', ttype=>"RIGHTBRACKET" },
696 "|" => { tag=>"mo", output=>"|", tex=>'', ttype=>"LEFTRIGHT" },
697 # {input:"||", tag:"mo", output:"||", tex:null, ttype:LEFTRIGHT},
698 "(:" => { tag=>"mo", output=>"〈", tex=>"langle", ttype=>"LEFTBRACKET" },
699 ":)" => { tag=>"mo", output=>"〉", tex=>"rangle", ttype=>"RIGHTBRACKET" },
700 "<<" => { tag=>"mo", output=>"〈", tex=>'langle', ttype=>"LEFTBRACKET" },
701 ">>" => { tag=>"mo", output=>"〉", tex=>'rangle', ttype=>"RIGHTBRACKET" },
702 "{:" => { tag=>"mo", output=>"{:", tex=>'', ttype=>"LEFTBRACKET", invisible=>"true" },
703 ":}" => { tag=>"mo", output=>":}", tex=>'', ttype=>"RIGHTBRACKET", invisible=>"true" },
705 # miscellaneous symbols
706 "int" => { tag=>"mo", output=>"∫", tex=>'', ttype=>"CONST" },
707 "dx" => { tag=>"mi", output=>"{:d x:}", tex=>'', ttype=>"DEFINITION" },
708 "dy" => { tag=>"mi", output=>"{:d y:}", tex=>'', ttype=>"DEFINITION" },
709 "dz" => { tag=>"mi", output=>"{:d z:}", tex=>'', ttype=>"DEFINITION" },
710 "dt" => { tag=>"mi", output=>"{:d t:}", tex=>'', ttype=>"DEFINITION" },
711 "oint" => { tag=>"mo", output=>"∮", tex=>'', ttype=>"CONST" },
712 "del" => { tag=>"mo", output=>"∂", tex=>"partial", ttype=>"CONST" },
713 "grad" => { tag=>"mo", output=>"∇", tex=>"nabla", ttype=>"CONST" },
714 "+-" => { tag=>"mo", output=>"±", tex=>"pm", ttype=>"CONST" },
715 "O/" => { tag=>"mo", output=>"∅", tex=>"emptyset", ttype=>"CONST" },
716 "oo" => { tag=>"mo", output=>"∞", tex=>"infty", ttype=>"CONST" },
717 "aleph" => { tag=>"mo", output=>"ℵ", tex=>'', ttype=>"CONST" },
718 "..." => { tag=>"mo", output=>"...", tex=>"ldots", ttype=>"CONST" },
719 ":." => { tag=>"mo", output=>"∴", tex=>"therefore", ttype=>"CONST" },
720 "/_" => { tag=>"mo", output=>"∠", tex=>"angle", ttype=>"CONST" },
721 "\\ " => { tag=>"mo", output=>" ", tex=>'\,', ttype=>"CONST" },
722 "quad" => { tag=>"mo", output=>"  ", tex=>'', ttype=>"CONST" },
723 "qquad" => { tag=>"mo", output=>"    ", tex=>'', ttype=>"CONST" },
724 "cdots" => { tag=>"mo", output=>"⋯", tex=>'', ttype=>"CONST" },
725 "vdots" => { tag=>"mo", output=>"⋮", tex=>'', ttype=>"CONST" },
726 "ddots" => { tag=>"mo", output=>"⋱", tex=>'', ttype=>"CONST" },
727 "diamond" => { tag=>"mo", output=>"⋄", tex=>'', ttype=>"CONST" },
728 "square" => { tag=>"mo", output=>"□", tex=>'', ttype=>"CONST" },
729 "|__" => { tag=>"mo", output=>"⌊", tex=>"lfloor", ttype=>"CONST" },
730 "__|" => { tag=>"mo", output=>"⌋", tex=>"rfloor", ttype=>"CONST" },
731 "|~" => { tag=>"mo", output=>"⌈", tex=>"lceil", ttype=>"CONST" },
732 "~|" => { tag=>"mo", output=>"⌉", tex=>"rceil", ttype=>"CONST" },
733 "CC" => { tag=>"mo", output=>"ℂ", tex=>'', ttype=>"CONST" },
734 "NN" => { tag=>"mo", output=>"ℕ", tex=>'', ttype=>"CONST" },
735 "QQ" => { tag=>"mo", output=>"ℚ", tex=>'', ttype=>"CONST" },
736 "RR" => { tag=>"mo", output=>"ℝ", tex=>'', ttype=>"CONST" },
737 "ZZ" => { tag=>"mo", output=>"ℤ", tex=>'', ttype=>"CONST" },
738 "f" => { tag=>"mi", output=>"f", tex=>'', ttype=>"UNARY", func=>"true" },
739 "g" => { tag=>"mi", output=>"g", tex=>'', ttype=>"UNARY", func=>"true" },
742 "lim" => { tag=>"mo", output=>"lim", tex=>'', ttype=>"UNDEROVER" },
743 "Lim" => { tag=>"mo", output=>"Lim", tex=>'', ttype=>"UNDEROVER" },
744 "sin" => { tag=>"mo", output=>"sin", tex=>'', ttype=>"UNARY", func=>"true" },
745 "cos" => { tag=>"mo", output=>"cos", tex=>'', ttype=>"UNARY", func=>"true" },
746 "tan" => { tag=>"mo", output=>"tan", tex=>'', ttype=>"UNARY", func=>"true" },
747 "sinh" => { tag=>"mo", output=>"sinh", tex=>'', ttype=>"UNARY", func=>"true" },
748 "cosh" => { tag=>"mo", output=>"cosh", tex=>'', ttype=>"UNARY", func=>"true" },
749 "tanh" => { tag=>"mo", output=>"tanh", tex=>'', ttype=>"UNARY", func=>"true" },
750 "cot" => { tag=>"mo", output=>"cot", tex=>'', ttype=>"UNARY", func=>"true" },
751 "sec" => { tag=>"mo", output=>"sec", tex=>'', ttype=>"UNARY", func=>"true" },
752 "csc" => { tag=>"mo", output=>"csc", tex=>'', ttype=>"UNARY", func=>"true" },
753 "log" => { tag=>"mo", output=>"log", tex=>'', ttype=>"UNARY", func=>"true" },
754 "ln" => { tag=>"mo", output=>"ln", tex=>'', ttype=>"UNARY", func=>"true" },
755 "det" => { tag=>"mo", output=>"det", tex=>'', ttype=>"UNARY", func=>"true" },
756 "dim" => { tag=>"mo", output=>"dim", tex=>'', ttype=>"CONST" },
757 "mod" => { tag=>"mo", output=>"mod", tex=>'', ttype=>"CONST" },
758 "gcd" => { tag=>"mo", output=>"gcd", tex=>'', ttype=>"UNARY", func=>"true" },
759 "lcm" => { tag=>"mo", output=>"lcm", tex=>'', ttype=>"UNARY", func=>"true" },
760 "lub" => { tag=>"mo", output=>"lub", tex=>'', ttype=>"CONST" },
761 "glb" => { tag=>"mo", output=>"glb", tex=>'', ttype=>"CONST" },
762 "min" => { tag=>"mo", output=>"min", tex=>'', ttype=>"UNDEROVER" },
763 "max" => { tag=>"mo", output=>"max", tex=>'', ttype=>"UNDEROVER" },
766 "uarr" => { tag=>"mo", output=>"↑", tex=>"uparrow", ttype=>"CONST" },
767 "darr" => { tag=>"mo", output=>"↓", tex=>"downarrow", ttype=>"CONST" },
768 "rarr" => { tag=>"mo", output=>"→", tex=>"rightarrow", ttype=>"CONST" },
769 "->" => { tag=>"mo", output=>"→", tex=>"to", ttype=>"CONST", latex=>1 },
770 "|->" => { tag=>"mo", output=>"↦", tex=>"mapsto", ttype=>"CONST" },
771 "larr" => { tag=>"mo", output=>"←", tex=>"leftarrow", ttype=>"CONST" },
772 "harr" => { tag=>"mo", output=>"↔", tex=>"leftrightarrow", ttype=>"CONST" },
773 "rArr" => { tag=>"mo", output=>"⇒", tex=>"Rightarrow", ttype=>"CONST", latex=>1 },
774 "lArr" => { tag=>"mo", output=>"⇐", tex=>"Leftarrow", ttype=>"CONST" },
775 "hArr" => { tag=>"mo", output=>"⇔", tex=>"Leftrightarrow", ttype=>"CONST", latex=>1 },
777 # commands with argument
779 "hat" => { tag=>"mover", output=>"^", tex=>'', ttype=>"UNARY", acc=>"true" },
780 "bar" => { tag=>"mover", output=>"¯", tex=>"overline", ttype=>"UNARY", acc=>"true" },
781 "vec" => { tag=>"mover", output=>"→", tex=>'', ttype=>"UNARY", acc=>"true" },
782 "dot" => { tag=>"mover", output=>".", tex=>'', ttype=>"UNARY", acc=>"true" },
783 "ddot" => { tag=>"mover", output=>"..", tex=>'', ttype=>"UNARY", acc=>"true" },
784 "ul" => { tag=>"munder", output=>"̲", tex=>"underline", ttype=>"UNARY", acc=>"true" },
786 "bb" => { tag=>"mstyle", atname=>"fontweight", atval=>"bold", output=>"bb", tex=>'', ttype=>"UNARY" },
787 "mathbf" => { tag=>"mstyle", atname=>"fontweight", atval=>"bold", output=>"mathbf", tex=>'', ttype=>"UNARY" },
788 "sf" => { tag=>"mstyle", atname=>"fontfamily", atval=>"sans-serif", output=>"sf", tex=>'', ttype=>"UNARY" },
789 "mathsf" => { tag=>"mstyle", atname=>"fontfamily", atval=>"sans-serif", output=>"mathsf", tex=>'', ttype=>"UNARY" },
790 "bbb" => { tag=>"mstyle", atname=>"mathvariant", atval=>"double-struck", output=>"bbb", tex=>'', ttype=>"UNARY", codes=>$AMbbb },
791 "mathbb" => { tag=>"mstyle", atname=>"mathvariant", atval=>"double-struck", output=>"mathbb", tex=>'', ttype=>"UNARY", codes=>$AMbbb },
792 "cc" => { tag=>"mstyle", atname=>"mathvariant", atval=>"script", output=>"cc", tex=>'', ttype=>"UNARY", codes=>$AMcal },
793 "mathcal" => { tag=>"mstyle", atname=>"mathvariant", atval=>"script", output=>"mathcal", tex=>'', ttype=>"UNARY", codes=>$AMcal },
794 "tt" => { tag=>"mstyle", atname=>"fontfamily", atval=>"monospace", output=>"tt", tex=>'', ttype=>"UNARY" },
795 "mathtt" => { tag=>"mstyle", atname=>"fontfamily", atval=>"monospace", output=>"mathtt", tex=>'', ttype=>"UNARY" },
796 "fr" => { tag=>"mstyle", atname=>"mathvariant", atval=>"fraktur", output=>"fr", tex=>'', ttype=>"UNARY", codes=>$AMfrk },
797 "mathfrak" => { tag=>"mstyle", atname=>"mathvariant", atval=>"fraktur", output=>"mathfrak", tex=>'', ttype=>"UNARY", codes=>$AMfrk },
800 # Preprocess AMSymbol for lexer regular expression
801 # Preprocess AMSymbol for tex input
802 my %AMTexSym = map(($AMSymbol{$_}{tex} || $_, $_),
803 grep($AMSymbol{$_}{tex}, keys %AMSymbol));
804 my $Ident_RE = join '|', map("\Q$_\E",
805 sort {length($b) - length($a)} (keys %AMSymbol,
808 sub _getSymbol_ : method {
809 my ($self, $str) = @_;
811 /^(\d+(\.\d+)?)/ || /^(\.\d+)/
812 and return $1, {tag=>'mn', output=>$1, ttype=>'CONST'};
814 return $1,$AMTexSym{$1} ? $AMSymbol{$AMTexSym{$1}} : $AMSymbol{$1};
815 $self->{Definition_RE} && /^($self->{Definition_RE})/ and
816 return $1, $self->{Definitions}{$1};
818 return $1, {tag=>'mi', output=>$1, ttype=>'CONST'};
820 return $1 eq '-' && defined $self->{previousSymbol} &&
821 $self->{previousSymbol} eq 'INFIX' ?
822 ($1, {tag=>'mo', output=>$1, ttype=>'UNARY', func=>"true"} ) :
823 ($1, {tag=>'mo', output=>$1, ttype=>'CONST'});
827 # Used so that Text::ASCIIMathML::Node can get access to the symbol table
833 # Parses an E expression
834 # Arguments: string to parse, whether to look for a right bracket
835 # Returns: parsed node (if successful), remaining unparsed string
836 sub _parseExpr : method {
837 my ($self, $str, $rightbracket) = @_;
838 my $newFrag = $self->_createDocumentFragment();
839 my ($node, $input, $symbol);
841 $str = _removeCharsAndBlanks($str, 0);
842 ($node, $str) = $self->_parseIexpr($str);
843 ($input, $symbol) = $self->_getSymbol($str);
844 if (defined $symbol && $symbol->{ttype} eq 'INFIX' && $input eq '/') {
845 $str = _removeCharsAndBlanks($str, length $input);
846 my @result = $self->_parseIexpr($str);
848 _removeBrackets($result[0]);
850 else { # show box in place of missing argument
851 $result[0] = $self->_createMmlNode
852 ('mo', $self->_createTextNode('A1;'));
855 _removeBrackets($node);
856 $node = $self->_createMmlNode($symbol->{tag}, $node);
857 $node->appendChild($result[0]);
858 $newFrag->appendChild($node);
859 ($input, $symbol) = $self->_getSymbol($str);
861 elsif (defined $node) {
862 $newFrag->appendChild($node);
864 } while (defined $symbol && ($symbol->{ttype} ne 'RIGHTBRACKET' &&
865 ($symbol->{ttype} ne 'LEFTRIGHT' ||
867 || $self->{nestingDepth} == 0) &&
868 $symbol->{output} ne '');
869 if (defined $symbol && $symbol->{ttype} =~ /RIGHTBRACKET|LEFTRIGHT/) {
870 my @childNodes = $newFrag->childNodes;
871 if (@childNodes > 1 &&
872 $childNodes[-1]->nodeName eq 'mrow' &&
873 $childNodes[-2]->nodeName eq 'mo' &&
874 $childNodes[-2]->firstChild->nodeValue eq ',') { # matrix
875 my $right = $childNodes[-1]->lastChild->firstChild->nodeValue;
876 if ($right =~ /[\)\]]/) {
877 my $left = $childNodes[-1]->firstChild->firstChild->nodeValue;
878 if ("$left$right" =~ /^\(\)$/ && $symbol->{output} ne '}' ||
879 "$left$right" =~ /^\[\]$/) {
880 my @pos; # positions of commas
883 for (my $i=0; $matrix && $i < $m; $i += 2) {
885 $node = $childNodes[$i];
887 $node->nodeName eq 'mrow' &&
889 $node->nextSibling->nodeName eq 'mo' &&
890 $node->nextSibling->firstChild->nodeValue eq ',')&&
891 $node->firstChild->firstChild->nodeValue eq $left&&
892 $node->lastChild->firstChild->nodeValue eq $right
895 for (my $j=0; $j<($node->childNodes); $j++) {
896 if (($node->childNodes)[$j]->firstChild->
898 push @{$pos[$i]}, $j;
902 if ($matrix && $i > 1) {
903 $matrix = @{$pos[$i]} == @{$pos[$i-2]};
907 my $table = $self->_createDocumentFragment();
908 for (my $i=0; $i<$m; $i += 2) {
909 my $row = $self->_createDocumentFragment();
910 my $frag = $self->_createDocumentFragment();
911 # <mrow>(-,-,...,-,-)</mrow>
912 $node = $newFrag->firstChild;
913 my $n = $node->childNodes;
915 $node->removeChild($node->firstChild); # remove (
916 for (my $j=1; $j<$n-1; $j++) {
917 if ($k < @{$pos[$i]} && $j == $pos[$i][$k]) {
920 ($self->_createMmlNode('mtd', $frag));
921 $frag = $self->_createDocumentFragment();
925 $frag->appendChild($node->firstChild);
927 $node->removeChild($node->firstChild);
930 ($self->_createMmlNode('mtd', $frag));
931 if ($newFrag->childNodes > 2) {
932 # remove <mrow>)</mrow>
933 $newFrag->removeChild($newFrag->firstChild);
935 $newFrag->removeChild($newFrag->firstChild);
938 ($self->_createMmlNode('mtr', $row));
940 $node = $self->_createMmlNode('mtable', $table);
941 $node->setAttribute('columnalign', 'left')
942 if $symbol->{invisible};
943 $newFrag->replaceChild($node, $newFrag->firstChild);
948 $str = _removeCharsAndBlanks($str, length $input);
949 if (! $symbol->{invisible}) {
950 $node = $self->_createMmlNode
951 ('mo', $self->_createTextNode($symbol->{output}));
952 $newFrag->appendChild($node);
955 return $newFrag, $str;
958 # Parses an I expression
959 # Arguments: string to parse
960 # Returns: parsed node (if successful), remaining unparsed string
961 sub _parseIexpr : method {
962 my ($self, $str) = @_;
963 $str = _removeCharsAndBlanks($str, 0);
964 my ($in1, $sym1) = $self->_getSymbol($str);
966 ($node, $str) = $self->_parseSexpr($str);
967 my ($input, $symbol) = $self->_getSymbol($str);
968 if (defined $symbol && $symbol->{ttype} eq 'INFIX' && $input ne '/') {
969 # if (symbol.input == "/") result = AMparseIexpr(str); else ...
970 $str = _removeCharsAndBlanks($str, length $input);
971 my @result = $self->_parseSexpr($str);
973 _removeBrackets($result[0]);
975 else { # show box in place of missing argument
976 $result[0] = $self->_createMmlNode
977 ('mo', $self->_createTextNode("A1;"));
981 my ($in2, $sym2) = $self->_getSymbol($str);
982 my $underover = $sym1->{ttype} eq 'UNDEROVER';
984 $str = _removeCharsAndBlanks($str, length $in2);
985 my @res2 = $self->_parseSexpr($str);
986 _removeBrackets($res2[0]);
988 $node = $self->_createMmlNode
989 ($underover ? 'munderover' : 'msubsup', $node);
990 $node->appendChild($result[0]);
991 $node->appendChild($res2[0]);
992 $node = $self->_createMmlNode('mrow',$node); # so sum does not stretch
995 $node = $self->_createMmlNode
996 ($underover ? 'munder' : 'msub', $node);
997 $node->appendChild($result[0]);
1001 $node = $self->_createMmlNode($symbol->{tag}, $node);
1002 $node->appendChild($result[0]);
1008 # Parses an S expression
1009 # Arguments: string to parse
1010 # Returns: parsed node (if successful), remaining unparsed string
1011 sub _parseSexpr : method {
1012 my ($self, $str) = @_;
1013 my $newFrag = $self->_createDocumentFragment();
1014 $str = _removeCharsAndBlanks($str, 0);
1015 my ($input, $symbol) = $self->_getSymbol($str);
1016 return (undef, $str)
1017 if ! defined $symbol ||
1018 $symbol->{ttype} eq 'RIGHTBRACKET' && $self->{nestingDepth} > 0;
1019 if ($symbol->{ttype} eq 'DEFINITION') {
1020 $str = $symbol->{output} . _removeCharsAndBlanks($str, length $input);
1021 ($input, $symbol) = $self->_getSymbol($str);
1023 my $ttype = $symbol->{ttype};
1024 if ($ttype =~ /UNDEROVER|CONST/) {
1025 $str = _removeCharsAndBlanks($str, length $input);
1027 $self->_createMmlNode($symbol->{tag},
1028 $self->_createTextNode($symbol->{output})),
1031 if ($ttype eq 'LEFTBRACKET') {
1032 $self->{nestingDepth}++;
1033 $str = _removeCharsAndBlanks($str, length $input);
1034 my @result = $self->_parseExpr($str, 1);
1035 $self->{nestingDepth}--;
1037 if ($symbol->{invisible}) {
1038 $node = $self->_createMmlNode('mrow', $result[0]);
1041 $node = $self->_createMmlNode
1042 ('mo', $self->_createTextNode($symbol->{output}));
1043 $node = $self->_createMmlNode('mrow', $node);
1044 $node->appendChild($result[0]);
1046 return $node, $result[1];
1048 if ($ttype eq 'TEXT') {
1049 $str = _removeCharsAndBlanks($str, length $input) unless $input eq '"';
1051 ($input, $st) = ($1, $2)
1052 if $str =~ /^(\"()\")/ || $str =~ /^(\"((?:\\\\|\\\"|.)+?)\")/;
1053 ($input, $st) = ($1, $2)
1054 if ($str =~ /^(\((.*?)\))/ ||
1055 $str =~ /^(\[(.*?)\])/ ||
1056 $str =~ /^(\{(.*?)\})/);
1057 ($input, $st) = ($str) x 2 unless defined $st;
1058 if (substr($st, 0, 1) eq ' ') {
1059 my $node = $self->_createElementMathML('mspace');
1060 $node->setAttribute(width=>'1ex');
1061 $newFrag->appendChild($node);
1063 $newFrag->appendChild
1064 ($self->_createMmlNode($symbol->{tag},
1065 $self->_createTextNode($st)));
1066 if (substr($st, -1) eq ' ') {
1067 my $node = $self->_createElementMathML('mspace');
1068 $node->setAttribute(width=>'1ex');
1069 $newFrag->appendChild($node);
1071 $str = _removeCharsAndBlanks($str, length $input);
1072 return $self->_createMmlNode('mrow', $newFrag), $str;
1074 if ($ttype eq 'UNARY') {
1075 $str = _removeCharsAndBlanks($str, length $input);
1076 my @result = $self->_parseSexpr($str);
1077 return ($self->_createMmlNode
1079 $self->_createTextNode($symbol->{output})), $str)
1080 if ! defined $result[0];
1081 if ($symbol->{func}) {
1082 return ($self->_createMmlNode
1084 $self->_createTextNode($symbol->{output})), $str)
1085 if $str =~ m!^[\^_/|]!;
1086 my $node = $self->_createMmlNode
1087 ('mrow', $self->_createMmlNode
1088 ($symbol->{tag}, $self->_createTextNode($symbol->{output})));
1089 $node->appendChild($result[0]);
1090 return $node, $result[1];
1092 _removeBrackets($result[0]);
1093 if ($symbol->{acc}) { # accent
1094 my $node = $self->_createMmlNode($symbol->{tag}, $result[0]);
1096 ($self->_createMmlNode
1097 ('mo', $self->_createTextNode($symbol->{output})));
1098 return $node, $result[1];
1100 if ($symbol->{atname}) { # font change command
1101 if ($self->{attr}{ForMoz} && $symbol->{codes}) {
1102 my @childNodes = $result[0]->childNodes;
1103 my $nodeName = $result[0]->nodeName;
1104 for (my $i=0; $i<@childNodes; $i++) {
1105 if ($childNodes[$i]->nodeName eq 'mi'||$nodeName eq 'mi') {
1106 my $st = $nodeName eq 'mi' ?
1107 $result[0] ->firstChild->nodeValue :
1108 $childNodes[$i]->firstChild->nodeValue;
1109 $st =~ s/([A-Z])/sprintf "&#x%X;",$symbol->{codes}[ord($1)-65]/ge;
1110 if ($nodeName eq 'mi') {
1111 $result[0] = $self->_createTextNode($st);
1114 $result[0]->replaceChild
1115 ($self->_createTextNode($st), $childNodes[$i]);
1120 my $node = $self->_createMmlNode($symbol->{tag}, $result[0]);
1121 $node->setAttribute($symbol->{atname}=>$symbol->{atval});
1122 return $node, $result[1];
1124 return $self->_createMmlNode($symbol->{tag}, $result[0]), $result[1];
1126 if ($ttype eq 'BINARY') {
1127 $str = _removeCharsAndBlanks($str, length $input);
1128 my @result = $self->_parseSexpr($str);
1129 return ($self->_createMmlNode
1130 ('mo', $self->_createTextNode($input)), $str)
1131 if ! defined $result[0];
1132 _removeBrackets($result[0]);
1133 my @result2 = $self->_parseSexpr($result[1]);
1134 return ($self->_createMmlNode
1135 ('mo', $self->_createTextNode($input)), $str)
1136 if ! defined $result2[0];
1137 _removeBrackets($result2[0]);
1138 if ($input =~ /new(command|symbol)/) {
1140 # Look for text in both arguments
1141 my $text1 = $result[0];
1142 my $haveTextArgs = 0;
1143 $text1 = $text1->firstChild while $text1->nodeName eq 'mrow';
1144 if ($text1->nodeName eq 'mtext') {
1145 my $text2 = $result2[0];
1146 $text2 = $text2->firstChild while $text2->nodeName eq 'mrow';
1148 if ($result2[0]->childNodes > 1 && $input eq 'newsymbol') {
1149 # Process the latex string for a newsymbol
1150 my $latexdef = $result2[0]->child(1);
1151 $latexdef = $latexdef->firstChild
1152 while $latexdef->nodeName eq 'mrow';
1153 $latex = $latexdef->firstChild->nodeValue;
1155 if ($text2->nodeName eq 'mtext') {
1156 $self->{Definitions}{$text1->firstChild->nodeValue} = {
1158 output=>$text2->firstChild->nodeValue,
1159 ttype =>$what eq 'symbol' ? 'CONST' : 'DEFINITION',
1161 $self->{Definition_RE} = join '|',
1162 map("\Q$_\E", sort {length($b) - length($a)}
1163 keys %{$self->{Definitions}});
1164 $self->{Latex}{$text2->firstChild->nodeValue} = $latex
1169 if (! $haveTextArgs) {
1170 $newFrag->appendChild($self->_createMmlNode
1171 ('mo', $self->_createTextNode($input)),
1172 $result[0], $result2[0]);
1173 return $self->_createMmlNode('mrow', $newFrag), $result2[1];
1175 return undef, $result2[1];
1177 if ($input =~ /root|stackrel/) {
1178 $newFrag->appendChild($result2[0]);
1180 $newFrag->appendChild($result[0]);
1181 if ($input eq 'frac') {
1182 $newFrag->appendChild($result2[0]);
1184 return $self->_createMmlNode($symbol->{tag}, $newFrag), $result2[1];
1186 if ($ttype eq 'INFIX') {
1187 $str = _removeCharsAndBlanks($str, length $input);
1188 return $self->_createMmlNode
1189 ('mo', $self->_createTextNode($symbol->{output})), $str;
1191 if ($ttype eq 'SPACE') {
1192 $str = _removeCharsAndBlanks($str, length $input);
1193 my $node = $self->_createElementMathML('mspace');
1194 $node->setAttribute('width', '1ex');
1195 $newFrag->appendChild($node);
1196 $newFrag->appendChild
1197 ($self->_createMmlNode($symbol->{tag},
1198 $self->_createTextNode($symbol->{output})));
1199 $node = $self->_createElementMathML('mspace');
1200 $node->setAttribute('width', '1ex');
1201 $newFrag->appendChild($node);
1202 return $self->_createMmlNode('mrow', $newFrag), $str;
1204 if ($ttype eq 'LEFTRIGHT') {
1205 $self->{nestingDepth}++;
1206 $str = _removeCharsAndBlanks($str, length $input);
1207 my @result = $self->_parseExpr($str, 0);
1208 $self->{nestingDepth}--;
1209 my $st = $result[0]->lastChild ?
1210 $result[0]->lastChild->firstChild->nodeValue : '';
1211 my $node = $self->_createMmlNode
1212 ('mo',$self->_createTextNode($symbol->{output}));
1213 $node = $self->_createMmlNode('mrow', $node);
1214 if ($st eq '|') { # it's an absolute value subterm
1215 $node->appendChild($result[0]);
1216 return $node, $result[1];
1221 $str = _removeCharsAndBlanks($str, length $input);
1222 return $self->_createMmlNode
1223 ($symbol->{tag}, # it's a constant
1224 $self->_createTextNode($symbol->{output})), $str;
1227 # Removes brackets at the beginning or end of an mrow node
1228 # Arguments: node object
1230 # Side-effects: may change children of node object
1231 sub _removeBrackets {
1233 if ($node->nodeName eq 'mrow') {
1234 my $st = $node->firstChild->firstChild->nodeValue;
1235 $node->removeChild($node->firstChild) if $st =~ /^[\(\[\{]$/;
1236 $st = $node->lastChild->firstChild->nodeValue;
1237 $node->removeChild($node->lastChild) if $st =~ /^[\)\]\}]$/;
1241 # Removes the first n characters and any following blanks
1242 # Arguments: string, n
1243 # Returns: resultant string
1244 sub _removeCharsAndBlanks {
1246 my $st = substr($str,
1247 substr($str, $n) =~ /^\\[^\\ ,]/ ? $n+1 : $n);
1248 $st =~ s/^[\x00-\x20]+//;
1252 # Removes outermost parenthesis
1254 # Returns: string with parentheses removed
1257 $s =~ s!^(<mrow>)<mo>[\(\[\{]</mo>!$1!;
1258 $s =~ s!<mo>[\)\]\}]</mo>(</mrow>)$!$1!;
1263 my %Conversion = ('<'=>'lt', '>'=>'gt', '"'=>'quot', '&'=>'amp');
1265 # Encodes special xml characters
1267 # Returns: encoded string
1270 $s =~ s/([<>\"&])/&$Conversion{$1};/g;
1275 package Text::ASCIIMathML::Node;
1278 # Create a closure for the following attributes
1281 # Creates a new Text::ASCIIMathML::Node object
1282 # Arguments: Text::ASCIIMathML object, optional tag
1283 # Returns: new object
1285 my ($class, $parser, $tag) = @_;
1286 my $obj = bless { children=>[] }, $class;
1287 if (defined $tag) { $obj->{tag} = $tag }
1288 else { $obj->{frag} = 1 }
1289 $parser_of{$obj} = $parser;
1293 # Creates a new Text::ASCIIMathML::Node text object
1294 # Arguments: Text::ASCIIMathML object, text
1295 # Returns: new object
1297 my ($class, $parser, $text) = @_;
1298 $text =~ s/^\s*(.*?)\s*$/$1/; # Delete leading/trailing spaces
1299 my $obj = bless { text=>$text }, $class;
1300 $parser_of{$obj} = $parser;
1307 $Null = new Text::ASCIIMathML::Node;
1310 # Appends one or more node objects to the children of an object
1311 # Arguments: list of objects to append
1313 sub appendChild : method {
1315 my @new = map $_->{frag} ? @{$_->{children}} : $_, @_;
1316 push @{$self->{children}}, @new;
1317 map do {$Parent{$_} = $self}, @new;
1321 # Returns a the value for an attribute of a node object
1322 # Arguments: Attribute name
1323 # Returns: Value for the attribute
1325 my ($self, $attr) = @_;
1326 return $self->{attr}{$attr};
1329 # Returns a list of the attributes of a node object
1331 # Returns: Array of attribute names
1334 return $self->{attrlist} ? @{$self->{attrlist}} : ();
1337 # Returns a child with a given index in the array of children of a node
1339 # Returns: Array of node objects
1341 my ($self, $index) = @_;
1342 return $self->{children} && @{$self->{children}} > $index ?
1343 $self->{children}[$index] : $Null;
1346 # Returns an array of children of a node
1348 # Returns: Array of node objects
1351 return $self->{children} ? @{$self->{children}} : ();
1354 # Returns the first child of a node; ignores any fragments
1356 # Returns: node object or self
1359 return $self->{children} && @{$self->{children}} ?
1360 $self->{children}[0] : $Null;
1363 # Returns true if the object is a fragment
1367 return $_[0]->{frag};
1370 # Returns true if the object is a named node
1374 return $_[0]->{tag};
1377 # Returns true if the object is a text node
1381 return defined $_[0]->{text};
1384 # Returns the last child of a node
1386 # Returns: node object or self
1389 return $self->{children} && @{$self->{children}} ?
1390 $self->{children}[-1] : $Null;
1394 # Creates closure for following "static" variables
1395 my (%LatexSym, %LatexMover, %LatexFont, %LatexOp);
1397 # Returns a latex representation of a node object
1399 # Returns: Text string
1400 sub latex : method {
1403 my $parser = $parser_of{$self};
1405 # Build the entity to latex symbol translator
1406 my $amsymbol = Text::ASCIIMathML::_get_amsymbol_();
1407 foreach my $sym (keys %$amsymbol) {
1408 next unless (defined $amsymbol->{$sym}{output} &&
1409 $amsymbol->{$sym}{output} =~ /&\#x.*;/);
1410 my ($output, $tex) = map $amsymbol->{$sym}{$_}, qw(output tex);
1411 next if defined $LatexSym{$output} && ! $amsymbol->{$sym}{latex};
1412 $tex = $sym if $tex eq '';
1413 $LatexSym{$output} = "\\$tex";
1415 my %math_font = (bbb => 'mathds',
1420 mathfrak => 'mathfrak',
1422 # Add character codes
1423 foreach my $coded (grep $amsymbol->{$_}{codes}, keys %$amsymbol) {
1424 @LatexSym{map(sprintf("&#x%X;", $_),
1425 @{$amsymbol->{$coded}{codes}})} =
1426 map("\\$math_font{$coded}\{$_}", ('A' .. 'Z'));
1428 # Post-process protected symbols
1429 $LatexSym{$_} =~ s/^\\\\/\\/ foreach keys %LatexSym;
1430 %LatexMover = ('^' => '\hat',
1431 '\overline' => '\overline',
1434 '\rightarrow' => '\vec',
1438 %LatexFont = (bold => '\bf',
1439 'double-struck' => '\mathds',
1440 fraktur => '\mathfrak',
1442 'sans-serif' => '\sf',
1445 %LatexOp = (if => '\mbox{if }',
1446 lcm => '\mbox{lcm}',
1447 newcommand => '\mbox{newcommand}',
1448 "\\" => '\backslash',
1454 if (defined $self->{text}) {
1455 my $text = $self->{text};
1456 $text =~ s/([{}])/\\$1/;
1457 $text =~ s/(&\#x.*?;)/
1458 defined $parser->{Latex}{$1} ? $parser->{Latex}{$1} :
1459 defined $LatexSym{$1} ? $LatexSym{$1} : $1/eg;
1462 my $tag = $self->{tag};
1465 if (@{$self->{children}}) {
1466 foreach (@{$self->{children}}) {
1467 push @child_str, $_->latex($parser);
1473 # Need to distinguish bmod from pmod
1474 my $parent = $self->parent;
1475 return $self eq $parent->child(1) &&
1476 $parent->firstChild->firstChild->{text} eq '('
1478 if $child_str[0] eq 'mod';
1479 return $LatexOp{$child_str[0]} if $LatexOp{$child_str[0]};
1480 return $child_str[0] =~ /^\w+$/ ? "\\$child_str[0]" : $child_str[0];
1484 if ($tag eq 'mrow') {
1485 @child_str = grep $_ ne '', @child_str;
1486 # Check for pmod function
1487 if (@child_str > 1 && $child_str[1] eq '\pmod') {
1488 pop @child_str if $child_str[-1] eq ')';
1489 splice @child_str, 0, 2;
1490 return "\\pmod{@child_str}";
1492 # Check if we need \left ... \right
1493 my $is_tall = grep(/[_^]|\\(begin\{array\}|frac|sqrt|stackrel)/,
1495 if ($is_tall && @child_str > 1 &&
1496 ($child_str[0] =~ /^([\(\[|]|\\\{)$/ ||
1497 $child_str[-1] =~ /^([\)\]|]|\\\})$/)) {
1498 if ($child_str[0] =~ /^([\(\[|]|\\\{)$/) {
1499 $child_str[0] = "\\left$child_str[0]";
1502 unshift @child_str, "\\left.";
1504 if ($child_str[-1] =~ /^([\)\]|]|\\\})$/) {
1505 $child_str[-1] = "\\right$child_str[-1]";
1508 push @child_str, "\\right.";
1511 return "@child_str";
1519 if ($tag =~ /^m([in]|ath|row|td)$/) {
1520 @child_str = grep $_ ne '', @child_str;
1521 return "@child_str";
1528 if ($tag =~ /^(msu[bp](sup)?|munderover)$/) {
1529 my $base = shift @child_str;
1530 $base = '\mbox{}' if $base eq '';
1531 # Put {} around arguments with more than one character
1532 @child_str = map length($_) > 1 ? "{$_}" : $_, @child_str;
1533 return ($tag eq 'msub' ? "${base}_$child_str[0]" :
1534 $tag eq 'msup' ? "${base}^$child_str[0]" :
1535 "${base}_$child_str[0]^$child_str[1]");
1539 if ($tag eq 'mover') {
1540 # Need to special-case math mode accents
1542 ($child_str[1] eq '\overline' && length($child_str[0]) == 1 ?
1543 "\\bar{$child_str[0]}" :
1544 $LatexMover{$child_str[1]} ?
1545 "$LatexMover{$child_str[1]}\{$child_str[0]\}" :
1546 "\\stackrel{$child_str[1]}{$child_str[0]}");
1550 if ($tag eq 'munder') {
1551 return $child_str[1] eq '\underline' ? "$child_str[1]\{$child_str[0]}"
1552 : "$child_str[0]_\{$child_str[1]\}";
1556 if ($tag eq 'mfrac') {
1557 return "\\frac{$child_str[0]}{$child_str[1]}";
1561 if ($tag eq 'msqrt') {
1562 return "\\sqrt{$child_str[0]}";
1566 if ($tag eq 'mroot') {
1567 return "\\sqrt[$child_str[1]]{$child_str[0]}";
1571 if ($tag eq 'mtext') {
1572 my $text = $child_str[0];
1573 my $next = $self->nextSibling;
1574 my $prev = $self->previousSibling;
1575 if (defined $next->{tag} && $next->{tag} eq 'mspace') {
1578 if (defined $prev->{tag} && $prev->{tag} eq 'mspace') {
1581 $text = ' ' if $text eq ' ';
1582 return "\\mbox{$text}";
1587 if ($tag eq 'mspace') {
1592 if ($tag eq 'mtable') {
1593 my $cols = ($child_str[0] =~ tr/&//) + 1;
1594 my $colspec = ($self->{attr}{columnalign} || '') eq 'left' ? 'l' : 'c';
1595 my $colspecs = $colspec x $cols;
1596 return ("\\begin{array}{$colspecs}\n" .
1597 join('', map(" $_ \\\\\n", @child_str)) .
1602 if ($tag eq 'mtr') {
1603 return join ' & ', @child_str;
1607 if ($tag eq 'mstyle') {
1608 @child_str = grep $_ ne '', @child_str;
1609 if ($self->parent->{tag} eq 'math') {
1610 push @child_str, ' ' unless @child_str;
1611 # The top-level mstyle
1612 return (defined $self->{attr}{displaystyle} &&
1613 $self->{attr}{displaystyle} eq 'true') ?
1614 "\$\$@child_str\$\$" : "\$@child_str\$";
1617 # It better be a font changing command
1618 return $child_str[0] if $self->{attr}{mathvariant};
1619 my ($attr) = map($self->{attr}{$_},
1620 grep $self->{attr}{$_},
1621 qw(fontweight fontfamily));
1622 return $attr && $LatexFont{$attr} ?
1623 "$LatexFont{$attr}\{$child_str[0]}" : $child_str[0];
1629 # Returns the next sibling of a node
1631 # Returns: node object or undef
1634 my $parent = $self->parent;
1635 for (my $i=0; $i<@{$parent->{children}}; $i++) {
1636 return $parent->{children}[$i+1] if $self eq $parent->{children}[$i];
1641 # Returns the tag of a node
1644 sub nodeName : method {
1645 return $_[0]{tag} || '';
1648 # Returns the text of a text node
1651 sub nodeValue : method {
1652 return $_[0]{text} || '';
1655 # Returns the parent of a node
1657 # Returns: parent node object or undef
1658 sub parent : method {
1659 return $Parent{$_[0]} || $Null;
1662 # Returns the previous sibling of a node
1664 # Returns: node object or undef
1665 sub previousSibling {
1667 my $parent = $self->parent;
1668 for (my $i=1; $i<@{$parent->{children}}; $i++) {
1669 return $parent->{children}[$i-1] if $self eq $parent->{children}[$i];
1674 # Removes a given child node from a node
1675 # Arguments: child node
1677 # Side-effects: May affect children of the node
1678 sub removeChild : method {
1679 my ($self, $child) = @_;
1680 @{$self->{children}} = grep $_ ne $child, @{$self->{children}}
1681 if $self->{children};
1682 delete $Parent{$child};
1685 # Replaces one child node object with another
1686 # Arguments: old child node object, new child node object
1688 sub replaceChild : method {
1689 my ($self, $new, $old) = @_;
1690 @{$self->{children}} = map $_ eq $old ? $new : $_, @{$self->{children}};
1691 delete $Parent{$old};
1692 $Parent{$new} = $self;
1695 # Sets one or more attributes on a node object
1696 # Arguments: set of attribute/value pairs
1698 sub setAttribute : method {
1701 $self->{attr} = {} unless $self->{attr};
1702 $self->{attrlist} = [] unless $self->{attrlist};
1704 while (my($aname, $aval) = splice(@_, 0, 2)) {
1706 push @{$self->{attrlist}}, $aname unless defined $self->{attr}{$aname};
1707 $self->{attr}{$aname} = $aval;
1711 # Returns the ASCII representation of a node object
1713 # Returns: Text string
1716 return $self->{text} if defined $self->{text};
1717 my $tag = $self->{tag};
1718 my $attr = join '', map(" $_=\"" .
1719 ($_ eq 'xmlns' ? $self->{attr}{$_} :
1720 Text::ASCIIMathML::_xml_encode($self->{attr}{$_})) .
1721 "\"", @{$self->{attrlist}})
1723 if (@{$self->{children}}) {
1725 foreach (@{$self->{children}}) {
1726 $child_str .= $_->text;
1728 return $tag ? "<$tag$attr>$child_str</$tag>" : $child_str;
1730 return $tag ? "<$tag$attr/>" : '';