http://www.karpfenteich.net/peter/ElvisMeetsTeX.html
Last modified:     22th February, 2002
Version:           1.04
	

Elvis - LaTeX Browse Mode

Do you like to edit your LaTeX files with Vi/Elvis/Vim? Would you love to have a fast overview over the plain text without all the "disturbing" LaTeX-Tags and without the need to load the DVI previewer? Elvis already has a basic LaTeX browse mode, but needs some improvements to be useable in daily work. This homepage is aimed to help Steve Kirkendall to find all possible LaTeX structures and how they could be realized in Elvis. Steve plans to make some improvements to the markup display modes, but does need our help, as Steve doesn't use LaTeX himself. I'm very grateful that Steve adds/improves LaTeX support to Elvis :-)

When you're willing to make some suggestions how specific LaTeX commands could be represented you can take a look at Implementation ideas for the LaTeX browse mode.

When you're further interested in the way Elvis handles the display markup mode you'll find some Elvis insights from Steve.

Possible implementation rules for a LaTeX browse mode in Elvis

When editing a LaTeX file it could be helpful to provide a command to hide all or most of the LaTeX tags. This would help to get a better impression of the final document and would help to find a specific text faster. In addition Elvis could preformat the LaTeX code with some elementary rules to a pseudo dvi-preview. The LaTeX browse mode should not make a real dvi previewer obsolete!

To get a first impression of a (rudimentary) working LaTeX browse mode you can take a look at the vi clone Elvis.

In the following you'll find an overlook of the different categories and you'll see what the problem is and which solutions could be used. If you're aware of another categorie which must be handled or you'd would like to make a suggestion/improvement/idea how a specific problem could be handled you can e-mail this suggestion to Peter.

Preamble

Problem
Is there a way to detect where the preamble ends? A way that would work in all TeX files?
Most of the weird stuff seems to happen in the preamble. If you can recommend a good way to detect the end of the preamble, then I could define a rule along the lines of "Hide everything in the preable, but show all unknown markups after the preamble." I suspect that the TITLE attribute may provide a way to implement this -- we could say that the first known markup which has TITLE='N' marks the end of the preamble.
Solution
\begin{document} closes the preamble
I think that most commands in the preamble just should be skipped. But maybe taking care of some arguments will be neccessary. (Example: the language)

Special characters as the umlauts

Problem
How should the different forms of special characters be handled? In german we have the umlauts like ä and others. The LaTeX tag for such characters looks like "a for an ä when the option "german" was set in the preamble. Also possible is the encoding with 8 bit ASCII and the character itself is already the right. The pure TeX way would be to have the character ä as \"a which can be shortened to "a when the german package is loaded.
  1. When "german" is set, how do you enter a double-quote followed by a normal, non-literal 'a' character?
  2. I assume that "german" mode doesn't convert ~n to ñ, but that "spanish" mode would. Also, I assume "spanish" mode would *not* convert ~a to ã since Spanish doesn't use ã. Is that correct?
  3. The best idea I can come up with for handling this sort of thing would be for the digraph table to mark some digraphs as being "easy" in TeX, and the rest requiring the full \"{a} display mode. Would that be good enough? Can you suggest something better? Maybe the best way would be to teach Elvis all different forms of digraphs?
Solution
Elvis has pretty good digraph support generally. The "tex" display mode currently supports digraphs of the form \"{a} but not \"a (without the braces), and certainly not just "a.

Pictures, Graphs

Problem
References to pictures should just standing left like they are. For example: \begin{figure}[hptb] \begin{center} \includegraphics[width = 5 cm]{sinc} \caption{Sinc-Funktion} \label{sinc} \end{center} \end{figure} should appear like it is. Maybe the code for a picture could be shown in another color, but that's not really important.
The challenge here is that there are so many variations here that it is hard to define what a picture is. Will there always be a \end{figure} markup? Most TeX documents I've seen have many \begin's and very few \ends.
Solution
Every \begin structure is closed with an \end structure.
I suppose I could add a new state flag to indicate whether included text should have its tags shown or not. Then I could make \begin{figure} turn on that flag, and \end{figure} turn it off. [The flag idea seems a good idea for the customization.]

Equations

Problem
Equations also should be left like they are (because all other ideas like Roff and trying to show equations with ASCII characters would be too much hassle and the dvi previewer shouldn't be robbed of his job ;-)
Maybe one would choose to strip the delimiters of equations. $ or $$ or \( ... \) and \[ ... \] and \begin{math} ... \end{math} and the same for the equation and eqnarray environment (\begin{equation} and so on...). Also showing EQ's with a different color would be an option
Solution
I could make $ and $$ set that new "don't hide the markups" flag. That would help. [sounds good]

Document structure, Titles

Problem
Also structures like chapters \part \section \subsection \subsubsection should be recognized.
Solution
I didn't know about \part. It should be easy to add.
The only weird thing about TeX titles is that the title text is embedded in the markup itself (in the braces), so the markup needs a custom function just to extract that text so it can be output. Since I'm forced to have a custom function for outputting titles anyway, I can display them however I like. That's why \title's are centered in TeX, but not in HTML or man.

Emphasized text

Problem
The \emph tag is used very often and could be represented in a way like this This *important* word can't be seen in Elvis version 2.1.
Do you mean that in TeX source, * toggles the emphasized font in much the same way that $ toggles the monspaced font?
Solution
Elvis supports more than a dozen ways to emphasize text. I'll be glad when this is configurable at run-time, so that users can add their own favorite method.
In TeX the $ doesn't toggle the monospaced font, but switches to math mode. With good markup a word is emphasized like \emph{this}, although sometimes a word also could be "hardcoded" like \textit{example words}. The character * would be one possible way to emphasize *text* visually with pure ASCII characters. Another way would be the use of colors.

Text attributes

Problem
Which text attributs exist and should be recognized?
Solution
Currently textsl{} is implemented. It displays the argument text in italics. When tex mode is configurable, you'll be able to select any font you want.

Unknown tags/commands

Problem
It's nearly impossible to describe a behavior for every tag. Unknown LaTeX tags should just be left like they are and this example sentence would then look like
    This \emph{important} word can't be seen in Elvis 2.1.
	    
Solution
Some unimportant (for the understanding of the text) tags should just be "skipped". Which tags should belong to this type?

Kinds of tags

Parsing is still rather awful, mostly because elvis' markup mode is oriented toward HTML-style markups. The parser must do the following:

Problem
Which sort of tags should be defined?
Solution

Customization

Problem
Maybe it would be possible to set up a file where the user could specify unknown tags and determine how they should be handled, to which tag group they should appeal.

Editing in TeX display mode

Problem
Maybe it would be clearer not to allow to edit the text in TeX display mode, because this would ensure that the tags aren't messed up, when I edit at the beginning/end of the "reach" of a tag. When the display mode could be switched via a F-key this wouldn't be a real drawback.
Solution
What if I force elvis to reparse the file when you exit input mode? This could almost be done via a :map!
Sounds like a good (IMO the best) idea.

Notes

By the way, ":set list" currently causes all marks to be visible. Maybe that will help you use the elvis 2.1's tex mode.

Elvis internals

How the markup display modes work and ideas how to handle an improved markup display mode

The big change will be user-configurable markup display modes. Currently (in 2.1) there are three markup modes: html, man, and tex. They all share a single text output routine, which adds words or whitespace to the output and also interprets the markups. Each mode, however, has its own setup function and its own "get next token" function, which is responsible for parsing the next markup or plain text word. Also, markups can invoke a custom C function.

Getting HTML tokens is simplest -- in part because of the nature of the tags, but mostly this is because I wrote the markup display mode for HTML first, and then wedged the man & tex display modes into it as best I could. To get a man token, I just check the next character, and that determines whether it is whitespace, normal text, or (if it begins with '<') a markup. Markups always end at the next '>'. Very simple.

Man mode is the next simplest. A few markups begin with a backslash, but most of them begin with a '.' at the front of a line, and end at the end of that same line. A few of the '.' commands also read the following line, but that's as tricky as it gets. Oh, and blank lines are treated as paragraph breaks.

The tex mode's "get next token" is very complex. Some markups cause the line to be read and ignored, some are followed by optional text in braces (which may be displayed normally, displayed in a special font, or ignored), some use a combination of brackets and braces, etc. Basically, this one is a mess.

Put all of that is intended to deal with the differences in how the markups are parsed. The code for displaying the text is common to all of them. Each markup has a small structure associated with it (by the "get next token" function), which describes what effect it has on the output. Each of the three display modes' "get next token" functions contain a table of those data structures, describing the known markups.

For 2.1, those tables are hardcoded. For 2.2, I'd like to make them be configurable. Instead of just HTML, it could handle MIME rich text, XML, and SGML. Instead of just nroff -man, it could handle -mm, and the various -ms implementations. Instead of just (a tiny subset of) TeX, it could handle a *configurable* (but still tiny subset of) TeX, or RTF. Clearly this is desirable!

My first thought was to store those tables in a file, sort of like the "elvis.syn" file for the syntax display modes. However, it occurs to me that if the table was configured via commands, then elvis could scan the file when loading it, and conditionally execute commands based on the presence of certain text. For example, this would allow XML documents to refer to a style sheet, and have elvis configure itself to resemble the output of that style sheet.

Currently, the data structure that describes a markup's behavior looks like this:

    typedef struct markup_s
    {
    char    *name;          /* name of the markup */
    char    attr[8];        /* attributes of markup */
    twrap_t (*fn)P_((TOKEN *));/* ptr to special function */        
    } MARKUP;
    #define TITLE   attr[0]         /* in title: -, N, Y */
    #define BREAKLN attr[1]         /* line break: -, 0, 1, 2, c, or p */
    #define INDENT  attr[2]         /* -, <, >, or a number */
    #define LIST    attr[3]         /* in list: -, N, Y, # */
    #define FONT    attr[4]         /* font: -,=,n,b,u,i,f,e,N,B,U,I,F,E */
    #define FILL    attr[5]         /* Y=fill, N=preformatted, -=no chg. */
    #define DEST    attr[6]         /* S=section, P=paragraph, T=<tab> key */
	

The attributes deserve some comments:
TITLE is used when printing the document; it is displayed on the header at the top of each page. To find the document, the print routine starts reading from the top of the file, past any markups which have TITLE='Y', until it encounters non-whitespace text. It then accumulates the text up to the next markup which has TITLE='N'.

BREAKLN is used for causing line breaks. '-' is used for markups which don't cause a line break. '0', '1', and '2' cause a line break, plus that many blank lines. 'p' causes a page break when printing, or two blank lines on the screen. 'c' is a conditional page break; it acts like 'p' near the bottom of a page, or '2' elsewhere on a page.

INDENT adjusts indentation. Indentation is always done in increments of half of the value of elvis' "shiftwidth" option. '-' leaves it unchanged, '>' increases indentation, '<' reduces indentation, and any digit sets the indentation to the given number. For example, '0' forces text to be flush against the left edge. The default indentation for most text is '2', which means a full shiftwidth.

LIST is used for starting or ending lists. '-' doesn't change the list status, 'Y' starts an non-numbered list, '#' starts a numbered list, and 'N' ends either type of list. List can be nested, but there is only one item counter, so you can't nest one numbered list inside another; the inner lists are always displayed as non-numbered.

FONT is used for changing the font. Elvis actually has two variables for storing fonts — one for the current displayed font, and another for storing a "normal" font for this context. '-' leaves both fonts unchanged. '=' sets the current font to the context's normal font. Lowercase letters set the current font to one of elvis' 6 main fonts, but leave the context's normal font unchanged. Uppercase letters set both font variables.

(Note: This is going to change. For 2.2, I intend to support a large number of fonts. They will be identified by a name, not just a letter. The :color attribute will associate attributes to a given font. By the time I get around to making the markup display modes be configurable, the FONT field will have to be radically changed.)

FILL controls whether the whitespace and line breaks should be adjusted. Filling is always enabled by default, but you'll want to disable filling for preformatted sections of text so that their formatting remains intact. '-' doesn't affect filling, 'Y' enables filling, and 'N' disables filling.

DEST affects the behavior of some cursor-movement commands. It was never fully implemented in elvis 2.1, but it is intended to work like this: '-' has no effect, 'S' marks a stopping point for the [[ and ]] section commands, 'P' marks a stopping point for the { and } paragraph commands, and 'T' marks a stopping point for the <Tab> key. Currently, only the <Tab> key support is implemented; the "html" display mode uses it to move to the next link, the next field of a form.

The "fn" function pointer, if non-NULL, points to a function which is called after the normal "attr" operations. It is passed the input token which includes both the markup data structure, and a buffer containing the source text for that token with all its arguments. It returns one of the following:

    TWRAP_NO      This is the usual return value.  It indicates that no
    extra processing was necessary.

    TWRAP_BEFORE  This indicates that elvis' text-fill routine should
    perform a linewrap, and then call the markup function
    again.

    The man display mode uses this for ".I italic text".
    If the italic text fits on the current line then it'll
    output it and return TWRAP_NO.  If it is too long for
    the current line then it'll return TWRAP_BEFORE.

    TWRAP_AFTER   This indicates that elvis' text-fill routine should
    perform a linewrap, but the markup function should *NOT*
    be called again.

    The man display mode uses this for ".IP label text".
    This starts a new paragraph, outputs the label text,
    and then increases the indentation for the rest of the
    paragraph.  If the label text is short enough, then
    the paragraph continues on the same line and the function
    returns TWRAP_NO.  Otherwise it returns TWRAP_AFTER so
    the paragraph continues on the next line.
	

Obviously these functions are allowed to output text. I now have a pool of functions to support the most common formatting tasks. The names of these functions indicate which display mode they're intended to be used with, but (unless they need to parse the arguments) can be used with any markup display mode. The current functions are:

    htmlimg()    For <img>, output "alt=" text if any, else output "src=" text
    htmlpre()    For <pre>, check for elvis' "graphic" argument
    htmlli()     For <li>, output a bullet or number
    htmlinput()  For <input> and similar form tags, simulate part of a form
    htmla()      For <a>, switch to underline if href, but not if anchor
    htmlhr()     For <hr> or TeX's \hline, output a horizontal rule
    manTH()      For .TH, which is invisible but defines the title
    manSH()      For .SH or .SS, output text as a header
    manBI()      For .BI and similar font commands, output text or change font
    manIP()      For .IP or .TP, start a hanging paragraph
    texscope()   For {, save current font
    texoutput()  For \mathreltext, output the text
    texitem()    For \item[label] (optional label in brackets, else use normal)
    textitle()   For \title{text}, output the text
    texdigraph() For \{axb}, combine 'a' & 'b' via digraph table, output it.
	

The texscope() function only stores the font. Unfortunately there is no place to store the rest of the state, such as indentation. It is important for the state information to remain small, because elvis must allocate a structure for each line of text, storing state at the start of that line. Elvis needs this information so it can refresh the display starting on any line, without reparsing the entire file each time you hit a key.

So that's my starting point. I wrote this down mostly to help me think about how it is now, how it should be extended, and what changes would be required for those extensions. I'm also hoping you can comment on it from a TeX perspective, but really anything you say is a bonus.

Design goals

  1. It should be general enough to handle tex, man, and html. I'm willing to have some tex-specific code, but I'd like to minimize that, since most elvis uses don't use tex.
  2. The amount of stored state should be kept small. Elvis needs to store a copy of the state for every line. When you make a change, it must reformat the text from that point forward, until it hits a line where the new state is identical to that state's old line.
  3. Most markups simply adjust the state, but it is possible to add a custom function to do anything. Balancing these two options, in light of items 1 and 2, is the main goal.
  4. The suggestions must be concrete enough for me to code. (Or even better, for someone else to code after I've changed the way fonts and colors are handled.) A perfect example of this is the means of detecting the end of the preamble.

I'm sure we'll have more to talk about when I make my first attempt to translate the current html, man, and tex descriptions into the external configuration file (in whatever format that file uses).

Phases/Plans

Serge and I have talked about revamping the window system to allow splits within a window, vertically or horizontally, in addition to the current method of creating a whole new window. I'd like to modify the syntax coloring to add syntax-sensitive autoindenting. I'd also like to modify ctags to use "elvis.syn" to guide the way it parses files, so it could generate tags for any kind of file that's described in elvis.syn. And then there's the change in the way fonts and colors will be handled. I'll probably do the font and color thing first, since it affects windows, syntax coloring, and markups display modes.

Now that 2.1_4 is out of the way, I've been looking into the new :color semantics, which support a large number of fonts, and allow you to change the attributes and colors of them independently. This is definitely a good thing. The color changes will require some changes to the markup display code, and while I'm at it I believe this is a good time to add a small stack of fonts — probably just 4 items (current font, two previous fonts, and a default font for if you try to return after nesting too far). Since 2.1_4 has only a current font and a default font, this should be a big improvement.

Also, fonts can be combined to some degree. For example, if an HTML file contains "<a href=#foo>Have you seen the <strong>foo</strong>?</a>" then "foo" would be displayed as a combination of "link" and "bold" fonts, instead of just the "bold" font.

The ability to combine fonts is related to another goal: I'd like to be able to have the background color change to indicate whether the has input focus or not. I intend to do this by defining both a "normal" font and an "idle" font; each window is drawn using one or the other as the base font, and any special text's font combined with the base font. Your .elvisrc file could contain...

    color normal=black on white
    color idle=black on gray
    color bold=bold
    color link=underlined green
	

... in which case a hypertext links would be underlined green on white if the window has input focus, or underlined green on gray otherwise.

We expect to do some fancy things with windows (separate, vertical panes, horizontal panes, and all combinations) which will be nice, but I'm worried that people will become confused about window focus. I know I will. My X server is configured to have focus follow the mouse, so I don't need to click on a window to type into it. But if a single application window contains multiple text panes, I'll need some reminder about which one I'm typing into.

I have *NOT* yet started to design the user-configurable version of the markup display modes. I just looked into it enough to reassure myself that the color changes I'm going to be making soon will not be discarded completely when it I do add the user-configurable markups. The little stack is just about the only change I'll be making for the colors, and that's almost unavoidable because the current (2.1_4) code only uses 3 bits to store a font, and I need a lot more to support the larger number of fonts.

Oh, by the way, another reason for the color change was so that users could have more control of coloring. There will be a lot more colors available, and the colors won't inherently be associated with bold, italic, or underlined attributes.


Send feedback on this page to Peter Karp