http://www.karpfenteich.net/peter/ElvisMeetsTeX.html Last modified: 22th February, 2002 Version: 1.04
Do you like to edit your LaTeX files with Vi/Elvis/Vim? Would you love to have a fast overview over the plain text without all the "disturbing" LaTeX-Tags and without the need to load the DVI previewer? Elvis already has a basic LaTeX browse mode, but needs some improvements to be useable in daily work. This homepage is aimed to help Steve Kirkendall to find all possible LaTeX structures and how they could be realized in Elvis. Steve plans to make some improvements to the markup display modes, but does need our help, as Steve doesn't use LaTeX himself. I'm very grateful that Steve adds/improves LaTeX support to Elvis :-)
When you're willing to make some suggestions how specific LaTeX commands could be represented you can take a look at Implementation ideas for the LaTeX browse mode.
When you're further interested in the way Elvis handles the display markup mode you'll find some Elvis insights from Steve.
When editing a LaTeX file it could be helpful to provide a command to hide all or most of the LaTeX tags. This would help to get a better impression of the final document and would help to find a specific text faster. In addition Elvis could preformat the LaTeX code with some elementary rules to a pseudo dvi-preview. The LaTeX browse mode should not make a real dvi previewer obsolete!
To get a first impression of a (rudimentary) working LaTeX browse mode you can take a look at the vi clone Elvis.
In the following you'll find an overlook of the different categories and you'll see what the problem is and which solutions could be used. If you're aware of another categorie which must be handled or you'd would like to make a suggestion/improvement/idea how a specific problem could be handled you can e-mail this suggestion to Peter.
This \emph{important} word can't be seen in Elvis 2.1.
Parsing is still rather awful, mostly because elvis' markup mode is oriented toward HTML-style markups. The parser must do the following:
By the way, ":set list" currently causes all marks to be visible. Maybe that will help you use the elvis 2.1's tex mode.
The big change will be user-configurable markup display modes. Currently (in 2.1) there are three markup modes: html, man, and tex. They all share a single text output routine, which adds words or whitespace to the output and also interprets the markups. Each mode, however, has its own setup function and its own "get next token" function, which is responsible for parsing the next markup or plain text word. Also, markups can invoke a custom C function.
Getting HTML tokens is simplest -- in part because of the nature of the tags, but mostly this is because I wrote the markup display mode for HTML first, and then wedged the man & tex display modes into it as best I could. To get a man token, I just check the next character, and that determines whether it is whitespace, normal text, or (if it begins with '<') a markup. Markups always end at the next '>'. Very simple.
Man mode is the next simplest. A few markups begin with a backslash, but most of them begin with a '.' at the front of a line, and end at the end of that same line. A few of the '.' commands also read the following line, but that's as tricky as it gets. Oh, and blank lines are treated as paragraph breaks.
The tex mode's "get next token" is very complex. Some markups cause the line to be read and ignored, some are followed by optional text in braces (which may be displayed normally, displayed in a special font, or ignored), some use a combination of brackets and braces, etc. Basically, this one is a mess.
Put all of that is intended to deal with the differences in how the markups are parsed. The code for displaying the text is common to all of them. Each markup has a small structure associated with it (by the "get next token" function), which describes what effect it has on the output. Each of the three display modes' "get next token" functions contain a table of those data structures, describing the known markups.
For 2.1, those tables are hardcoded. For 2.2, I'd like to make them be configurable. Instead of just HTML, it could handle MIME rich text, XML, and SGML. Instead of just nroff -man, it could handle -mm, and the various -ms implementations. Instead of just (a tiny subset of) TeX, it could handle a *configurable* (but still tiny subset of) TeX, or RTF. Clearly this is desirable!
My first thought was to store those tables in a file, sort of like the "elvis.syn" file for the syntax display modes. However, it occurs to me that if the table was configured via commands, then elvis could scan the file when loading it, and conditionally execute commands based on the presence of certain text. For example, this would allow XML documents to refer to a style sheet, and have elvis configure itself to resemble the output of that style sheet.
Currently, the data structure that describes a markup's behavior looks like this:
typedef struct markup_s
{
char *name; /* name of the markup */
char attr[8]; /* attributes of markup */
twrap_t (*fn)P_((TOKEN *));/* ptr to special function */
} MARKUP;
#define TITLE attr[0] /* in title: -, N, Y */
#define BREAKLN attr[1] /* line break: -, 0, 1, 2, c, or p */
#define INDENT attr[2] /* -, <, >, or a number */
#define LIST attr[3] /* in list: -, N, Y, # */
#define FONT attr[4] /* font: -,=,n,b,u,i,f,e,N,B,U,I,F,E */
#define FILL attr[5] /* Y=fill, N=preformatted, -=no chg. */
#define DEST attr[6] /* S=section, P=paragraph, T=<tab> key */
The attributes deserve some comments:
TITLE is used when printing the document; it is displayed on the header at
the top of each page. To find the document, the print routine starts
reading from the top of the file, past any markups which have TITLE='Y',
until it encounters non-whitespace text. It then accumulates the text
up to the next markup which has TITLE='N'.
BREAKLN is used for causing line breaks. '-' is used for markups which don't cause a line break. '0', '1', and '2' cause a line break, plus that many blank lines. 'p' causes a page break when printing, or two blank lines on the screen. 'c' is a conditional page break; it acts like 'p' near the bottom of a page, or '2' elsewhere on a page.
INDENT adjusts indentation. Indentation is always done in increments of half of the value of elvis' "shiftwidth" option. '-' leaves it unchanged, '>' increases indentation, '<' reduces indentation, and any digit sets the indentation to the given number. For example, '0' forces text to be flush against the left edge. The default indentation for most text is '2', which means a full shiftwidth.
LIST is used for starting or ending lists. '-' doesn't change the list status, 'Y' starts an non-numbered list, '#' starts a numbered list, and 'N' ends either type of list. List can be nested, but there is only one item counter, so you can't nest one numbered list inside another; the inner lists are always displayed as non-numbered.
FONT is used for changing the font. Elvis actually has two variables for storing fonts — one for the current displayed font, and another for storing a "normal" font for this context. '-' leaves both fonts unchanged. '=' sets the current font to the context's normal font. Lowercase letters set the current font to one of elvis' 6 main fonts, but leave the context's normal font unchanged. Uppercase letters set both font variables.
(Note: This is going to change. For 2.2, I intend to support a large number of fonts. They will be identified by a name, not just a letter. The :color attribute will associate attributes to a given font. By the time I get around to making the markup display modes be configurable, the FONT field will have to be radically changed.)
FILL controls whether the whitespace and line breaks should be adjusted. Filling is always enabled by default, but you'll want to disable filling for preformatted sections of text so that their formatting remains intact. '-' doesn't affect filling, 'Y' enables filling, and 'N' disables filling.
DEST affects the behavior of some cursor-movement commands. It was never fully implemented in elvis 2.1, but it is intended to work like this: '-' has no effect, 'S' marks a stopping point for the [[ and ]] section commands, 'P' marks a stopping point for the { and } paragraph commands, and 'T' marks a stopping point for the <Tab> key. Currently, only the <Tab> key support is implemented; the "html" display mode uses it to move to the next link, the next field of a form.
The "fn" function pointer, if non-NULL, points to a function which is called after the normal "attr" operations. It is passed the input token which includes both the markup data structure, and a buffer containing the source text for that token with all its arguments. It returns one of the following:
TWRAP_NO This is the usual return value. It indicates that no
extra processing was necessary.
TWRAP_BEFORE This indicates that elvis' text-fill routine should
perform a linewrap, and then call the markup function
again.
The man display mode uses this for ".I italic text".
If the italic text fits on the current line then it'll
output it and return TWRAP_NO. If it is too long for
the current line then it'll return TWRAP_BEFORE.
TWRAP_AFTER This indicates that elvis' text-fill routine should
perform a linewrap, but the markup function should *NOT*
be called again.
The man display mode uses this for ".IP label text".
This starts a new paragraph, outputs the label text,
and then increases the indentation for the rest of the
paragraph. If the label text is short enough, then
the paragraph continues on the same line and the function
returns TWRAP_NO. Otherwise it returns TWRAP_AFTER so
the paragraph continues on the next line.
Obviously these functions are allowed to output text. I now have a pool of functions to support the most common formatting tasks. The names of these functions indicate which display mode they're intended to be used with, but (unless they need to parse the arguments) can be used with any markup display mode. The current functions are:
htmlimg() For <img>, output "alt=" text if any, else output "src=" text
htmlpre() For <pre>, check for elvis' "graphic" argument
htmlli() For <li>, output a bullet or number
htmlinput() For <input> and similar form tags, simulate part of a form
htmla() For <a>, switch to underline if href, but not if anchor
htmlhr() For <hr> or TeX's \hline, output a horizontal rule
manTH() For .TH, which is invisible but defines the title
manSH() For .SH or .SS, output text as a header
manBI() For .BI and similar font commands, output text or change font
manIP() For .IP or .TP, start a hanging paragraph
texscope() For {, save current font
texoutput() For \mathreltext, output the text
texitem() For \item[label] (optional label in brackets, else use normal)
textitle() For \title{text}, output the text
texdigraph() For \{axb}, combine 'a' & 'b' via digraph table, output it.
The texscope() function only stores the font. Unfortunately there is no place to store the rest of the state, such as indentation. It is important for the state information to remain small, because elvis must allocate a structure for each line of text, storing state at the start of that line. Elvis needs this information so it can refresh the display starting on any line, without reparsing the entire file each time you hit a key.
So that's my starting point. I wrote this down mostly to help me think about how it is now, how it should be extended, and what changes would be required for those extensions. I'm also hoping you can comment on it from a TeX perspective, but really anything you say is a bonus.
I'm sure we'll have more to talk about when I make my first attempt to translate the current html, man, and tex descriptions into the external configuration file (in whatever format that file uses).
Serge and I have talked about revamping the window system to allow splits within a window, vertically or horizontally, in addition to the current method of creating a whole new window. I'd like to modify the syntax coloring to add syntax-sensitive autoindenting. I'd also like to modify ctags to use "elvis.syn" to guide the way it parses files, so it could generate tags for any kind of file that's described in elvis.syn. And then there's the change in the way fonts and colors will be handled. I'll probably do the font and color thing first, since it affects windows, syntax coloring, and markups display modes.
Now that 2.1_4 is out of the way, I've been looking into the new :color semantics, which support a large number of fonts, and allow you to change the attributes and colors of them independently. This is definitely a good thing. The color changes will require some changes to the markup display code, and while I'm at it I believe this is a good time to add a small stack of fonts — probably just 4 items (current font, two previous fonts, and a default font for if you try to return after nesting too far). Since 2.1_4 has only a current font and a default font, this should be a big improvement.
Also, fonts can be combined to some degree. For example, if an HTML file contains "<a href=#foo>Have you seen the <strong>foo</strong>?</a>" then "foo" would be displayed as a combination of "link" and "bold" fonts, instead of just the "bold" font.
The ability to combine fonts is related to another goal: I'd like to be able to have the background color change to indicate whether the has input focus or not. I intend to do this by defining both a "normal" font and an "idle" font; each window is drawn using one or the other as the base font, and any special text's font combined with the base font. Your .elvisrc file could contain...
color normal=black on white
color idle=black on gray
color bold=bold
color link=underlined green
... in which case a hypertext links would be underlined green on white if the window has input focus, or underlined green on gray otherwise.
We expect to do some fancy things with windows (separate, vertical panes, horizontal panes, and all combinations) which will be nice, but I'm worried that people will become confused about window focus. I know I will. My X server is configured to have focus follow the mouse, so I don't need to click on a window to type into it. But if a single application window contains multiple text panes, I'll need some reminder about which one I'm typing into.
I have *NOT* yet started to design the user-configurable version of the markup display modes. I just looked into it enough to reassure myself that the color changes I'm going to be making soon will not be discarded completely when it I do add the user-configurable markups. The little stack is just about the only change I'll be making for the colors, and that's almost unavoidable because the current (2.1_4) code only uses 3 bits to store a font, and I need a lot more to support the larger number of fonts.
Oh, by the way, another reason for the color change was so that users could have more control of coloring. There will be a lot more colors available, and the colors won't inherently be associated with bold, italic, or underlined attributes.
Send feedback on this page to Peter Karp