Using Org-Mode Table Formatting Functions
Introduction
In large-scale data analysis, one often associates integers with
parameter name rather than strings. The performance and storage
difference is significant for multi-gigabyte data sets. But integers
are not at all convenient or descriptive. Systems like R provide a
factor data type that translates the stored integers into user-level
strings. Emulating that construct in SQL is handy for data transfer
with SQLite or out-of-core analysis in R using a SQL back-end. That
leaves the problem of maintaining a registry of integer level codes,
string names, and documentation.
Org-mode provide a fast, light-weight table mechanism that can be sent in email, bundled with code, or embedded in documentation. The tables can be transformed and placed elsewhere. It sounds like a perfect registry, generating documentation and code from one data table. Similar techniques could be used in a multi-lingual document to store many translations in one table and send them to sections in specific languages.
We start with a simple table:
| Name | Level | Description |
|---|---|---|
| normx | 1 | norm(x, ∞) |
| normb | 2 | norm(b, ∞) |
| normA | 3 | norm(A, ∞) |
That one source table contains the documentation in the first and third column:
#+BEGIN RECEIVE ORGTBL exdoc
| Name | Description |
|---|---|
normx |
norm(x, ∞) |
normb |
norm(b, ∞) |
normA |
norm(A, ∞) |
#+END RECEIVE ORGTBL exdoc
The first two columns of the source table provide the data we must
transfer the SQL. The third column can be used to embed some
documentation into the table itself after the string is sanitized for
SQL. The remaining necessary information, the SQL destination table
name, can be provided as a parameter to #+ORGTBL: SEND, producing the
following code chunk:
#+BEGIN RECEIVE ORGTBL exsql
BEGIN TRANSACTION; INSERT INTO extbl( Name, Level, Description ) VALUES ( 'normx' , 1 , 'norm(x, infty)' ); INSERT INTO extbl( Name, Level, Description ) VALUES ( 'normb' , 2 , 'norm(b, infty)' ); INSERT INTO extbl( Name, Level, Description ) VALUES ( 'normA' , 3 , 'norm(A, infty)' ); COMMIT;
#+END RECEIVE ORGTBL exsql
We will explain the parameters used to produce both outputs. The SQL
insertion statements use functions as formatting parameters, some of
which are called purely for the side-effect of gathering the header
fields. The SQL-generating code is distributed with org-mode in
contrib/lisp/orgtbl-sqlinsert.el.
Specifying multiple destinations for a single table
Sending one table to multiple destinations is straight-forward. Add one
SEND directive for each destination. For example, the first table has
the following two directives prepended, with parameters described later:
#+ORGTBL: SEND exdoc orgtbl-to-orgtbl ... #+ORGTBL: SEND exsql orgtbl-to-sqlinsert ...
The documentation removes the second column and adds fiddly formatting parameters with
:skipcols (2) :fmt (1 "=%s=") :hfmt (1 "%s")
The SQL-generating line gathers the destination table name and passes integers through unchanged with the parameters
:sqlname "extbl" :fmt (2 "%s")
The SQL table name defaults to the name of the target, exsql in this
case. And the default formatting used for other columns is
orgtbl-sql-strip-and-quote. That routine only removes potentially
non-portable constructs; it is not designed to prevent insertion
attacks.
We could apply orgtbl-sql-strip-and-quote to the first column of the
documentation table to ensure the strings match exactly, but it easier
to use simple, non-mangled strings as names.
By default, a block of insertions is wrapped in BEGIN TRANSACTION and
COMMIT statements. These can be supressed by setting :tstart and
:tend to nil. The example used in this document uses a
"double-embedding" trick to wrap the statement in an org-mode code
block:
:tstart "#+BEGIN_EXAMPLE\nBEGIN TRANSACTION;" :tend "COMMIT;\n#+END_EXAMPLE"
Similar wrapping can embed the SQL statements into literate programs.
There is built-in support for Noweb with the :nowebname parameter.
Setting :nowebname to a string wraps the insertions in a Noweb code
chunk named with the string.
Formatting with functions for side effects and display
The orgtbl-to-sqlinsert routine calls orgtbl-to-generic for all the
generic table parsing. The parameters provide an example of using
functions for gathering data as well as formatting. Emacs Lisp's
dynamic binding allows manipulating any symbols in the current
environment, so the formatting functions do not need to pass parameters
through the outer functions.
The default :tstart parameter is one example used strictly for
formatting. After the :nowebname parameter is decoded and bound to
nowebname, it can be checked within a thunk to produce the starting
string:
:tstart (lambda () (concat (if nowebname
(format "<<%s>>= \n" nowebname)
"")
"BEGIN TRANSACTION;"))
The functions need not be pure. The header formatting gathers the
first header line into the variable hdrlist with
:hfmt (lambda (f) (progn (if firstheader (push f hdrlist)) ""))
Then each line is preceded with a function that uses hdrlist to ensure
data values are associated with named columns rather than just
positions.
:lstart (lambda () (concat "INSERT INTO "
sqlname "( "
(mapconcat 'identity (reverse hdrlist)
", ")
" )" (if breakvals "\n" " ")
"VALUES ( "))
Note that orgtbl-to-sqlinsert takes advantage of org-mode's applying
the formatting to each cell before checking for a line-formatting
function. The header line itself and sectioning line are suppressed
with the settings
:hlfmt (lambda (lst) (setq firstheader nil)) :hline nil :remove-nil-lines t
Similar techniques could be used to generate a table's SQL definition from the second header line.
Current limitations
One current limitation is that all the tables are in the same Emacs buffer and hence the same text file. A literate programming mechanism like Noweb can separate the chunks.
Also, the tables must be sent manually. Writing a function that scans
an entire buffer for all SEND and RECEIVE pairs is feasible, as is
using overlays to manage automatic updates.