Using Org-Mode Table Formatting Functions
Introduction
In large-scale data analysis, one often associates integers with
parameter name rather than strings. The performance and storage
difference is significant for multi-gigabyte data sets. But integers
are not at all convenient or descriptive. Systems like R provide a
factor
data type that translates the stored integers into user-level
strings. Emulating that construct in SQL is handy for data transfer
with SQLite or out-of-core analysis in R using a SQL back-end. That
leaves the problem of maintaining a registry of integer level codes,
string names, and documentation.
Org-mode provide a fast, light-weight table mechanism that can be sent in email, bundled with code, or embedded in documentation. The tables can be transformed and placed elsewhere. It sounds like a perfect registry, generating documentation and code from one data table. Similar techniques could be used in a multi-lingual document to store many translations in one table and send them to sections in specific languages.
We start with a simple table:
Name | Level | Description |
---|---|---|
normx | 1 | norm(x, ∞) |
normb | 2 | norm(b, ∞) |
normA | 3 | norm(A, ∞) |
That one source table contains the documentation in the first and third column:
#+BEGIN RECEIVE ORGTBL exdoc
Name | Description |
---|---|
normx |
norm(x, ∞) |
normb |
norm(b, ∞) |
normA |
norm(A, ∞) |
#+END RECEIVE ORGTBL exdoc
The first two columns of the source table provide the data we must
transfer the SQL. The third column can be used to embed some
documentation into the table itself after the string is sanitized for
SQL. The remaining necessary information, the SQL destination table
name, can be provided as a parameter to #+ORGTBL: SEND
, producing the
following code chunk:
#+BEGIN RECEIVE ORGTBL exsql
BEGIN TRANSACTION; INSERT INTO extbl( Name, Level, Description ) VALUES ( 'normx' , 1 , 'norm(x, infty)' ); INSERT INTO extbl( Name, Level, Description ) VALUES ( 'normb' , 2 , 'norm(b, infty)' ); INSERT INTO extbl( Name, Level, Description ) VALUES ( 'normA' , 3 , 'norm(A, infty)' ); COMMIT;
#+END RECEIVE ORGTBL exsql
We will explain the parameters used to produce both outputs. The SQL
insertion statements use functions as formatting parameters, some of
which are called purely for the side-effect of gathering the header
fields. The SQL-generating code is distributed with org-mode in
contrib/lisp/orgtbl-sqlinsert.el
.
Specifying multiple destinations for a single table
Sending one table to multiple destinations is straight-forward. Add one
SEND
directive for each destination. For example, the first table has
the following two directives prepended, with parameters described later:
#+ORGTBL: SEND exdoc orgtbl-to-orgtbl ... #+ORGTBL: SEND exsql orgtbl-to-sqlinsert ...
The documentation removes the second column and adds fiddly formatting parameters with
:skipcols (2) :fmt (1 "=%s=") :hfmt (1 "%s")
The SQL-generating line gathers the destination table name and passes integers through unchanged with the parameters
:sqlname "extbl" :fmt (2 "%s")
The SQL table name defaults to the name of the target, exsql
in this
case. And the default formatting used for other columns is
orgtbl-sql-strip-and-quote
. That routine only removes potentially
non-portable constructs; it is not designed to prevent insertion
attacks.
We could apply orgtbl-sql-strip-and-quote
to the first column of the
documentation table to ensure the strings match exactly, but it easier
to use simple, non-mangled strings as names.
By default, a block of insertions is wrapped in BEGIN TRANSACTION
and
COMMIT
statements. These can be supressed by setting :tstart
and
:tend
to nil
. The example used in this document uses a
"double-embedding" trick to wrap the statement in an org-mode code
block:
:tstart "#+BEGIN_EXAMPLE\nBEGIN TRANSACTION;" :tend "COMMIT;\n#+END_EXAMPLE"
Similar wrapping can embed the SQL statements into literate programs.
There is built-in support for Noweb with the :nowebname
parameter.
Setting :nowebname
to a string wraps the insertions in a Noweb code
chunk named with the string.
Formatting with functions for side effects and display
The orgtbl-to-sqlinsert
routine calls orgtbl-to-generic
for all the
generic table parsing. The parameters provide an example of using
functions for gathering data as well as formatting. Emacs Lisp's
dynamic binding allows manipulating any symbols in the current
environment, so the formatting functions do not need to pass parameters
through the outer functions.
The default :tstart
parameter is one example used strictly for
formatting. After the :nowebname
parameter is decoded and bound to
nowebname
, it can be checked within a thunk to produce the starting
string:
:tstart (lambda () (concat (if nowebname (format "<<%s>>= \n" nowebname) "") "BEGIN TRANSACTION;"))
The functions need not be pure. The header formatting gathers the
first header line into the variable hdrlist
with
:hfmt (lambda (f) (progn (if firstheader (push f hdrlist)) ""))
Then each line is preceded with a function that uses hdrlist
to ensure
data values are associated with named columns rather than just
positions.
:lstart (lambda () (concat "INSERT INTO " sqlname "( " (mapconcat 'identity (reverse hdrlist) ", ") " )" (if breakvals "\n" " ") "VALUES ( "))
Note that orgtbl-to-sqlinsert
takes advantage of org-mode's applying
the formatting to each cell before checking for a line-formatting
function. The header line itself and sectioning line are suppressed
with the settings
:hlfmt (lambda (lst) (setq firstheader nil)) :hline nil :remove-nil-lines t
Similar techniques could be used to generate a table's SQL definition from the second header line.
Current limitations
One current limitation is that all the tables are in the same Emacs buffer and hence the same text file. A literate programming mechanism like Noweb can separate the chunks.
Also, the tables must be sent manually. Writing a function that scans
an entire buffer for all SEND
and RECEIVE
pairs is feasible, as is
using overlays to manage automatic updates.