Org tutorial on table lookup functions

Introduction

Org provides three different functions for performing searches and data dependent calculations in tables. These functions can, among other things, be used to implement associative arrays, count matching cells, rank results, or group data. The following examples will hopefully help you in getting started with these functions.

Associative array with unique keys

The most straightforward use of lookups is to treat part of an org table as an associative array: a key can be used to look up a corresponding value.

Say you are taking a trip to Scandinavia, and you want to keep track of how much money you have spent on the trip. You decide to convert all sums to euros. Before your trip you write down the following table of approximate currency rates.

 #+TBLNAME: rates
| currency        | abbreviation | euros |
|-----------------+--------------+-------|
| euro            | eur          |     1 |
| Norwegian krone | nok          |  0.14 |
| Swedish krona   | sek          |  0.12 |
| US dollar       | usd          |  0.77 |

In what follows we will use the function org-lookup-first and the previous table rates to automatically convert the sums in different currencies to euros. The signature of function org-lookup-first looks as follows:

(org-lookup-first VAL S-LIST R-LIST &optional PREDICATE)

Assuming that PREDICATE is nil, in which case the default predicate equal is used, this function does a search for the first instance of VAL in S-LIST and returns the a value from the corresponding position in R-LIST. In the table below, each sum is assigned a currency abbreviation; a lookup is done in table rates above in the second column for the corresponding abbreviation, and then the corresponding rate is returned from the third column. For each row only the first four columns need to filled; columns 5 and 6 are calculated automatically. Notice that an error results if the key is not found: in the last row, an empty key is being searched for.

|  date | expense          |  sum | currency |   rate |  euros |
|-------+------------------+------+----------+--------+--------|
|  1.3. | flights          |  324 | eur      |      1 |    324 |
|  4.6. | books and maps   |  243 | usd      |   0.77 | 187.11 |
| 30.7. | rental car       | 8300 | sek      |   0.12 |   996. |
|  2.7. | hotel            | 1150 | sek      |   0.12 |   138. |
|  2.7. | lunch            |  190 | sek      |   0.12 |   22.8 |
|  3.7. | fishing licenses | 1400 | nok      |   0.14 |   196. |
|  3.7. | gasoline         |  340 |          | #ERROR | #ERROR |
 #+TBLFM: $5='(org-lookup-first $4 '(remote(rates,@2$2..@>$2)) '(remote(rates,@2$3..@>$3)))::$6=$5*$3

Multiple matches with preferred ordering

A common task for teachers is the assignment of exam grades from total marks. The starting point for such grading is a table with grade boundaries. Below is one such table, with the rows in increasing order of the lower bound required for a particular grade.

 #+TBLNAME: grade-boundaries
| lower bound | grade |
|-------------+-------|
|           0 | F     |
|          10 | D     |
|          20 | C     |
|          30 | B     |
|          40 | A     |

We will use the function org-lookup-last and the previous table grade-boundaries to assign grades to students based on their marks. The signature of function org-lookup-last is exactly like the signature of org-lookup-first:

(org-lookup-last VAL S-LIST R-LIST &optional PREDICATE)

However, this function does a search for the last match in S-LIST and returns the a value from the corresponding position in R-LIST. Here the idea of the lookup used in assigning the grade is as follows. Say a student's exam result is 33 marks. We look for the last row in the table for which the student's marks are greater than or equal to the lower bound; in this case it is the row with lower bound 30. The student's grade is the corresponding element from the second column, in this case a B.

Thus, given the number of marks VAL of a student, we find the last row of the first column of table grade-boundaries for which the lower bound S fulfils (>= VAL S). Thus we will use >= as PREDICATE to perform the matching. Note that VAL and S are assigned as arguments to the predicate in the order they are in the signature of org-lookup-last, where VAL precedes S-LIST. The following table does the conversion from total marks to the final grade. Notice the literal interpolation L of table values into the Elisp formula, which is needed because some values are numbers and some are symbols.

| student | marks | grade |
|---------+-------+-------|
| X       |    30 | B     |
| Y       |    29 | C     |
| Z       |     5 | F     |
| W       |    55 | A     |
 #+TBLFM: $3='(org-lookup-last $2 '(remote(grade-boundaries,@2$1..@>$1)) '(remote(grade-boundaries,@2$2..@>$2)) '>=);L

Counting matching cells

The function org-lookup-all can not be used by itself in a table equation, because it returns a list of values. However, powerful lookup tasks can be performed by combining the function with other Elisp functions.

As a simple example consider counting the number of missing values in a table. The signature of function org-lookup-all is exactly like the signatures of the other two lookup functions:

(org-lookup-all VAL S-LIST R-LIST &optional PREDICATE)

However, this function does a search for the all matches in S-LIST and returns the all corresponding values from the corresponding positions in R-LIST. As is the case with org-lookup-first and org-lookup-last, if R-LIST is nil, then the corresponding matching values of S-LIST are returned directly. Notice the use of the E flag to retain empty fields in the range. Also notice that in this case we are doing the lookup in a true two-dimensional range, which is thus also possible

| group | round 1 | round 2 |
|-------+---------+---------|
| A     |         |     2.4 |
| B     |     4.7 |      11 |
| C     |         |         |
| D     |       5 |         |
| E     |         |     7.2 |
| F     |     3.2 |     4.3 |
| G     |         |     4.4 |
| H     |         |       8 |
|-------+---------+---------|
| total | missing |       7 |
 #+TBLFM: @>$3='(length(org-lookup-all "" '(@2$2..@-1$3) nil));E

Ranking results

Another example application of org-lookup-all is an automatic ranking of results. In the table below, a larger total number of marks is better. Notice that the Elisp expression also automatically takes care of ties.

| group | marks | rank |
|-------+-------+------|
| A     |    22 |    2 |
| B     |    22 |    2 |
| C     |    14 |    4 |
| D     |    28 |    1 |
| E     |     9 |    5 |
 #+TBLFM: $3='(+ 1 (length (org-lookup-all $2 '(@2$2..@>$2) nil '<)));N

Frequency counts from raw data

A common situation in the analysis of data is the classification (grouping) of raw data values for, e.g., visualisation. Often this is done by counting the frequencies of observations within certain bounds. The function org-lookup-all, combined with other Elisp functions, can be used to perform this task. This example also shows how to construct more complicated lookup rules using multiple values from a table.

Consider the following table with different results from different groups A-I.

 #+TBLNAME: raw-data
| group | result |
|-------+--------|
| A     |    2.3 |
| B     |    4.2 |
| C     |    1.1 |
| D     |    3.6 |
| E     |    4.5 |
| F     |    2.4 |
| G     |    1.0 |
| H     |    2.3 |
| I     |    2.8 |

We will classify the results into different, mutually exclusive classes. For example, the observations that will belong to the first class are in the interval [1, 1.9] (endpoints included). In order to perform this classification, we define the following two-place predicate function in-interval. Notice that the first parameter of this function is a pair whose first element is the lower bound and second member the upper bound of the interval.

#+BEGIN_SRC emacs-lisp
  (defun in-interval (bounds el)
    (and (>= el (car bounds)) (<= el (cadr bounds))))
#+END_SRC

#+RESULTS:
: in-interval

Using this predicate function, we can construct a table with class boundaries and corresponding frequencies. Note that the first argument to the function org-lookup-all, which is then passed over as the first argument to the predicate in-interval, is the pair of bounds.

| lower bound | upper bound | frequency |
|-------------+-------------+-----------|
|           1 |         1.9 |         2 |
|           2 |         2.9 |         4 |
|           3 |         3.9 |         1 |
|           4 |         4.9 |         2 |
 #+TBLFM: $3='(length (org-lookup-all '($1 $2) '(remote(raw-data,@2$2..@>$2)) nil 'in-interval));N

Conclusion

The org lookup functions can be used for a large number of different data-dependent calculations. For example, the following spreadsheet operations familiar to libreoffice or Excel users can be implemented using them: HLOOKUP, VLOOKUP, COUNTIF, SUMIF and FREQUENCY. If you have other interesting examples of the use of these functions, feel free to send them to the org mailing list and we will be happy to add them on this page.