Maxim Nikulin <manikulin@gmail.com> writes:

> I am trying to minimize number of regexp searches. Mostly it is applied 
> when information concerning multiple headings is required (agenda, 
> refile targets). It unlikely will get some benefits during interactive 
> calls related to single heading.

> Having the tree, it is possible to instantly find heading for 
> *arbitrary* position in the buffer. Likely the next operation is goto to 
> the heading or to it parent and parsing the line for more detailed 
> properties. The latter is cacheable, structure for heading properties 
> can be expanded.

Thanks for the explanation! I very much like this idea, though I do not
think that it is going to be practical on actual large files. Your code
runs for 0.25sec on my largest file:

| scan x10                         | 2.482179163 | 1 | 0.38996822200002157 |
| btree x10                        | 0.103329151 | 0 |                 0.0 |
| scan+btree x10                   | 2.666915096 | 1 |  0.3657946569999808 |
| find random x10 000              | 0.048668756 | 0 |                 0.0 |
| nodes                            |       16641 |   |                     |

> Since there is no operations as insertion or deletion of nodes from 
> tree, no complex code is required to implement balancing rotations. That 
> is why I think that avl-tree is an overkill.

Yet, if we could keep the tree in sync with buffer modifications... And
we can, using org-element-cache. It already uses AVL-tree and already
syncs it automatically on buffer edits. The only problem is that it does
not work with headlines. However, I wrote some code adding headline
support to org-element-cache in https://github.com/yantar92/org.

Some benchmarks:

| scan x10                                 |        2.179845106 | 0 |               0.0 |
| btree x10                                |        0.474887597 | 1 | 0.364201286999986 |
| scan+btree x10                           |        2.318029902 | 0 |               0.0 |
| scan using org-element-cache x10         | 2.8941826990000004 | 0 |               0.0 |
| populating the cache x1                  |       11.332366941 | 5 | 2.040159146999997 |
| find random x10 000                      |        0.053074866 | 0 |               0.0 |
| find random in org-element-cache x10 000 |        0.007963375 | 0 |               0.0 |
| nodes                                    |              16641 |   |                   |

Scan through the file takes pretty much the same time with the btree
approach. Sometimes more, sometimes less. And most of the time is really
just re-search-forward. Find random showcases that avl-tree vs. btree
really makes a difference on such large number of headings.

On the worse side, initial population of the org-element-cache takes
quite a bit of time. However, it is normally done asynchronously during
idle time and does not really affect the user except the first loading
(which might be further optimised if we store cache between Emacs
sessions). Later, cache is updated automatically with buffer
modifications.

Moreover, the org-element-cache will already provide the parsed
headlines, including titles (and tags and other useful staff). No need
to match headline text with regexp.

For me, the idea of storing headlines in cache looks promising. I can
imagine many other Org functions using the cache after trivial
modifications. At least org-get-tags, org-get-category, org-entry-get
can be easily adapted to use the cache. What do you think?

> See the attachment for experimental (thus almost untested) code. Likely 
> you will find code style quite ugly. I am not happy with 0.1 sec for a 
> moderately large file. It is close to the limit for comfortable 
> interactive operations.

No worries about the style. It is testing anyway.

I modified your code slightly for testing the org-element-cache. See the
attached diff.

Best,
Ihor