Org-mode mailing list
 help / color / mirror / Atom feed
From: Ihor Radchenko <yantar92@gmail.com>
To: Maxim Nikulin <manikulin@gmail.com>
Cc: emacs-orgmode@gnu.org
Subject: Re: [PATCH] Use cache in org-up-heading-safe
Date: Mon, 10 May 2021 23:14:14 +0800
Message-ID: <87eeeey62h.fsf@localhost> (raw)
In-Reply-To: <s75sla$1fm$1@ciao.gmane.io>

[-- Attachment #1: Type: text/plain, Size: 3913 bytes --]

Maxim Nikulin <manikulin@gmail.com> writes:

> I am trying to minimize number of regexp searches. Mostly it is applied 
> when information concerning multiple headings is required (agenda, 
> refile targets). It unlikely will get some benefits during interactive 
> calls related to single heading.

> Having the tree, it is possible to instantly find heading for 
> *arbitrary* position in the buffer. Likely the next operation is goto to 
> the heading or to it parent and parsing the line for more detailed 
> properties. The latter is cacheable, structure for heading properties 
> can be expanded.

Thanks for the explanation! I very much like this idea, though I do not
think that it is going to be practical on actual large files. Your code
runs for 0.25sec on my largest file:

| scan x10                         | 2.482179163 | 1 | 0.38996822200002157 |
| btree x10                        | 0.103329151 | 0 |                 0.0 |
| scan+btree x10                   | 2.666915096 | 1 |  0.3657946569999808 |
| find random x10 000              | 0.048668756 | 0 |                 0.0 |
| nodes                            |       16641 |   |                     |

> Since there is no operations as insertion or deletion of nodes from 
> tree, no complex code is required to implement balancing rotations. That 
> is why I think that avl-tree is an overkill.

Yet, if we could keep the tree in sync with buffer modifications... And
we can, using org-element-cache. It already uses AVL-tree and already
syncs it automatically on buffer edits. The only problem is that it does
not work with headlines. However, I wrote some code adding headline
support to org-element-cache in https://github.com/yantar92/org.

Some benchmarks:

| scan x10                                 |        2.179845106 | 0 |               0.0 |
| btree x10                                |        0.474887597 | 1 | 0.364201286999986 |
| scan+btree x10                           |        2.318029902 | 0 |               0.0 |
| scan using org-element-cache x10         | 2.8941826990000004 | 0 |               0.0 |
| populating the cache x1                  |       11.332366941 | 5 | 2.040159146999997 |
| find random x10 000                      |        0.053074866 | 0 |               0.0 |
| find random in org-element-cache x10 000 |        0.007963375 | 0 |               0.0 |
| nodes                                    |              16641 |   |                   |

Scan through the file takes pretty much the same time with the btree
approach. Sometimes more, sometimes less. And most of the time is really
just re-search-forward. Find random showcases that avl-tree vs. btree
really makes a difference on such large number of headings.

On the worse side, initial population of the org-element-cache takes
quite a bit of time. However, it is normally done asynchronously during
idle time and does not really affect the user except the first loading
(which might be further optimised if we store cache between Emacs
sessions). Later, cache is updated automatically with buffer
modifications.

Moreover, the org-element-cache will already provide the parsed
headlines, including titles (and tags and other useful staff). No need
to match headline text with regexp.

For me, the idea of storing headlines in cache looks promising. I can
imagine many other Org functions using the cache after trivial
modifications. At least org-get-tags, org-get-category, org-entry-get
can be easily adapted to use the cache. What do you think?

> See the attachment for experimental (thus almost untested) code. Likely 
> you will find code style quite ugly. I am not happy with 0.1 sec for a 
> moderately large file. It is close to the limit for comfortable 
> interactive operations.

No worries about the style. It is testing anyway.

I modified your code slightly for testing the org-element-cache. See the
attached diff.

Best,
Ihor


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: diff-btree.diff --]
[-- Type: text/x-diff, Size: 3548 bytes --]

diff -u /tmp/nm-btree-1.org /tmp/nm-btree.org
--- /tmp/nm-btree-1.org	2021-05-10 23:04:21.286395360 +0800
+++ /tmp/nm-btree.org	2021-05-10 23:02:17.586396326 +0800
@@ -1,3 +1,6 @@
+:PROPERTIES:
+:ID:       0aba93db-491d-46f7-b952-e138ba179a12
+:END:
 
 #+begin_src elisp :results none
 
@@ -38,6 +41,26 @@
 		(push props parents))))
 	  (and headings (cons headings count)))))))
 
+(defun nm-buffer-headings-reversed-cache (buffer)
+  (with-current-buffer buffer
+    (save-restriction
+      (save-excursion
+	(widen)
+	(goto-char (point-min))
+	(let ((count 0)
+	      (headings ())
+	      (parents ()))
+	  (while (re-search-forward org-outline-regexp-bol nil t)
+            (let ((cached (org-element--cache-find (line-beginning-position))))
+              (if (eq (org-element-property :begin cached) (line-beginning-position))
+                  (push cached headings)
+                (push (org-element-at-point) headings)))
+            
+            ;; (save-excursion
+            ;;   (beginning-of-line)
+            ;;   (org-element-at-point 'cached))
+            ))))))
+
 
 ;; binary search tree
 (defun nm-btree-new-node ()
@@ -103,6 +126,7 @@
 
 #+begin_src elisp
 (byte-compile #'nm-buffer-headings-reversed)
+(byte-compile #'nm-buffer-headings-reversed-cache)
 (byte-compile #'nm-btree-from-reversed)
 (byte-compile #'nm-btree-find-left)
 
@@ -125,13 +149,44 @@
 	     (let* ((scan-result1 (nm-buffer-headings-reversed buffer))
 		    (tree1 (nm-btree-from-reversed scan-result1)))
 	       tree1)))
+   (append '("scan using org-element-cache x10")
+           (progn
+             (let ((org-element-cache-sync-duration 1000.0))
+               (org-element-cache-reset)
+	       (org-element--cache-buffer-stealthly))
+	     (benchmark-run 10
+	       (nm-buffer-headings-reversed-cache buffer))))
+   (append '("populating the cache x1")
+	   (benchmark-run 1
+             (with-current-buffer buffer
+               ;; Disable async processing.
+               (let ((org-element-cache-sync-duration 1000.0))
+                 (org-element-cache-reset)
+	         (org-element--cache-buffer-stealthly)))))
    (append '("find random x10 000")
 	   (benchmark-run 10000
 	     (nm-btree-find-left tree (random lim))))
+   (append '("find random in org-element-cache x10 000")
+	   (benchmark-run 10000
+             (org-element-lineage
+	      (org-element--cache-find (random lim))
+              '(headlines) t)))
    (list "nodes" (cdr scan-result) "" "")))
 #+end_src
 
 #+RESULTS:
+| scan x10                                 |          1.693757321 | 0 |                 0.0 |
+| btree x10                                |          0.552682783 | 1 | 0.43690057000000593 |
+| scan+btree x10                           |   2.3174109880000002 | 0 |                 0.0 |
+| scan using org-element-cache x10         |   2.9140550789999997 | 0 |                 0.0 |
+| populating the cache x1                  |          10.83575134 | 9 |  3.9872376689999953 |
+| find random x10 000                      | 0.061988949999999994 | 0 |                 0.0 |
+| find random in org-element-cache x10 000 |          0.003262685 | 0 |                 0.0 |
+| nodes                                    |                16641 |   |                     |
+
+
+
+
 | scan x10            |   0.8611382689999999 | 0 |                 0.0 |
 | btree x10           |  0.07705962400000001 | 1 | 0.05648322199999978 |
 | scan+btree x10      |          0.940467238 | 1 | 0.05685373699999996 |

Diff finished.  Mon May 10 23:04:27 2021

  reply	other threads:[~2021-05-10 15:11 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-04 15:08 Ihor Radchenko
2021-05-05 16:40 ` Maxim Nikulin
2021-05-06 14:34   ` Ihor Radchenko
2021-05-06 17:02     ` Maxim Nikulin
2021-05-07  2:08       ` Ihor Radchenko
2021-05-08 11:28         ` Maxim Nikulin
2021-05-10 15:14           ` Ihor Radchenko [this message]
2021-05-15 11:58         ` Bastien
2021-05-16  6:15 ` Bastien
2021-05-16  6:36   ` Ihor Radchenko
2021-05-16  8:53     ` Bastien

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://orgmode.org

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87eeeey62h.fsf@localhost \
    --to=yantar92@gmail.com \
    --cc=emacs-orgmode@gnu.org \
    --cc=manikulin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Org-mode mailing list

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://orgmode.org/list/0 list/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 list list/ https://orgmode.org/list \
		emacs-orgmode@gnu.org
	public-inbox-index list

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.yhetil.org/yhetil.emacs.orgmode
	nntp://news.gmane.io/gmane.emacs.orgmode


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git