Org-mode mailing list
 help / color / mirror / Atom feed
From: Nicolas Goaziou <mail@nicolasgoaziou.fr>
To: Ihor Radchenko <yantar92@gmail.com>
Cc: emacs-orgmode@gnu.org
Subject: Re: [patch suggestion] Mitigating the poor Emacs performance on huge org files: Do not use overlays for PROPERTY and LOGBOOK drawers
Date: Tue, 19 May 2020 15:07:47 +0200
Message-ID: <87o8qkhy3g.fsf@nicolasgoaziou.fr> (raw)
In-Reply-To: <87tv0d2nk7.fsf@localhost> (Ihor Radchenko's message of "Tue, 19 May 2020 00:52:08 +0800")

Hello,

Ihor Radchenko <yantar92@gmail.com> writes:

>> As you noticed, using Org Element is a no-go, unfortunately. Parsing an
>> element is a O(N) operation by the number of elements before it in
>> a section. In particular, it is not bounded, and not mitigated by
>> a cache. For large documents, it is going to be unbearably slow, too.
>
> Ouch. I thought it is faster.
> What do you mean by "not mitigated by a cache"?

Parsing starts from the closest headline, every time. So, if Org parses
the Nth element in the entry two times, it really parses 2N elements.

With a cache, assuming the buffer wasn't modified, Org would parse
N elements only. With a smarter cache, with fine grained cache
invalidation, it could also reduce the number of subsequent parsed
elements.

> The reason I would like to utilise org-element parser to make tracking
> modifications more robust. Using details of the syntax would make the
> code fragile if any modifications are made to syntax in future.

I don't think the code would be more fragile. Also, the syntax we're
talking about is not going to be modified anytime soon. Moreover, if
folding breaks, it is usually visible, so the bug will not be unnoticed.

This code is going to be as low-level as it can be.

> Debugging bugs in modification functions is not easy, according to my
> experience.

No, it's not. 

But this is not really related to whether you use Element or not.

> One possible way to avoid performance issues during modification is
> running parser in advance. For example, folding an element may
> as well add information about the element to its text properties.
> This will not degrade performance of folding since we are already
> parsing the element during folding (at least, in
> org-hide-drawer-toggle).

We can use this information stored at fold time. But I'm not even sure
we need it.

> The problem with parsing an element during folding is that we cannot
> easily detect changes like below without re-parsing.

Of course we can. It is only necessary to focus on changes that would
break the structure of the element. This does not entail a full parsing.

> :PROPERTIES: <folded>
> :CREATED: [2020-05-18 Mon]
> :END: <- added line
> :ID: test
> :END:
>
> or even
>
> :PROPERTIES:
> :CREATED: [2020-05-18 Mon]
> :ID: test
> :END: <- delete this line
>
> :DRAWER: <folded, cannot be unfolded if we don't re-parse after deletion>
> test
> :END:

Please have a look at the "sensitive parts" I wrote about. This takes
care of this kind of breakage.

> The re-parsing can be done via regexp, as you suggested, but I don't
> like this idea, because it will end up re-implementing
> org-element-*-parser.

You may have misunderstood my suggestion. See below.

> Would it be acceptable to run org-element-*-parser
> in after-change-functions?

I'd rather not do that. This is unnecessary consing, and matching, etc.

> If I understand correctly, it is not as easy.
> Consider the following example:
>
> :PROPERTIES:
> :CREATED: [2020-05-18 Mon]
> <region-beginning>
> :ID: example
> :END:
>
> <... a lot of text, maybe containing other drawers ...>
>
> Nullam rutrum.
> Pellentesque dapibus suscipit ligula.
> <region-end>
> Proin quam nisl, tincidunt et, mattis eget, convallis nec, purus.
>
> If the region gets deleted, the modification hooks from chars inside
> drawer will be called as (hook-function <region-beginning>
> <region-end>). So, there is still a need to find the drawer somehow to
> mark it as about to be modified (modification hooks are ran before
> actual modification).

If we can stick with `after-change-functions' (or local equivalent),
that's better. It is more predictable than `before-change-functions' and
alike.

If it is a deletion, here is the kind of checks we could do, depending
on when they are performed.

Before actual changes :

  1. The deletion is happening within a folded drawer (unnecessary step
     in local functions).
  2. The change deleted the sensitive line ":END:".
  3. Conclusion : unfold.

Or, after actual changes :

  1. The deletion involves a drawer.
  2. Text properties indicate that the beginning of the propertized part
     of the buffer start with org-drawer-regexp, but doesn't end with
     `org-property-end-re'. A "sensitive part" disappeared!
  3. Conclusion : unfold

This is far away from parsing. IMO, a few checks cover all cases. Let me
know if you have questions about it.

Also, note that the kind of change you describe will happen perhaps
0.01% of the time. Most change are about one character, or a single
line, long.

> The only difference between using modification hooks and
> before-change-functions is that modification hooks will trigger less
> frequently. 

Exactly. Much less frequently. But extra care is required, as you noted
already.

> Considering the performance of org-element-at-point, it is
> probably worth doing. Initially, I wanted to avoid it because setting a
> single before-change-functions hook sounded cleaner than setting
> modification-hooks, insert-behind-hooks, and insert-in-front-hooks.

Well, `before-change-fuctions' and `after-change-functions' are not
clean at all: you modify an unrelated part of the buffer, but still call
those to check if a drawer needs to be unfolded somewhere.

And, more importantly, they are not meant to be used together, i.e., you
cannot assume that a single call to `before-change-functions' always
happens before calling `after-change-functions'. This can be tricky if
you want to use the former to pass information to the latter.

But I understand that they are easier to use than their local
counterparts. If you stick with (before|after)-change-functions, the
function being called needs to drop the ball very quickly if the
modification is not about folding changes. Also, I very much suggest to
stick to only `after-change-functions', if feasible (I think it is), per
above.

> Moreover, these text properties would be copied by default if one uses 
> buffer-substring. Then, the hooks will also trigger later in the yanked
> text, which may cause all kinds of bugs.

Indeed, that would be something to handle specifically. I.e.,
destructive modifications (i.e., those that unfold) could clear such
properties.

It is more work. I don't know if it is worth the trouble if we can get
out quickly of `after-change-functions' for unrelated changes.

> It was mostly an annoyance, because they returned different results on
> the same element. Specifically, they returned different :post-blank and
> :end properties, which does not sound right.

OK. If you have a reproducible recipe, I can look into it and see what
can be done.

Regards,

-- 
Nicolas Goaziou


  reply	other threads:[~2020-05-19 13:08 UTC|newest]

Thread overview: 85+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-24  6:55 Ihor Radchenko
2020-04-24  8:02 ` Nicolas Goaziou
2020-04-25  0:29   ` stardiviner
2020-04-26 16:04   ` Ihor Radchenko
2020-05-04 16:56     ` Karl Voit
2020-05-07  7:18       ` Karl Voit
2020-05-09 15:43       ` Ihor Radchenko
2020-05-07 11:04     ` Christian Heinrich
2020-05-09 15:46       ` Ihor Radchenko
2020-05-08 16:38     ` Nicolas Goaziou
2020-05-09 13:58       ` Nicolas Goaziou
2020-05-09 16:22         ` Ihor Radchenko
2020-05-09 17:21           ` Nicolas Goaziou
2020-05-10  5:25             ` Ihor Radchenko
2020-05-10  9:47               ` Nicolas Goaziou
2020-05-10 13:29                 ` Ihor Radchenko
2020-05-10 14:46                   ` Nicolas Goaziou
2020-05-10 16:21                     ` Ihor Radchenko
2020-05-10 16:38                       ` Nicolas Goaziou
2020-05-10 17:08                         ` Ihor Radchenko
2020-05-10 19:38                           ` Nicolas Goaziou
2020-05-09 15:40       ` Ihor Radchenko
2020-05-09 16:30         ` Ihor Radchenko
2020-05-09 17:32           ` Nicolas Goaziou
2020-05-09 18:06             ` Ihor Radchenko
2020-05-10 14:59               ` Nicolas Goaziou
2020-05-10 15:15                 ` Kyle Meyer
2020-05-10 16:30                 ` Ihor Radchenko
2020-05-10 19:32                   ` Nicolas Goaziou
2020-05-12 10:03                     ` Nicolas Goaziou
2020-05-17 15:00                     ` Ihor Radchenko
2020-05-17 15:40                       ` Ihor Radchenko
2020-05-18 14:35                         ` Nicolas Goaziou
2020-05-18 16:52                           ` Ihor Radchenko
2020-05-19 13:07                             ` Nicolas Goaziou [this message]
2020-05-23 13:52                               ` Ihor Radchenko
2020-05-23 13:53                                 ` Ihor Radchenko
2020-05-23 15:26                                   ` Ihor Radchenko
2020-05-26  8:33                                 ` Nicolas Goaziou
2020-06-02  9:21                                   ` Ihor Radchenko
2020-06-02  9:23                                     ` Ihor Radchenko
2020-06-02 12:10                                       ` Bastien
2020-06-02 13:12                                         ` Ihor Radchenko
2020-06-02 13:23                                           ` Bastien
2020-06-02 13:30                                             ` Ihor Radchenko
2020-06-02  9:25                                     ` Ihor Radchenko
2020-06-05  7:26                                     ` Nicolas Goaziou
2020-06-05  8:18                                       ` Ihor Radchenko
2020-06-05 13:50                                         ` Nicolas Goaziou
2020-06-08  5:05                                           ` Ihor Radchenko
2020-06-08  5:06                                             ` Ihor Radchenko
2020-06-08  5:08                                             ` Ihor Radchenko
2020-06-10 17:14                                             ` Nicolas Goaziou
2020-06-21  9:52                                               ` Ihor Radchenko
2020-06-21 15:01                                                 ` Nicolas Goaziou
2020-08-11  6:45                                               ` Ihor Radchenko
2020-08-11 23:07                                                 ` Kyle Meyer
2020-08-12  6:29                                                   ` Ihor Radchenko
2020-09-20  5:53                                                     ` Ihor Radchenko
2020-09-20 11:45                                                       ` Kévin Le Gouguec
2020-09-22  9:05                                                         ` Ihor Radchenko
2020-09-22 10:00                                                           ` Ihor Radchenko
2020-09-23  6:16                                                             ` Kévin Le Gouguec
2020-09-23  6:48                                                               ` Ihor Radchenko
2020-09-23  7:09                                                                 ` Bastien
2020-09-23  7:30                                                                   ` Ihor Radchenko
2020-09-24 18:07                                                                 ` Kévin Le Gouguec
2020-09-25  2:16                                                                   ` Ihor Radchenko
2020-12-15 17:38                                                                     ` [9.4] Fixing logbook visibility during isearch Kévin Le Gouguec
2020-12-16  3:15                                                                       ` Ihor Radchenko
2020-12-16 18:05                                                                         ` Kévin Le Gouguec
2020-12-17  3:18                                                                           ` Ihor Radchenko
2020-12-17 14:50                                                                             ` Kévin Le Gouguec
2020-12-18  2:23                                                                               ` Ihor Radchenko
2020-12-24 23:37                                                                                 ` Kévin Le Gouguec
2020-12-25  2:51                                                                                   ` Ihor Radchenko
2020-12-25 10:59                                                                                     ` Kévin Le Gouguec
2020-12-25 12:32                                                                                       ` Ihor Radchenko
2020-12-25 21:35                                                                                     ` Kévin Le Gouguec
2020-12-26  4:14                                                                                       ` Ihor Radchenko
2020-12-26 11:44                                                                                         ` Kévin Le Gouguec
2020-12-26 12:22                                                                                           ` Ihor Radchenko
2020-12-04  5:58                                                       ` [patch suggestion] Mitigating the poor Emacs performance on huge org files: Do not use overlays for PROPERTY and LOGBOOK drawers Ihor Radchenko
2021-03-21  9:09                                                         ` Ihor Radchenko
2021-05-03 17:28                                                           ` Bastien

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://orgmode.org

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87o8qkhy3g.fsf@nicolasgoaziou.fr \
    --to=mail@nicolasgoaziou.fr \
    --cc=emacs-orgmode@gnu.org \
    --cc=yantar92@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Org-mode mailing list

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://orgmode.org/list/0 list/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 list list/ https://orgmode.org/list \
		emacs-orgmode@gnu.org
	public-inbox-index list

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.yhetil.org/yhetil.emacs.orgmode
	nntp://news.gmane.io/gmane.emacs.orgmode


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git