From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp1 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id mimCG98ryV4BEAAA0tVLHw (envelope-from ) for ; Sat, 23 May 2020 13:57:51 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp1 with LMTPS id aALUFt8ryV7WMAAAbx9fmQ (envelope-from ) for ; Sat, 23 May 2020 13:57:51 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id BA74B9404CD for ; Sat, 23 May 2020 13:57:50 +0000 (UTC) Received: from localhost ([::1]:49404 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jcUer-0004n7-P5 for larch@yhetil.org; Sat, 23 May 2020 09:57:49 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:58930) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jcUeP-0004mC-Um for emacs-orgmode@gnu.org; Sat, 23 May 2020 09:57:21 -0400 Received: from mail-pj1-x1036.google.com ([2607:f8b0:4864:20::1036]:38878) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1jcUeO-0002hg-02 for emacs-orgmode@gnu.org; Sat, 23 May 2020 09:57:21 -0400 Received: by mail-pj1-x1036.google.com with SMTP id t8so4167847pju.3 for ; Sat, 23 May 2020 06:57:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:in-reply-to:references:date:message-id :mime-version:content-transfer-encoding; bh=WJUrpMVfUtFFUmOzIhVWAjGMEYJ+QfOmXIXnCcs6+4w=; b=qcs+csSnrFUwyvZwAKNbK7XHVMPDBYpDZzZqGc5gxySGXZ0QkWq8h4yKYYHjkc8epR u6G/COOyaFL77PNEeO6gp+QsvgA5sxWckLUnhgZZ2sul0RaF+hi4dRKCAck0qSX8ySzs LfQByOX38uVLk2+7wHEFODeBLMSinhx0tu1tyCueep3SMGm6iBzwgRXFxYom+/bZgAAa mdxUJm0Cxdipm42LbGUCF5c6lbgPXdwY2U4l5xJYmnQ5SnIui7YMNKV1z4ep0mGuovPy HUtt8X4FNzUidFNKvyzpekzeOIRg6dz+nCmAIgV8GvzjLXSIwujuxZGRIrpnuTvFXHxb LAbg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:in-reply-to:references:date :message-id:mime-version:content-transfer-encoding; bh=WJUrpMVfUtFFUmOzIhVWAjGMEYJ+QfOmXIXnCcs6+4w=; b=tqEIDZ3fDlZ1EKNdB5b2SFWiekG+EZ6UCsfLQ7quJ97pgJFAeu/X/w01mIyrqNkB8c YP7BTPMXo7NWApQotaBsByG6ZL6oOyQ9x9naovqra4X3Ir1W+Y0o7jF7MFY4s/ym4xQU vxhrWoLrWmrZoJvwBVPJCBBpVuI0DTwJJK6qRAvXuctcT5+/GXwd3vRP/Tj6VzsutyYO wf/CzWDpxbQTmIPs7nAE2jotGCpz1fupf9LUJ3lF+rCUwfuK0tshdworFZ2kUGrlWGoN IZezB782yyOQPwQtxF0LuupT0TDj3KJpy3VUiXH+i77v/GFmRMIKdE2xIUbhciDrjDX0 KTOA== X-Gm-Message-State: AOAM531qWzu4YF7TJK9xdwDLtwXUI69lYKJHmmiFfQyofcbs2jnouMmH N5uRxWP6ZaB+hFiF/P3Yrgly/eOOVX+EOA== X-Google-Smtp-Source: ABdhPJxsAGMrhMKE8M7gwuVPISDC6xEUuhRm5EJgMsZcWLDqEMIUHNlJ6BTJB6qpRyhIoTvObfywBg== X-Received: by 2002:a17:90b:1082:: with SMTP id gj2mr11083175pjb.225.1590242237820; Sat, 23 May 2020 06:57:17 -0700 (PDT) Received: from localhost ([104.151.6.52]) by smtp.gmail.com with ESMTPSA id l23sm7108669pff.80.2020.05.23.06.57.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 23 May 2020 06:57:16 -0700 (PDT) From: Ihor Radchenko To: Nicolas Goaziou Subject: Re: [patch suggestion] Mitigating the poor Emacs performance on huge org files: Do not use overlays for PROPERTY and LOGBOOK drawers In-Reply-To: <87o8qkhy3g.fsf@nicolasgoaziou.fr> References: <87h7x9e5jo.fsf@localhost> <875zdpia5i.fsf@nicolasgoaziou.fr> <87y2qi8c8w.fsf@localhost> <87r1vu5qmc.fsf@nicolasgoaziou.fr> <87imh5w1zt.fsf@localhost> <87blmxjckl.fsf@localhost> <87y2q13tgs.fsf@nicolasgoaziou.fr> <878si1j83x.fsf@localhost> <87d07bzvhd.fsf@nicolasgoaziou.fr> <87imh34usq.fsf@localhost> <87pnbby49m.fsf@nicolasgoaziou.fr> <87tv0efvyd.fsf@localhost> <874kse1seu.fsf@localhost> <87r1vhqpja.fsf@nicolasgoaziou.fr> <87tv0d2nk7.fsf@localhost> <87o8qkhy3g.fsf@nicolasgoaziou.fr> Date: Sat, 23 May 2020 21:52:40 +0800 Message-ID: <87sgfqu5av.fsf@localhost> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=2607:f8b0:4864:20::1036; envelope-from=yantar92@gmail.com; helo=mail-pj1-x1036.google.com X-detected-operating-system: by eggs.gnu.org: No matching host in p0f cache. That's all we know. X-Spam_score_int: -17 X-Spam_score: -1.8 X-Spam_bar: - X-Spam_report: (-1.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: emacs-orgmode@gnu.org Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: "Emacs-orgmode" X-Scanner: scn0 Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20161025 header.b=qcs+csSn; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of emacs-orgmode-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=emacs-orgmode-bounces@gnu.org X-Spam-Score: -1.21 X-TUID: J5eN7+48nVjR Hello, [The patch itself will be provided in the following email] I have five updates from the previous version of the patch: 1. I implemented a simplified version of element parsing to detect changes in folded drawers or blocks. No computationally expensive calls of org-element-at-point or org-element-parse-buffer are needed now. 2. The patch is now compatible with master (commit 2e96dc639). I reverted the earlier change in folding drawers and blocks. Now, they are back to using 'org-hide-block and 'org-hide-drawer. Using 'outline would achieve nothing when we use text properties. 3. 'invisible text property can now be nested. This is important, for example, when text inside drawers contains fontified links (which also use 'invisible text property to hide parts of the link). Now, the old 'invisible spec is recovered after unfolding. 4. Some outline-* function calls in org referred to outline-flag-region implementation, which is not in sync with org-flag-region in this patch. I have implemented their org-* versions and replaced the calls throughout .el files. Actually, some org-* versions were already implemented in org, but not used for some reason (or not mentioned in the manual). I have updated the relevant sections of manual. These changes might be relevant to org independently of this feature branch. 5. I have managed to get a working version of outline folding via text properties. However, that approach has a big downside - folding state cannot be different in indirect buffer when we use text properties. I have seen packages relying on this feature of org and I do not see any obvious way to achieve different folding state in indirect buffer while using text properties for outline folding. ----------------------------------------------------------------------- ----------------------------------------------------------------------- More details on the new implementation for tracking changes: > Of course we can. It is only necessary to focus on changes that would > break the structure of the element. This does not entail a full parsing. I have limited parsing to matching beginning and end of a drawer/block. The basic functions are org--get-element-region-at-point, org--get-next-element-region-at-point, and org--find-elements-in-region. They are simplified versions of org-element-* parsers and do not require parsing everything from the beginning of the section. For now, I keep everything in org.el, but those simplified parsers probably belong to org-element.el. > If we can stick with `after-change-functions' (or local equivalent), > that's better. It is more predictable than `before-change-functions' and > alike. For now, I still used before/after-change-functions combination. I see the following problems with using only after-change-functions:=20 1. They are not guaranteed to be called after every single change: >From (elisp) Change Hooks: "... some complex primitives call =E2=80=98before-change-functions=E2=80=99= once before making changes, and then call =E2=80=98after-change-functions=E2=80=99 zero= or more times" The consequence of it is a possibility that region passed to the after-change-functions is quite big (including all the singular changes, even if they are distant). This region may contain changed drawers as well and unchanged drawers and needs to be parsed to determine which drawers need to be re-folded. > And, more importantly, they are not meant to be used together, i.e., you > cannot assume that a single call to `before-change-functions' always > happens before calling `after-change-functions'. This can be tricky if > you want to use the former to pass information to the latter. The fact that before-change-functions can be called multiple times before after-change-functions, is trivially solved by using buffer-local changes register (see org--modified-elements). The register is populated by before-change-functions and cleared by after-change-functions. > Well, `before-change-fuctions' and `after-change-functions' are not > clean at all: you modify an unrelated part of the buffer, but still call > those to check if a drawer needs to be unfolded somewhere. 2. As you pointed, instead of global before-change-functions, we can use modification-hooks text property on sensitive parts of the drawers/blocks. This would work, but I am concerned about one annoying special case: ------------------------------------------------------------------------- :BLAH: :DRAWER: Donec at pede. :END: ------------------------------------------------------------------------- In this example, the user would not be able to unfold the folder DRAWER because it will technically become a part of a new giant BLAH drawer. This may be especially annoying if is more than one screen long and there is no easy way to identify why unfolding does not work (with point at :DRAWER:). Because of this scenario, limiting before-change-functions to folded drawers is not sufficient. Any change in text may need to trigger unfolding. In the patch, I always register possible modifications in the blocks/drawers intersecting with the modified region + a drawer/block right next to the region. ----------------------------------------------------------------------- ----------------------------------------------------------------------- More details on the nested 'invisible text property implementation. The idea is to keep 'invisible property stack push and popping from it as we add/remove 'invisible text property. All the work is done in org-flag-region. This was originally intended for folding outlines via text properties. Since using text properties for folding outlines is not a good idea, nested text properties have much less use. As I mentioned, they do preserve link fontification, but I am not sure if it worth it for the overhead to org-flag-region. Keeping this here mostly in the case if someone has any ideas how it can be useful. ----------------------------------------------------------------------- ----------------------------------------------------------------------- More details on replaced outline-* -> org-* function calls. I have implemented org-* versions of the following functions: - outline-hide-entry - outline-hide-subtree - outline-hide-sublevels - outline-show-heading - outline-show-branches The org-* versions trivially use org-flag-region instead of outline-flag-region. Replaced outline-* calls where org- versions were already available: - outline-show-entry - outline-show-all - outline-show-subtree I reflected the new (including already available) functions in the manual and removed some defalias from org-compat.el where they are not needed.=20 ----------------------------------------------------------------------- ----------------------------------------------------------------------- Further work: 1. after-change-functions use org-hide-drawer/block-toggle to fold/unfold after modification. However, I just found that they call org-element-at-point, which slows down modifications in folded drawers/blocks. For example, realigning a long table inside folded drawer takes >1sec, while it is instant in the unfolded drawer. 2. org-toggle-custom-properties is terribly slow on large org documents, similarly to folded drawers on master. It should be relatively easy to use text properties there instead of overlays. 3. Multiple calls to before/after-change-functions is still a problem. I am looking into following ways to reduce this number: - reduce the number of elements registered as potentially modified + do not add duplicates to org--modified-elements + do not add unfolded elements to org--modified-elements + register after-change-function as post-command hook and remove it from global after-change-functions. This way, it will be called twice per command only. - determine common region containing org--modified-elements. if change is happening within that region, there is no need to parse drawers/blocks there again. P.S. >> It was mostly an annoyance, because they returned different results on >> the same element. Specifically, they returned different :post-blank and >> :end properties, which does not sound right. > > OK. If you have a reproducible recipe, I can look into it and see what > can be done. Recipe to have different (org-element-at-point) and (org-element-parse-buffer 'element) ------------------------------------------------------------------------- :PROPERTIES: :CREATED: [2020-05-23 Sat 02:32] :END: ------------------------------------------------------------------------- Best, Ihor Nicolas Goaziou writes: > Hello, > > Ihor Radchenko writes: > >>> As you noticed, using Org Element is a no-go, unfortunately. Parsing an >>> element is a O(N) operation by the number of elements before it in >>> a section. In particular, it is not bounded, and not mitigated by >>> a cache. For large documents, it is going to be unbearably slow, too. >> >> Ouch. I thought it is faster. >> What do you mean by "not mitigated by a cache"? > > Parsing starts from the closest headline, every time. So, if Org parses > the Nth element in the entry two times, it really parses 2N elements. > > With a cache, assuming the buffer wasn't modified, Org would parse > N elements only. With a smarter cache, with fine grained cache > invalidation, it could also reduce the number of subsequent parsed > elements. > >> The reason I would like to utilise org-element parser to make tracking >> modifications more robust. Using details of the syntax would make the >> code fragile if any modifications are made to syntax in future. > > I don't think the code would be more fragile. Also, the syntax we're > talking about is not going to be modified anytime soon. Moreover, if > folding breaks, it is usually visible, so the bug will not be unnoticed. > > This code is going to be as low-level as it can be. > >> Debugging bugs in modification functions is not easy, according to my >> experience. > > No, it's not.=20 > > But this is not really related to whether you use Element or not. > >> One possible way to avoid performance issues during modification is >> running parser in advance. For example, folding an element may >> as well add information about the element to its text properties. >> This will not degrade performance of folding since we are already >> parsing the element during folding (at least, in >> org-hide-drawer-toggle). > > We can use this information stored at fold time. But I'm not even sure > we need it. > >> The problem with parsing an element during folding is that we cannot >> easily detect changes like below without re-parsing. > > Of course we can. It is only necessary to focus on changes that would > break the structure of the element. This does not entail a full parsing. > >> :PROPERTIES: >> :CREATED: [2020-05-18 Mon] >> :END: <- added line >> :ID: test >> :END: >> >> or even >> >> :PROPERTIES: >> :CREATED: [2020-05-18 Mon] >> :ID: test >> :END: <- delete this line >> >> :DRAWER: >> test >> :END: > > Please have a look at the "sensitive parts" I wrote about. This takes > care of this kind of breakage. > >> The re-parsing can be done via regexp, as you suggested, but I don't >> like this idea, because it will end up re-implementing >> org-element-*-parser. > > You may have misunderstood my suggestion. See below. > >> Would it be acceptable to run org-element-*-parser >> in after-change-functions? > > I'd rather not do that. This is unnecessary consing, and matching, etc. > >> If I understand correctly, it is not as easy. >> Consider the following example: >> >> :PROPERTIES: >> :CREATED: [2020-05-18 Mon] >> >> :ID: example >> :END: >> >> <... a lot of text, maybe containing other drawers ...> >> >> Nullam rutrum. >> Pellentesque dapibus suscipit ligula. >> >> Proin quam nisl, tincidunt et, mattis eget, convallis nec, purus. >> >> If the region gets deleted, the modification hooks from chars inside >> drawer will be called as (hook-function >> ). So, there is still a need to find the drawer somehow to >> mark it as about to be modified (modification hooks are ran before >> actual modification). > > If we can stick with `after-change-functions' (or local equivalent), > that's better. It is more predictable than `before-change-functions' and > alike. > > If it is a deletion, here is the kind of checks we could do, depending > on when they are performed. > > Before actual changes : > > 1. The deletion is happening within a folded drawer (unnecessary step > in local functions). > 2. The change deleted the sensitive line ":END:". > 3. Conclusion : unfold. > > Or, after actual changes : > > 1. The deletion involves a drawer. > 2. Text properties indicate that the beginning of the propertized part > of the buffer start with org-drawer-regexp, but doesn't end with > `org-property-end-re'. A "sensitive part" disappeared! > 3. Conclusion : unfold > > This is far away from parsing. IMO, a few checks cover all cases. Let me > know if you have questions about it. > > Also, note that the kind of change you describe will happen perhaps > 0.01% of the time. Most change are about one character, or a single > line, long. > >> The only difference between using modification hooks and >> before-change-functions is that modification hooks will trigger less >> frequently.=20 > > Exactly. Much less frequently. But extra care is required, as you noted > already. > >> Considering the performance of org-element-at-point, it is >> probably worth doing. Initially, I wanted to avoid it because setting a >> single before-change-functions hook sounded cleaner than setting >> modification-hooks, insert-behind-hooks, and insert-in-front-hooks. > > Well, `before-change-fuctions' and `after-change-functions' are not > clean at all: you modify an unrelated part of the buffer, but still call > those to check if a drawer needs to be unfolded somewhere. > > And, more importantly, they are not meant to be used together, i.e., you > cannot assume that a single call to `before-change-functions' always > happens before calling `after-change-functions'. This can be tricky if > you want to use the former to pass information to the latter. > > But I understand that they are easier to use than their local > counterparts. If you stick with (before|after)-change-functions, the > function being called needs to drop the ball very quickly if the > modification is not about folding changes. Also, I very much suggest to > stick to only `after-change-functions', if feasible (I think it is), per > above. > >> Moreover, these text properties would be copied by default if one uses=20 >> buffer-substring. Then, the hooks will also trigger later in the yanked >> text, which may cause all kinds of bugs. > > Indeed, that would be something to handle specifically. I.e., > destructive modifications (i.e., those that unfold) could clear such > properties. > > It is more work. I don't know if it is worth the trouble if we can get > out quickly of `after-change-functions' for unrelated changes. > >> It was mostly an annoyance, because they returned different results on >> the same element. Specifically, they returned different :post-blank and >> :end properties, which does not sound right. > > OK. If you have a reproducible recipe, I can look into it and see what > can be done. > > Regards, > > --=20 > Nicolas Goaziou --=20 Ihor Radchenko, PhD, Center for Advancing Materials Performance from the Nanoscale (CAMP-nano) State Key Laboratory for Mechanical Behavior of Materials, Xi'an Jiaotong U= niversity, Xi'an, China Email: yantar92@gmail.com, ihor_radchenko@alumni.sutd.edu.sg