[-- Attachment #1: Type: text/plain, Size: 1274 bytes --] Hi all, I noticed over the weekend that the `<title>` tag in an ox-html document is populated with HTML when inline formatting like bold or italics is used. I'm running org 9.4.4 but even on HEAD <https://code.orgmode.org/bzg/org-mode/src/9ea7ff5e2f8f9f280f8022cade62c1a3bba1478c/lisp/ox-html.el#L2092-L2111> this behavior seems to be the same where `(org-export-data title …)` is used to populate the title rather than something that would render the title textually. My understanding from MDN <https://developer.mozilla.org/en-US/docs/Web/HTML/Element/title> is that `title` tags should never contain markup. Is this something we'd take a patch for? My thought is that the title should be converted using the text backend by default, perhaps with a configuration variable that would let you specify a different one (for instance markdown) but I wouldn't go that far at first. Of course at minimum since org is textual it could just use the title without exporting at all but I think that doesn't go far enough. The other possibility would be to somehow strip all special characters from the title but I think that goes much too far. Thanks in advance! -- In Christ, Timmy V. https://blog.twonegatives.com http://five.sentenc.es [-- Attachment #2: Type: text/html, Size: 1518 bytes --]
Tim Visher writes: > Hi all, > > I noticed over the weekend that the `<title>` tag in an ox-html document is > populated with HTML when inline formatting like bold or italics is used. [...] > Is this something we'd take a patch for? Thanks for reporting. Is this addressed by the in-progress series at <https://orgmode.org/list/87o8hwpz34.fsf@gmail.com/>?
[-- Attachment #1: Type: text/plain, Size: 1005 bytes --] On Mon, Jan 11, 2021 at 8:19 PM Kyle Meyer <kyle@kyleam.com> wrote: > Tim Visher writes: > > > Hi all, > > > > I noticed over the weekend that the `<title>` tag in an ox-html document > is > > populated with HTML when inline formatting like bold or italics is used. > [...] > > Is this something we'd take a patch for? > > Thanks for reporting. Is this addressed by the in-progress series at > <https://orgmode.org/list/87o8hwpz34.fsf@gmail.com/>? > IIUC yes. I believe the following section of the diff should address it. ``` + (let* ((title (org-html-plain-text + (org-element-interpret-data (plist-get info :title)) info)) + ;; Set title to an invisible character instead of leaving it + ;; empty, which is invalid. + (title (if (org-string-nw-p title) title "‎")) ``` If I'm reading the code correctly, `org-html-plain-text` is a specialized form of converting org data into a plain text string with no markup. If I have that correct then I believe you're right. Is that your read as well? [-- Attachment #2: Type: text/html, Size: 1583 bytes --]
Tim Visher writes: > On Mon, Jan 11, 2021 at 8:19 PM Kyle Meyer <kyle@kyleam.com> wrote: >> Thanks for reporting. Is this addressed by the in-progress series at >> <https://orgmode.org/list/87o8hwpz34.fsf@gmail.com/>? >> > > IIUC yes. I believe the following section of the diff should address it. [...] > If I'm reading the code correctly, `org-html-plain-text` is a specialized > form of converting org data into a plain text string with no markup. If I > have that correct then I believe you're right. > > Is that your read as well? Yep. And as a light test: #+title: a *b* c exports <title>a *b* c</title> rather than <title>a <b>b</b> c</title>
[-- Attachment #1: Type: text/plain, Size: 1143 bytes --] On Tue, Jan 12, 2021 at 10:43 PM Kyle Meyer <kyle@kyleam.com> wrote: > Tim Visher writes: > > > On Mon, Jan 11, 2021 at 8:19 PM Kyle Meyer <kyle@kyleam.com> wrote: > > >> Thanks for reporting. Is this addressed by the in-progress series at > >> <https://orgmode.org/list/87o8hwpz34.fsf@gmail.com/>? > >> > > > > IIUC yes. I believe the following section of the diff should address it. > [...] > > If I'm reading the code correctly, `org-html-plain-text` is a specialized > > form of converting org data into a plain text string with no markup. If I > > have that correct then I believe you're right. > > > > Is that your read as well? > > Yep. And as a light test: > > #+title: a *b* c > > exports > > <title>a *b* c</title> > > rather than > > <title>a <b>b</b> c</title> > Nice! I don't know enough about `org-export` but FWIW the use case I have is not to have an explicit `title` property but instead just the default title of the heading contents. I assume that's all handled transparently by the `(plist-get …` section. Do you have any idea the timeline for getting that patch merged? [-- Attachment #2: Type: text/html, Size: 1757 bytes --]
Tim Visher writes:
> Nice! I don't know enough about `org-export` but FWIW the use case I have
> is not to have an explicit `title` property but instead just the default
> title of the heading contents. I assume that's all handled transparently by
> the `(plist-get …` section.
>
> Do you have any idea the timeline for getting that patch merged?
It's been applied to master (f4b9f9808). Please report back if you
still encounter the problem in your use case.
[-- Attachment #1: Type: text/plain, Size: 1403 bytes --] On Wed, Jan 20, 2021 at 11:10 PM Kyle Meyer <kyle@kyleam.com> wrote: > Tim Visher writes: > > > Nice! I don't know enough about `org-export` but FWIW the use case I have > > is not to have an explicit `title` property but instead just the default > > title of the heading contents. I assume that's all handled transparently > by > > the `(plist-get …` section. > > > > Do you have any idea the timeline for getting that patch merged? > > It's been applied to master (f4b9f9808). Please report back if you > still encounter the problem in your use case. > I (finally) got around to testing this out. Initially I thought it had been released in 9.4.5 but AFAICT that's not the case. Does org not get released from `master`? Anyway, we're a step further now in that the title appears to be set using no markup, so that's 👍. Unfortunately, the title now is essentially the exact text of the org heading, which is awkward in terms of readability for a general audience (and probably for SEO etc.). I know I said in my original message that I think stripping all the markup characters would be going too far but now I think I've come full circle and rendering the title as nothing but the plain text without any markup information feels like the right solution given what the title is supposed to convey. So, would we be willing to accept a patch to that effect? :) [-- Attachment #2: Type: text/html, Size: 1799 bytes --]
[-- Attachment #1: Type: text/plain, Size: 1406 bytes --] On Tue, Mar 30, 2021 at 6:58 PM Tim Visher <tim.visher@gmail.com> wrote: > On Wed, Jan 20, 2021 at 11:10 PM Kyle Meyer <kyle@kyleam.com> wrote: > >> Tim Visher writes: >> >> > Nice! I don't know enough about `org-export` but FWIW the use case I >> have >> > is not to have an explicit `title` property but instead just the default >> > title of the heading contents. I assume that's all handled >> transparently by >> > the `(plist-get …` section. >> > >> > Do you have any idea the timeline for getting that patch merged? >> >> It's been applied to master (f4b9f9808). Please report back if you >> still encounter the problem in your use case. >> > > Unfortunately, the title now is essentially the exact text of the org > heading, which is awkward in terms of readability for a general audience > (and probably for SEO etc.). I know I said in my original message that I > think stripping all the markup characters would be going too far but now I > think I've come full circle and rendering the title as nothing but the > plain text without any markup information feels like the right solution > given what the title is supposed to convey. > > So, would we be willing to accept a patch to that effect? :) > Ping again on this. Any interest in a patch that would transform the heading into only plaintext without any markup characters for use in the <title> element? [-- Attachment #2: Type: text/html, Size: 2010 bytes --]
Tim Visher writes: > On Wed, Jan 20, 2021 at 11:10 PM Kyle Meyer <kyle@kyleam.com> wrote: >> >> It's been applied to master (f4b9f9808). Please report back if you >> still encounter the problem in your use case. >> > > I (finally) got around to testing this out. Initially I thought it had been > released in 9.4.5 but AFAICT that's not the case. Does org not get released > from `master`? For version X.Y.Z, Z ticks happen from maint. > Unfortunately, the title now is essentially the exact text of the org > heading, which is awkward in terms of readability for a general audience > (and probably for SEO etc.). I know I said in my original message that I > think stripping all the markup characters would be going too far but now I > think I've come full circle and rendering the title as nothing but the > plain text without any markup information feels like the right solution > given what the title is supposed to convey. > > So, would we be willing to accept a patch to that effect? :) I don't have an informed opinion about the above, but providing a patch might prompt those that do (including TEC, the author of the above commit, as well as Jens, who provided reviews) to give their input.
[-- Attachment #1: Type: text/plain, Size: 1104 bytes --] On 2021-04-19, Kyle Meyer wrote: > Tim Visher writes: > >> Unfortunately, the title now is essentially the exact text of the org >> heading, which is awkward in terms of readability for a general audience >> (and probably for SEO etc.). I know I said in my original message that I >> think stripping all the markup characters would be going too far but now I >> think I've come full circle and rendering the title as nothing but the >> plain text without any markup information feels like the right solution >> given what the title is supposed to convey. >> >> So, would we be willing to accept a patch to that effect? :) > > I don't have an informed opinion about the above, but providing a patch > might prompt those that do (including TEC, the author of the above > commit, as well as Jens, who provided reviews) to give their input. The following is not a strong opinion: The author writes “what the title is supposed to convey”. If there is *emphasis*, why not export that as ASCII markup to HTML? With an additional option, authors could choose. Best wishes Jens [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 5290 bytes --]
[-- Attachment #1: Type: text/plain, Size: 2159 bytes --] Thanks so much for getting back to me, Jens. On Tue, Apr 20, 2021 at 12:59 AM Jens Lechtenboerger < jens.lechtenboerger@wi.uni-muenster.de> wrote: > On 2021-04-19, Kyle Meyer wrote: > > > Tim Visher writes: > > > >> Unfortunately, the title now is essentially the exact text of the org > >> heading, which is awkward in terms of readability for a general audience > >> (and probably for SEO etc.). I know I said in my original message that I > >> think stripping all the markup characters would be going too far but > now I > >> think I've come full circle and rendering the title as nothing but the > >> plain text without any markup information feels like the right solution > >> given what the title is supposed to convey. > >> > >> So, would we be willing to accept a patch to that effect? :) > > > > I don't have an informed opinion about the above, but providing a patch > > might prompt those that do (including TEC, the author of the above > > commit, as well as Jens, who provided reviews) to give their input. > > The following is not a strong opinion: The author writes “what the > title is supposed to convey”. If there is *emphasis*, why not > export that as ASCII markup to HTML? > > With an additional option, authors could choose. > I guess I don't have a super strong opinion here either. Ironically, coming from someone who spends a significant portion of their day reading raw org/markdown documents with no problem, something about even the ASCII style markup in the title looks wrong to me. I don't do SEO and nothing in MDN's article <https://developer.mozilla.org/en-US/docs/Web/HTML/Element/title> indicates that non word characters have a negative affect anyway but it just looks strange to my eyes. I guess regardless it sounds like if I were to go to the trouble of making a patch for this it would be good to make sure that it was behind an option and probably defaulting to the current HEAD behavior of including the ASCII markup with an option to strip the non-word characters from it. -- In Christ, Timmy V. https://blog.twonegatives.com http://five.sentenc.es [-- Attachment #2: Type: text/html, Size: 2787 bytes --]
[-- Attachment #1: Type: text/plain, Size: 373 bytes --] On 2021-04-20, Tim Visher wrote: > I guess regardless it sounds like if I were to go to the trouble of making > a patch for this it would be good to make sure that it was behind an option > and probably defaulting to the current HEAD behavior of including the ASCII > markup with an option to strip the non-word characters from it. That would be great. Best wishes Jens [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 5290 bytes --]
[-- Attachment #1: Type: text/plain, Size: 739 bytes --] Jens Lechtenboerger <lechten@wi.uni-muenster.de> writes: > On 2021-04-20, Tim Visher wrote: > >> I guess regardless it sounds like if I were to go to the trouble of making >> a patch for this it would be good to make sure that it was behind an option >> and probably defaulting to the current HEAD behavior of including the ASCII >> markup with an option to strip the non-word characters from it. > > That would be great. It is something that could also benefit the LaTeX export. Having special characters in the pdftitle can make the export fail, but those characters (like @@latex:\something@@) can make the latex-compilation fail. Best wishes, Arne -- Unpolitisch sein heißt politisch sein ohne es zu merken [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 1125 bytes --]
[-- Attachment #1: Type: text/plain, Size: 1022 bytes --] On Wed, Apr 21, 2021 at 2:39 AM Dr. Arne Babenhauserheide <arne_bab@web.de> wrote: > > Jens Lechtenboerger <lechten@wi.uni-muenster.de> writes: > > > On 2021-04-20, Tim Visher wrote: > > > >> I guess regardless it sounds like if I were to go to the trouble of > making > >> a patch for this it would be good to make sure that it was behind an > option > >> and probably defaulting to the current HEAD behavior of including the > ASCII > >> markup with an option to strip the non-word characters from it. > > > > That would be great. > > It is something that could also benefit the LaTeX export. Having special > characters in the pdftitle can make the export fail, but those > characters (like @@latex:\something@@) can make the latex-compilation > fail. Awesome. Do you know whether there's an official way to share this sort of behavior between ox backends or is it just creating a function and calling it from both places or something? -- In Christ, Timmy V. https://blog.twonegatives.com http://five.sentenc.es [-- Attachment #2: Type: text/html, Size: 1575 bytes --]
Hello,
Tim Visher <tim.visher@gmail.com> writes:
> Awesome. Do you know whether there's an official way to share this sort of
> behavior between ox backends or is it just creating a function and calling
> it from both places or something?
Do you want to remove all markup from some parsed text?
You could define temporary export back-end with
`org-export-create-backend', and apply it with
`org-export-data-with-backend'. See for example, how
`org-latex-headline' formats headings (`text' binding in the function).
If that's the case, you need to know exactly what you want. It is pretty
obvious for bold markup, but what would happen to, e.g., a_b or \alpha
or <<target>>?
HTH,
--
Nicolas Goaziou
[-- Attachment #1: Type: text/plain, Size: 1510 bytes --] On Thu, Apr 22, 2021 at 9:52 AM Nicolas Goaziou <mail@nicolasgoaziou.fr> wrote: > Tim Visher <tim.visher@gmail.com> writes: > > > Awesome. Do you know whether there's an official way to share this sort > of > > behavior between ox backends or is it just creating a function and > calling > > it from both places or something? > > Do you want to remove all markup from some parsed text? > > You could define temporary export back-end with > `org-export-create-backend', and apply it with > `org-export-data-with-backend'. See for example, how > `org-latex-headline' formats headings (`text' binding in the function). > > If that's the case, you need to know exactly what you want. It is pretty > obvious for bold markup, but what would happen to, e.g., a_b or \alpha > or <<target>>? > Thanks for the tip. What I'm thinking more is somehow getting the heading pre-output generation, stripping any characters that `org` would consider special (I'm hoping there's already a function that can at least mark 'markup' text in a given org string), and _then_ passing it to whatever ox function is responsible for using the title. That way it's as generic as it can possibly be. I confess though that I don't follow exactly what you're talking about defining a temporary export back-end. Why would that be necessary or beneficial to the end of teaching org how to use only the 'plain text' of a heading for the title in N ox backends? -- In Christ, Timmy V. https://blog.twonegatives.com http://five.sentenc.es [-- Attachment #2: Type: text/html, Size: 2143 bytes --]
Tim Visher <tim.visher@gmail.com> writes: > Thanks for the tip. What I'm thinking more is somehow getting the heading > pre-output generation, stripping any characters that `org` would consider > special (I'm hoping there's already a function that can at least mark > 'markup' text in a given org string), and _then_ passing it to whatever ox > function is responsible for using the title. That way it's as generic as it > can possibly be. What format has "heading pre-output generation"? Is a string or is it parsed data? The first part of your paragraph sounds like you want to rewrite an Org parser. How do you pass it to ox function responsible for using the title? I.e., who/what is responsible for making the change to the title? Is it the user? You may need to clarify your specifications. > I confess though that I don't follow exactly what you're talking about > defining a temporary export back-end. In `org-html--build-meta-info' from "ox-html.el", replace the following (org-html-plain-text (org-element-interpret-data (plist-get info :title)) info) with (org-export-data-with-backend (plist-get info :title) (org-export-create-backend :transcoders '((bold . (lambda (_ c _) c)) (italic . (lambda (_ c _) c)))) info) Now re-evaluate the function `org-html--build-meta-info' and try exporting a document to HTML with a title containing bold and italic syntax, even nested, e.g. #+title: /Some *bold* text/ > Why would that be necessary or beneficial to the end of teaching org > how to use only the 'plain text' of a heading for the title in N ox > backends? Adding the function `org-export-strip-syntax below to "ox.el" (defun org-export-strip-syntax (data info) (org-export-data-with-backend data (org-export-create-backend :transcoders '((bold . (lambda (_ c _) c)) (italic . (lambda (_ c _) c)))) info)) you can now call it from any export back-end whenever its needs to remove syntax from a piece of code. You can also drop the info argument and add it to "org-element.el". But it depends on what you want to obtain. Also, some syntax is not obvious to strip, as I suggested in my previous message. Regards,
[-- Attachment #1: Type: text/plain, Size: 2785 bytes --] On Thu, Apr 22, 2021 at 11:36 AM Nicolas Goaziou <mail@nicolasgoaziou.fr> wrote: > Tim Visher <tim.visher@gmail.com> writes: > > > Thanks for the tip. What I'm thinking more is somehow getting the heading > > pre-output generation, stripping any characters that `org` would consider > > special (I'm hoping there's already a function that can at least mark > > 'markup' text in a given org string), and _then_ passing it to whatever > ox > > function is responsible for using the title. That way it's as generic as > it > > can possibly be. > > What format has "heading pre-output generation"? Is a string or is it > parsed data? The first part of your paragraph sounds like you want to > rewrite an Org parser. > > How do you pass it to ox function responsible for using the title? I.e., > who/what is responsible for making the change to the title? Is it the > user? > > You may need to clarify your specifications. > > > I confess though that I don't follow exactly what you're talking about > > defining a temporary export back-end. > > In `org-html--build-meta-info' from "ox-html.el", replace the following > > (org-html-plain-text > (org-element-interpret-data (plist-get info :title)) info) > > with > > (org-export-data-with-backend (plist-get info :title) > (org-export-create-backend > :transcoders > '((bold . (lambda (_ c _) c)) > (italic . (lambda (_ c _) c)))) > info) > > Now re-evaluate the function `org-html--build-meta-info' and try > exporting a document to HTML with a title containing bold and italic > syntax, even nested, e.g. > > #+title: /Some *bold* text/ > > > Why would that be necessary or beneficial to the end of teaching org > > how to use only the 'plain text' of a heading for the title in N ox > > backends? > > Adding the function `org-export-strip-syntax below to "ox.el" > > (defun org-export-strip-syntax (data info) > (org-export-data-with-backend data > (org-export-create-backend > :transcoders > '((bold . (lambda (_ c _) c)) > (italic . (lambda (_ c _) c)))) > info)) > > you can now call it from any export back-end whenever its needs to > remove syntax from a piece of code. > > You can also drop the info argument and add it to "org-element.el". But > it depends on what you want to obtain. Also, some syntax is not obvious > to strip, as I suggested in my previous message. > Awesome! This is a ton of great info. If I decide to bite this off I'll be sure to reference this. :) [-- Attachment #2: Type: text/html, Size: 3644 bytes --]