From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Mark E. Shoulson" Subject: Re: "Smart" quotes Date: Mon, 28 May 2012 21:30:20 -0400 Message-ID: <4FC426AC.2030109@kli.org> References: <4FBB08CA.5060705@kli.org> <87d35u8rvk.fsf@gmail.com> <4FBDA56E.5030901@kli.org> <87zk8w6v4q.fsf@gmail.com> <4FC00CE0.6060308@kli.org> <87r4u75tg9.fsf@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Return-path: Received: from eggs.gnu.org ([208.118.235.92]:41351) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SZBGc-0006wi-E0 for emacs-orgmode@gnu.org; Mon, 28 May 2012 21:30:35 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SZBGa-0000W4-DW for emacs-orgmode@gnu.org; Mon, 28 May 2012 21:30:33 -0400 Received: from pi.meson.org ([96.56.207.26]:50202) by eggs.gnu.org with smtp (Exim 4.71) (envelope-from ) id 1SZBGa-0000Uu-8R for emacs-orgmode@gnu.org; Mon, 28 May 2012 21:30:32 -0400 In-Reply-To: <87r4u75tg9.fsf@gmail.com> List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: emacs-orgmode@gnu.org On 05/26/2012 02:48 AM, Nicolas Goaziou wrote: > Hello, > > "Mark E. Shoulson" writes: > >>> The regexp may be able to tell level 1 from level 2 quotes. >> Do you mean that the author would use the same characters for both >> first and second level quotes, and the regexp would be smart enough to >> distinguish which level each was at? I don't think that's possible, >> and you probably don't either. > Actually, I do. Since you can tell an opening quote from a closing one > by the position of the white space (or parenthesis, beginning/end of > line) near it, I think you can deduce the quote level. I may be wrong, > though. Maybe, if it's all on one line. But if the quote is several lines long, can you sensibly count the levels? I guess it doesn't actually matter, but it starts to get weird if you find yourself looking arbitrarily far back, and then you start building in exceptions for crossing paragraph boundaries... And then there's the fact that multi-paragraph quotes usually have an open-quote for each paragraph but only one close-quote at the end... Actually keeping count of what level you're at, accurately, is a classic example of a non-regular language; you need a push-down automaton to keep count, and regular expressions don't cut it. Then again, Emacs regexps are more powerful than simple regular expressions, and we only would want to keep track of even vs odd level anyway. I'm rambling. In sum, I'm going to start off /not/ trying to solve that problem, and assume the writer is going to use alternating " and ' as typography requires and not try to second-guess what level we're at. As that progresses, maybe I'll come to understand better what can and can't (and should and shouldn't) be deduced by the regexps. >> "this is a 'quote', and that's all you need to know." >> >> becoming, for instance >> >> «this is a ‹quote›, and that’s all you need to know.» > "this is a "quote", and that's all you need to know" is as parsable to > me. > > As a side note, at least in French, many typographers would recommend > "this is a /quote/, and that's all you need to know" here. Oh, and > I know that was just an example. I see; because I can tell that the second " must be an open-quote and not closing the first, due to its position relative to the spaces. It does seem possible, but I think I'm going to try not solving that problem first. (And French typography raises other problems, since French puts lots of space around the quote-marks, to the extent that French typists typing plain-text will often put a space on both sides of a quote-mark, making it hard to see whether it opens or closes... another issue, not necessarily solvable, to watch for.) ~mark