Org-mode mailing list
 help / color / Atom feed
From: Paul Sexton <psexton@xnet.co.nz>
To: emacs-orgmode@gnu.org
Subject: Re: Context-sensitive word count in org mode (elisp)
Date: Sun, 20 Feb 2011 21:49:16 +0000 (UTC)
Message-ID: <loom.20110220T224023-532@post.gmane.org> (raw)
In-Reply-To: <87zkptbee7.fsf@gnu.org>

Bastien <bastien.guerry <at> wikimedia.fr> writes:
> #+begin_src emacs-lisp
>   (when (looking-at org-bracket-link-analytic-regexp)
>     (match-string-no-properties 5))
> #+end_src emacs-lisp

Thanks. Here is version 3 if the function, which is now able to count 
words in link descriptions.

The code to advance to the next word has been moved to the end of the 
loop, which improves accuracy.

Paul

----------------------------------------------------------------------

(defun org-word-count (beg end
                           &optional count-latex-macro-args?
                           count-footnotes?)
  "Report the number of words in the Org mode buffer or selected region.
Ignores:
- comments
- tables
- source code blocks (#+BEGIN_SRC ... #+END_SRC, and inline blocks)
- hyperlinks (but does count words in hyperlink descriptions)
- tags, priorities, and TODO keywords in headers
- sections tagged as 'not for export'.

The text of footnote definitions is ignored, unless the optional argument
COUNT-FOOTNOTES? is non-nil.

If the optional argument COUNT-LATEX-MACRO-ARGS? is non-nil, the word count
includes LaTeX macro arguments (the material between {curly braces}).
Otherwise, and by default, every LaTeX macro counts as 1 word regardless
of its arguments."
  (interactive "r")
  (unless mark-active
    (setf beg (point-min)
	  end (point-max)))
  (let ((wc 0)
	(latex-macro-regexp "\\\\[A-Za-z]+\\(\\[[^]]*\\]\\|\\){\\([^}]*\\)}"))
    (save-excursion
      (goto-char beg)
      (while (< (point) end)
        (cond
         ;; Ignore comments.
         ((or (org-in-commented-line) (org-at-table-p))
          nil)
         ;; Ignore hyperlinks. But if link has a description, count
         ;; the words within the description.
         ((looking-at org-bracket-link-analytic-regexp)
          (when (match-string-no-properties 5)
            (let ((desc (match-string-no-properties 5)))
              (save-match-data 
                (incf wc (length (remove "" (org-split-string
                                             desc "\\W")))))))
          (goto-char (match-end 0)))
         ((looking-at org-any-link-re)
          (goto-char (match-end 0)))
         ;; Ignore source code blocks.
         ((org-in-regexps-block-p "^#\\+BEGIN_SRC\\W" "^#\\+END_SRC\\W")
          nil)
         ;; Ignore inline source blocks, counting them as 1 word.
         ((save-excursion
            (backward-char)
            (looking-at org-babel-inline-src-block-regexp))
          (goto-char (match-end 0))
          (setf wc (+ 2 wc)))
         ;; Count latex macros as 1 word, ignoring their arguments.
         ((save-excursion
            (backward-char)
            (looking-at latex-macro-regexp))
          (goto-char (if count-latex-macro-args?
                         (match-beginning 2)
                       (match-end 0)))
          (setf wc (+ 2 wc)))
         ;; Ignore footnotes.
         ((and (not count-footnotes?)
               (or (org-footnote-at-definition-p)
                   (org-footnote-at-reference-p)))
          nil)
         (t
          (let ((contexts (org-context)))
            (cond
             ;; Ignore tags and TODO keywords, etc.
             ((or (assoc :todo-keyword contexts)
                  (assoc :priority contexts)
                  (assoc :keyword contexts)
                  (assoc :checkbox contexts))
              nil)
             ;; Ignore sections marked with tags that are
             ;; excluded from export.
             ((assoc :tags contexts)
              (if (intersection (org-get-tags-at) org-export-exclude-tags
                                :test 'equal)
                  (org-forward-same-level 1)
                nil))
             (t
              (incf wc))))))
        (re-search-forward "\\w+\\W*")))
    (message (format "%d words in %s." wc
                     (if mark-active "region" "buffer")))))

  reply index

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-16  3:51 Paul Sexton
2011-02-16  9:12 ` Christian Moe
2011-02-16  9:47   ` Dan Davison
2011-02-16  9:45 ` Christian Moe
2011-02-16 20:34   ` Paul Sexton
2011-02-17 10:02     ` Christian Moe
2011-02-17 18:57       ` Eric Schulte
2011-02-16 10:14 ` Bastien
2011-02-16 18:15   ` Samuel Wales
2011-02-16 13:03 ` Joost Kremers
2011-02-16 23:28 ` Paul Sexton
2011-02-17 16:50   ` Samuel Wales
2011-02-17 18:55     ` Paul Sexton
2011-03-27 19:40       ` [Orgmode] " Samuel Wales
2011-02-18 14:34   ` Bastien
2011-02-20 21:49     ` Paul Sexton [this message]
2011-02-21 23:30       ` Samuel Wales
     [not found]     ` <4D601314.8000701@xnet.co.nz>
2011-02-22 11:28       ` Bastien
2011-02-16 16:22 Benjamin Beckwith
2011-02-16 23:31 ` Paul Sexton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://orgmode.org

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=loom.20110220T224023-532@post.gmane.org \
    --to=psexton@xnet.co.nz \
    --cc=emacs-orgmode@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Org-mode mailing list

Archives are clonable:
	git clone --mirror https://orgmode.org/list/0 list/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 list list/ https://orgmode.org/list \
		emacs-orgmode@gnu.org
	public-inbox-index list

Example config snippet for mirrors

Newsgroups are available over NNTP:
	nntp://news.yhetil.org/yhetil.emacs.orgmode
	nntp://news.gmane.io/gmane.emacs.orgmode


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git