CHARACTER-ENCODING

how bytes are parsed into characters
Major Section:  IO

When the Common Lisp reader comes across bytes in a file or at the terminal, they are parsed into characters. The simplest case is when each byte that is read is a standard character (see standard-char-p). It is actually quite common that each byte that is read corresponds to a single character. The parsing of bytes into characters is based on a character encoding, that is, a mapping that associates one or more bytes with each legal character.

In order to help guarantee the portability of files (including books), ACL2 installs a common character encoding for reading files, often known as iso-8859-1 or latin-1. For some host Lisps this character encoding is also used for reading from the terminal; but, sadly, this may not hold for all host Lisps, and may not even be possible for some of them.

The use of the above encoding could in principle cause problems if one's editor produces files using an encoding other than iso-8859-1, at least if one uses non-standard characters. In particular, the default Emacs buffer encoding may be utf-8. If your file has non-standard characters, then in Emacs you can evaluate the form

(setq save-buffer-coding-system 'iso-8859-1)
before saving the buffer into a file. This will happen automatically for users who load distributed file emacs/emacs-acl2.el into their Emacs sessions.

For an example of character encodings in action, see the distributed book books/misc/character-encoding-test.lisp.