notepad-plus-plus-legacy/PowerEditor/misc/npp.help/HTML/Encoding.html

43 lines
2.8 KiB
HTML
Raw Normal View History

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html><head><meta content="text/html; charset=utf-8" http-equiv="content-type"><title>Encoding</title>
<link rel="stylesheet" href="styles.css" type="text/css">
</head>
<body>
<h1>Encoding</h1>
<p>
Text
can be encoded in multiple ways. Most (older) text files use an
encoding named ANSI, which has room for a limited amount of different
characters, but is often sufficient to display all the text. However,
Unicode encodings allow for a much richer amount of characters,
allowing a single file to contain many languages at once, at the cost
of an increase in filesize. Notepad++ will automatically try to
detect the encoding used when opening a file, but allows you to
change it when editing it. To simply change the displayed encoding
(without modifying the actual text), select one of the&nbsp;<span class="menu_item">Format-&gt;Encode in</span>
options from the Format menu. The convert the text to a certain
encoding, select one of the&nbsp;<span class="menu_item">Format-&gt;Convert to</span> options in the format menu.<p>
It
can happen that a file is saved with a certain encoding, but upon
reopening it in Notepad++ it is detected with another encoding. This
is a technical limitation and happens because sometimes the resulting
file will not differ even though different encodings are used. This
is most noticeable if the file is saved without a special BOM (Byte
Order Mark) indicating the used encoding.<p>Notepad++ offers the following encoding schemes:
<dl>
<dt>ANSI
<dd> Older encoding, smallest filesize but error prone due to use of various codepages
<dt>UTF-8
<dd> Unicode encoding, most Western character take one byte of filesize,
but other character can take up more, 3 to 4 most commonly. A three
byte BOM will be added upon save.
<dt>UTF-8 without BOM
<dd> Like UTF-8, but no BOM is added. Saves three bytes, but makes encoding detection harder.
<dt>UTF-16 Little Endian
<dd> All characters are two bytes in size, pairs are Little Endian ordered. A 2 byte BOM is added upon save.
<dt>UTF-16 Big Endian
<dd> All characters are two bytes in size, pairs are Big Endian ordered. A 2 byte BOM is added upon save.
</dl>
<p>In addition, since version 5.6, Notepad++ supports changing the character set being used to display the text, exactly the way you can change it on most web browsers. Thiese encodings are available using the <span class="menu_item">Character sets</span> menu entry which comes right after the <span class="menu_item">Encode in ...</span> family items.
<p>Note that, for HTML and XML files, Notepad++ attempts to detect the encoding being used when the file is opened, thus avoiding a number of errors which may not show before the file is being used on a server.
</body></html>