2009-11-23 01:34:38 +00:00
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
2010-02-21 22:23:10 +00:00
< html > < head > < meta content = "text/html; charset=utf-8" http-equiv = "content-type" > < title > Encoding< / title >
2009-11-23 01:34:38 +00:00
< link rel = "stylesheet" href = "styles.css" type = "text/css" >
2009-07-02 12:45:08 +00:00
< / head >
< body >
2009-11-23 01:34:38 +00:00
< h1 > Encoding< / h1 >
2009-07-02 12:45:08 +00:00
< p >
Text
2009-11-23 01:34:38 +00:00
can be encoded in multiple ways. Most (older) text files use an
2009-07-02 12:45:08 +00:00
encoding named ANSI, which has room for a limited amount of different
characters, but is often sufficient to display all the text. However,
Unicode encodings allow for a much richer amount of characters,
allowing a single file to contain many languages at once, at the cost
of an increase in filesize. Notepad++ will automatically try to
detect the encoding used when opening a file, but allows you to
change it when editing it. To simply change the displayed encoding
(without modifying the actual text), select one of the < span class = "menu_item" > Format-> Encode in< / span >
options from the Format menu. The convert the text to a certain
encoding, select one of the < span class = "menu_item" > Format-> Convert to< / span > options in the format menu.< p >
It
can happen that a file is saved with a certain encoding, but upon
reopening it in Notepad++ it is detected with another encoding. This
is a technical limitation and happens because sometimes the resulting
file will not differ even though different encodings are used. This
is most noticeable if the file is saved without a special BOM (Byte
2009-11-23 01:34:38 +00:00
Order Mark) indicating the used encoding.< p > Notepad++ offers the following encoding schemes:
< dl >
< dt > ANSI
< dd > Older encoding, smallest filesize but error prone due to use of various codepages
2009-07-02 12:45:08 +00:00
< dt > UTF-8
< dd > Unicode encoding, most Western character take one byte of filesize,
but other character can take up more, 3 to 4 most commonly. A three
2009-11-23 01:34:38 +00:00
byte BOM will be added upon save.
< dt > UTF-8 without BOM
< dd > Like UTF-8, but no BOM is added. Saves three bytes, but makes encoding detection harder.
< dt > UTF-16 Little Endian
2010-02-21 22:23:10 +00:00
< dd > All characters are two bytes in size, pairs are Little Endian ordered. A 2 byte BOM is added upon save.
2009-11-23 01:34:38 +00:00
< dt > UTF-16 Big Endian
2010-02-21 22:23:10 +00:00
< dd > All characters are two bytes in size, pairs are Big Endian ordered. A 2 byte BOM is added upon save.
2009-11-23 01:34:38 +00:00
< / dl >
2009-12-05 01:46:08 +00:00
< p > In addition, since version 5.6, Notepad++ supports changing the character set being used to display the text, exactly the way you can change it on most web browsers. Thiese encodings are available using the < span class = "menu_item" > Character sets< / span > menu entry which comes right after the < span class = "menu_item" > Encode in ...< / span > family items.
< p > Note that, for HTML and XML files, Notepad++ attempts to detect the encoding being used when the file is opened, thus avoiding a number of errors which may not show before the file is being used on a server.
2009-07-02 12:45:08 +00:00
< / body > < / html >