This PR fixes UTF-8 detection for 4 byte characters (a 2002 code used by npp assumed characters longer than 3 bytes are invalid -.-). This means such files will not be erroreously displayed as ANSI anymore.
Steps to reproduce:
Create a new UTF-8 file (w/out BOM)
Paste eg. this character 🍪 and save.
Reopen the file again.
Prior to this PR, file is detected as ANSI (even if Notepad++ is configured to default-assume UTF-8!!!). After this fix, file gets opened as UTF-8 correctly.
Fixes#4730, Fixes#3986, Fixes#3441, Fixes#3405, Closes#4922
(Fixes#725)
Open a file of 3 bytes length with '\0' in the middle, only 1 character
shown in editor.
Such file is detected as UTF16 w/o BOM, that makes the wrong length
interpretation. Adding the "len mod 2 == 0" condition to enhance the
detection is the only solution I can find so far.
When loading a file via `FileManager::loadFileData`, a fixed-length buffer
is filled via `fread`. Then, in some cases, a conversion is done with the help
of `Utf8_16_Read`. However, the method `Utf8_16_Read::convert` performs a call
to `strlen` on this buffer. This is obviously wrong: `\0` char should be
accepted (even if a bit strange) and the buffer is not zero-terminated.
The changes merely consist in adding an additional parameter `length` to
not have to guess the size of the buffer.