Register a free account to unlock additional features at BleepingComputer.com
Welcome to BleepingComputer, a free community where people like yourself come together to discuss and learn how to use their computers. Using the site is easy and fun. As a guest, you can browse and view the various discussions in the forums, but can not create a new topic or reply to an existing one unless you are logged in. Other benefits of registering an account are subscribing to topics and forums, creating a blog, and having no ads shown anywhere on the site.


Click here to Register a free account now! or read our Welcome Guide to learn how to use this site.

Generic User Avatar

Detecting Unicode in files in Windows 10


  • Please log in to reply
1 reply to this topic

#1 GomesEagle

GomesEagle

  •  Avatar image
  • Members
  • 2 posts
  • OFFLINE
  •  
  • Local time:02:20 AM

Posted 09 February 2021 - 04:32 AM

Now Windows 10 Notepad does not require unicode files to have the BOM header and it does not encode the header by default. This does break the existing code that checks the header to determine Unicode in files. How can I now tell in C++ if a file is in unicode? 
The code we have to determine Unicode:
int IsUnicode(const BYTE p2bytes[3])
{
if( p2bytes[0]==0xEF && p2bytes[1]==0xBB p2bytes[2]==0xBF)
return 1; // UTF-8
if( p2bytes[0]==0xFE && p2bytes[1]==0xFF)
return 2; // UTF-16 (BE)
if( p2bytes[0]==0xFF && p2bytes[1]==0xFE)
return 3; // UTF-16 (LE)

return 0;
}
If it's so much pain, why isn't there a typical function to determine the encoding?


Edited by Chris Cosgrove, 09 February 2021 - 04:36 AM.
Moved from Win 10 support to Programming.


BC AdBot (Login to Remove)

 


#2 GomesEagle

GomesEagle
  • Topic Starter

  •  Avatar image
  • Members
  • 2 posts
  • OFFLINE
  •  
  • Local time:02:20 AM

Posted 11 February 2021 - 12:09 AM

Now Windows 10 Notepad does not require unicode files to have the BOM header and it does not encode the header by default. This does break the existing code that checks the header to determine Unicode in files. How can I now tell in C++ if a file is in unicode? 
The code we have to determine Unicode:
int IsUnicode(const BYTE p2bytes[3])
{
if( p2bytes[0]==0xEF && alaskasworld[1]==0xBB p2bytes[2]==0xBF)
return 1; // UTF-8
if( p2bytes[0]==0xFE && p2bytes[1]==0xFF)
return 2; // UTF-16 (BE)
if( p2bytes[0]==0xFF && p2bytes[1]==0xFE)
return 3; // UTF-16 (LE)

return 0;
}
If it's so much pain, why isn't there a typical function to determine the encoding?

I'm happy to see the issue is now solved. 


Edited by GomesEagle, 11 February 2021 - 12:10 AM.





1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users