Jump to content


 


Register a free account to unlock additional features at BleepingComputer.com
Welcome to BleepingComputer, a free community where people like yourself come together to discuss and learn how to use their computers. Using the site is easy and fun. As a guest, you can browse and view the various discussions in the forums, but can not create a new topic or reply to an existing one unless you are logged in. Other benefits of registering an account are subscribing to topics and forums, creating a blog, and having no ads shown anywhere on the site.


Click here to Register a free account now! or read our Welcome Guide to learn how to use this site.

Photo

identifying file types


  • Please log in to reply
4 replies to this topic

#1 BobLewiston

BobLewiston

  • Members
  • 69 posts
  • OFFLINE
  •  
  • Local time:11:50 PM

Posted 28 January 2009 - 06:04 PM

Within a C# program, is there any way to tell if a file is a text file or not? I mean a real way, not just basing your conclusion on a file name extension.

BC AdBot (Login to Remove)

 


#2 groovicus

groovicus

  • Security Colleague
  • 9,963 posts
  • OFFLINE
  •  
  • Gender:Male
  • Location:Centerville, SD
  • Local time:09:50 PM

Posted 28 January 2009 - 06:31 PM

Yes, but I couldn't tell you how to do it. In the header of each file, there is information as to what type of file it is.
http://en.wikipedia.org/wiki/Magic_number_...umbers_in_files

#3 Platypus

Platypus

  • Global Moderator
  • 15,181 posts
  • OFFLINE
  •  
  • Gender:Male
  • Location:Australia
  • Local time:02:50 PM

Posted 28 January 2009 - 09:02 PM

I don't think there's actually anything about a file that will uniquely identify it as a text file. The only thing that makes it be a text file is that the ASCII values of the relevant bytes form recognisable text. There's no header required for plain ASCII text, but finding a regular pattern of line breaks and a single terminating EOF character would indicate it probably is intended as a text file. Finding multiple EOF characters or a single EOF in the body of the file would indicate it's either not a text file, or not plain ASCII text. But I can't think of a characteristic that would ID a file as absolutely being a plain ASCII text file.
Top 5 things that never get done:

1.

#4 Billy O'Neal

Billy O'Neal

    Visual C++ STL Maintainer


  • Malware Response Team
  • 12,304 posts
  • OFFLINE
  •  
  • Gender:Male
  • Location:Redmond, Washington
  • Local time:08:50 PM

Posted 30 January 2009 - 08:08 PM

.TXT files have no EOF character.

You could look for files which are entirely printable characters, however that leaves the problem of seperate encodings, which appear to print nonprintables under certain conditions (i.e. reading Windows Unicode (UTF-16) as ASCII).

Billy3
Twitter - My statements do not establish the official position of Microsoft Corporation, and are my own personal opinion. (But you already knew that, right?)
Posted Image

#5 Platypus

Platypus

  • Global Moderator
  • 15,181 posts
  • OFFLINE
  •  
  • Gender:Male
  • Location:Australia
  • Local time:02:50 PM

Posted 31 January 2009 - 08:52 AM

.TXT files have no EOF character.

Heh, showing my age by unconsciously assuming CP/M backward compatability? :thumbsup:

An EOF marker isn't required under DOS/Windows, but can be present, it's not safe to assume a .TXT file will never contain one.

I should of course have been more precise and said "finding a regular pattern of line breaks and at most a single terminating EOF character", as my point was that finding multiple or out-of position EOF could indicate a file was not a plain text file, even if it showed a regular CRLF pattern.

Edited by Platypus, 31 January 2009 - 08:53 AM.

Top 5 things that never get done:

1.




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users