Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 262603
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 11, 20262026-05-11T22:31:49+00:00 2026-05-11T22:31:49+00:00

I am really used to doing grep -iIr on the Unix shell but I

  • 0

I am really used to doing grep -iIr on the Unix shell but I haven’t been able to get a PowerShell equivalent yet.

Basically, the above command searches the target folders recursively and ignores binary files because of the “-I” option. This option is also equivalent to the --binary-files=without-match option, which says “treat binary files as not matching the search string”

So far I have been using Get-ChildItems -r | Select-String as my PowerShell grep replacement with the occasional Where-Object added. But I haven’t figured out a way to ignore all binary files like the grep -I command does.

How can binary files be filtered or ignored with Powershell?

So for a given path, I only want Select-String to search text files.

EDIT: A few more hours on Google produced this question How to identify the contents of a file is ASCII or Binary. The question says “ASCII” but I believe the writer meant “Text Encoded”, like myself.

EDIT: It seems that an isBinary() needs to be written to solve this issue. Probably a C# commandline utility to make it more useful.

EDIT: It seems that what grep is doing is checking for ASCII NUL Byte or UTF-8 Overlong. If those exists, it considers the file binary. This is a single memchr() call.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-11T22:31:49+00:00Added an answer on May 11, 2026 at 10:31 pm

    On Windows, file extensions are usually good enough:

    # all C# and related files (projects, source control metadata, etc)
    dir -r -fil *.cs* | ss foo
    
    # exclude the binary types most likely to pollute your development workspace
    dir -r -exclude *exe, *dll, *pdb | ss foo
    
    # stick the first three lines in your $profile (refining them over time)
    $bins = new-list string
    $bins.AddRange( [string[]]@("exe", "dll", "pdb", "png", "mdf", "docx") )
    function IsBin([System.IO.FileInfo]$item) { !$bins.Contains($item.extension.ToLower()) }
    dir -r | ? { !IsBin($_) } | ss foo
    

    But of course, file extensions are not perfect. Nobody likes typing long lists, and plenty of files are misnamed anyway.

    I don’t think Unix has any special binary vs text indicators in the filesystem. (Well, VMS did, but I doubt that’s the source of your grep habits.) I looked at the implementation of Grep -I, and apparently it’s just a quick-n-dirty heuristic based on the first chunk of the file. Turns out that’s a strategy I have a bit of experience with. So here’s my advice on choosing a heuristic function that is appropriate for Windows text files:

    • Examine at least 1KB of the file. Lots of file formats begin with a header that looks like text but will bust your parser shortly afterward. The way modern hardware works, reading 50 bytes has roughly the same I/O overhead as reading 4KB.
    • If you only care about straight ASCII, exit as soon you see something outside the character range [31-127 plus CR and LF]. You might accidentally exclude some clever ASCII art, but trying to separate those cases from binary junk is nontrivial.
    • If you want to handle Unicode text, let MS libraries handle the dirty work. It’s harder than you think. From Powershell you can easily access the IMultiLang2 interface (COM) or Encoding.GetEncoding static method (.NET). Of course, they are still just guessing. Raymond’s comments on the Notepad detection algorithm (and the link within to Michael Kaplan) are worth reviewing before deciding exactly how you want to mix & match the platform-provided libraries.
    • If the outcome is important — ie a flaw will do something worse than just clutter up your grep console — then don’t be afraid to hard-code some file extensions for the sake of accuracy. For example, *.PDF files occasionally have several KB of text at the front despite being a binary format, leading to the notorious bugs linked above. Similarly, if you have a file extension that is likely to contain XML or XML-like data, you might try a detection scheme similar to Visual Studio’s HTML editor. (SourceSafe 2005 actually borrows this algorithm for some cases)
    • Whatever else happens, have a reasonable backup plan.

    As an example, here’s the quick ASCII detector:

    function IsAscii([System.IO.FileInfo]$item)
    {
        begin 
        { 
            $validList = new-list byte
            $validList.AddRange([byte[]] (10,13) )
            $validList.AddRange([byte[]] (31..127) )
        }
    
        process
        {
            try 
            {
                $reader = $item.Open([System.IO.FileMode]::Open)
                $bytes = new-object byte[] 1024
                $numRead = $reader.Read($bytes, 0, $bytes.Count)
    
                for($i=0; $i -lt $numRead; ++$i)
                {
                    if (!$validList.Contains($bytes[$i]))
                        { return $false }
                }
                $true
            }
            finally
            {
                if ($reader)
                    { $reader.Dispose() }
            }
        }
    }
    

    The usage pattern I’m targeting is a where-object clause inserted in the pipeline between “dir” and “ss”. There are other ways, depending on your scripting style.

    Improving the detection algorithm along one of the suggested paths is left to the reader.

    edit: I started replying to your comment in a comment of my own, but it got too long…

    Above, I looked at the problem from the POV of whitelisting known-good sequences. In the application I maintained, incorrectly storing a binary as text had far worse consequences than vice versa. The same is true for scenarios where you are choosing which FTP transfer mode to use, or what kind of MIME encoding to send to an email server, etc.

    In other scenarios, blacklisting the obviously bogus and allowing everything else to be called text is an equally valid technique. While U+0000 is a valid code point, it’s pretty much never found in real world text. Meanwhile, \00 is quite common in structured binary files (namely, whenever a fixed-byte-length field needs padding), so it makes a great simple blacklist. VSS 6.0 used this check alone and did ok.

    Aside: *.zip files are a case where checking for \0 is riskier. Unlike most binaries, their structured “header” (footer?) block is at the end, not the beginning. Assuming ideal entropy compression, the chance of no \0 in the first 1KB is (1-1/256)^1024 or about 2%. Luckily, simply scanning the rest of the 4KB cluster NTFS read will drive the risk down to 0.00001% without having to change the algorithm or write another special case.

    To exclude invalid UTF-8, add \C0-C1 and \F8-FD and \FE-FF (once you’ve seeked past the possible BOM) to the blacklist. Very incomplete since you’re not actually validating the sequences, but close enough for your purposes. If you want to get any fancier than this, it’s time to call one of the platform libraries like IMultiLang2::DetectInputCodepage.

    Not sure why \C8 (200 decimal) is on Grep’s list. It’s not an overlong encoding. For example, the sequence \C8 \80 represents Ȁ (U+0200). Maybe something specific to Unix.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

G'day, I have never really used excel formulas before but need to convert a
I'm trying to debug my code. I haven't really used a debugger before. I
I've used MySQL (via PHPMyAdmin) a lot before but never really understood half of
I've been looking at web frameworks like Rails, Grails, etc. I'm used to doing
I'm used to program in C# or Java, so I'm doing really bad in
I've been doing ASP.NET development for a little while now, and I've used both
I really like Backbone, but I am having the hardest time doing what would
I'm trying to animate an image (used as background image but it doesn't really
I must be doing really stupid here. I've been troubleshooting this for 3 days
This really shouldn't be this complicated, which suggests I'm doing something wrong, but after

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.