Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 371289
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 12, 20262026-05-12T14:06:01+00:00 2026-05-12T14:06:01+00:00

I have a date import project in which the clients send ANSI-latin1 encoded files

  • 0

I have a date import project in which the clients send ANSI-latin1 encoded files (iso-8859-1). However… It seems that on a weekly basis we get a surprise file, one that is not of the correct format and the import basically dies horribly and needs manual intervention to recover and move on… Most common bad file formats seem to be excel, compress file or an XML/HTML file…

So in order to mitigate the human intervention, I would like to reasonably determine if we have a strong ANSI candidate file, before attempting to go through each line of the file looking for 1 of 64 bad characters and then making a guestimate on whether the whole line or file is bad on the # of bad characters found…

I was thinking of maybe making a Unicode/UTF check and/or magic number check or evening trying to check for a few specific application types.. The files have no file extensions so any check would be by examining the content and any fast way to rule out the file as non-ANSI would be perfect, since the import process needs to process 100-500 records a second.

NOTE: Over 100 different types of bad files have been sent to us, including images and PDF’s. So there is a concern about whether you can easily and quickly rule out LTOS of different non ANSI types rather than specifically targeting just a few…

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-12T14:06:01+00:00Added an answer on May 12, 2026 at 2:06 pm

    Given your example “bad” files types, I’d say put in series of quick checks on the first few bytes of the file:

    • Is a UTF-16 BOM?
    • Is it “<html” or “<!DOCTYPE“?
    • Is it “<xml“?
    • Does it have a NUL character?
    • Is it `PK\003\004′ (the zip file header?)
    • Is it (whatever Excel files start with; you’ll have to look that one up 😎
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a date string of the form '2009/05/13 19:19:30 -0400'. It seems that
I have a date output like >>> import time >>> print time.strftime(%d %B) 19
I have a web project in asp.net in which i have dashboard on which
I have a MSBuild project and I want the current date to be added
I have a new project on django, in which im using Grappelli and filebrowser,
I have date 2012-04-02 10:01:00 am I want to show this date in with
I have date formatted in this way: 2010-02-04T00:00:00 I use this code to parse
I have date in sting object in following format. dd/mm/yyyy I want to change
I have a Date format: 2009-08-10T16:03:03Z that I want to convert to: @MMM dd,
I have a date time property in a linq to sql model and I

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.