Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 468917
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 12, 20262026-05-12T23:43:48+00:00 2026-05-12T23:43:48+00:00

I am struggling a bit with how I can unit test parsing a file…

  • 0

I am struggling a bit with how I can unit test parsing a file… Let’s say I have a file with 25 columns that could be anywhere from 20-1000 records long… How do I write a unit test against that? The function takes the file as a string as parameter and returns a DataTable with the file contents…

The best I can come up with is parsing a 4 record file and only checking the top left and bottom right ‘corners’… e.g. the first few fields in the 2 top records and the last few fields of the 2 bottom records… I couldn’t imagine having to tediously hand-type assert statements for every single field in the file. And doing just one record and every field seems just as weak, since it doesn’t account for scenarios of multiple record files or unexpected data.

That seemed ‘good enough’ at the time… however now I’m working on a new project which is essentially the parsing of various PDF files coming in from 10 different sources, each source has 4-6 different formats for their files, so about 40-60 parsing routines. We may eventually fully automate 25 additional sources down the road. We take the PDF and convert it to excel using a 3rd party tool.. then we sit and analyze the patterns in the output, and write the code that calls the API of the tool, takes the excel file and parses it – stripping out the garbage, sorting around data thats in different places, cleaning it etc..

How realitically can I unit test something like this?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-12T23:43:49+00:00Added an answer on May 12, 2026 at 11:43 pm

    I am not sure I fully understand the problem, but here is one idea. Collect a bunch of sample files that represent diverse formats and edge cases. Run the conversion to your DataTables and manually inspect the DataTables the first time to ensure they are correct. Then serialize the DataTable’s to XML format and store them in your unit test suite along with your test case PDF files.

    Your automated unit tests could perform the conversion from PDF to DataTable and compare the results against the respective “approved” serialized DataTable representation.

    You could build up a library of test documents over time using this method. Failures in your unit tests would indicate that changes to the parsing routines have broken a particular edge case.

    There’s one ‘catch’ though. I my first
    example I was talking of a .NET
    application. However, this new project
    with the 40 possibly ‘scrubbing
    scripts’ is written in VBA…. The
    input is an Excel Spreadsheet and the
    output is an Excel spreadsheet… how
    could I serialize this? Maybe do a
    checksum on the entire file????

    For the second example if the Excel spreadsheets are not too complicated you could try to create a cell by cell comparison routine like this one; perhaps you could wrap this into a custom Assert.AreExcelWorksheetsEqual(). You are right though, a checksum might work just as well.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Im struggling a bit with the UIkeyboard, I have a datePicker, IntervalPicker some custom
I'm a bit struggling with the setup of my models. I have companies which
I've been struggling with LINQ a bit, and was after help. I could do
I’m struggling a little bit with some XAML syntax I hope someone can advise
Hi all i'm struggling with this a bit and wonder if someone could lend
struggling a bit with something. I have built a proof of concept and googled
I have a bit of an odd situation, and I'm struggling with populating a
I am struggling a bit to figure this one out. I'm working on an
New to wpf and therefore struggling a bit. I am putting together a quick
After attending a talk on Oslo/M I am struggling a bit to see the

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.