Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 580343
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 13, 20262026-05-13T14:27:45+00:00 2026-05-13T14:27:45+00:00

Our product is a distributed system. The modules I work on are fairly new,

  • 0

Our product is a distributed system. The modules I work on are fairly new, quite rigorous, well tested. They were developed with recent best practices in mind. Other modules can be considered as legacy software.

While I’m vigilant about everything that happens within modules I’m responsible for, I’m under constant pressure to work with bad data sent to me from the other modules. At heart, I’m a “Fail Fast” principle developer and as a result , when problems arise I usually am able to eliminate the possibility of error in my modules. It’s not so much about blame, just saving wasted effort in chasing bugs in the wrong places.

But the argument I keep coming up against is: “We can’t let this stuff fail in production, the customer expects this to work, why don’t you work around this problem”. And this would be an argument for robustness: be liberal in what you accept, conservative in what you send.

I should also note that these are mostly intermittent problems. We see them in integration tests but they are hard to reproduce. Timing and concurrency are involved.

I’m having a hard time balancing between the two principles. Part of it is my worry that if I start allowing and propagating exceptional data, I’m inviting trouble and I won’t have as much confidence in my system. But I can’t argue against keeping the system working even if other modules are sending me wrong data. The reason other modules aren’t getting fixed is that they are too complex and fragile, while mine still appear clear and safe. But if I don’t resist the pressure, my modules will slowly be saddled with the same problems I’ve been rejecting until now.

I should say that the system is not “crashing” in production, but my module may simply display an error to the operator and ask them to contact support. A crash would be a big problem, but if I’m reporting the error clearly, then isn’t this the right thing to do? I suspect that my peers just don’t want the customer to see any problems, period. But my module is rejecting data from other modules within our product, not customer input. So it seems to me that we are just not tackling problems.

So, do I need to be more pragmatic or hold my ground?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-13T14:27:46+00:00Added an answer on May 13, 2026 at 2:27 pm

    Thanks everyone. The case that prompted this question ended well, and partly thanks to insights I got from the answers above.

    My initial reaction was to stick to fail fast, but I thought about this some more, and had reached the conclusion that one of the roles of my module is to provide a stabilizing anchor to the rest of the system. That does not necessarily mean accepting bad data, but surfacing problems, isolating them and handling them in a transparent manner until we find a solution.

    I planned adding a new handler and code path for this case, which would properly execute as if it was a special use case that was previously undocumented.

    We had a discussion where I reiterated the need to deal with the problem at the boundary, but was also willing to help. I outlined my plan to the other side, because I had a suspicion that my position was viewed as overly pedantic, and that the solution was perceived as me only having to turn off spurious validation of harmless data, even if it was incorrect. In reality though, the way I work is largely data driven, so I explained why it has to be correct and how behavior is driven by it and how in accommodating this data I will be implementing a special code path.

    I think this gave weight to my position and it led to a more thorough discussion of the other side’s aversion to fixing the data. It turned out that it was more of a weariness of dealing with an error prone legacy system than an actual obstacle. There was a relatively simple solution, it was just scary to make a change, a mindset that’s fairly entrenched.

    But having aired all challenges and possible solutions, we eventually agreed to fix the data, and so far it seems to have solved our problem. Our integration tests are now passing consistently, but we have also added logging and will continue to monitor it.

    In summary, I think that for me, the synthesis of both principles is that fail fast is essential for surfacing problems. But once they do surface, robustness means providing a transparent path to continue operation in a way that does not compromise the system. I was able to offer that, and by doing so, won some goodwill from the other side and got the data fixed in the end.

    Again, thanks to everyone that responded. I’m too new to rate comments, but I do appreciate all the perspectives presented.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Our product contains a task-manager system that allows applications to run code in a
Our product has the requirement of exporting its native format (essentially an XML file)
In our product we have a big utilities file that we require (with do
How can our team gather requirements from our Product Owner in as low friction
Our company has for many years had multiple domain names to protect our product
We have a situation in our product where for a long time some data
As part of some error handling in our product, we'd like to dump some
I've a windows service that updates our product. It copies the product files into
I have to choose a platform for our product. I have to decide between
The need arose, in our product, to determine how long the current user has

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.