Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7702501
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 31, 20262026-05-31T23:14:30+00:00 2026-05-31T23:14:30+00:00

We have an external service that continuously sends us data. For the sake of

  • 0

We have an external service that continuously sends us data. For the sake of simplicity lets say this data has three strings in tab delimited fashion.

datapointA datapointB datapointC

This data is received by one of our servers and then is forwarded to a processing engine where something meaningful is done with this dataset.

One of the requirements of the processing engine is that duplicate results will not be processed by the processing engine. So for instance on day1, the processing engine received
A B C, and on day 243, the same A B C was received by the server. In this particular situation, the processing engine will spit out a warning,”record already processed” and not process that particular record.

There may be a few ways to solve this issue:

  • Store the incoming data in an in-memory HashSet, and set exculsion
    will indicate the processing status of the particular record.
    Problems will arise when we have this service running with zero
    downtime and depending on the surge of data, this collection can
    exceed the bounds of memory. Also, in case of system outages, this
    data needs to be persisted someplace.

  • Store the incoming data in the database and the next set of data will
    only be processed if the data is not present in the database. This
    helps with the durability of the history in case of some catastrophe
    but there’s the overhead of maintaing proper-indexes and aggressive
    sharding in the case of performance related issues.

….or some other technique

Can somebody point out some case-studies or established patterns or practices to solve this particular issue?

Thanks

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-31T23:14:31+00:00Added an answer on May 31, 2026 at 11:14 pm

    you need some kind of backing store, for persistence, whatever the solution. so no matter how much work that has to be implemented. but it doesn’t have to be an sql database for something so simple – alternative to memcached that can persist to disk

    in addition to that, you could consider bloom filters for reducing the in-memory footprint. these can give false positives, so then you would need to fall back to a second (slower but reliable) layer (which could be the disk store).

    and finally, the need for idempotent behaviour is really common in messaging/enterprise systems, so a search like this turns up more papers/ideas (not sure if you’re aware that “idempotent” is a useful search term).

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a PHP script that grabs data from an external service and saves
I have an application that sends messages to an external web service. I build
I have an ASP.NET app that gets its data form an external service, via
We have an external service that is currently accessible via the http (port 80,
I Have an internal SOAP Web service that is being called from an external
I have a background thread that handles communication with an external service. Each time
I have a Windows Service written in C# that handles all of our external
I have an xmpp/ejabberdb app that uses an external service to provide eventing features,
I have an external Web Service that returns back its own object, and I
I'm writing tests that check that an external service is providing inventory data (on

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.