Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6863663
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 27, 20262026-05-27T02:48:43+00:00 2026-05-27T02:48:43+00:00

I was writing an generic enumerator to scrape sites as an exercise and I

  • 0

I was writing an generic enumerator to scrape sites as an exercise and I did it, and it is complete and works fine, but I have a question. You can find it here: https://github.com/mindreader/scrape-enumerator if you want to look at the code.

The basic idea is I wanted an enumerator that spits out site defined entries on pages like search engines, blogs, things where you have to fetch a page, and it will have 25 entries, and you want one entry at a time. But at the same time I didn’t want to write the plumbing for every site, so I wanted a generic interface. What I came up with is this (this uses type families):

class SiteEnum a where
  type Result a :: *
  urlSource :: a -> InputUrls (Int,Int)
  enumResults :: a -> L.ByteString -> Maybe [Result a]

data InputUrls state =
  UrlSet [URL] |
  UrlFunc state (state -> (state,URL)) |
  UrlPageDependent URL (L.ByteString -> Maybe URL)

In order to do this on every type of site, this requires a url source of some sort, which could be a list (possibly infinite) of pregenerated urls, or it could be an initial state and something to generate urls from it (like if the urls contained &page=1, &page=2, etc), and then for really screwed up pages like google, give an initial url and then provide a function that will search the body for the next link and then use that. Your site makes a data type an instance of SiteEnum and gives a type to Result which is site dependent and now the enumerator deals with all the I/O, and you don’t have to think about it. This works perfectly and I implemented one site with it.

My question is that there is an annoyance with this implementation is the InputUrls datatype. When I use UrlFunc everything is golden. When I use UrlSet or UrlPageDependent, it isn’t all fun and games because the state type is undefined, and I have to cast it to :: InputUrls () in order for it to compile. This seems totally unnecessary as that type variable due to the way the program is written, will never be used for the majority of sites, but I don’t know how to get around it. I’m finding that I want to use types like this in a lot of different contexts, and I always end up with stray type variables that only are needed in certain pieces of the datatype, but it doesn’t feel like I should be using it this way. Is there a better way of doing this?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-27T02:48:44+00:00Added an answer on May 27, 2026 at 2:48 am

    Why do you need the UrlFunc case at all? From what I understand, the only thing you’re doing with the state function is using it to build a list like the one in UrlSet anyway, so instead of storing the state function, just store the resulting list. That way, you can eliminate the state type variable from your data type, which should eliminate the ambiguity problems.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

My question is eerily similar to Writing a generic class to handle built-in types
This is a generic C++ design question. I'm writing an application that uses a
I'm writing a generic wrapper class for a bunch of classes we have defined
I have a statement that needs writing (with generic names for stuff, since this
I'm writing a bunch of generic-but-related functions to be used by different objects. I
I'm writing a generic control template for my WPF Custom Control. But with ItemsPresenter
I'm writing a C++ project and have a generic evaluate method in a class
I was writing a generic class to read RSS feed from various source and
I'm writing a generic class where I need to use Interlocked. T test1, test2;
I am writing a generic Http resource hosting service and am storing larger objects

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.