One day on #haskell, someone mentioned the concept of how a string’s type should change when the string changes. This reminded me of some code I have in my project. It keeps bugging me, and I couldn’t articulate why. The reason, I now surmise, is because I am not implementing this concept. Here’s the code below, followed by some ideas of how I can begin to change it for the better. What I would like is some input to the effect of , “You’re on the right track.” or , “No, way off.”, or “Here’s this other thing you should be mindful of.”.
> processHTML :: String -> [[String]]
> processHTML htmlFILE =
> let parsedHTML = parseTags htmlFILE
> allTagOpens = sections (~== TagOpen "a" [("href","")]) parsedHTML
> taggedTEXT = head $ map (filter isTagOpen) allTagOpens
> allHREFS = map (fromAttrib "href") taggedTEXT
> allPotentials = map (dropWhile (/= '?')) allHREFS
> removedNulls = filter (not . null) allPotentials
> removedQs = map (drop 1) removedNulls
> in map (splitOn "&") removedQs
The idea here is I’m taking raw HTML and filtering out everything I don’t want until I get what I do want. Each let binding represents a stage in filtering. This could be the foundation of a data structure, like so:
> data Stage = Stage1 Foo
> | Stage2 Bar
> | Stage3 Baz
Where Foo Bar and Baz are the appropriate datatype; a String, or TagOpen for example, depending on what stage I am at in the filtering process. I could use this data type to get precise information when I add in the error handling code. Plus, it could help me keep track of what is happening when.
Feedback appreciated.
You’re on the right track.
First of all, when you’re building a long pipeline like this, you may prefer to compose functions directly:
This uses
Control.Category.(>>>), which is just (at least in this case) flipped function composition.Now for your actual question, it looks like you’re using the tagsoup package for parsing tags. This already does some type changing throughout the pipeline:
parseTagsgenerates aTag, some functions operate on it, and thenfromAttribgoes back to a String.Depending on how much work you’ll be doing, I might create a newtype:
Only the last line has changed here, to add the QE newtype tags to each element.
Depending on your use case, you could take a difference approach. For example, you may want to add more information to the URI instead of just collecting the query variables. Or you might want to fold over the query items and produce a
Map String Stringdirectly.Finally, if you’re trying to gain type safety, you usually wouldn’t make a sum type such as your
Stage. This is because each constructor creates a value of the same type, so the compiler can’t do any extra checking. Instead you’d create a separate type for each stage:It’s easy to create very fine-grained classes and data structures, but at some point they get out of hand. For example, in your functions
allPotentials,removedNulls, andremovedQs, you may want to just work on Strings. There isn’t a lot of semantic meaning that can be attached to the output of those stages, especially as they’re partial steps within a slightly larger process.