I know that the topic of flattening a nested list has been covered in great detail before, however I think my task is a bit different and I couldn’t find any info.
I am writing a scraper, and as output I get a nested list. The top level list elements are supposed to become rows for data in spreadsheet form. However, since the nested lists are often of different lengths, I need to expand them before flattening the list.
Here’s an example. I have
[ [ "id1", [["x", "y", "z"], [1, 2]], ["a", "b", "c"]],
[ "id2", [["x", "y", "z"], [1, 2, 3]], ["a", "b"]],
[ "id3", [["x", "y"], [1, 2, 3]], ["a", "b", "c", ""]] ]
The output I ultimately want is
[[ "id1", "x", "y", z, 1, 2, "", "a", "b", "c", ""],
[ "id2", "x", "y", z, 1, 2, 3, "a", "b", "", ""],
[ "id3", "x", "y", "", 1, 2, 3, "a", "b", "c", ""]]
However, an intermediate list like this
[ [ "id1", [["x", "y", "z"], [1, 2, ""]], ["a", "b", "c", ""]],
[ "id2", [["x", "y", "z"], [1, 2, 3]], ["a", "b", "", ""]],
[ "id3", [["x", "y", ""], [1, 2, 3]], ["a", "b", "c", ""]] ]
which I can then simply flatten would also be fine.
The top-level list elements (rows) are built in every iteration, and appended to the full list. I guess it is easier to transform the full list at the end?
The structure in which elements are nested should be the same, however I cannot be certain of it at this point. I guess I have a problem if the structure where to look like this.
[ [ "id1", [[x, y, z], [1, 2]], ["a", "b", "c"]],
[ "id2", [[x, y, z], [1, 2, 3]], ["bla"], ["a", "b"]],
[ "id3", [[x, y], [1, 2, 3]], ["a", "b", "c", ""]] ]
which should become
[[ "id1", x, y, z, 1, 2, "", "", "a", "b", "c", ""],
[ "id2", x, y, z, 1, 2, 3, "bla", "a", "b", "", ""],
[ "id3", x, y, "", 1, 2, 3, "", "a", "b", "c", ""]]
Thanks for any comments, and please excuse if this is trivial, I am rather new to Python.
I’ve got a simple solution for the “same structure” case, using a recursive generator and the
izip_longestfunction fromitertools. This code is for Python 2, but with a few tweaks (noted in comments) it can be made to work on Python 3:In Python 3.3 it will become even simpler, thanks to PEP 380 which will allow the recursive step,
for e in _flatengen(element): yield e, to becomeyield from _flattengen(element).