This relates to a project to convert a 2-way ANOVA program in SAS to Python.
I pretty much started trying to learn the language Thursday, so I know I have a lot of room for improvement. If I’m missing something blatantly obvious, by all means, let me know. I haven’t got Sage up and running yet, nor numpy, so right now, this is all quite vanilla Python 2.6.1. (portable)
Primary query: Need a good set of list comprehensions that can extract the data in lists of samples in lists by factor A, by factor B, overall, and in groups of each level of factors A&B (AxB).
After some work, the data is in the following form (3 layers of nested lists):
response[a][b][n]
(meaning [a1 [b1 [n1, … ,nN] …[bB [n1, …nN]]], … ,[aA [b1 [n1, … ,nN] …[bB [n1, …nN]]]
Hopefully that’s clear.)
Factor levels in my example case: A=3 (0-2), B=8 (0-7), N=8 (0-7)
byA= [[a[i] for i in range(b)] for a[b] in response]
(Can someone explain why this syntax works? I stumbled into it trying to see what the parser would accept. I haven’t seen that syntax attached to that behavior elsewhere, but it’s really nice. Any good links on sites or books on the topic would be appreciated. Edit: Persistence of variables between runs explained this oddity. It doesn’t work.)
byB=lstcrunch([[Bs[i] for i in range(len(Bs)) ]for Bs in response])
(It bears noting that zip(*response) almost does what I want. The above version isn’t actually working, as I recall. I haven’t run it through a careful test yet.)
byAxB= [item for sublist in response for item in sublist]
(Stolen from a response by Alex Martelli on this site. Again could someone explain why? List comprehension syntax is not very well explained in the texts I’ve been reading.)
ByO= [item for sublist in byAxB for item in sublist]
(Obviously, I simply reused the former comprehension here, ’cause it did what I need. Edit:)
I’d like these to end up the same datatypes, at least when looped through by the factor in question, s.t. that same average/sum/SS/et cetera functions can be applied and used.
This could easily be replaced by something cleaner:
def lstcrunch(Dlist):
"""Returns a list containing the entire
contents of whatever is imported,
reduced by one level.
If a rectangular array, it reduces a dimension by one.
lstcrunch(DataSet[a][b]) -> DataOutput[a]
[[1, 2], [[2, 3], [2, 4]]] -> [1, 2, [2, 3], [2, 4]]
"""
flat=[]
if islist(Dlist):#1D top level list
for i in Dlist:
if islist(i):
flat+= i
else:
flat.append(i)
return flat
else:
return [Dlist]
Oh, while I’m on the topic, what’s the preferred way of identifying a variable as a list?
I have been using:
def islist(a):
"Returns 'True' if input is a list and 'False' otherwise"
return type(a)==type([])
Parting query:
Is there a way to explicitly force a shallow copy to convert to a deep? copy? Or, similarly, when copying into a variable, is there a way of declaring that the assignment is supposed to replace the pointer, too, and not merely the value? (s.t.the assignment won’t propagate to other shallow copies) Similarly, using that might be useful, as well, from time to time, so being able to control when it does or doesn’t occur sounds really nice.
(I really stepped all over myself when I prepared my table for inserting by calling:
response=[[[0]*N]*B]*A
)
Edit:
Further investigation lead to most of this working fine. I’ve since made the class and tested it. it works fine. I’ll leave the list comprehension forms intact for reference.
def byB(array_a_b_c):
y=range(len(array_a_b_c))
x=range(len(array_a_b_c[0]))
return [[array_a_b_c[i][j][k]
for k in range(len(array_a_b_c[0][0]))
for i in y]
for j in x]
def byA(array_a_b_c):
return [[repn for rowB in rowA for repn in rowB]
for rowA in array_a_b_c]
def byAxB(array_a_b_c):
return [rowB for rowA in array_a_b_c
for rowB in rowA]
def byO(array_a_b_c):
return [rep
for rowA in array_a_b_c
for rowB in rowA
for rep in rowB]
def gen3d(row, col, inner):
"""Produces a 3d nested array without any naughty shallow copies.
[row[col[inner]] named s.t. the outer can be split on, per lprn for easy display"""
return [[[k for k in range(inner)]
for i in range(col)]
for j in range(row)]
def lprn(X):
"""This prints a list by lines.
Not fancy, but works"""
if isiterable(X):
for line in X: print line
else:
print x
def isiterable(a):
return hasattr(a, "__iter__")
Thanks to everyone who responded. Already see a noticeable improvement in code quality due to improvements in my gnosis. Further thoughts are still appreciated, of course.
I am sure A.M. will be able to give you a good explanation. Here is my stab at it while waiting for him to turn up.
I would approach this from left to right. Take these four words:
I hope you can see the resemblance to a regular
forloop. These four words are doing the ground work for performing some action on eachsublistinresponse. It appears thatresponseis a list of lists. In that casesublistwould be a list for each iteration throughresponse.This is again another
forloop in the making. Given that we first heard aboutsublistin the previous “loop” this would indicate that we are now traversing through sublist, oneitemat a time. If I were to write these loops out without comprehensions it would look like this:Next, we look at the remaining words.
[,itemand]. This effectively means, collect items in a list and return the resulting list.Whenever you have trouble creating or understanding list iterations write the relevant
forloops out and then compress them:This will compress to:
Dive Into Python has a section dedicated to list comprehensions. There is also this nice tutorial to read through.
Update
I forgot to say something. List comprehensions are another way of achieving what has been traditionally done using
mapandfilter. It would be a good idea to understand howmapandfilterwork if you want to improve your comprehension-fu.