I have text file(test.data), which include some values and class name, for example
4.5,3.5,U1
4.5,10.5,U2
4.5,6,U1
3.5,10.5,U2
3.5,10.5,U2
5,7,U1
7,6.5,U1
I need to classify this data to rows(matrix), where 1,2 row etc is data, last one is class. So I started with this code:
reader = csv.reader(open('test.data', 'r'))
result = []
for row in reader:
result.append(row)
print result
output:
[['4.5', '3.5', 'U1'], ['4.5', '10.5', 'U2'], ['4.5', '6', 'U1'], ['3.5', '10.5', 'U2'], ['3.5', '10.5', 'U2'], ['5', '7', 'U1'], ['7', '6.5', 'U1']]
This all work ok, but now I need from this data make matrix classification. In this case I want to make matrix:
test data=[data1, data2,.....,class name1]
test data2=[data1, data2,.....,class name2]...
I need this “matrix”(test data,test data2), because I will then choose from every test data just 2/3 data which will be named “choosen“, other 1/3 data must stay in test data,….
So what I need as output:
choosen=[data,data,......class name1] # 2/3 from every **test.data**
test data=[data1, data2,.....,class name1] # other 1/3 from test data
test data2=[data1, data2,.....,class name1] # other 1/3 from test data 2
.
. . . .
.
Many thanks for help
EDIT2:
If I use your code I get:
{
'U1': [
['4.5','3.5'],
['4.5','6'],
['5','7'],
['7','6.5']
],
'U2': [
['4.5','10.5'],
['3.5','10.5'],
['3.5','10.5']
]
}
But I don’t have everytime this data:
4.5,3.5,U1
4.5,10.5,U2
4.5,6,U1
3.5,10.5,U2
3.5,10.5,U2
5,7,U1
7,6.5,U1
I have also:
4.5,3.5,4.5,10.5, U1
3.5,10.5,3.5,10.5,U2
4.5,12.5,3.5,12.5,U2
……
(so I don’t know that class is on second row as you write on your code), but I know that last row is CLASS
So how can I change your code:
reader = csv.reader(open('test.data', 'r'))
result = {}
for row in reader:
uclass=row[2] #-------> must be last row not second !!!!
if result.has_key(uclass):
result[uclass].append([row[0],row[1]]) #---->not just 2 row's, on other data I have for example 5 rows..
else:
result[uclass]=[[row[0],row[1]]] #---->not just 2 row's, on other data I have for example 5 rows..
print repr(result)
Edit: Original code snippet modified to handle N-column input (last is class). This requirement was mentioned by the OP in a later reply.
I’m still not completely sure about the second half of your problem.
Something is missing in how you explained it perhaps.
If I understand this correctly you want a different list for each class?
If this is the case a dictionary should do what you want:
result will look like:
For the data skipping you can use the list slice skip option available in newer Pythons:
so on a list like:
using skip sliceing such as:
gives: