i have a bunch of data collection , example :
1.00 3 4
1.00 0 1
51.00 1 4
84.00 3 4
95.00 0 2
110.00 2 4
120.00 0 1
121.00 1 2
124.00 2 4
158.00 3 4
159.00 1 3
172.00 0 4
214.00 0 4
223.00 2 4
224.00 1 2
228.00 1 4
229.00 0 1
232.00 2 3
233.00 3 4
233.00 1 3
246.00 0 2
292.00 0 3
294.00 0 4
294.00 2 4
294.00 3 4
318.00 1 2
331.00 0 1
383.00 2 4
402.00 3 4
then the output that i want to generate is like this :
node_src node_dst time_repeated time1 time2 ... average_time ß
detail :
*node_src = 2nd column
*node_dst = 3rd column
*time_repeated = the number of the same line is repeated, example 3 4 is repeated 5 time
*time1, time2 .. = are the value of column 1
*average_time = the average time for the different interval,
example see below,
*ß = time_repeated / average_time
my attempt generated this result :
node1 node2 nbrepeated time1 time2 time3 time4 time5 time6 time7 average ß
2 4 6 110.0 124.0 223.0 294.0 383.0 461.0 543.0 6.0 0
2 3 1 232.0 402.0 0.0 0.0 0.0 0.0 0.0 1.0 0
1 3 2 159.0 233.0 521.0 0.0 0.0 0.0 0.0 2.0 4
1 2 4 121.0 224.0 318.0 461.0 573.0 0.0 0.0 4.0 5
0 4 4 172.0 214.0 294.0 415.0 543.0 0.0 0.0 4.0 5
0 2 5 95.0 246.0 415.0 536.0 572.0 588.0 0.0 5.0 :
0 3 3 292.0 403.0 455.0 588.0 0.0 0.0 0.0 3.0 :
1 4 2 51.0 228.0 494.0 0.0 0.0 0.0 0.0 2.0 :
0 1 4 1.0 120.0 229.0 331.0 536.0 0.0 0.0 4.0 :
3 4 6 1.0 84.0 158.0 233.0 294.0 402.0 431.0 6.0 :
i was unable to fine the average time and ß due to the complexity of the calculation
to find the average time is like this :
121.0 224.0 318.0 461.0 573.0
avg_time = ((224-121)+(318-224)+(461-318)+(573-461))/4
the challenge here, is to make it dynamically, since the number time field is unknown…
made using bash…
here is the code, thanks to glenn jackman
#!/bin/bash
declare -A t
while read tm f1 f2; do
t["$f1:$f2"]+=" $tm"
done < $1
max=0
for key in "${!t[@]}"; do
set -- ${t[$key]}
[[ $# -gt $max ]] && max=$#
done
{
printf "field1 field2 nbrepeated"
for i in $(seq $max); do printf " %s" time$i; done
echo " average_time beta"
for key in "${!t[@]}"; do
f1=${key%:*}
f2=${key#*:}
set -- ${t[$key]}
f3=$(($# - 1))
f4=$(($# - 1))
f5= 1
printf "%d %d %d" $f1 $f2 $f3
for i in $(seq $max); do
printf " %.1f" ${1-0}
shift
done
printf " %.1f %.1f" $f4 $f5
echo ""
done
} | column -t
modification need to do :
- find the average time : avg_time
- find the beta
p/s : normally to find the average time, people do : sum/NR, but it was not the case for my question
case solve : here is the output
field1 field2 nbrepeated time1 time2 time3 time4 time5 time6 time7 average_time beta
2 4 6 110.0 124.0 223.0 294.0 383.0 461.0 543.0 72.16 0.08
2 3 1 232.0 402.0 0.0 0.0 0.0 0.0 0.0 170.00 0.00
1 3 2 159.0 233.0 521.0 0.0 0.0 0.0 0.0 181.00 0.01
1 2 4 121.0 224.0 318.0 461.0 573.0 0.0 0.0 113.00 0.03
First, note that the average formula can be simplified. For example:
I have added the following section to calculate the average and beta:
The complete script becomes:
Output