# columns Awk conditional sum from a CSV file

## awk sum if (5)

For your last question, you can use `split` and display simultaneously the two columns :

``cat filename | awk '{split(\$2,tab,":"); id = tab; print id " -> " \$3;}'``

That prints :

``````1 -> 1.6
2 -> 1.1
2 -> 3.4
3 -> -1.3
3 -> 6.0
3 -> 1.1
4 -> -1.0
5 -> 10.9``````

For the complete result you can use :

``awk -F, '{ split(\$1,line,"    "); split(line,tab,":"); id=tab; if (sums[id]=="") {sums[id] = 0;} sums[id]+=line;} END {for (i=1;i<=length(sums);i++) print i " -> "sums[i]}' < test``

that prints :

``````1 -> 1.6
2 -> 4.5
3 -> 5.8
4 -> -1
5 -> 10.9``````

I have a CSV file from which I would like to extract some pieces of information: for each distinct value in one colum, I would like to compute the sum of the corresponding values in another column. Eventually, I may do it in Python, but I believe there could be a simple solution using `awk`.

This could be the CSV file:

``````2    1:2010-1-bla:bla    1.6
2    2:2010-1-bla:bla   1.1
2    2:2010-1-bla:bla    3.4
2    3:2010-1-bla:bla    -1.3
2    3:2010-1-bla:bla    6.0
2    3:2010-1-bla:bla    1.1
2    4:2010-1-bla:bla    -1.0
2    5:2010-1-bla:bla    10.9``````

I would like to get:

``````1    1.6
2    4.5
3    5.8
4    -1.0
5    10.9``````

For now, I can only extract:

a) the values of the first colum:

``awk -F ' ' '{print \$(2)}' MyFile.csv | awk -F ':' '{print \$(1)}'``

and then get:

``````1
2
2
3
3
3
4
5``````

b) and the values equal to, say, `1.1` in the last column with:

`awk -F ' ' '{print \$(NF)}' MyFile.csv | awk '\$1 == 1.1'`

and then get:

``````1.1
1.1``````

I am not able to simultaneously extract the columns I am interested in, which may help me in the end. Here is a sample output which may ease the computation of the sums (I don't know):

``````1    1.6
2    1.1
2    3.4
3    -1.3
3    6.0
3    1.1
4    -1.0
5    10.9``````

Edit: Thanks to Elenaher, we could say the input is the file above.

``````{
b=\$2;               # assign column 2 to the variable 'b'
sub( /:.*/, "", b); # get rid of everything after the first colon in b
results[b] += \$3
}
END {  for (result in results )print result " " results[result] }``````

If Perl is an option:

`perl -F'(\s+|:)' -lane '\$h{\$F} += \$F[-1]; END{print "\$_ \$h{\$_}" for sort keys %h}' file`

output:

``````1 1.6
2 4.5
3 5.8
4 -1
5 10.9``````

These command-line options are used:

• `-n` loop around every line of the input file
• `-l` removes newlines before processing, and adds them back in afterwards
• `-a` autosplit mode – split input lines into the `@F` array. Defaults to splitting on whitespace.
• `-e` execute the perl code
• `-F` autosplit modifier, in this case splits on a color or one-or-more whitespace

`@F` is the array of words in each line, indexed starting with `\$F`
`\$F[-1]` is the last word
Store result in hash `%h`
At the END, iterate through the sorted keys of the hash
Print each element `\$_` and the hash value `\$h{\$_}`

So, assuming that your input looks like this:

``````unique_col, to_sum
1.3, 1 2 3
1.3, 5 6 7
1.4, 2 3 4``````

Then this should do the trick:

``````\$ awk -F, '{ if (seen[\$1] == "") { split(\$2, to_sum, " "); seen[\$1] = 0; for (x in to_sum) seen[\$1] += to_sum[x]; }} END { for (x in seen) { if (x != "") { print x " " seen[x]; }}}' < input
1.3 6
1.4 9``````

``````\$ awk -F"[: \t]+" '{a[\$2]+=\$NF}END{for(i in a ) print i,a[i] }' file 