I am running Hive 071 I have a table, with mulitple rows, with the same column value e.g.
| x | y |
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 2 | 2 |
| 3 | 2 |
| 3 | 1 |
I want to have the x column unique, and remove rows that have the same x val e.g.
| x | y |
| 1 | 2 |
| 2 | 2 |
| 3 | 2 |
or
| x | y |
| 1 | 4 |
| 2 | 2 |
| 3 | 1 |
are both good as distinct works only on the whole rs in hive, I couldn’t find a way to do it
help please Tx
Some options:
1) This will give you the max value of y for each value of x
Equally you could use avg() or min()
2) OR, you could collect all the values of y in a list:
This will give you: