I know we can use “<” or “>” to compare the partitions in hive table, even the pt is the type of string, which represents the date. just like this:
WHERE page_views.date >= '2008-03-01' AND page_views.date <= '2008-03-31'
Hive can do it with the right way we want.
My question is how hive can do this, and why it can compare the date string through the right way?
Thanks!
In Hive the partitions are a way to achieve selective scan, so each partition will have one or more files that constitutes it. When you query using a partition it’s faster because Hive knows which files it needs to scan and which not.
Hive can make sense of order in your case, even it’s a string, because your field is in ‘yyyy-MM-dd’ format which lexicographical order matches date order, if you use another date format, for instance, ‘MM-dd-yyyy’, it will not work.