Here’s a query string in a bash script that I wrote.
“SELECT day,xxx,yyy,zzz,if(count>$threshold,keyword,’_other’) as keyword, sum(count) as searches
FROM
(SELECT
LEFT(FORMAT_UTC_USEC(UTC_USEC_TO_DAY(timestamp*1000000)),10) as day,
xxx, yyy, zzz,
REGEXP_EXTRACT(actiondata,’wvq=([^&])’) as keyword,
COUNT() as count
FROM [table.$dir_prefix]
WHERE product=’myproduct’
AND LEFT(FORMAT_UTC_USEC(timestamp*1000000),10) = ‘$1′
AND REGEXP_MATCH(actiondata,’wvq=’)
GROUP BY day,xxx, yyy,zzz,keyword
)
GROUP BY day,xxx,yyy,zzz, keyword
ORDER BY searches DESC;”
When I echo this string, the output is:
“SELECT day,xxx,yyy,zzz,if(count>50,keyword,’_other’) as keyword, sum(count) as searches
FROM
(SELECT
LEFT(FORMAT_UTC_USEC(UTC_USEC_TO_DAY(timestamp*1000000)),10) as day,
xxx, yyy, zzz,
REGEXP_EXTRACT(actiondata,’wvq=([^&])’) as keyword,
COUNT() as count
FROM 1
WHERE product=’myproduct’
AND LEFT(FORMAT_UTC_USEC(timestamp*1000000),10) = ‘2012-11-28′
AND REGEXP_MATCH(actiondata,’wvq=’)
GROUP BY day,xxx, yyy,zzz,keyword
)
GROUP BY day,xxx,yyy,zzz, keyword
ORDER BY searches DESC;”
Isolating the string and echoing “[table.$dir_prefix]” outputs the expected string, [table.20121128]. Can anyone explain why this is being evaluated in the larger string as ‘1’?
Escaping the square brackets (\[table.$dir_prefix\]) does not solve the problem.
More details:
$dir_prefix and $threshold are set to 20121128 and 50, respectively.
The string is being set like so:
to_echo=\
"SELECT day,xxx,yyy,zzz,if(count>$threshold,keyword,'_other') as keyword, sum(count) as searches FROM (SELECT LEFT(FORMAT_UTC_USEC(UTC_USEC_TO_DAY(timestamp*1000000)),10) as day, xxx, yyy, zzz, REGEXP_EXTRACT(actiondata,'wvq=([^&])') as keyword, COUNT() as count FROM [table.$dir_prefix] WHERE product='myproduct' AND LEFT(FORMAT_UTC_USEC(timestamp*1000000),10) = '$1' AND REGEXP_MATCH(actiondata,'wvq=') GROUP BY day,xxx, yyy,zzz,keyword ) GROUP BY day,xxx,yyy,zzz, keyword
ORDER BY searches DESC;"
Update 2
The script only has the issue on this specific server (running Ubuntu). My other server (running Redhat) does not have the problem and outputs what’s expected. It must be something configuration related. I can work around it but I really just want to know what’s behind this.
Here’s the exact script:
#!/bin/bash
dir_prefix=`date --date "$1" +%Y%m%d`;
threshold=$2;
query="SELECT day,xxx,yyy,zzz,if(count>$threshold,keyword,'_other') as keyword, sum(count) as searches FROM (SELECT LEFT(FORMAT_UTC_USEC(UTC_USEC_TO_DAY(timestamp*1000000)),10) as day, xxx, yyy, zzz, REGEXP_EXTRACT(actiondata,'wvq=([^&])') as keyword, COUNT() as count FROM [table.$dir_prefix] WHERE product='myproduct' AND LEFT(FORMAT_UTC_USEC(timestamp*1000000),10) = '$1' AND REGEXP_MATCH(actiondata,'wvq=') GROUP BY day,xxx, yyy,zzz,keyword ) GROUP BY day,xxx,yyy,zzz, keyword
ORDER BY searches DESC;"
echo $query;
Solution: More Info
Here’s a good write-up that goes into detail on preventing globbing when you don’t want it. I went with the set -f option.
http://blog.edwards-research.com/2011/05/preventing-globbing/
You have a file named
1in the current directory, and it’s being globbed.