Issue
This Content is from Stack Overflow. Question asked by zhangzhexin
I want to write a regex for spark SQL to return the rows which contain 3 digitals or more than 3 digitals against some column.
for example:
with temp as (
select '12' col
union
select '12a' col
union
select '1234' col --need to return
union
select 'ab234' col --need to return
union
select '33345abc' col --need to return
)
select col from temp
where col regexp '.*\d{3,}'
when I run this script in spark SQL, I got no results.
so, is there any logic error for my expexp expression?
but I test it in Hive SQL, it works fine.
Solution
You may not need to double escape \d
:
SELECT col
FROM temp
WHERE col REGEXP '\d{3,}';
Or, you might have to use [0-9]
instead of \d
:
SELECT col
FROM temp
WHERE col REGEXP '[0-9]{3,}';
Note that prefacing with .*
is probably not needed as REGEXP
can handle partial matches of the input.
This Question was asked in StackOverflow by zhangzhexin and Answered by Tim Biegeleisen It is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.