[SOLVED] Regex expression not working in SPARK SQL

Issue

This Content is from Stack Overflow. Question asked by zhangzhexin

I want to write a regex for spark SQL to return the rows which contain 3 digitals or more than 3 digitals against some column.

for example:

with temp as (

    select '12' col
    union
    select '12a' col
    union
    select '1234' col  --need to return
    union
    select 'ab234' col --need to return
    union
    select '33345abc' col --need to return
)
select col from temp
where col regexp '.*\d{3,}'

when I run this script in spark SQL, I got no results.

so, is there any logic error for my expexp expression?

but I test it in Hive SQL, it works fine.



Solution

You may not need to double escape \d:

SELECT col
FROM temp
WHERE col REGEXP '\d{3,}';

Or, you might have to use [0-9] instead of \d:

SELECT col
FROM temp
WHERE col REGEXP '[0-9]{3,}';

Note that prefacing with .* is probably not needed as REGEXP can handle partial matches of the input.


This Question was asked in StackOverflow by zhangzhexin and Answered by Tim Biegeleisen It is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.

people found this article helpful. What about you?