This Content is from Stack Overflow. Question asked by chendu
I have a dataframe with column of array (or list) with each element being a map of String, complex data type (meaning –String, nested map, list etc; in a way you may assume column data type is similar to
now i want to query on this table like..
select * from the tableX where column.<any of the array element>['someArbitaryKey'] in ('a','b','c')
I am not sure how to represent
<any of the array element> in the spark SQL. Need help.
The idea is to transform the list of maps into a list of booleans, where each boolean indicates if the respective map contains the wanted key (
k2 in the code below). After that all we have to check if the boolean array contains at least one true element.
select * from tableX where array_contains(transform(col1, map->map_contains_key(map,'k2')), true)
I have assumed that the name of the column holding the list of maps is
The second parameter of the
transform function could be replaced by any expression that returns a boolean value. In this example map_contains_key is used, but any check resulting in a boolean value would work.
A bit unrelated: I believe that the data type of the map cannot be
Map[String,AnyRef] as there is no encoder for
This Question was asked in StackOverflow by chendu and Answered by werner It is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.