[SOLVED] SparkSQL: How to query a column with datatype: List of Maps


This Content is from Stack Overflow. Question asked by chendu

I have a dataframe with column of array (or list) with each element being a map of String, complex data type (meaning –String, nested map, list etc; in a way you may assume column data type is similar to List[Map[String,AnyRef]])

now i want to query on this table like..

select * from the tableX where column.<any of the array element>['someArbitaryKey'] in ('a','b','c')

I am not sure how to represent <any of the array element> in the spark SQL. Need help.


The idea is to transform the list of maps into a list of booleans, where each boolean indicates if the respective map contains the wanted key (k2 in the code below). After that all we have to check if the boolean array contains at least one true element.

select * from tableX where array_contains(transform(col1, map->map_contains_key(map,'k2')), true)

I have assumed that the name of the column holding the list of maps is col1.

The second parameter of the transform function could be replaced by any expression that returns a boolean value. In this example map_contains_key is used, but any check resulting in a boolean value would work.

A bit unrelated: I believe that the data type of the map cannot be Map[String,AnyRef] as there is no encoder for AnyRef available.

This Question was asked in StackOverflow by chendu and Answered by werner It is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.

people found this article helpful. What about you?