[SOLVED] Pyarrow Join (int8 and int16)

Issue

This Content is from Stack Overflow. Question asked by daniel guo

I have two Pyarrow Tables and want to join both.

A.join(
        right_table=B, keys="A_id", right_keys="B_id"
    )

Now I got the following error:

{ArrowInvalid} Incompatible data types for corresponding join field keys: FieldRef.Name(A_id) of type int8 and FieldRef.Name(B_id) of type int16

What is the preferred way to solve this issue?

I did not find a way to cast one column to either int8 or int16 in pyarrow Table.

Thanks



Solution

you need to change field type of one of your tables.

How to change ‘A_id’ field for your table A

# change type of 'A_id'
schema = A.schema
for num, field in enumerate(schema):
    if field.name == 'A_id':
        new_field = field.with_type(pa.int16()) # return a copy of field with new type
        schema = schema.remove(num) # remove old field 
        schema = schema.insert(num, new_field) # add new field 

A = A.cast(target_schema=schema) # update new schema to Table A
# join tables
A.join(
        right_table=B, keys="A_id", right_keys="B_id"
    )


This Question was asked in StackOverflow by daniel guo and Answered by Lucas M. Uriarte It is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.

people found this article helpful. What about you?