[SOLVED] Keeping the schema types from Pandas dataframe to Snowpark dataframe

Issue

This Content is from Stack Overflow. Question asked by elongl

Snowpark has a problem / bug that it does not maintain the types between Pandas and Snowpark, nor does it allow to manually set its schema.

For instance,

df1 = session.sql(sql).to_pandas()
df2 = session.create_dataframe(df)

The timestamp field on df1 with TimestampType has become a LongType.

I’ve also tried to store the schema and use it, but same results.

df1 = session.sql(sql)
df1_schema = df1.schema
df1 = df1.to_pandas()
df2 = session.create_dataframe(df, df1_schema)

Has anyone managed to deal with it?
This stops me from being able to write the DataFrame back to the table as it needs to be of TimestampType rather than LongType.



Solution

Tried to recreate this in snowpark, it seems that the TimestampType is internally getting changed to LongType when pandas df is converted to snowpark df using create_dataframe() method.

Also, specifying schema parameter in create_dataframe() method w.r.t this scenario is not making any difference.
So, one way is to explicitly change the column to timestamp using to_timestamp() method.

from snowflake.snowpark.functions import sql_expr
df1 = session.sql("select * from timestamp_test")
df1 = df1.to_pandas()
df2 = session.create_dataframe(df1)
colCast = df2.withColumn("T", sql_expr("to_timestamp(T::string)"))
colCast.show()


This Question was asked in StackOverflow by elongl and Answered by PooMac It is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.

people found this article helpful. What about you?