How to Solve “simulations array must contain numerical values” error when my csv files are already in proper format?, using Jupyter Notebook

Issue

This Content is from Stack Overflow. Question asked by Harith S

I am trying to evaluate dataset of temperature that i extracted from GCM against my observed data. I used the same exact script for precipitation files as well and it worked well. But now when I run that same script for Temp it gives me error. The format in which i prepared my input files are exactly same as precipitation files. So it should work… The error is as following:
“TypeError: simulations array must contain numerical values”
All my input files aka Simulated file , Station file and output file which I am supposed to get are in .CSV format.
I am sharing the script below. Please have a look and help me out. and my simulated and observed files are here:https://drive.google.com/drive/folders/1u5kgCSVbReDzv1bgh1l_YJjmv1iwatlh?usp=sharing

Thank you so much.

for file in os.listdir("input/sim"):
    if file.endswith(".csv"):
        simulated_data=pd.read_csv(os.path.join("input/sim", file))
    simulated_data['Date']=pd.to_datetime(simulated_data['Date'])
    simulated_data.index=simulated_data['Date']
    simulated_data.drop(['Date'],axis=1,inplace=True)
    simulated_data.index
       
    for s in lat_lon['Stations']:
        Ob_data=pd.DataFrame(Observed_data[str(s)])
        sim_data=pd.DataFrame(simulated_data[str(s)])
        for_cor=pd.concat([Ob_data,sim_data],axis=1,copy=True)
        nse = he.evaluator(he.nse,Ob_data, sim_data)
        nse1=pd.DataFrame(nse, columns=['NSE'], index=[str(s)])
        R2=my_rsquare(Ob_data, sim_data)
        R22=pd.DataFrame(R2, columns=['R2'], index=[str(s)])
        MSE=mean_squared_error(Ob_data, sim_data)
        MSE1=pd.DataFrame(MSE, columns=['MSE'], index=[str(s)])
        RMSE=math.sqrt(MSE)
        RMSE1=pd.DataFrame(RMSE, columns=['RMSE'], index=[str(s)])
        corr = for_cor.corr()
        corr1=pd.DataFrame(corr.iloc[0,1], columns=['Pearson_R'], index=[str(s)])
        mae=mean_absolute_error(Ob_data, sim_data)
        mae1=pd.DataFrame(mae, columns=['MAE'], index=[str(s)])
        kge, r, alpha, beta = he.evaluator(he.kge, Ob_data, sim_data)
        kge_results=pd.DataFrame([kge], columns=['kge'],index=[str(s)])
        globals()['kge_'+str(s)]=kge_results
        Perf=pd.concat([nse1,R22,MSE1,corr1,kge_results,mae1,RMSE1],axis=1,copy=True)
        globals()['perform_'+str(s)]=Perf
    
    for s in lat_lon['Stations']:   
        All_stations=pd.concat([globals()['perform_'+str(s)] for s in lat_lon['Stations']],axis=0,copy=True)
        globals()['result']=All_stations
    
        final=pd.concat([result,lat_lon],axis=1,copy=True,sort=True)
        final.drop('Stations',axis=1,inplace=True)
    
        new_path = os.path.join("output/", file)
        final.to_csv(new_path)

        result['Stations']=result.index
        result.index=result['Stations']
        result.drop('Stations',axis=1,inplace=True)    

and the Error log:

TypeError                                 Traceback (most recent call last)
Input In [49], in <cell line: 1>()
     11 sim_data=pd.DataFrame(simulated_data[str(s)])
     12 for_cor=pd.concat([Ob_data,sim_data],axis=1,copy=True)
---> 13 nse = he.evaluator(he.nse,Ob_data, sim_data)
     14 nse1=pd.DataFrame(nse, columns=['NSE'], index=[str(s)])
     15 R2=my_rsquare(Ob_data, sim_data)

File D:Program FilesPythonANACONDAlibsite-packageshydroevalhydroeval.py:158, in evaluator(obj_fn, simulations, evaluation, axis, transform, epsilon)
    156     raise TypeError('simulations must be an array')
    157 if not np.issubdtype(simulations.dtype, np.number):
--> 158     raise TypeError('simulations array must contain numerical values')
    159 evaluation = np.asarray(evaluation)
    160 if not evaluation.shape:

TypeError: simulations array must contain numerical values  

and this is how I have defined nse function:

#NSE Function

import statistics
import pandas as pd

def my_nse(arr1,arr2):
    
    numsum=densum=0
    
    my_new=pd.DataFrame()
    my_new['Observed_Discharge']=arr1
    my_new['Simulated_Discharge']=arr2
    
    mean_val_obs=statistics.mean(my_new['Observed_Discharge'])

    i=0
    while i<len(my_new['Simulated_Discharge'].values):
        
        num=(my_new['Observed_Discharge'][i])-(my_new['Simulated_Discharge'][i])
        num=num*num

        den=(my_new['Observed_Discharge'][i])-mean_val_obs
        den=den*den

        numsum=numsum+num
        densum=densum+den

        i=i+1

    cons=numsum/densum
    nse=1-cons
    
    return nse



Solution

Check the Answers

This Question and Answer are collected from stackoverflow and tested by JTuto community, is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.

people found this article helpful. What about you?