My dataframe has N rows.
I have M centroids. Each centroid is the same shape as a dataframe-row.
I need to create a Nrows by Mcols matrix, where the m-th column is created by applying the m-th centroid to the dataframe.
My solution involves pre-creating the output matrix and filling it one column at a time as we manually iterate over centroids.
It feels clumsy. But I can’t see clearly how to do it ‘properly’.
def getDistanceMatrix(df, centroids): distanceMatrix = np.zeros((len(df), len(centroids))) distFunc = lambda centroid, row: sum(centroid != row) iCentroid = 0 for _, centroid in centroids.iterrows(): distanceMatrix[:, iCentroid] = df.apply( lambda row: distFunc(centroid, row), axis=1 ) iCentroid += 1 return distanceMatrix distanceMatrix = getDistanceMatrix(df, centroids)
It feels like some kind of outer-product-with-a-custom-function.
What’s a good way to write this?
This question is not yet answered, be the first one who answer using the comment. Later the confirmed answer will be published as the solution.