I have categorical variables like
FT, PT, PIT, FY, TY, HS, XS etc [2/3 Lettered Strings]
I want to encode them in such a way that FT, PT would be more closer as they have T in common. This would enable me to feed them into ML model. As my model relies on the closeness of these variables, encoding them in such manner is necessary.
I have tried all types of encoding namely Label, Hashing, One Hot (Don’t want to increase dimensionality), Binary, etc .
- Is it possible?
- What other approach can I try?
This question is not yet answered, be the first one who answer using the comment. Later the confirmed answer will be published as the solution.