Categorical Variable Encoding in Python


This Content is from Stack Overflow. Question asked by spd

I have categorical variables like FT, PT, PIT, FY, TY, HS, XS etc [2/3 Lettered Strings]

I want to encode them in such a way that FT, PT would be more closer as they have T in common. This would enable me to feed them into ML model. As my model relies on the closeness of these variables, encoding them in such manner is necessary.

I have tried all types of encoding namely Label, Hashing, One Hot (Don’t want to increase dimensionality), Binary, etc .

  1. Is it possible?
  2. What other approach can I try?


This question is not yet answered, be the first one who answer using the comment. Later the confirmed answer will be published as the solution.

This Question and Answer are collected from stackoverflow and tested by JTuto community, is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.

people found this article helpful. What about you?