Issue
This Content is from Stack Overflow. Question asked by Atom
I am trying to to train a spacy model with a small dataset in Spacy 2.2. It is overfitting, I want to customize the architecture of the TextCategorizer. I referred to this post on GitHub :
https://github.com/explosion/spaCy/issues/3320
However, I am unable
from spacy.pipeline import TextCategorizer
from thinc.api import layerize
from spacy.language import Language
class StupidTextCategorizer(TextCategorizer):
name = 'stupid_textcat'
@classmethod
def Model(cls, nr_class, **cfg):
return create_dummy_model(nr_class, cfg.get('preferred_class', 0))
def create_dummy_model(nr_class, preferred_class):
"""Create a Thinc model that always predicts the same class."""
def dummy_model(docs, drop=0.):
scores = model.ops.allocate((len(docs), nr_class))
scores[:, preferred_class] = 1.0
return scores
model = layerize(dummy_model)
return model
However, when I’m trying to pass it to my training script, it throws this error which I can’t seem to understand.
"[E002] Can't find factory for 'stupid_textcat'. This usually happens when spaCy calls `nlp.create_pipe` with a component name that's not built in - for example, when constructing the pipeline from a model's meta.json. If you're using a custom component, you can write to `Language.factories['stupid_textcat']` or remove it from the model meta and add it via `nlp.add_pipe` instead."
PS : Still learning Spacy but I can’t find any helping documentation or tutorial for the above.
Solution
This question is not yet answered, be the first one who answer using the comment. Later the confirmed answer will be published as the solution.
This Question and Answer are collected from stackoverflow and tested by JTuto community, is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.