Customizing Spacy’s Text Categorizer

Issue

This Content is from Stack Overflow. Question asked by Atom

I am trying to to train a spacy model with a small dataset in Spacy 2.2. It is overfitting, I want to customize the architecture of the TextCategorizer. I referred to this post on GitHub :

https://github.com/explosion/spaCy/issues/3320

However, I am unable

from spacy.pipeline import TextCategorizer
from thinc.api import layerize
from spacy.language import Language

class StupidTextCategorizer(TextCategorizer):
    name = 'stupid_textcat'
    @classmethod

    def Model(cls, nr_class, **cfg):
        return create_dummy_model(nr_class, cfg.get('preferred_class', 0))

def create_dummy_model(nr_class, preferred_class):
    """Create a Thinc model that always predicts the same class."""
    def dummy_model(docs, drop=0.):
        scores = model.ops.allocate((len(docs), nr_class))
        scores[:, preferred_class] = 1.0
        return scores
        
    model = layerize(dummy_model)
    return model

However, when I’m trying to pass it to my training script, it throws this error which I can’t seem to understand.

"[E002] Can't find factory for 'stupid_textcat'. This usually happens when spaCy calls `nlp.create_pipe` with a component name that's not built in - for example, when constructing the pipeline from a model's meta.json. If you're using a custom component, you can write to `Language.factories['stupid_textcat']` or remove it from the model meta and add it via `nlp.add_pipe` instead."

PS : Still learning Spacy but I can’t find any helping documentation or tutorial for the above.



Solution

This question is not yet answered, be the first one who answer using the comment. Later the confirmed answer will be published as the solution.

This Question and Answer are collected from stackoverflow and tested by JTuto community, is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.

people found this article helpful. What about you?