Bug/Quirk

#5
by deepanwa - opened

Hello Numind,

Thanks for this model.

I would like to report the bug/quirk.

Bug description: The model does not recognize a partial entity that was already recognized with high confidence earlier.
In the example below, First occurence of John Doe was recognized correctly, but the next 3 occurences of John were completely missed. This can be replicated with any name as long as the first occurence is of type and then the subsequent occurences are partial(either first or last name). in my debugging analysis I found out that the confidence is very very low for the next occurences (closer to 0, less than 0.5). Let me know if you have any insights on this.

In [1]: import gliner

In [2]: gliner.version
Out[2]: '0.2.3'

In [3]: from gliner import GLiNER

In [4]: model = GLiNER.from_pretrained("numind/NuNerZero")

In [5]: t = "John Doe is suffering from broken heart syndrome. John visited Japan. John also likes Sushi. John was born in 1924."

In [6]: labels = ("person",)

In [7]: model.predict_entities(t,labels)
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
Out[7]:
[{'start': 0,
'end': 4,
'text': 'John',
'label': 'person',
'score': 0.9987418055534363},
{'start': 5,
'end': 8,
'text': 'Doe',
'label': 'person',
'score': 0.9984360337257385}]

Sign up or log in to comment