match_dict.json format

Here is a minimal match_dict.json:

{
  "extract-revenge": {
    "patterns": [
      {
        "LEMMA": "extract",
        "TEMPLATE_ID": 1
      }
    ],
    "suggestions": [
      [
        {
          "TEXT": "exact",
          "FROM_TEMPLATE_ID": 1
        }
      ]
    ],
    "match_hook": [
      {
        "name": "succeeded_by_phrase",
        "args": "revenge",
        "match_if_predicate_is": true
      }
    ],
    "test": {
      "positive": [
        "And at the same time extract revenge on those he so despises?",
        "Watch as Tampa Bay extracts revenge against his former Los Angeles Rams team."
      ],
      "negative": ["Mother flavours her custards with lemon extract."]
    }
  }
}

The top-level key, extract-revenge must be unique (as must any dictionary key). The name is used as a unique identifier, but never shown.
The inner keys are as follows
- patterns - A list of spaCy Matcher patterns (actually, a superset of a spaCy matcher pattern), which may look like e.g. [{"LOWER": "hello"}, {"IS_PUNCT": True}, {"LOWER": "world"}]. The added syntax which makes it a superset is being able to add "TEMPLATE_ID": int to some of the dicts. This labels that part of the match as a template to be inflected, such as a verb to conjugate or a noun to pluralize. In the above example, we label the lemma extract as having TEMPLATE_ID of 1.
- suggestions - a list of lists of dicts. The dicts have 1-2 keys:
  - just "TEXT" (str), which will be used in the suggestion,
  - just "PATTERN_REF" (int), which will copy the PATTERN_REF's token from the matched text,
  - both "TEXT": "sometext" and "FROM_TEMPLATE_ID": int, which will apply the conjugation/pluralization of the TEMPLATE_ID with value int to "TEXT". In the above example, suggestions is [[{"TEXT":"exact","FROM_TEMPLATE_ID":1}]], which means we will match the conjugation of exact to the conjugation of extracts, from the step above,
  - both "PATTERN_REF" (int) and "INFLECTION" (str), an explicit POS tag. Used when you want to reference the PATTERN_REF's token from the pattern, but conjugate to a different form (so far I have only seen this used for grammar rules). Example: {"PATTERN_REF": 1, "INFLECTION": "VBN"} will take the second token from the matched pattern and conjugate it into the past particible.
- match_hook - (despite the singular name) A list of "match hooks". These are Python functions which refine matches. See the following section.
- test - has positive and negative keys. positive is a list of strings which this rule SHOULD match against, negative is a list of strings which SHOULD NOT match. Used for testing now, but we have plans to infer rules from this section.
- (optional) comment - a string for other humans to read; ignored by replaCy
- (optional) anything - you can add any extra structure here, and replaCy will attempt to tag matching spans with this information using the spaCy custom extension attributes namespace span._ (spaCy docs). For example, you can add the key oogly with value "boogly" for the match "LOWER": "secret password". Then if you call span = rmatcher("This is the secret password.")[0], then span._.oogly == "boogly". replaCy tries to be cool about default values with user-defined extensions. If you have a match with the key-value pair "coolnes": 10, replaCy will infer that coolness is an int. When it adds coolness to all spaCy spans, it will make it so span._.coolness defaults to 0. This way, you can check all spans for if span._.coolness > THRESHOLD and not cause an AttributeError. You can change this the way you would change any spaCy custom attribute, e.g.
```
  from spacy.tokens import Span

  Span.set_extension("coolness", default=9000)
```

Between match hooks and custom span attributes, replaCy is incredibly powerful, and allows you to control your NLP application's behavior from a single JSON file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

match_dict.json format

match_dict.json format

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally