How to Write a Template?¶

As we stated, template (which could be specific textual tokens or abstract new tokens, the only difference is the initialization) is one of the most important module in a prompt-leanring framework. In this tutorial, we introduce how to write a template and set the corresponding attributes for a Template class.

Our template language takes the insight from the Dict grammar from Python in order to make it easy-to-learn. We use a meta key to denote the original text input, or the part of the input, or other key information. A mask key is used to denote the indice of the token that need to be predicted. A soft key denotes soft tokens and textual tokens could be directly written down.

Textual Template¶

A simple template for binary sentiment classification, the sentence denotes the original input and the mask is the target position,

{"meta": "sentence"}. It is {"mask"}.

Here is a basic template for news topic classification, where one example contains two parts – a title and a description,

A {"mask"} news : {"meta": "title"} {"meta": "description"}

In entity typing, an entity is a key information, and we want to copy it in the template,s

{"meta": "sentence"} {"text": "In this sentence,"} {"meta": "entity"} {"text": "is a"} {"mask"},

# you can also omit the `text` key
{"meta": "sentence"}. In this sentence, {"meta": "entity"} is a {"mask"},

Easy, huh? We can also specify that in topic classification, the title should not be truncated,

a {"mask"} news: {"meta": "title", "shortenable": False} {"meta": "description"}

Soft & Mix Template¶

Enough for the textual template, let’s try some soft tokens, if you use {'soft'}, the token will be randomly initialized. If you add some textual tokens at the value position, the soft token(s) will be initialized by these tokens. Note that, a textual template will optimized with the model. And soft tokens will be separately optimzed.

{"meta": "premise"} {"meta": "hypothesis"} {"soft": "Does the first sentence entails the second?"} {"mask"} {"soft"}.

We can also mix them up, too, note that if two soft tokens have same soft_ids, they will share embeddings,

{"meta": "premise"} {"meta": "hypothesis"} {"soft": "Does"} {"soft": "the", "soft_id": 1} first sentence entails {"soft_id": 1} second?

If you try to define 10000 soft tokens, please use the key duplicate,

{"soft": None, "duplicate": 10000} {"meta": "text"} {"mask"}

If you try to define 10000 identical soft tokens, use the key same,

{"soft": None, "duplicate": 10000, "same": True}

Post processing¶

We also support post-processing (e.g. write an lambda expression to strip the final punctuation in data),

{"meta": 'context', "post_processing": lambda s: s.rstrip(string.punctuation)}. {"soft": "It was"} {"mask"}

You can also apply an MLP to post process your tokens,

{"text": "This sentence is", "post_processing": "mlp"} {"soft": None, "post_processing": "mlp"}

Our flexible template language support token-level specifying in prompt-learning, you can easily develop complex desired template by OpenPrompt, try it out!