Verbalizer¶

Overview¶

The verbalizer is one of the most important module in prompt-learning, which projects the original labels to a set of label words.

We implement common verbalizer classes in OpenPrompt.

One to One Verbalizer¶

The basic one to one Verbalizer.

class One2oneVerbalizer(tokenizer: transformers.tokenization_utils.PreTrainedTokenizer, num_classes: Optional[int] = None, classes: Optional[List] = None, label_words: Optional[Union[Sequence[str], Mapping[str, str]]] = None, prefix: Optional[str] = ' ', multi_token_handler: Optional[str] = 'first', post_log_softmax: Optional[bool] = True)[source]¶

The basic manually defined verbalizer class, this class is inherited from the Verbalizer class. This class restrict the use of label words to one words per label. For a verbalzer with less constraints, please use Basic ManualVerbalizer.

Parameters

tokenizer (PreTrainedTokenizer) – The tokenizer of the current pre-trained model to point out the vocabulary.
classes (classes) – The classes (or labels) of the current task.
num_classes (int) – Optional. The number of classes of the verbalizer. Only one of classes and num_classes should be used.
label_words (Union[Sequence[str], Mapping[str, str]], optional) – The label words that are projected by the labels.
prefix (str, optional) – The prefix string of the verbalizer. (used in PLMs like RoBERTa, which is sensitive to prefix space)
multi_token_handler (str, optional) – The handling strategy for multiple tokens produced by the tokenizer.
post_log_softmax (bool, optional) – Whether to apply log softmax post processing on label_logits. Default to True.

on_label_words_set()[source]¶: A hook to do something when textual label words were set.

static add_prefix(label_words, prefix)[source]¶

Add prefix to label words. For example, if a label words is in the middle of a template, the prefix should be ' '.

Parameters

label_words (Union[Sequence[str], Mapping[str, str]], optional) – The label words that are projected by the labels.
prefix (str, optional) – The prefix string of the verbalizer.

Returns

New label words with prefix.

Return type

Sequence[str]

generate_parameters() → List[source]¶: In basic manual template, the parameters are generated from label words directly. In this implementation, the label_words should not be tokenized into more than one token.

project(logits: torch.Tensor, **kwargs) → torch.Tensor[source]¶

Project the labels, the return value is the normalized (sum to 1) probs of label words.

Parameters: logits (torch.Tensor) – The original logits of label words.
Returns: The normalized logits of label words
Return type: torch.Tensor

process_logits(logits: torch.Tensor, **kwargs)[source]¶

A whole framework to process the original logits over the vocabulary, which contains four steps:

Project the logits into logits of label words

if self.post_log_softmax is True:

Normalize over all label words

Calibrate (optional)

Parameters: logits (torch.Tensor) – The original logits.
Returns: The final processed logits over the label words set.
Return type: (torch.Tensor)

normalize(logits: torch.Tensor) → torch.Tensor[source]¶

Given logits regarding the entire vocabulary, return the probs over the label words set.

Parameters: logits (Tensor) – The logits over the entire vocabulary.
Returns: The logits over the label words set.
Return type: Tensor

calibrate(label_words_probs: torch.Tensor, **kwargs) → torch.Tensor[source]¶

Parameters: label_words_probs (torch.Tensor) – The probability distribution of the label words with the shape of [batch_size, num_classes, num_label_words_per_class]
Returns: The calibrated probability of label words.
Return type: torch.Tensor

Manual Verbalizer¶

The basic manually defined Verbalizer.

class ManualVerbalizer(tokenizer: transformers.tokenization_utils.PreTrainedTokenizer, classes: Optional[List] = None, num_classes: Optional[Sequence[str]] = None, label_words: Optional[Union[Sequence[str], Mapping[str, str]]] = None, prefix: Optional[str] = ' ', multi_token_handler: Optional[str] = 'first', post_log_softmax: Optional[bool] = True)[source]¶

The basic manually defined verbalizer class, this class is inherited from the Verbalizer class.

Parameters

tokenizer (PreTrainedTokenizer) – The tokenizer of the current pre-trained model to point out the vocabulary.
classes (List[Any]) – The classes (or labels) of the current task.
label_words (Union[List[str], List[List[str]], Dict[List[str]]], optional) – The label words that are projected by the labels.
prefix (str, optional) – The prefix string of the verbalizer (used in PLMs like RoBERTa, which is sensitive to prefix space)
multi_token_handler (str, optional) – The handling strategy for multiple tokens produced by the tokenizer.
post_log_softmax (bool, optional) – Whether to apply log softmax post processing on label_logits. Default to True.

on_label_words_set()[source]¶: A hook to do something when textual label words were set.

static add_prefix(label_words, prefix)[source]¶

Add prefix to label words. For example, if a label words is in the middle of a template, the prefix should be ' '.

Parameters

label_words (Union[Sequence[str], Mapping[str, str]], optional) – The label words that are projected by the labels.
prefix (str, optional) – The prefix string of the verbalizer.

Returns

New label words with prefix.

Return type

Sequence[str]

generate_parameters() → List[source]¶: In basic manual template, the parameters are generated from label words directly. In this implementation, the label_words should not be tokenized into more than one token.

project(logits: torch.Tensor, **kwargs) → torch.Tensor[source]¶

Project the labels, the return value is the normalized (sum to 1) probs of label words.

Parameters: logits (torch.Tensor) – The original logits of label words.
Returns: The normalized logits of label words
Return type: torch.Tensor

process_logits(logits: torch.Tensor, **kwargs)[source]¶

A whole framework to process the original logits over the vocabulary, which contains four steps:

Project the logits into logits of label words

if self.post_log_softmax is True:

Normalize over all label words

Calibrate (optional)

Aggregate (for multiple label words)

Parameters: logits (torch.Tensor) – The original logits.
Returns: The final processed logits over the labels (classes).
Return type: (torch.Tensor)

normalize(logits: torch.Tensor) → torch.Tensor[source]¶

Given logits regarding the entire vocabulary, return the probs over the label words set.

Parameters: logits (Tensor) – The logits over the entire vocabulary.
Returns: The logits over the label words set.
Return type: Tensor

aggregate(label_words_logits: torch.Tensor) → torch.Tensor[source]¶

Use weight to aggregate the logits of label words.

Parameters: label_words_logits (torch.Tensor) – The logits of the label words.
Returns: The aggregated logits from the label words.
Return type: torch.Tensor

calibrate(label_words_probs: torch.Tensor, **kwargs) → torch.Tensor[source]¶

Parameters: label_words_probs (torch.Tensor) – The probability distribution of the label words with the shape of [batch_size, num_classes, num_label_words_per_class]
Returns: The calibrated probability of label words.
Return type: torch.Tensor

Automatic Verbalizer¶

The Automatic Verbalizer defined in Automatically Identifying Words That Can Serve as Labels for Few-Shot Text Classification.

class AutomaticVerbalizer(tokenizer: Optional[transformers.tokenization_utils.PreTrainedTokenizer] = None, num_candidates: Optional[int] = 1000, label_word_num_per_class: Optional[int] = 1, num_searches: Optional[int] = 1, score_fct: Optional[str] = 'llr', balance: Optional[bool] = True, num_classes: Optional[bool] = None, classes: Optional[List[str]] = None, init_using_split: Optional[str] = 'train', **kwargs)[source]¶

This implementation is slightly different from the original code in that 1). we allow re-selecting the verbalizer after a fixed training steps. The original implementation only performs one step selection after getting the initial logits on the training data. To adopt their implementation, please only do optimize() after the first pass of training data.

2). We strictly follows the probility calculation in Equation (3) in the paper, which take softmax over the logits.

3). We do not implements the ``combine_patterns’’ if-branch. Since it’s not a pure verbalizer type, and doesn’t yield much improvement. However, it can be achieve by using EnsembleTrainer to pass text wrapped by multiple templates together with this verbalizer.

We use a probs_buffer to store the probability \(q_{P,t}(1|\mathbf{x})\) that to be used in later verbalizer selection, and a label_buffer to store the label \(y\) that to be used in later verbalizer selection.

Parameters

num_candidates (int, optional) – the number of candidates for further selection based on Section 4.1
label_word_num_per_class (int, optional) – set to be greater than 1 to support Multi-Verbalizers in Section 4.2
num_searches (int, optional) – Maximnum number of label_words search. After reaching this number, the verbalizer will use the same label_words as the previous iterations.
search_id (int, optional) – the id of current search, used to determine when to stop label words searching.
score_fct (str, optional) – the scoring function of label words selection. llr means log likelihood ratio, corresponding to Equation (7); ce means cross entropy, corresponding to Equation (6). As the paper points out, ``llr’’ is significantly better than ‘ce’, we only keep it to match the original code.
balance (book, optional) – whether to perform normalization of unbalanced training dataset, as Equation (5).

register_buffer(logits, labels)[source]¶

Parameters

logits (torch.Tensor) –
labels (List) –

project(logits: torch.Tensor, **kwargs) → torch.Tensor[source]¶

When this verbalizer hasn’t perform optimize(), it has no label_words_ids, thus will give random predictions, and should have no connection to the model to give (miss-leading) grads.

Parameters: logits (torch.Tensor) – The original logits over the vocabulary.
Returns: The projected logits of label words.
Return type: torch.Tensor

optimize_to_initialize()[source]¶: This is an epoch-level optimize. If used in batch-level like an ordinary gradient descend optimizer, the result may not be very satisfying since the accumated examples (i.e., the probs_buffer and the labels_buffer) are not enough if the batchsize is small.

from_file(path: str, choice: Optional[int] = 0)[source]¶

Load the predefined label words from verbalizer file. Currently support three types of file format: 1. a .jsonl or .json file, in which is a single verbalizer in dict format. 2. a .jsonal or .json file, in which is a list of verbalizers in dict format 3. a .txt or a .csv file, in which is the label words of a class are listed in line, separated by commas. Begin a new verbalizer by an empty line. This format is recommended when you don’t know the name of each class.

The details of verbalizer format can be seen in How to Write a Verbalizer?.

Parameters

path (str) – The path of the local template file.
choice (int) – The choice of verbalizer in a file containing multiple verbalizers.

Returns

self object

Return type

Template

Knowledgeable Verbalizer¶

The Knowledgeable Verbalizer defined in Knowledgeable Prompt-tuning: Incorporating Knowledge into Prompt Verbalizer for Text Classification.

class KnowledgeableVerbalizer(tokenizer: Optional[transformers.tokenization_utils.PreTrainedTokenizer] = None, classes: Optional[Sequence[str]] = None, prefix: Optional[str] = ' ', multi_token_handler: Optional[str] = 'first', max_token_split: Optional[int] = - 1, verbalizer_lr: Optional[float] = 0.05, candidate_frac: Optional[float] = 0.5, pred_temp: Optional[float] = 1.0, **kwargs)[source]¶

This is the implementation of knowledeagble verbalizer, which uses external knowledge to expand the set of label words. This class inherit the ManualVerbalizer class.

Parameters

tokenizer (PreTrainedTokenizer) – The tokenizer of the current pre-trained model to point out the vocabulary.
classes (classes) – The classes (or labels) of the current task.
prefix (str, optional) – The prefix string of the verbalizer.
multi_token_handler (str, optional) – The handling strategy for multiple tokens produced by the tokenizer.
max_token_split (int, optional) –
verbalizer_lr (float, optional) – The learning rate of the verbalizer optimization.
candidate_frac (float, optional) –

on_label_words_set()[source]¶: A hook to do something when textual label words were set.

static add_prefix(label_words, prefix)[source]¶: add prefix to label words. For example, if a label words is in the middle of a template, the prefix should be ‘ ‘.

generate_parameters() → List[source]¶: In basic manual template, the parameters are generated from label words directly. In this implementation, the label_words should not be tokenized into more one token.

register_calibrate_logits(logits: torch.Tensor)[source]¶: For Knowledgeable Verbalizer, it’s nessessory to filter the words with has low prior probability. Therefore we re-compute the label words after register calibration logits.

project(logits: torch.Tensor, **kwargs) → torch.Tensor[source]¶: The return value if the normalized (sum to 1) probs of label words.

aggregate(label_words_logits: torch.Tensor) → torch.Tensor[source]¶

Use weight to aggregate the logots of label words.

Parameters: label_words_logits (torch.Tensor) – The logits of the label words.
Returns: The aggregated logits from the label words.
Return type: torch.Tensor

PTR Verbalizer¶

The verbalizer of PTR from PTR: Prompt Tuning with Rules for Text Classification.

class PTRVerbalizer(tokenizer: transformers.tokenization_utils.PreTrainedTokenizer, classes: Optional[Sequence[str]] = None, num_classes: Optional[int] = None, label_words: Optional[Union[Sequence[Sequence[str]], Mapping[str, Sequence[str]]]] = None)[source]¶

In PTR, each prompt has more than one <mask> tokens. Different <mask> tokens have different label words. The final label is predicted jointly by these label words using logic rules.

Parameters

tokenizer (PreTrainedTokenizer) – A tokenizer to appoint the vocabulary and the tokenization strategy.
classes (Sequence[str]) – A sequence of classes that need to be projected.
label_words (Union[Sequence[Sequence[str]], Mapping[str, Sequence[str]]], optional) – The label words that are projected by the labels.

on_label_words_set()[source]¶: Prepare One2oneVerbalizer for each <mask> separately

process_logits(logits: torch.Tensor, batch: Union[Dict, openprompt.data_utils.utils.InputFeatures], **kwargs)[source]¶

Process vocab logits of each <mask> into label logits of each <mask>
Combine these logits into a single label logits of the whole task

Parameters: logits (torch.Tensor) – vocab logits of each <mask> (shape: [batch_size, num_masks, vocab_size])
Returns: logits (label logits of whole task (shape: [batch_size, label_size of the whole task]))
Return type: torch.Tensor

Generation Verbalizer¶

This verbalizer empower the “generation for all the tasks” paradigm.

class GenerationVerbalizer(tokenizer: transformers.tokenization_utils.PreTrainedTokenizer, classes: Optional[List[str]] = None, num_classes: Optional[int] = None, is_rule: Optional[bool] = False, label_words: Optional[dict] = None)[source]¶

This verbalizer is useful when the label prediction is better defined by a piece of input. For example, in correference resolution, the tgt_text is a proper noun mentioned in the text. There is no fixed mapping between a class label and its label words. This verbalizer can be used as verbalizer of COPA and WSC datasets in SuperGlue.

This verbalizer is especially powerful when combined with All NLP Tasks Are Generation Tasks Paradigm (Also see Crossfit). It can make any piece of text the tgt_text. The tgt_text will then be filled in the {“mask”}.

For example, when label word is "good", the tgt_text is "good";

when label word is {"text":"good"}, the tgt_text is also "good";

when label word is {"meta":"choice1"}, the tgt_text is the "meta['choice1']" field of the InputExample;

when label word is {"meta":"choice1"} {"placeholder", "text_a"} ., the tgt_text is the "meta['choice1']" field of the InputExample, followed by text_a field of the InputExample, and then a '.';

A use case can be seen in Tutorial 4.1

Parameters

tokenizer (PreTrainedTokenizer) – The tokenizer of the current pre-trained model to point out the vocabulary.
classes (List[Any]) – The classes (or labels) of the current task.
prefix (str, optional) – The prefix string of the verbalizer (used in PLMs like RoBERTa, which is sensitive to prefix space)
is_rule (bool, optional) – When the verbalizer use the rule syntax of MixTemplate.
label_words (dict, optional) – The label words of the generation verbalizer

Example: To use this verbalizer to train the T5 model to predict answer and explanation using two masks.

When the template (Defined by MixedTemplate) is: >>> input_example = InputExample(text_a = “Can fish run?”, meta={“answer”:”no”, “explanation”: “The fish have no legs”}, label=0) >>> template = “{‘placeholder’:’text_a’} answer: {‘mask’} explanation: {‘mask’}”

The verbalizer can be: >>> label_words = {0:[“no”, “{‘meta’:’explanation’}”], 1:[“yes”, “{‘meta’:’explanation’}”]} >>> verbalizer = GenerationVerbalizer(tokenizer, classes=None, is_rule=True, label_words=label_words)

wrap_one_example(example: openprompt.data_utils.utils.InputExample) → List[Dict][source]¶: Take an InputExample, and fill the tgt_text with label words

on_label_words_set()[source]¶: Process the text into the label words (sometimes a function) according to the syntax of MixedTemplate

Soft Verbalizer¶

class SoftVerbalizer(tokenizer: Optional[transformers.tokenization_utils.PreTrainedTokenizer], model: Optional[transformers.utils.dummy_pt_objects.PreTrainedModel], classes: Optional[List] = None, num_classes: Optional[Sequence[str]] = None, label_words: Optional[Union[Sequence[str], Mapping[str, str]]] = None, prefix: Optional[str] = ' ', multi_token_handler: Optional[str] = 'first')[source]¶

The implementation of the verbalizer in WARP

Parameters

tokenizer (PreTrainedTokenizer) – The tokenizer of the current pre-trained model to point out the vocabulary.
classes (List[Any]) – The classes (or labels) of the current task.
label_words (Union[List[str], List[List[str]], Dict[List[str]]], optional) – The label words that are projected by the labels.
prefix (str, optional) – The prefix string of the verbalizer (used in PLMs like RoBERTa, which is sensitive to prefix space)
multi_token_handler (str, optional) – The handling strategy for multiple tokens produced by the tokenizer.
post_log_softmax (bool, optional) – Whether to apply log softmax post processing on label_logits. Default to True.

property group_parameters_1¶: Include the parameters of head’s layer but not the last layer In soft verbalizer, note that some heads may contain modules other than the final projection layer. The parameters of these part should be optimized (or freezed) together with the plm.

property group_parameters_2¶: Include the last layer’s parameters

on_label_words_set()[source]¶: A hook to do something when textual label words were set.

static add_prefix(label_words, prefix)[source]¶

Add prefix to label words. For example, if a label words is in the middle of a template, the prefix should be ' '.

Parameters

label_words (Union[Sequence[str], Mapping[str, str]], optional) – The label words that are projected by the labels.
prefix (str, optional) – The prefix string of the verbalizer.

Returns

New label words with prefix.

Return type

Sequence[str]

generate_parameters() → List[source]¶: In basic manual template, the parameters are generated from label words directly. In this implementation, the label_words should not be tokenized into more than one token.

process_hiddens(hiddens: torch.Tensor, **kwargs)[source]¶: A whole framework to process the original logits over the vocabulary, which contains four steps:

process_outputs(outputs: torch.Tensor, batch: Union[Dict, openprompt.data_utils.utils.InputFeatures], **kwargs)[source]¶

By default, the verbalizer will process the logits of the PLM’s output.

Parameters

logits (torch.Tensor) – The current logits generated by pre-trained language models.
batch (Union[Dict, InputFeatures]) – The input features of the data.

gather_outputs(outputs: transformers.file_utils.ModelOutput)[source]¶

retrieve useful output for the verbalizer from the whole model output By default, it will only retrieve the logits

Parameters: outputs (ModelOutput) –
Returns: torch.Tensor The gathered output, should be of shape (batch_size, seq_len, any)