Verbalizer¶
Overview¶
The verbalizer is one of the most important module in prompt-learning, which projects the original labels to a set of label words.
We implement common verbalizer classes in OpenPrompt.
One to One Verbalizer¶
The basic one to one Verbalizer.
- class One2oneVerbalizer(tokenizer: transformers.tokenization_utils.PreTrainedTokenizer, num_classes: Optional[int] = None, classes: Optional[List] = None, label_words: Optional[Union[Sequence[str], Mapping[str, str]]] = None, prefix: Optional[str] = ' ', multi_token_handler: Optional[str] = 'first', post_log_softmax: Optional[bool] = True)[source]¶
The basic manually defined verbalizer class, this class is inherited from the
Verbalizer
class. This class restrict the use of label words to one words per label. For a verbalzer with less constraints, please use Basic ManualVerbalizer.- Parameters
tokenizer (
PreTrainedTokenizer
) – The tokenizer of the current pre-trained model to point out the vocabulary.classes (
classes
) – The classes (or labels) of the current task.num_classes (
int
) – Optional. The number of classes of the verbalizer. Only one of classes and num_classes should be used.label_words (
Union[Sequence[str], Mapping[str, str]]
, optional) – The label words that are projected by the labels.prefix (
str
, optional) – The prefix string of the verbalizer. (used in PLMs like RoBERTa, which is sensitive to prefix space)multi_token_handler (
str
, optional) – The handling strategy for multiple tokens produced by the tokenizer.post_log_softmax (
bool
, optional) – Whether to apply log softmax post processing on label_logits. Default to True.
- static add_prefix(label_words, prefix)[source]¶
Add prefix to label words. For example, if a label words is in the middle of a template, the prefix should be
' '
.- Parameters
label_words (
Union[Sequence[str], Mapping[str, str]]
, optional) – The label words that are projected by the labels.prefix (
str
, optional) – The prefix string of the verbalizer.
- Returns
New label words with prefix.
- Return type
Sequence[str]
- generate_parameters() List [source]¶
In basic manual template, the parameters are generated from label words directly. In this implementation, the label_words should not be tokenized into more than one token.
- project(logits: torch.Tensor, **kwargs) torch.Tensor [source]¶
Project the labels, the return value is the normalized (sum to 1) probs of label words.
- Parameters
logits (
torch.Tensor
) – The original logits of label words.- Returns
The normalized logits of label words
- Return type
torch.Tensor
- process_logits(logits: torch.Tensor, **kwargs)[source]¶
A whole framework to process the original logits over the vocabulary, which contains four steps:
Project the logits into logits of label words
if self.post_log_softmax is True:
Normalize over all label words
Calibrate (optional)
- Parameters
logits (
torch.Tensor
) – The original logits.- Returns
The final processed logits over the label words set.
- Return type
(
torch.Tensor
)
- normalize(logits: torch.Tensor) torch.Tensor [source]¶
Given logits regarding the entire vocabulary, return the probs over the label words set.
- Parameters
logits (
Tensor
) – The logits over the entire vocabulary.- Returns
The logits over the label words set.
- Return type
Tensor
- calibrate(label_words_probs: torch.Tensor, **kwargs) torch.Tensor [source]¶
- Parameters
label_words_probs (
torch.Tensor
) – The probability distribution of the label words with the shape of [batch_size
,num_classes
,num_label_words_per_class
]- Returns
The calibrated probability of label words.
- Return type
torch.Tensor
Manual Verbalizer¶
The basic manually defined Verbalizer.
- class ManualVerbalizer(tokenizer: transformers.tokenization_utils.PreTrainedTokenizer, classes: Optional[List] = None, num_classes: Optional[Sequence[str]] = None, label_words: Optional[Union[Sequence[str], Mapping[str, str]]] = None, prefix: Optional[str] = ' ', multi_token_handler: Optional[str] = 'first', post_log_softmax: Optional[bool] = True)[source]¶
The basic manually defined verbalizer class, this class is inherited from the
Verbalizer
class.- Parameters
tokenizer (
PreTrainedTokenizer
) – The tokenizer of the current pre-trained model to point out the vocabulary.classes (
List[Any]
) – The classes (or labels) of the current task.label_words (
Union[List[str], List[List[str]], Dict[List[str]]]
, optional) – The label words that are projected by the labels.prefix (
str
, optional) – The prefix string of the verbalizer (used in PLMs like RoBERTa, which is sensitive to prefix space)multi_token_handler (
str
, optional) – The handling strategy for multiple tokens produced by the tokenizer.post_log_softmax (
bool
, optional) – Whether to apply log softmax post processing on label_logits. Default to True.
- static add_prefix(label_words, prefix)[source]¶
Add prefix to label words. For example, if a label words is in the middle of a template, the prefix should be
' '
.- Parameters
label_words (
Union[Sequence[str], Mapping[str, str]]
, optional) – The label words that are projected by the labels.prefix (
str
, optional) – The prefix string of the verbalizer.
- Returns
New label words with prefix.
- Return type
Sequence[str]
- generate_parameters() List [source]¶
In basic manual template, the parameters are generated from label words directly. In this implementation, the label_words should not be tokenized into more than one token.
- project(logits: torch.Tensor, **kwargs) torch.Tensor [source]¶
Project the labels, the return value is the normalized (sum to 1) probs of label words.
- Parameters
logits (
torch.Tensor
) – The original logits of label words.- Returns
The normalized logits of label words
- Return type
torch.Tensor
- process_logits(logits: torch.Tensor, **kwargs)[source]¶
A whole framework to process the original logits over the vocabulary, which contains four steps:
Project the logits into logits of label words
if self.post_log_softmax is True:
Normalize over all label words
Calibrate (optional)
Aggregate (for multiple label words)
- Parameters
logits (
torch.Tensor
) – The original logits.- Returns
The final processed logits over the labels (classes).
- Return type
(
torch.Tensor
)
- normalize(logits: torch.Tensor) torch.Tensor [source]¶
Given logits regarding the entire vocabulary, return the probs over the label words set.
- Parameters
logits (
Tensor
) – The logits over the entire vocabulary.- Returns
The logits over the label words set.
- Return type
Tensor
- aggregate(label_words_logits: torch.Tensor) torch.Tensor [source]¶
Use weight to aggregate the logits of label words.
- Parameters
label_words_logits (
torch.Tensor
) – The logits of the label words.- Returns
The aggregated logits from the label words.
- Return type
torch.Tensor
- calibrate(label_words_probs: torch.Tensor, **kwargs) torch.Tensor [source]¶
- Parameters
label_words_probs (
torch.Tensor
) – The probability distribution of the label words with the shape of [batch_size
,num_classes
,num_label_words_per_class
]- Returns
The calibrated probability of label words.
- Return type
torch.Tensor
Automatic Verbalizer¶
The Automatic Verbalizer defined in Automatically Identifying Words That Can Serve as Labels for Few-Shot Text Classification.
- class AutomaticVerbalizer(tokenizer: Optional[transformers.tokenization_utils.PreTrainedTokenizer] = None, num_candidates: Optional[int] = 1000, label_word_num_per_class: Optional[int] = 1, num_searches: Optional[int] = 1, score_fct: Optional[str] = 'llr', balance: Optional[bool] = True, num_classes: Optional[bool] = None, classes: Optional[List[str]] = None, init_using_split: Optional[str] = 'train', **kwargs)[source]¶
This implementation is slightly different from the original code in that 1). we allow re-selecting the verbalizer after a fixed training steps. The original implementation only performs one step selection after getting the initial logits on the training data. To adopt their implementation, please only do
optimize()
after the first pass of training data.2). We strictly follows the probility calculation in Equation (3) in the paper, which take softmax over the logits.
3). We do not implements the ``combine_patterns’’ if-branch. Since it’s not a pure verbalizer type, and doesn’t yield much improvement. However, it can be achieve by using EnsembleTrainer to pass text wrapped by multiple templates together with this verbalizer.
We use a probs_buffer to store the probability \(q_{P,t}(1|\mathbf{x})\) that to be used in later verbalizer selection, and a label_buffer to store the label \(y\) that to be used in later verbalizer selection.
- Parameters
num_candidates (
int
, optional) – the number of candidates for further selection based on Section 4.1label_word_num_per_class (
int
, optional) – set to be greater than 1 to support Multi-Verbalizers in Section 4.2num_searches (
int
, optional) – Maximnum number of label_words search. After reaching this number, the verbalizer will use the same label_words as the previous iterations.search_id (
int
, optional) – the id of current search, used to determine when to stop label words searching.score_fct (
str
, optional) – the scoring function of label words selection.llr
means log likelihood ratio, corresponding to Equation (7);ce
means cross entropy, corresponding to Equation (6). As the paper points out, ``llr’’ is significantly better than ‘ce’, we only keep it to match the original code.balance (
book
, optional) – whether to perform normalization of unbalanced training dataset, as Equation (5).
- project(logits: torch.Tensor, **kwargs) torch.Tensor [source]¶
When this verbalizer hasn’t perform optimize(), it has no
label_words_ids
, thus will give random predictions, and should have no connection to the model to give (miss-leading) grads.- Parameters
logits (
torch.Tensor
) – The original logits over the vocabulary.- Returns
The projected logits of label words.
- Return type
torch.Tensor
- optimize_to_initialize()[source]¶
This is an epoch-level optimize. If used in batch-level like an ordinary gradient descend optimizer, the result may not be very satisfying since the accumated examples (i.e., the probs_buffer and the labels_buffer) are not enough if the batchsize is small.
- from_file(path: str, choice: Optional[int] = 0)[source]¶
Load the predefined label words from verbalizer file. Currently support three types of file format: 1. a .jsonl or .json file, in which is a single verbalizer in dict format. 2. a .jsonal or .json file, in which is a list of verbalizers in dict format 3. a .txt or a .csv file, in which is the label words of a class are listed in line, separated by commas. Begin a new verbalizer by an empty line. This format is recommended when you don’t know the name of each class.
The details of verbalizer format can be seen in How to Write a Verbalizer?.
Knowledgeable Verbalizer¶
The Knowledgeable Verbalizer defined in Knowledgeable Prompt-tuning: Incorporating Knowledge into Prompt Verbalizer for Text Classification.
- class KnowledgeableVerbalizer(tokenizer: Optional[transformers.tokenization_utils.PreTrainedTokenizer] = None, classes: Optional[Sequence[str]] = None, prefix: Optional[str] = ' ', multi_token_handler: Optional[str] = 'first', max_token_split: Optional[int] = - 1, verbalizer_lr: Optional[float] = 0.05, candidate_frac: Optional[float] = 0.5, pred_temp: Optional[float] = 1.0, **kwargs)[source]¶
This is the implementation of knowledeagble verbalizer, which uses external knowledge to expand the set of label words. This class inherit the
ManualVerbalizer
class.- Parameters
tokenizer (
PreTrainedTokenizer
) – The tokenizer of the current pre-trained model to point out the vocabulary.classes (
classes
) – The classes (or labels) of the current task.prefix (
str
, optional) – The prefix string of the verbalizer.multi_token_handler (
str
, optional) – The handling strategy for multiple tokens produced by the tokenizer.max_token_split (
int
, optional) –verbalizer_lr (
float
, optional) – The learning rate of the verbalizer optimization.candidate_frac (
float
, optional) –
- static add_prefix(label_words, prefix)[source]¶
add prefix to label words. For example, if a label words is in the middle of a template, the prefix should be ‘ ‘.
- generate_parameters() List [source]¶
In basic manual template, the parameters are generated from label words directly. In this implementation, the label_words should not be tokenized into more one token.
- register_calibrate_logits(logits: torch.Tensor)[source]¶
For Knowledgeable Verbalizer, it’s nessessory to filter the words with has low prior probability. Therefore we re-compute the label words after register calibration logits.
PTR Verbalizer¶
The verbalizer of PTR from PTR: Prompt Tuning with Rules for Text Classification.
- class PTRVerbalizer(tokenizer: transformers.tokenization_utils.PreTrainedTokenizer, classes: Optional[Sequence[str]] = None, num_classes: Optional[int] = None, label_words: Optional[Union[Sequence[Sequence[str]], Mapping[str, Sequence[str]]]] = None)[source]¶
In PTR, each prompt has more than one
<mask>
tokens. Different<mask>
tokens have different label words. The final label is predicted jointly by these label words using logic rules.- Parameters
tokenizer (
PreTrainedTokenizer
) – A tokenizer to appoint the vocabulary and the tokenization strategy.classes (
Sequence[str]
) – A sequence of classes that need to be projected.label_words (
Union[Sequence[Sequence[str]], Mapping[str, Sequence[str]]]
, optional) – The label words that are projected by the labels.
- process_logits(logits: torch.Tensor, batch: Union[Dict, openprompt.data_utils.utils.InputFeatures], **kwargs)[source]¶
Process vocab logits of each <mask> into label logits of each <mask>
Combine these logits into a single label logits of the whole task
- Parameters
logits (
torch.Tensor
) – vocab logits of each <mask> (shape: [batch_size, num_masks, vocab_size])- Returns
logits (label logits of whole task (shape: [batch_size, label_size of the whole task]))
- Return type
torch.Tensor
Generation Verbalizer¶
This verbalizer empower the “generation for all the tasks” paradigm.
- class GenerationVerbalizer(tokenizer: transformers.tokenization_utils.PreTrainedTokenizer, classes: Optional[List[str]] = None, num_classes: Optional[int] = None, is_rule: Optional[bool] = False, label_words: Optional[dict] = None)[source]¶
This verbalizer is useful when the label prediction is better defined by a piece of input. For example, in correference resolution, the tgt_text is a proper noun mentioned in the text. There is no fixed mapping between a class label and its label words. This verbalizer can be used as verbalizer of
COPA
andWSC
datasets in SuperGlue.This verbalizer is especially powerful when combined with All NLP Tasks Are Generation Tasks Paradigm (Also see Crossfit). It can make any piece of text the tgt_text. The tgt_text will then be filled in the {“mask”}.
For example, when label word is
"good"
, the tgt_text is"good"
;when label word is
{"text":"good"}
, the tgt_text is also"good"
;when label word is
{"meta":"choice1"}
, the tgt_text is the"meta['choice1']"
field of theInputExample
;when label word is
{"meta":"choice1"} {"placeholder", "text_a"} .
, the tgt_text is the"meta['choice1']"
field of theInputExample
, followed bytext_a
field of theInputExample
, and then a'.'
;A use case can be seen in Tutorial 4.1
- Parameters
tokenizer (
PreTrainedTokenizer
) – The tokenizer of the current pre-trained model to point out the vocabulary.classes (
List[Any]
) – The classes (or labels) of the current task.prefix (
str
, optional) – The prefix string of the verbalizer (used in PLMs like RoBERTa, which is sensitive to prefix space)is_rule (
bool
, optional) – When the verbalizer use the rule syntax of MixTemplate.label_words (
dict
, optional) – The label words of the generation verbalizer
Example: To use this verbalizer to train the T5 model to predict answer and explanation using two masks.
When the template (Defined by
MixedTemplate
) is: >>> input_example = InputExample(text_a = “Can fish run?”, meta={“answer”:”no”, “explanation”: “The fish have no legs”}, label=0) >>> template = “{‘placeholder’:’text_a’} answer: {‘mask’} explanation: {‘mask’}”The verbalizer can be: >>> label_words = {0:[“no”, “{‘meta’:’explanation’}”], 1:[“yes”, “{‘meta’:’explanation’}”]} >>> verbalizer = GenerationVerbalizer(tokenizer, classes=None, is_rule=True, label_words=label_words)
Soft Verbalizer¶
- class SoftVerbalizer(tokenizer: Optional[transformers.tokenization_utils.PreTrainedTokenizer], model: Optional[transformers.utils.dummy_pt_objects.PreTrainedModel], classes: Optional[List] = None, num_classes: Optional[Sequence[str]] = None, label_words: Optional[Union[Sequence[str], Mapping[str, str]]] = None, prefix: Optional[str] = ' ', multi_token_handler: Optional[str] = 'first')[source]¶
The implementation of the verbalizer in WARP
- Parameters
tokenizer (
PreTrainedTokenizer
) – The tokenizer of the current pre-trained model to point out the vocabulary.classes (
List[Any]
) – The classes (or labels) of the current task.label_words (
Union[List[str], List[List[str]], Dict[List[str]]]
, optional) – The label words that are projected by the labels.prefix (
str
, optional) – The prefix string of the verbalizer (used in PLMs like RoBERTa, which is sensitive to prefix space)multi_token_handler (
str
, optional) – The handling strategy for multiple tokens produced by the tokenizer.post_log_softmax (
bool
, optional) – Whether to apply log softmax post processing on label_logits. Default to True.
- property group_parameters_1¶
Include the parameters of head’s layer but not the last layer In soft verbalizer, note that some heads may contain modules other than the final projection layer. The parameters of these part should be optimized (or freezed) together with the plm.
- property group_parameters_2¶
Include the last layer’s parameters
- static add_prefix(label_words, prefix)[source]¶
Add prefix to label words. For example, if a label words is in the middle of a template, the prefix should be
' '
.- Parameters
label_words (
Union[Sequence[str], Mapping[str, str]]
, optional) – The label words that are projected by the labels.prefix (
str
, optional) – The prefix string of the verbalizer.
- Returns
New label words with prefix.
- Return type
Sequence[str]
- generate_parameters() List [source]¶
In basic manual template, the parameters are generated from label words directly. In this implementation, the label_words should not be tokenized into more than one token.
A whole framework to process the original logits over the vocabulary, which contains four steps:
- process_outputs(outputs: torch.Tensor, batch: Union[Dict, openprompt.data_utils.utils.InputFeatures], **kwargs)[source]¶
By default, the verbalizer will process the logits of the PLM’s output.
- Parameters
logits (
torch.Tensor
) – The current logits generated by pre-trained language models.batch (
Union[Dict, InputFeatures]
) – The input features of the data.
- gather_outputs(outputs: transformers.file_utils.ModelOutput)[source]¶
retrieve useful output for the verbalizer from the whole model output By default, it will only retrieve the logits
- Parameters
outputs (
ModelOutput
) –- Returns
torch.Tensor
The gathered output, should be of shape (batch_size
,seq_len
,any
)