Base Classes

Overview

This page introduces the base classes of the prompt-learning framework. Generally, to conduct prompt-learning, a PretrainedModel is selected with the corresponding pre-trained task, a Template class is established to wrap the original text, and a Verbalizer class (if needed) is defined to project the labels to the label words in the vocabulary. In OpenPrompt, the specific prompt-related classes will inherit these base classes.

Prompt Base

Base classes of Template and Verbalizer.

class Template(tokenizer: transformers.tokenization_utils.PreTrainedTokenizer, placeholder_mapping: dict = {'<text_a>': 'text_a', '<text_b>': 'text_b'})[source]

Base class for all the templates. Most of methods are abstract, with some exceptions to hold the common methods for all template, such as loss_ids, save, load.

Parameters
  • tokenizer (PreTrainedTokenizer) – A tokenizer to appoint the vocabulary and the tokenization strategy.

  • placeholder_mapping (dict) – A place holder to represent the original input text.

get_default_loss_ids() List[int][source]

Get the loss indices for the template using mask. e.g. when self.text is '{"placeholder": "text_a"}. {"meta": "word"} is {"mask"}.', output is [0, 0, 0, 0, 1, 0].

Returns

A list of integers in the range [0, 1]:

  • 1 for a masked tokens.

  • 0 for a sequence tokens.

Return type

List[int]

get_default_shortenable_ids() List[int][source]

Every template needs shortenable_ids, denoting which part of the template can be truncate to fit the language model’s max_seq_length. Default: the input text is shortenable, while the template text and other special tokens are not shortenable.

e.g. when self.text is '{"placeholder": "text_a"} {"placeholder": "text_b", "shortenable": False} {"meta": "word"} is {"mask"}.', output is [1, 0, 0, 0, 0, 0, 0].

Returns

A list of integers in the range [0, 1]:

  • 1 for the input tokens.

  • 0 for the template sequence tokens.

Return type

List[int]

get_default_soft_token_ids() List[int][source]

This function identifies which tokens are soft tokens.

Sometimes tokens in the template are not from the vocabulary, but a sequence of soft tokens. In this case, you need to implement this function

Raises

NotImplementedError – if needed, add soft_token_ids into registered_inputflag_names attribute of Template class and implement this method.

wrap_one_example(example: openprompt.data_utils.utils.InputExample) List[Dict][source]

Given an input example which contains input text, which can be referenced by self.template.placeholder_mapping ‘s value. This function process the example into a list of dict, Each dict functions as a group, which has the sample properties, such as whether it’s shortenable, whether it’s the masked position, whether it’s soft token, etc. Since a text will be tokenized in the subsequent processing procedure, these attributes are broadcasted along the tokenized sentence.

Parameters

example (InputExample) – An InputExample object, which should have attributes that are able to be filled in the template.

Returns

A list of dict of the same length as self.text. e.g. [{"loss_ids": 0, "text": "It was"}, {"loss_ids": 1, "text": "<mask>"}, ]

Return type

List[Dict]

abstract process_batch(batch)[source]

Template should rewrite this method if you need to process the batch input such as substituting embeddings.

post_processing_outputs(outputs)[source]

Post processing the outputs of language models according to the need of template. Most templates don’t need post processing, The template like SoftTemplate, which appends soft template as a module (rather than a sequence of input tokens) to the input, should remove the outputs on these positions to keep the seq_len the same

save(path: str, **kwargs) None[source]

A save method API.

Parameters

path (str) – A path to save your template.

safe_on_text_set() None[source]

With this wrapper function, setting text inside on_text_set() will not trigger on_text_set() again to prevent endless recursion.

abstract on_text_set()[source]

A hook to do something when template text was set. The designer of the template should explicitly know what should be down when the template text is set.

from_file(path: str, choice: int = 0)[source]

Read the template from a local file.

Parameters
  • path (str) – The path of the local template file.

  • choice (int) – The id-th line of the file.

classmethod from_config(config: yacs.config.CfgNode, **kwargs)[source]

load a template from template’s configuration node.

Parameters
  • config (CfgNode) – the sub-configuration of template, i.e. config[config.template] if config is a global config node.

  • kwargs – Other kwargs that might be used in initialize the verbalizer. The actual value should match the arguments of __init__ functions.

class Verbalizer(tokenizer: Optional[transformers.tokenization_utils.PreTrainedTokenizer] = None, classes: Optional[Sequence[str]] = None, num_classes: Optional[int] = None)[source]

Base class for all the verbalizers.

Parameters
  • tokenizer (PreTrainedTokenizer) – A tokenizer to appoint the vocabulary and the tokenization strategy.

  • classes (Sequence[str]) – A sequence of classes that need to be projected.

property label_words

Label words means the words in the vocabulary projected by the labels. E.g. if we want to establish a projection in sentiment classification: positive \(\rightarrow\) {wonderful, good}, in this case, wonderful and good are label words.

on_label_words_set()[source]

A hook to do something when textual label words were set.

abstract generate_parameters(**kwargs) List[source]

The verbalizer can be seen as an extra layer on top of the original pre-trained models. In manual verbalizer, it is a fixed one-hot vector of dimension vocab_size, with the position of the label word being 1 and 0 everywhere else. In other situation, the parameters may be a continuous vector over the vocab, with each dimension representing a weight of that token. Moreover, the parameters may be set to trainable to allow label words selection.

Therefore, this function serves as an abstract methods for generating the parameters of the verbalizer, and must be instantiated in any derived class.

Note that the parameters need to be registered as a part of pytorch’s module to It can be achieved by wrapping a tensor using nn.Parameter().

register_calibrate_logits(logits: torch.Tensor)[source]

This function aims to register logits that need to be calibrated, and detach the original logits from the current graph.

process_outputs(outputs: torch.Tensor, batch: Union[Dict, openprompt.data_utils.utils.InputFeatures], **kwargs)[source]

By default, the verbalizer will process the logits of the PLM’s output.

Parameters
  • logits (torch.Tensor) – The current logits generated by pre-trained language models.

  • batch (Union[Dict, InputFeatures]) – The input features of the data.

gather_outputs(outputs: transformers.file_utils.ModelOutput)[source]

retrieve useful output for the verbalizer from the whole model output By default, it will only retrieve the logits

Parameters

outputs (ModelOutput) –

Returns

torch.Tensor The gathered output, should be of shape (batch_size, seq_len, any)

static aggregate(label_words_logits: torch.Tensor) torch.Tensor[source]

To aggregate logits on multiple label words into the label’s logits Basic aggregator: mean of each label words’ logits to a label’s logits Can be re-implemented in advanced verbaliezer.

Parameters

label_words_logits (torch.Tensor) – The logits of the label words only.

Returns

The final logits calculated by the label words.

Return type

torch.Tensor

normalize(logits: torch.Tensor) torch.Tensor[source]

Given logits regarding the entire vocab, calculate the probs over the label words set by softmax.

Parameters

logits (Tensor) – The logits of the entire vocab.

Returns

The probability distribution over the label words set.

Return type

Tensor

abstract project(logits: torch.Tensor, **kwargs) torch.Tensor[source]

This method receives input logits of shape [batch_size, vocab_size], and use the parameters of this verbalizer to project the logits over entire vocab into the logits of labels words.

Parameters

logits (Tensor) – The logits over entire vocab generated by the pre-trained language model with shape [batch_size, max_seq_length, vocab_size]

Returns

The normalized probs (sum to 1) of each label .

Return type

Tensor

handle_multi_token(label_words_logits, mask)[source]

Support multiple methods to handle the multi tokens produced by the tokenizer. We suggest using ‘first’ or ‘max’ if the some parts of the tokenization is not meaningful. Can broadcast to 3-d tensor.

Parameters

label_words_logits (torch.Tensor) –

Returns

torch.Tensor

classmethod from_config(config: yacs.config.CfgNode, **kwargs)[source]

load a verbalizer from verbalizer’s configuration node.

Parameters
  • config (CfgNode) – the sub-configuration of verbalizer, i.e. config[config.verbalizer] if config is a global config node.

  • kwargs – Other kwargs that might be used in initialize the verbalizer. The actual value should match the arguments of __init__ functions.

from_file(path: str, choice: Optional[int] = 0)[source]

Load the predefined label words from verbalizer file. Currently support three types of file format: 1. a .jsonl or .json file, in which is a single verbalizer in dict format. 2. a .jsonal or .json file, in which is a list of verbalizers in dict format 3. a .txt or a .csv file, in which is the label words of a class are listed in line, separated by commas. Begin a new verbalizer by an empty line. This format is recommended when you don’t know the name of each class.

The details of verbalizer format can be seen in How to Write a Verbalizer?.

Parameters
  • path (str) – The path of the local template file.

  • choice (int) – The choice of verbalizer in a file containing multiple verbalizers.

Returns

self object

Return type

Template

Pipeline Base

Base classes of PromptDataLoader and PromptModel, PromptForClassification and PromptForGeneration.

class PromptDataLoader(dataset: Union[torch.utils.data.dataset.Dataset, List], template: openprompt.prompt_base.Template, tokenizer_wrapper: Optional[openprompt.plms.utils.TokenizerWrapper] = None, tokenizer: Optional[transformers.tokenization_utils.PreTrainedTokenizer] = None, tokenizer_wrapper_class=None, verbalizer: Optional[openprompt.prompt_base.Verbalizer] = None, max_seq_length: Optional[str] = 512, batch_size: Optional[int] = 1, shuffle: Optional[bool] = False, teacher_forcing: Optional[bool] = False, decoder_max_length: Optional[int] = - 1, predict_eos_token: Optional[bool] = False, truncate_method: Optional[str] = 'tail', drop_last: Optional[bool] = False, **kwargs)[source]

PromptDataLoader wraps the original dataset. The input data is firstly wrapped with the prompt’s template, and then is tokenized by a wrapperd-tokenizer.

Parameters
  • dataset (Dataset or List) – Either a DatasetObject or a list containing the input examples.

  • template (Template) – A derived class of Template

  • tokenizer (PretrainedTokenizer) – The pretrained tokenizer.

  • tokenizer_wrapper_class (:cls:`TokenizerWrapper`) – The class of tokenizer wrapper.

  • max_seq_length (int, optional) – The max sequence length of the input ids. It’s used to truncate sentences.

  • batch_size (int, optional) – The batch_size of data loader

  • teacher_forcing (bool, optional) – Whether to fill the mask with target text. Set to true in training generation model.

  • decoder_max_length (int, optional) – the decoder maximum length of an encoder-decoder model.

  • predict_eos_token (bool, optional) – Whether to predict the <eos> token. Suggest to set to true in generation.

  • truncate_method (bool, optional) – the truncate method to use. select from head, tail, balanced.

  • kwargs – Other kwargs that might be passed into a tokenizer wrapper.

wrap()[source]

A simple interface to pass the examples to prompt, and wrap the text with template.

tokenize() None[source]

Pass the wrapped text into a prompt-specialized tokenizer, the true PretrainedTokenizer inside the tokenizer is flexible, e.g. AlBert, Bert, T5,…

class PromptModel(plm: transformers.utils.dummy_pt_objects.PreTrainedModel, template: openprompt.prompt_base.Template, freeze_plm: bool = False, plm_eval_mode: bool = False)[source]

PromptModel is the encapsulation of Template and the pre-trained model, with OpenPrompt, these modules could be flexibly combined. And this class is the base class of PromptForClassification and PromptForGeneration

Parameters
  • plm (PreTrainedModel) – The pre-trained language model for the current prompt-learning task.

  • template (Template) – The Template object to warp the input data.

  • freeze_plm (bool) – whether or not to freeze the pretrained language model

  • plm_eval_mode (bool) – this is a stronger freezing mode than freeze_plm, i.e. the dropout of the model is turned off. No matter whether the other part is set to train.

train(mode: bool = True)[source]

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

Parameters

mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.

Returns

self

Return type

Module

forward(batch: Union[Dict, openprompt.data_utils.utils.InputFeatures]) torch.Tensor[source]

This is a forward method to make wrapped input data go through the model, and return the output logits. Typically, this function aims to predict the <mask> position.

Parameters

batch (Union[Dict, InputFeatures]) – The input features of batchified data sequences.

prepare_model_inputs(batch: Union[Dict, openprompt.data_utils.utils.InputFeatures]) Dict[source]

Will be used in generation

class PromptForClassification(plm: transformers.utils.dummy_pt_objects.PreTrainedModel, template: openprompt.prompt_base.Template, verbalizer: openprompt.prompt_base.Verbalizer, freeze_plm: bool = False, plm_eval_mode: bool = False)[source]

PromptModel with a classification head on top. The classification head will map the logits in all position of the sequence (return value of a PromptModel) into the logits of the labels, using a verbalizer.

Parameters
  • plm (PretrainedModel) – A pre-traiend model you decide to use for classification, e.g. BERT.

  • template (Template) – A Template object you use to wrap the input text for classification, e.g. ManualTemplate.

  • verbalizer (Verbalizer) – A Verbalizer object you use to project the labels to label words for classification, e.g. ManualVerbalizer.

  • freeze_plm (bool) – whether or not to freeze the pretrained language model

  • plm_eval_mode (bool) – this is a stronger freezing mode than freeze_plm, i.e. the dropout of the model is turned off. No matter whether the other part is set to train.

property device

Register the device parameter.

extract_at_mask(outputs: torch.Tensor, batch: Union[Dict, openprompt.data_utils.utils.InputFeatures])[source]

Get outputs at all <mask> token E.g., project the logits of shape (batch_size, max_seq_length, vocab_size) into logits of shape (if num_mask_token > 1) (batch_size, num_mask_token, vocab_size) or into logits of shape (if num_mask_token = 1) (batch_size, vocab_size).

Parameters
  • outputs (torch.Tensor) – The original outputs (maybe process by verbalizer’s gather_outputs before) etc. of the whole sequence.

  • batch (Union[Dict, InputFeatures]) – The original batch

Returns

The extracted outputs of <mask> tokens.

Return type

torch.Tensor

forward(batch: Union[Dict, openprompt.data_utils.utils.InputFeatures]) torch.Tensor[source]

Get the logits of label words.

Parameters

batch (Union[Dict, InputFeatures]) – The original batch

Returns

The logits of the label words (obtained by the current verbalizer).

Return type

torch.Tensor

property tokenizer

Utility property, to get the tokenizer more easily.

parallelize(device_map=None)[source]

Parallelize the model across device

deparallelize()[source]

Deparallelize the model across device

class PromptForGeneration(plm: transformers.utils.dummy_pt_objects.PreTrainedModel, template: openprompt.prompt_base.Template, freeze_plm: bool = False, plm_eval_mode: bool = False, gen_config: Optional[yacs.config.CfgNode] = None, tokenizer: Optional[transformers.tokenization_utils.PreTrainedTokenizer] = None)[source]

PromptModel with generation loss calculation and generation utils integrated.

Parameters
  • plm (PretrainedModel) – A pre-traiend model you decide to use for generation, e.g. GPT.

  • template (Template) – A Template object you use to wrap the input text for classification, e.g. PrefixTemplate.

  • tokenizer (Tokenizer) – A Tokenizer of the current model.

  • gen_config (CfgNode) – The generation configs to pass into GenerationMixin.generate

  • freeze_plm (bool) – whether or not to freeze the pretrained language model

  • plm_eval_mode (bool) – this is a stronger freezing mode than freeze_plm, i.e. the dropout of the model is turned off. No matter whether the other part is set to train.

shift_logits_and_labels(logits, loss_ids, reference_ids)[source]

Left shift the label, and make label of the positions that are not loss position to -100, which is the ignore index in pytorch’s loss function.

Parameters
  • logits (torch.Tensor) –

  • batch (InputFeatures) – The input features of batchified data sequences.

Returns

shift_input_ids (List[int]):

Return type

shift_logits (torch.Tensor)

forward(*args, **kwargs)[source]

In generation process, it will use the plm’s forward function. This is because, in the first step we will directly call the process_batch function to generate initial input with the template, after that the all template have been processed into the past_key_value, then we can use the normal generation function. In learning process, the forward is linked to _forward functions. in which the loss will be calculated for all the positions in the same time.

generate(batch: Union[Dict, openprompt.data_utils.utils.InputFeatures], verbose: Optional[bool] = False, **generation_kwargs)[source]

This function wraps the generate() methods in parent class GenerationMixin. Forward uses the PretrainedModel’s forward method. generation_kwargs include all the parameters that are passed in to transformers.generation_util.GenerationMixin.generate

Parameters
  • batch (Union[Dict, InputFeatures]) – The input features of batchified data sequences.

  • verbose (Optional[bool]) – Set to true to verbose the generated sentence.

Returns

The raw sequences generated by the generation model. generated_sentences (List[torch.Tensor]): The generated sentences that have been post-processed.

Return type

output_sequences (List[torch.Tensor])

post_processing(output_sequences, input_lengths)[source]

Post-process the sequences generated by the generation model.

Parameters
  • output_sequences (torch.Tensor) – The raw sequences generated by the generation model.

  • input_lengths (int or list) – The length(s) of the input sequence.

Returns

The generated sentences that have been post-processed.

Return type

List

prepare_inputs_for_generation(input_ids: Optional[torch.Tensor] = None, **model_kwargs)[source]

This function wraps the prepare_inputs_for_generation function in the huggingface transformers.

When the past not in model_kwargs, we prepare the input from scratch. When past is in model_kwargs, we don’t need to prepare the template wrapped input, instead we use the inner pretrain_models’ function to prepare the next step’s input. model_kwargs includes all the argument passed in the batch: InputFeatures, except input_ids , as long as they do not conflict with keywords in generation_kwargs. if ‘past’ not in model_kwargs: # the past_key_value not in model_kwargs, then we need to prepare input from scrath , as long as they do not conflict with keywords in generation_kwargs.

Parameters

input_ids (torch.Tensor) – Indices of input sequence tokens in the vocabulary.

parallelize(device_map=None)[source]

Parallelize the model across device

deparallelize()[source]

Deparallelize the model across device