Base Classes¶
Overview¶
This page introduces the base classes of the prompt-learning framework.
Generally, to conduct prompt-learning, a PretrainedModel
is selected with the corresponding pre-trained task,
a Template
class is established to wrap the original text,
and a Verbalizer
class (if needed) is defined to project the labels to the label words in the vocabulary.
In OpenPrompt, the specific prompt-related classes will inherit these base classes.
Prompt Base¶
Base classes of Template
and Verbalizer
.
- class Template(tokenizer: transformers.tokenization_utils.PreTrainedTokenizer, placeholder_mapping: dict = {'<text_a>': 'text_a', '<text_b>': 'text_b'})[source]¶
Base class for all the templates. Most of methods are abstract, with some exceptions to hold the common methods for all template, such as
loss_ids
,save
,load
.- Parameters
tokenizer (
PreTrainedTokenizer
) – A tokenizer to appoint the vocabulary and the tokenization strategy.placeholder_mapping (
dict
) – A place holder to represent the original input text.
- get_default_loss_ids() List[int] [source]¶
Get the loss indices for the template using mask. e.g. when self.text is
'{"placeholder": "text_a"}. {"meta": "word"} is {"mask"}.'
, output is[0, 0, 0, 0, 1, 0]
.- Returns
A list of integers in the range [0, 1]:
1 for a masked tokens.
0 for a sequence tokens.
- Return type
List[int]
- get_default_shortenable_ids() List[int] [source]¶
Every template needs shortenable_ids, denoting which part of the template can be truncate to fit the language model’s
max_seq_length
. Default: the input text is shortenable, while the template text and other special tokens are not shortenable.e.g. when self.text is
'{"placeholder": "text_a"} {"placeholder": "text_b", "shortenable": False} {"meta": "word"} is {"mask"}.'
, output is[1, 0, 0, 0, 0, 0, 0]
.- Returns
A list of integers in the range
[0, 1]
:1 for the input tokens.
0 for the template sequence tokens.
- Return type
List[int]
- get_default_soft_token_ids() List[int] [source]¶
This function identifies which tokens are soft tokens.
Sometimes tokens in the template are not from the vocabulary, but a sequence of soft tokens. In this case, you need to implement this function
- Raises
NotImplementedError – if needed, add
soft_token_ids
intoregistered_inputflag_names
attribute of Template class and implement this method.
- wrap_one_example(example: openprompt.data_utils.utils.InputExample) List[Dict] [source]¶
Given an input example which contains input text, which can be referenced by self.template.placeholder_mapping ‘s value. This function process the example into a list of dict, Each dict functions as a group, which has the sample properties, such as whether it’s shortenable, whether it’s the masked position, whether it’s soft token, etc. Since a text will be tokenized in the subsequent processing procedure, these attributes are broadcasted along the tokenized sentence.
- Parameters
example (
InputExample
) – AnInputExample
object, which should have attributes that are able to be filled in the template.- Returns
A list of dict of the same length as self.text. e.g.
[{"loss_ids": 0, "text": "It was"}, {"loss_ids": 1, "text": "<mask>"}, ]
- Return type
List[Dict]
- abstract process_batch(batch)[source]¶
Template should rewrite this method if you need to process the batch input such as substituting embeddings.
- post_processing_outputs(outputs)[source]¶
Post processing the outputs of language models according to the need of template. Most templates don’t need post processing, The template like SoftTemplate, which appends soft template as a module (rather than a sequence of input tokens) to the input, should remove the outputs on these positions to keep the seq_len the same
- save(path: str, **kwargs) None [source]¶
A save method API.
- Parameters
path (str) – A path to save your template.
- safe_on_text_set() None [source]¶
With this wrapper function, setting text inside
on_text_set()
will not triggeron_text_set()
again to prevent endless recursion.
- abstract on_text_set()[source]¶
A hook to do something when template text was set. The designer of the template should explicitly know what should be down when the template text is set.
- classmethod from_config(config: yacs.config.CfgNode, **kwargs)[source]¶
load a template from template’s configuration node.
- Parameters
config (
CfgNode
) – the sub-configuration of template, i.e. config[config.template] if config is a global config node.kwargs – Other kwargs that might be used in initialize the verbalizer. The actual value should match the arguments of __init__ functions.
- class Verbalizer(tokenizer: Optional[transformers.tokenization_utils.PreTrainedTokenizer] = None, classes: Optional[Sequence[str]] = None, num_classes: Optional[int] = None)[source]¶
Base class for all the verbalizers.
- Parameters
tokenizer (
PreTrainedTokenizer
) – A tokenizer to appoint the vocabulary and the tokenization strategy.classes (
Sequence[str]
) – A sequence of classes that need to be projected.
- property label_words¶
Label words means the words in the vocabulary projected by the labels. E.g. if we want to establish a projection in sentiment classification: positive \(\rightarrow\) {wonderful, good}, in this case, wonderful and good are label words.
- abstract generate_parameters(**kwargs) List [source]¶
The verbalizer can be seen as an extra layer on top of the original pre-trained models. In manual verbalizer, it is a fixed one-hot vector of dimension
vocab_size
, with the position of the label word being 1 and 0 everywhere else. In other situation, the parameters may be a continuous vector over the vocab, with each dimension representing a weight of that token. Moreover, the parameters may be set to trainable to allow label words selection.Therefore, this function serves as an abstract methods for generating the parameters of the verbalizer, and must be instantiated in any derived class.
Note that the parameters need to be registered as a part of pytorch’s module to It can be achieved by wrapping a tensor using
nn.Parameter()
.
- register_calibrate_logits(logits: torch.Tensor)[source]¶
This function aims to register logits that need to be calibrated, and detach the original logits from the current graph.
- process_outputs(outputs: torch.Tensor, batch: Union[Dict, openprompt.data_utils.utils.InputFeatures], **kwargs)[source]¶
By default, the verbalizer will process the logits of the PLM’s output.
- Parameters
logits (
torch.Tensor
) – The current logits generated by pre-trained language models.batch (
Union[Dict, InputFeatures]
) – The input features of the data.
- gather_outputs(outputs: transformers.file_utils.ModelOutput)[source]¶
retrieve useful output for the verbalizer from the whole model output By default, it will only retrieve the logits
- Parameters
outputs (
ModelOutput
) –- Returns
torch.Tensor
The gathered output, should be of shape (batch_size
,seq_len
,any
)
- static aggregate(label_words_logits: torch.Tensor) torch.Tensor [source]¶
To aggregate logits on multiple label words into the label’s logits Basic aggregator: mean of each label words’ logits to a label’s logits Can be re-implemented in advanced verbaliezer.
- Parameters
label_words_logits (
torch.Tensor
) – The logits of the label words only.- Returns
The final logits calculated by the label words.
- Return type
torch.Tensor
- normalize(logits: torch.Tensor) torch.Tensor [source]¶
Given logits regarding the entire vocab, calculate the probs over the label words set by softmax.
- Parameters
logits (
Tensor
) – The logits of the entire vocab.- Returns
The probability distribution over the label words set.
- Return type
Tensor
- abstract project(logits: torch.Tensor, **kwargs) torch.Tensor [source]¶
This method receives input logits of shape
[batch_size, vocab_size]
, and use the parameters of this verbalizer to project the logits over entire vocab into the logits of labels words.- Parameters
logits (
Tensor
) – The logits over entire vocab generated by the pre-trained language model with shape [batch_size
,max_seq_length
,vocab_size
]- Returns
The normalized probs (sum to 1) of each label .
- Return type
Tensor
- handle_multi_token(label_words_logits, mask)[source]¶
Support multiple methods to handle the multi tokens produced by the tokenizer. We suggest using ‘first’ or ‘max’ if the some parts of the tokenization is not meaningful. Can broadcast to 3-d tensor.
- Parameters
label_words_logits (
torch.Tensor
) –- Returns
torch.Tensor
- classmethod from_config(config: yacs.config.CfgNode, **kwargs)[source]¶
load a verbalizer from verbalizer’s configuration node.
- Parameters
config (
CfgNode
) – the sub-configuration of verbalizer, i.e.config[config.verbalizer]
if config is a global config node.kwargs – Other kwargs that might be used in initialize the verbalizer. The actual value should match the arguments of
__init__
functions.
- from_file(path: str, choice: Optional[int] = 0)[source]¶
Load the predefined label words from verbalizer file. Currently support three types of file format: 1. a .jsonl or .json file, in which is a single verbalizer in dict format. 2. a .jsonal or .json file, in which is a list of verbalizers in dict format 3. a .txt or a .csv file, in which is the label words of a class are listed in line, separated by commas. Begin a new verbalizer by an empty line. This format is recommended when you don’t know the name of each class.
The details of verbalizer format can be seen in How to Write a Verbalizer?.
Pipeline Base¶
Base classes of PromptDataLoader
and PromptModel
, PromptForClassification
and PromptForGeneration
.
- class PromptDataLoader(dataset: Union[torch.utils.data.dataset.Dataset, List], template: openprompt.prompt_base.Template, tokenizer_wrapper: Optional[openprompt.plms.utils.TokenizerWrapper] = None, tokenizer: Optional[transformers.tokenization_utils.PreTrainedTokenizer] = None, tokenizer_wrapper_class=None, verbalizer: Optional[openprompt.prompt_base.Verbalizer] = None, max_seq_length: Optional[str] = 512, batch_size: Optional[int] = 1, shuffle: Optional[bool] = False, teacher_forcing: Optional[bool] = False, decoder_max_length: Optional[int] = - 1, predict_eos_token: Optional[bool] = False, truncate_method: Optional[str] = 'tail', drop_last: Optional[bool] = False, **kwargs)[source]¶
PromptDataLoader wraps the original dataset. The input data is firstly wrapped with the prompt’s template, and then is tokenized by a wrapperd-tokenizer.
- Parameters
dataset (
Dataset
orList
) – Either a DatasetObject or a list containing the input examples.template (
Template
) – A derived class ofTemplate
tokenizer (
PretrainedTokenizer
) – The pretrained tokenizer.tokenizer_wrapper_class (:cls:`TokenizerWrapper`) – The class of tokenizer wrapper.
max_seq_length (
int
, optional) – The max sequence length of the input ids. It’s used to truncate sentences.batch_size (
int
, optional) – The batch_size of data loaderteacher_forcing (
bool
, optional) – Whether to fill the mask with target text. Set to true in training generation model.decoder_max_length (
int
, optional) – the decoder maximum length of an encoder-decoder model.predict_eos_token (
bool
, optional) – Whether to predict the <eos> token. Suggest to set to true in generation.truncate_method (
bool
, optional) – the truncate method to use. select from head, tail, balanced.kwargs – Other kwargs that might be passed into a tokenizer wrapper.
- class PromptModel(plm: transformers.utils.dummy_pt_objects.PreTrainedModel, template: openprompt.prompt_base.Template, freeze_plm: bool = False, plm_eval_mode: bool = False)[source]¶
PromptModel
is the encapsulation ofTemplate
and thepre-trained model
, with OpenPrompt, these modules could be flexibly combined. And this class is the base class ofPromptForClassification
andPromptForGeneration
- Parameters
plm (
PreTrainedModel
) – The pre-trained language model for the current prompt-learning task.template (
Template
) – TheTemplate
object to warp the input data.freeze_plm (
bool
) – whether or not to freeze the pretrained language modelplm_eval_mode (
bool
) – this is a stronger freezing mode than freeze_plm, i.e. the dropout of the model is turned off. No matter whether the other part is set to train.
- train(mode: bool = True)[source]¶
Sets the module in training mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout
,BatchNorm
, etc.- Parameters
mode (bool) – whether to set training mode (
True
) or evaluation mode (False
). Default:True
.- Returns
self
- Return type
Module
- forward(batch: Union[Dict, openprompt.data_utils.utils.InputFeatures]) torch.Tensor [source]¶
This is a forward method to make wrapped input data go through the model, and return the output logits. Typically, this function aims to predict the
<mask>
position.- Parameters
batch (
Union[Dict, InputFeatures]
) – The input features of batchified data sequences.
- class PromptForClassification(plm: transformers.utils.dummy_pt_objects.PreTrainedModel, template: openprompt.prompt_base.Template, verbalizer: openprompt.prompt_base.Verbalizer, freeze_plm: bool = False, plm_eval_mode: bool = False)[source]¶
PromptModel
with a classification head on top. The classification head will map the logits in all position of the sequence (return value of aPromptModel
) into the logits of the labels, using a verbalizer.- Parameters
plm (
PretrainedModel
) – A pre-traiend model you decide to use for classification, e.g. BERT.template (
Template
) – ATemplate
object you use to wrap the input text for classification, e.g.ManualTemplate
.verbalizer (
Verbalizer
) – AVerbalizer
object you use to project the labels to label words for classification, e.g.ManualVerbalizer
.freeze_plm (
bool
) – whether or not to freeze the pretrained language modelplm_eval_mode (
bool
) – this is a stronger freezing mode than freeze_plm, i.e. the dropout of the model is turned off. No matter whether the other part is set to train.
- property device¶
Register the device parameter.
- extract_at_mask(outputs: torch.Tensor, batch: Union[Dict, openprompt.data_utils.utils.InputFeatures])[source]¶
Get outputs at all <mask> token E.g., project the logits of shape (
batch_size
,max_seq_length
,vocab_size
) into logits of shape (if num_mask_token > 1) (batch_size
,num_mask_token
,vocab_size
) or into logits of shape (ifnum_mask_token
= 1) (batch_size
,vocab_size
).- Parameters
outputs (
torch.Tensor
) – The original outputs (maybe process by verbalizer’s gather_outputs before) etc. of the whole sequence.batch (
Union[Dict, InputFeatures]
) – The original batch
- Returns
The extracted outputs of
<mask>
tokens.- Return type
torch.Tensor
- forward(batch: Union[Dict, openprompt.data_utils.utils.InputFeatures]) torch.Tensor [source]¶
Get the logits of label words.
- Parameters
batch (
Union[Dict, InputFeatures]
) – The original batch- Returns
The logits of the label words (obtained by the current verbalizer).
- Return type
torch.Tensor
- property tokenizer¶
Utility property, to get the tokenizer more easily.
- class PromptForGeneration(plm: transformers.utils.dummy_pt_objects.PreTrainedModel, template: openprompt.prompt_base.Template, freeze_plm: bool = False, plm_eval_mode: bool = False, gen_config: Optional[yacs.config.CfgNode] = None, tokenizer: Optional[transformers.tokenization_utils.PreTrainedTokenizer] = None)[source]¶
PromptModel
with generation loss calculation and generation utils integrated.- Parameters
plm (
PretrainedModel
) – A pre-traiend model you decide to use for generation, e.g. GPT.template (
Template
) – ATemplate
object you use to wrap the input text for classification, e.g.PrefixTemplate
.tokenizer (
Tokenizer
) – ATokenizer
of the current model.gen_config (
CfgNode
) – The generation configs to pass into GenerationMixin.generatefreeze_plm (
bool
) – whether or not to freeze the pretrained language modelplm_eval_mode (
bool
) – this is a stronger freezing mode than freeze_plm, i.e. the dropout of the model is turned off. No matter whether the other part is set to train.
- shift_logits_and_labels(logits, loss_ids, reference_ids)[source]¶
Left shift the label, and make label of the positions that are not loss position to -100, which is the ignore index in pytorch’s loss function.
- Parameters
logits (
torch.Tensor
) –batch (
InputFeatures
) – The input features of batchified data sequences.
- Returns
shift_input_ids (
List[int]
):- Return type
shift_logits (
torch.Tensor
)
- forward(*args, **kwargs)[source]¶
In generation process, it will use the plm’s forward function. This is because, in the first step we will directly call the process_batch function to generate initial input with the template, after that the all template have been processed into the past_key_value, then we can use the normal generation function. In learning process, the forward is linked to
_forward
functions. in which the loss will be calculated for all the positions in the same time.
- generate(batch: Union[Dict, openprompt.data_utils.utils.InputFeatures], verbose: Optional[bool] = False, **generation_kwargs)[source]¶
This function wraps the generate() methods in parent class
GenerationMixin
. Forward uses thePretrainedModel
’s forward method. generation_kwargs include all the parameters that are passed in totransformers.generation_util.GenerationMixin.generate
- Parameters
batch (
Union[Dict, InputFeatures]
) – The input features of batchified data sequences.verbose (
Optional[bool]
) – Set to true to verbose the generated sentence.
- Returns
The raw sequences generated by the generation model. generated_sentences (
List[torch.Tensor]
): The generated sentences that have been post-processed.- Return type
output_sequences (
List[torch.Tensor]
)
- post_processing(output_sequences, input_lengths)[source]¶
Post-process the sequences generated by the generation model.
- Parameters
output_sequences (
torch.Tensor
) – The raw sequences generated by the generation model.input_lengths (
int
or list) – The length(s) of the input sequence.
- Returns
The generated sentences that have been post-processed.
- Return type
List
- prepare_inputs_for_generation(input_ids: Optional[torch.Tensor] = None, **model_kwargs)[source]¶
This function wraps the
prepare_inputs_for_generation
function in the huggingface transformers.When the past not in model_kwargs, we prepare the input from scratch. When past is in model_kwargs, we don’t need to prepare the template wrapped input, instead we use the inner pretrain_models’ function to prepare the next step’s input. model_kwargs includes all the argument passed in the batch: InputFeatures, except
input_ids
, as long as they do not conflict with keywords ingeneration_kwargs
. if ‘past’ not in model_kwargs: # the past_key_value not in model_kwargs, then we need to prepare input from scrath , as long as they do not conflict with keywords ingeneration_kwargs
.- Parameters
input_ids (
torch.Tensor
) – Indices of input sequence tokens in the vocabulary.