Play with Configuration

OpenPrompt suggests to use configuration file to developen the users’ own prompt-leanring pipelines. We provide a config_default.yaml file to implement common attributes in prompt-learning, which will be detailed introduced next. For a prompt-learning pipeline derived by OpenPrompt, you can implement a model specific config file to implement specific attributes.

We provide a unified entrance for experiment with OpenPrompt. Just run the following code in the root directory.

python experiments/cli.py --config_yaml experiments/classification_manual_prompt.yaml

You may choose the configuration file we wrote for you, or write your own configuration file.

Next, we will introduce the meaning of each configuration parameter in the default configuration. In your own experiments, you can create a yaml file containing a subset of configuration parameters that you want to change or specify.

Default Configuration

Now we introduce the details of the default configuration of OpenPrompt.

Environment

In environment, we can set the attributes regarding to the training and inference environment.

  • num_gpus (int): The number of GPUs during training and evaluation.

  • cuda_visible_devices: Which devices are visible during training and evaluation.

  • local_rank: The indices of devices for the current process.

Example:

environment:
    num_gpus: 1
    cuda_visible_devices:
        - 0
    local_rank: 0

Reproduce

The reproduce configuration controls key attributes that determines the reproduction of a prompt-leanring framework. Specifically, seeds for all potential randomness.

Example:

  • seed: If seed this seed, and other seeds are unset, then all the seeds will use this value.

reproduce:
    seed: 100

PLM

PLM implements attributes regarding to pre-trained language models, including the model’s type, path and optimization.

  • model_name: The name of the pre-trained model.

  • model_path: The path of the pre-trained model.

  • optimize:
    • freeze_para: If the parameters of the model are freezed.

    • loss_function: The loss function during training.

    • no_decay: The no_decay setup of the optimization.

    • lr: The learning rate during training.

    • weight_decay: The weight_decay setup of optimization.

    • scheduler:
      • type: The scheduler type.

      • num_warmup_steps: The number of steps for warming up.

Example:

plm:
    model_name:
    model_path:
    optimize:
        freeze_para: False
        no_decay:
            - bias
            - LayerNorm.weight
        lr: 0.0005
        weight_decay: 0.01
        scheduler:
            type:
            num_warmup_steps: 500

Pipeline

This part contains the attributes of train, dev and test.

  • train
    • num_epochs: The number of epochs during training.

    • batch_size: The batch size during training.

    • shuffle_data: If True, the data will be shuffled during training.

    • teacher_forcing: If True, the teach forcing method will be used during training.

    • clean: If True, not saving checkpoints and not logging tensorboard. However, test will use the last model but not the best model in validation.

  • dev
    • batch_size: The batch size during validation.

    • shuffle_data: If True, the data will be shuffled during validation.

  • test
    • batch_size: The batch size during testing.

    • shuffle_data: If True, the data will be shuffled during testing.

Example:

train:
    num_epochs: 5
    batch_size: 2
    shuffle_data: False
    teacher_forcing: False
    clean: False

dev:
    batch_size: 2
    shuffle_data: False

test:
    batch_size: 2
    shuffle_data: False

Task

The configuration about the current task. There will be a parent configuration task to determine the current type of task, e.g. classfiication. And for the specific task, a user could specifically set the corresponding attributes.

Example:

task: classification
classification:
    parent_config: task
    metric:
        - micro-f1
    loss_function: cross_entropy ## select from cross_entropy

generation:
    parent_config: task
    gen_max_length: 128
    decoding_strategy: greedy

relation_classification:
    parent_config: task

Dataloader

This is the configuration about the dataloader, which sets some attributes like max_seq_length, etc.

Example:

dataloader:
    max_seq_length: 256
    decoder_max_length: 256
    predict_eos_token: False  # necessary to set to true in generation.
    truncate_method: "head" # choosing from balanced, head, tail

Learning Setting

Configuration about the learning settings, including full, few-shot and zero-shot.

learning_setting:   # selecting from "full", "zero-shot", "few-shot"

zero_shot:
    parent_config: learning_setting

few_shot:
    parent_config: learning_setting
    few_shot_sampling:

sampling_from_train:
    parent_config: few_shot_sampling
    num_examples_per_label: 10
    also_sample_dev: True
    num_examples_per_label_dev: 10
    seed:
        - 123
        - 456

Prompt-specific Config

Configuration about templates and verbalizers, there are different attributes for different classes. Here are some examples:

template:
verbalizer:

manual_template:
    parent_config: template
    text:
    mask_token: <mask>
    placeholder_mapping:
    <text_a>: text_a
    <text_b>: text_b
file_path:
    choice: 0
    optimize:  # the parameters related to optimize the template


automatic_verbalizer:
    parent_config: verbalizer
    num_cadidates: 1000
    label_word_num_per_class: 1
    num_searches: 1
    score_fct: llr
    balance: true
optimize:
    level: epoch
num_classes:
    init_using_split: valid

one2one_verbalizer:
    parent_config: verbalizer
    label_words:
    prefix: " "
    multi_token_handler: first
    file_path:
    choice:
    num_classes:
    optimize:

manual_verbalizer:
    parent_config: verbalizer
    label_words:
    prefix: " "
    multi_token_handler: first
    file_path:
    choice:
    num_classes:
    optimize:

prefix_tuning_template:
    parent_config: template
    text:
        mask_token: <mask>
    num_token: 5
    placeholder_mapping:
        <text_a>: text_a
        <text_b>: text_b
    prefix_dropout: 0.0
    optimize:
        lr: 0.0001