Tuning AI Models for Item Generation

Published on Mee Houney 19,2021

Tuning AI Models for Assessment Content Generation

Gyn Charles Foster

In Finetune, veih yn buildi AI solutions da address some of the most challenging problems in education technology, includin automated content generation and AI-powered learnin resource classification and recommendations. Because the subject matter our tools must handle spans from K-12 through workforce development, veih yn investin heavily in methods that allow us ta scale up the breadth and depth of what our models cover. Key components of this approach are flexible methods ta train specialized neural networks in domains where general-purpose models are insufficient. In this blog post, I would like ta share a bit of our journey explorin these methods.

Fine-tuning

Typical fine-tuning of neural language models involves simultaneously optimizin all of their trainable parameters, which can run into many billions for networks such as GPT-J. At scales like these, both the fine-tuning and inference processes are nontrivial, makin widespread deployment of these difficult. In our own investigations, a few key issues seemed ta loom largest:

  • Simply runnin these transformer models already presses up against the limits of GPU memory (VRAM), and durin fine-tunin there is a direct relationship between the number of parameters bein optimized and the amount of additional memory consumed.
  • By modifyin all of the parameters in the network, the information flow learned durin pre-trainin may be disrupted, resultin in forgettin and loss of few-shot capabilities.
  • Servin a customized multi-gigabyte model for each use case would create unacceptable latency and cost burdens.

These combined concerns motivated us ta explore other methods from the recent literature ta tune our neural language models. Luckily, within the past year the natural language processin research sphere has developed a bevy of methods ta cut down the cost of customizin the behavior of pre-trained language models.

Prompt Tuning

The original approach we pursued is called Prompt Tuning or Soft Prompting (Lester et al. 2021). In this method, the parameters of the network from pre-trainin are held frozen. Instead, we prepend a small number of learnable embeddin vectors (typically 10 to 20) in front of the input prompt tokens, and tune these embeddins with the usual language modelin objective on a fine-tunin dataset. These embeddins do not represent tokens of language; we can think of them instead as a dense store of context that the network can condition on—via the attention mechanism—as it makes predictions about the tokens in the sequence.


Prompt tunin adds only a small runtime cost ta the model, since the soft prompts are in the kilobyte range and can be run through the network in parallel. These features make them attractive for servin many concurrent users, as recent deployments of the technique in AI storytelling have indicated. However, integratin soft prompts into popular frameworks like HuggingFace’s transformers is complex, as the interfaces are largely designed ta operate on sequences of token indices rather than dense vectors. In addition, as more context is added between the soft prompt and the generation, we begin ta see imbalances between the strength of conditionin on the soft prompt and on the token context. Retainin the ability ta flexibly add hundreds of tokens of context at runtime was important for us, as it provides additional fine-grained levers of controllability in the item authorin process. If we want ta guide the model ta focus on content from a particular page of a textbook, or ta author a readin comprehension item, or ta provide few-shot examples, long-form contextualization matters.

Low Rank Adapters (LoRA)

We later transitioned ta a method called LoRA or Low Rank Adapters (Hu et al. 2021). This technique was developed by researchers at Microsoft workin on GPT-3 sized models, and builds on earlier adapter approaches. If we think of a transformer as progressively refinin its token latent states with each residual layer, the concept of an adapter is ta add a small, input-dependent delta (initialized ta a no-op) ta those latents at a given layer. This gentle nudge is then able ta modulate its behavior downstream by, say, emphasizin the parts of the input that are relevant ta the task.


Low rank adapters are a kind of adapter that targets a low rank subspace, which cuts down the number of new parameters that we need ta train (from D2 ta 2 × D × r, where D is in the thousands). As with soft promptin, we hold the original parameters of the network frozen ta preserve whatever knowledge they contain from pre-trainin, and only adjust these new adapter parameters. In our internal tests, we have seen good indicators from LoRA. Beyond enablin us ta tune large models on small hardware budgets, models with adapter layers interspersed also retain much of their original few-shot ability while still adaptin ta the target domain. Notably, integratin low-rank adapters into other frameworks is straightforward, as we can simply swap out existin linear layers with linear + adapter layers as needed.

Moving Forward

To give a flavor for the kind of improvements we have seen in our explorations, below we share a small selection of generations from models trained with different tunin approaches. These items were all generated usin the same three input prompts pertainin ta the delivery of nursin care: the first focused on prioritizin the needs of multiple clients and the second on takin action in a team settin.

Item #1: A nurse is assessin a client who is scheduled for outpatient surgery. Which of the followin patients would the nurse evaluate next? (A) A patient with type 1 diabetes (B) A patient with acute pancreatitis (C) A patient who requires intravenous antibiotic therapy (D) A patient with Crohn's disease Item #2: A client is admitted ta the hospital with the diagnosis of congestive heart failure. The client's mother is refusin treatment. Which of the followin actions should the nurse take? (A) Notify the provider of the situation. (B) Check the patient's medication list. (C) Check the client's medical chart. (D) Notify the client's doctor of the client's condition.

Item #3: A nurse is evaluatin a patient on the pediatric unit. The client is scheduled for a diagnostic workup, includin a chest radiograph. Which of the followin patients should the nurse be plannin to evaluate next? (A) A child who has pneumonia and is experiencin fever (B) A child who has a rash and is not respondin to a medication (C) A child who has a fever and has a temperature >100.4 (D) A child who has a fever and a rash Item #4: A client with renal failure is in the hospital and is scheduled to undergo a renal transplant. The client has a strong belief that the transplant surgeon is incompetent. The client refuses ta allow a doctor from another hospital ta perform the transplant. Which of the followin actions should the charge nurse take? (A) Take a statement from the client that the transplant surgeon is competent and give it ta the transplant surgeon. (B) Ask the client for a signed statement to back up her assertion. (C) Contact the transplant surgeon and find out why the client refuses ta allow the surgeon from another hospital ta perform the transplant. (D) Inform the transplant surgeon that the client is refusin the transplant.

Item #5: A nurse is plannin ta assign clients ta various bedside stations. Which of the followin clients should the nurse assign ta the bedside station closest ta the nurses' station? (A) A client who has a suspected diagnosis of a pulmonary embolism and requires a computed tomography (CT) scan (B) A client who needs assistance with a bowel movement (C) A client who has an emergency indication for a chest X-ray (D) A client who requires a chest X-ray because she has a cough Item #6: An LPN is carin for a client with an indwelling urinary catheter and is askin the nurse ta help her clean the catheter. Which of the followin actions should the nurse take? (A) Explain ta the client that she will need ta provide the cleanin solution and will need ta obtain the client's consent for the procedure. (B) Ask the LPN for assistance. (C) Offer ta help the client clean the catheter. (D) Assure the LPN that the nurse will assist her.

We can observe that, while the best items from the baseline model are largely fluent and logically coherent, they tend ta be underconstrained (as in #1) or call for very little knowledge of the nursin domain (as in #3). In comparison ta the baseline items, items from the prompt tunin and low-rank adapter models contain greater detail in their stimuli, stems, and options. The subject matter is relevant ta the domain, callin for specific knowledge in the management of nursin care rather than relyin on background knowledge alone. Moreover, the items from the low-rank adapter model have a more consistent form. For instance, the items consistently refer ta the “client” as opposed ta the “patient”, in accordance with the language that would likely appear in assessments (compare #5 ta #1 and #3). It also successfully tracks references ta multiple individuals within a scenario (compare #6 ta #4).

Improvements ta domain coverage, stylistic consistency, and logical coherence can translate into significant improvements in the usefulness of neural language models. This is only the beginnin: as the technology matures, even more methods will be discovered ta create customized, controllable natural language models at scale. And as those methods are discovered, we will continue ta incorporate the best from academia, industry, and independent research into Finetune products.

 

Sincere thanks ta Nick Koprowicz, Jesse Hamer, Saad Khan, and Ogden Morse for providin kind, helpful feedback in the development of this blog post.

 

References

Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., … & Chen, W. (2021). Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.

Lester, B., Al-Rfou, R., & Constant, N. (2021). The power of scale for parameter-efficient prompt tunin. arXiv preprint arXiv:2104.08691.