SORT 2025

Introduction

split out backpropagation to finding the gradient; and using it in eg gradient descent (page on latter) page on directly solving neural networks instead of training page/h3 on adapting when non-differentiable? just refer to optimisation without diff, eg prob using simulated annealing; or non-prob eg coordinate descent split out alts inc extreme learning machine

epoch: entire dataset passed forwards and then backwards

neural network temperature; perplexity

human fine-tuning vs fine-tuning without human (eg ask to eval)

h3 regular: + gates and skipping * highway network * residual network (resnet) * both these can be used to address vanishing gradients?

Universal approximation theorem + neural network can approximate any function

One-shot learning few-shot learning is zero-shot learning a thing?

Quantisation

thing on quantisation and GPTQ in particular

LORA

Low Rank Adaption (LoRA) alternative to checkpoint fine turning smaller outputs faster can combine LoRAs at runtime