Attention in neural networks and Transformers, and Bidirectional Encoder Representations from Transformers (BERT) and Generative Pre-trained Transformers (GPT)