Hi John, thanks for the blog post. A few things I'd like to comment on:
1) Llama 2 comes in various sizes - apart from the 70B model that you mention there are also the 7B and 13B variants, which don't require as much GPU memory.
2) Thanks to Cloud providers it is not prohibitively expensive to hos these models. You mention Google Colab, but there are alternatives, e.g. Amazon SageMaker: https://aws.amazon.com/blogs/machine-learning/llama-2-foundation-models-from-meta-are-now-available-in-amazon-sagemaker-jumpstart/
3) Re finetuning - there are alternatives to updating the entire model weights. Parameter efficient finetuning techniques like QLoRA enable you to train Llama 2 with a much smaller GPU memory footprint: https://www.philschmid.de/sagemaker-llama2-qlora