• Question: how does chat GPT work?

    Asked by menu498bus on 26 Jan 2024.
    • Photo: Fraser Smith

      Fraser Smith answered on 26 Jan 2024:


      Great question. The AI system underlying chat GPT is based on learning patterns in human language – its trained on an enormous set of text data (books, webpages, blogs – whatever is on the internet more or less) – and its job is to predict the next word in sentences. After this, its then trained to be able to produce sensible answers to questions that humans provide it (this part is called reinforcement learning).
      Its important to realize that it gives a kind of average answer based on all its training material – so it can produce incorrect information quite easily (known as hallucinations) – although work is well underway to improve this aspect.

    • Photo: Carl Peter Robinson

      Carl Peter Robinson answered on 26 Jan 2024:


      The thing is, these language models have been around for a little while now (look up BERT, GPT2, and GPT3 for starters). They’ve slowly been evolving using different model architectures from things called recurrent neural network models (RNN) to the new kid on the block: transformer-based models.

      Regarding ChatGPT, the model architecture is proprietary, meaning OpenAI haven’t released it. So, we don’t know what that architecture actually looks like, we can only speculate based on our knowledge of these systems and evidence from using ChatGPT.

      Now, the biggest thing that makes ChatGPT so impressive is that it has been trained on a vast amount of data (whether that data was provided willingly or scraped from the web without permission from its authors is a debate for a different question). For a model to be capable of performing its task effectively, it requires training. Training a model on such a huge dataset requires a vast amount of hardware resources and this is where ChatGPT has been “lucky”. OpenAI were able to put together the required hardware set-up (incorporating lots of NVIDIA A100 GPUs, complimented by other computer hardware) for a feasible cost, thanks to funding by Microsoft. This enabled OpenAI to undertake the training and evaluation process to optimise the ChatGPT model successfully.

      The training process used an unsupervised learning method (The example in the first Royal Institution Christmas lecture for unsupervised learning is really good). This training process includes methods and techniques that map the words in that dataset to numeric values (called vectors). This mapping is done such that the vectors for words that have similar meanings or context are placed close to each other on the map. When I say “map” imagine a graph that you draw in your maths class, with an X and a Y axis. Only this map has lots and lots of axes that create what we call dimensions. It’s having all these dimensions that enables the system to map all of the data in the dataset effectively.

      Regarding the ChatGPT model itself, it uses the transformer architecture as part of its model, to predict sequences of words in response to the prompt (your question) it received as input. Because of the size of the model and the hardware it is run on, it is able to keep a long history of words, making prediction of the next word much easier. Additionally, it uses a mechanism called “self-attention” to provide weighting to the words. This applies a level of importance to each word in the text that is being put together to give as output, helping the prediction process.

      There is a lot more to it, but I feel my answer has become very long, as usual! Hope that helps a bit.

Comments