Oct 07 2023

Generative AI in 2023. An Executive Summary

In 2023, Generative AI took center stage with advancements like ChatGPT 3.5, Google's Bard, and tools like Adobe Firefly and Midjourney, revolutionizing various industries. From text to image and video, Generative AI's ability to understand prompts and generate content marked a significant shift, promising practical, scalable, and cost-efficient solutions for businesses, as demonstrated by Pierian's expertise in guiding organizations to leverage this transformative technology effectively.

The idea of Artificial Intelligence has been a staple of science and science fiction for decades, with Alan Turing inventing his “imitation game” in 1950, which became known as the Turing Test – the standard test of a machine's ability to exhibit intelligent behaviour equivalent to, or indistinguishable from, that of a human.

However, the promise of a computer beating the Turing test and showing true artificial intelligence, has always been just over the cusp, seemingly just a little further away than enthusiastic computer scientists have claimed.

Despite Deep Blue beating Gary Kasparov in 1997 (after losing to him in 1996), despite the Ponomariov vs Fritz game on 21 November 2005 being the last time a human beat a computer in chess competition and despite the 2016 victory for AlphaGo, the DeepMind “Go” playing system defeating world champion Lee Sedol 4-1, artificial intelligence has not caught the attention of the business, science and creative worlds or of the general public.

Until now.

Since the launch of ChatGPT 3.5 in November 2022, the buzz is unavoidable.  Bing has integrated ChatGPT into its Search Engine, Google has released Bard, its own Generative AI engine and is preparing a version of their Search Engine with generative AI built in, tools like Adobe Firefly and Midjourney can create images that are ready to publish, and the media is stuffed full of dire warnings that all human jobs will soon be pointless.

So, to put it bluntly, what is it?  What can it do? What advantage can it give me over the way I worked before?

Why, this time, should I care?

Defining the terms

Before we get into the details, let’s have a little refresher of the terms, what they actually mean and why someone decided they needed to split them up into incremental elements.

What is AI?

AI is a branch of computer science using datasets to enable problem solving. It deals with the creation of “intelligent agents” – Systems that can do human tasks, which require intelligence and discernment.

What is Machine learning?

Machine learning is a program or system that trains a model from data to be able to make useful predictions from new or unseen data. It enables computers to learn and adapt without explicit instructions by using statistical modelling and algorithms to analyse data and to draw conclusions from the patterns.

Machine Learning models come in 2 types: Unsupervised models and Supervised models.

Unsupervised models have no labels, are about discovery and looks at the raw data to see whether it falls naturally into groups so that it can compare the predicted output with the expected output

Supervised machine learning models have labels and are tagged so that the names, types or numbers can be categorised precisely.  Supervised machine learning models learn from past examples to predict future values.

What is Deep Learning?

Deep Learning is inspired by the brain and uses Artificial Neural Nets to process more complicated patterns than traditional Machine Learning.  It uses many interconnected nodes and layers and is trained on a small amount of labelled data and a lot of unlabelled data.

Deep Learning comes in 2 model types, discriminative and generative.

Discriminative Deep Learning is used to Classify or Predict results and is trained on labelled data so that it can learn the relationship between features of data points and labels

Generative Deep Learning generates new data which is similar to the training data it was provided with.   It understands the distribution of data and how likely an example might be and then uses this information to predict the result and create it.

What are Large Language Models (LLMs)?

Large Language Models are a subset of Deep Learning which can be pre-trained and then fine-tuned.  They are trained to solve common language problems like text classification, question answering, document summarization and text generation and are trained on a relatively small dataset to solve specific problems in areas like retail, finance and entertainment.

What is Generative AI?

Generative AI is a type of AI that is trained by existing content, creates a statistical model and creates new content from what it has learned.  When you give Generative AI a “prompt”, it uses its statistical model to predict what the answer is and creates the content. Simply put, Generative AI learns the structure of the data and can generate new samples which are similar.

Large Language Models are a subset of Deep Learning but also intersects with Generative AI as well

Large Language Models are large!  While this seems obvious it is important to understand the scale.  LLMs require a huge amount of data for training, usually ranging in the Terabyte to Petabyte range.

Here are some examples to give you an idea of what that means in real terms:

·         10 Terabytes – data the Hubble Space Telescope produces annually

·         24 Terabytes – average daily video upload to YouTube in 2016

·         2 Petabytes – Contents of all US Academic research libraries

·         20 Petabytes – data in the US Library of Congress

·         200 Petabytes – estimated total of all printed material ever produced in human history

This huge amount of data allows the model to create lots of parameters, which are the things that the model learns from the dataset, and the number of parameters is even larger than the amount of data used for training, in the hundreds of billions to trillions range.

Large Language Models are pre-trained and fine-tuned general purpose models, sufficient to solve common problems

They have key advantages; a single model can be used for many different tasks as the fine tuning process requires minimal field data.  Performance gains can be quickly achieved through increased data and additional created parameters.

In traditional Machine Learning Development, a level of ML expertise is needed, you need training examples and time to train a model, computing time and hardware is required and the machine learning is focused on minimising loss or errors.  To put it another way, it tries to ensure that it isn’t wrong.

Contrast that with LLM development (using pre-trained APIs), where no ML expertise is needed, you need no training examples or to train a model and its focus Is on matching the brief given to It in the prompt.  i.e. it tries to guess at what the right answer is.

LLM Use Cases

There are several basic types of Large Language Models (LLMs) and each of them have specific uses that they are most relevant for.

A model that can do “everything” has, in reality, practical limitations and task-specific tuning can make LLMs much more reliable

Tuning is the process of adapting a model to a new area of knowledge or a custom use case by training it on new data and Fine Tuning is where you bring your own data and retraining everything in the LLM – This is a large training exercise, and the resulting model would have to be separately hosted.

Due to the limitations of tuning and the need for both prepared data and to host the model independently, there is a third method of tuning called Parameter Efficient Tuning Methods (PETM), which enables you to tune an LLM on custom data without duplicating the model and needing to host it yourself.  Instead, add-on layers are tuned which can be swapped in and out.

The LLM types are:

Generic (Raw) Language Models

These are the most basic type of large language models and simply predict the next word based on the language in the training data.  They are increasingly used in areas such as Autocomplete or in Keyword Classification where the task is important but needs to be scaled to enable efficiencies

Instruction Tuned

Instruction Tuned Large Language Models are trained to predict a response to the instructions in the input.  It aims to follow natural language such as prompts, positive and negative examples, and so that it can perform better when learning a variety of tasks and so that it can more accurately generalise on tasks that it has not seen before.

Instruction tuned LLMs look to deliver what you asked it to do.

Dialog Tuned

Dialog tuned large language models are, inevitably, trained to engage in a dialog by predicting the next response.  As opposed to instruction tuned LLMs, Dialog tuned LLMs aren’t necessarily looking to immediately answer your questions  Sometimes they might decide that they need to ask you a question of their own but to do that they need to be able to understand the context of the phrasing and everything that has been said in the conversation.

This is an example given by Google of the dialog that it’s LaMDA model can engage in as it looks to give the user the answer they want:

Source: https://ai.googleblog.com/2022/01/lamda-towards-safe-grounded-and-high.html

What is Prompt Design?

Prompt Design is the process of creating prompts that give you the result that you are looking for, i.e.  the tailoring of the prompts for a specific task. Prompts are essentially the "recipes" that tell the AI model what to do and how to do it.

Prompts can be as simple as a few words or as complex as a paragraph and the effectiveness of a prompt depends on a number of factors, including the clarity of the instructions, the relevance of the context, and the length of the prompt.

Creating Text

Basic Prompt:

“Write a song” – Unclear, lacks context.  What is the song about? What style? What mood?

Simple Prompt

“Write a romantic song” – More defined but still lacks a lot of information that is needed

Complicated Prompt

“Write a 5 verse, upbeat romantic rock song about a lost love” – Much clearer instructions, output will be the correct genre, length, theme and include the specific detail that you’d like included

Creating Images

Basic Prompt:

“draw a cat” – Unclear, lacks context.  What kind of cat? What is it doing? What mood?

Simple Prompt

“draw a large black cat” – More defined but still lacks a lot of information that is needed

Complicated Prompt

“a macro photograph in the style of pets posing as humans, i'm looking for an intense large black cat dressed in a tweed jacket teaching a college class with pointing stick in paw walking them through a graph on a blackboard --ar 16:9” – Much clearer instructions, output will have the correct type of cat, be in the correct style, have the right aspect ratio and include the correct elements.  While unusual results can still occur, it is much less likely and the examples would all be variations on what you need.

What is Prompt Engineering?

The practice of developing and optimising prompts to efficiently use LLMs for a variety of applications. Prompt Engineering is the tailoring of prompts to improve performance.

Rather than being about designing and developing prompts, prompt engineering uses those inputs by converting tasks to a prompt based dataset and training the language model with prompt based learning.

One of the methods of improving an LLM’s ability to understand prompts and return a more accurate result is Chain-of-thought prompting (CoT).  In this, rather than just being asked to return the result, the LLM is also asked to break down the task into component parts and use the intermediate reasoning steps before giving the final answer.

LLMs still face difficulties with tasks requiring logical thinking or with multiple steps, such as arithmetic or commonsense reasoning and CoT prompting can help and also can allow you to see exactly where the LLM made an error.

So, to use a GCSE Maths example, “Sally had 12 apples.  She shared ¾ of them amongst her friends and bought 3 more.  How many apples does Sally now have?”

With an appropriate CoT prompt, the LLM might answer: “Sally had 12 apples originally, she shared ¾ of them amongst her friends, so she gave them 0.75 x 12 = 9 and had 12-9=3 remaining. She then bought 3 apples so she has 3+3=6 apples.”

Ensuring the LLM goes through the steps reduces the error rate and can even help a generic LLM to perform comparably with task specific, fine-tuned models.

How does Generative AI work?

Generative AI generates text, images, video, audio and more all directly based on content and leverages generation models, requiring no domain knowledge.

It is a subset of Deep learning that can process both labelled and unlabelled data and can apply a supervised, unsupervised or semi-supervised model as required.  Generative AI is typically trained on a small amount of labelled data and a huge quantity of unlabelled data.

Like Deep Learning, it is inspired by the human brain and can process much more complicated patterns than traditional machine learning.

Generative AI uses existing content as training, learns the structure of the data, and creates a statistical model so that when a prompt is given to GenAI, it uses this statistical model to predict what the response could be, and it generates the content

Different Gen AI Model Types

Text to Text

Text to Text generative AI is currently the most common type of GenAI model, leveraging NLP, one of the most researched areas of AI.  Text to Text Gen AI is used in text generation, classification, summarisation, translation, search, research, extraction, clustering and content editing or rewriting and, in the case of Google Bard, is used alongside a speech synthesiser to create a chatbot that can make appointments for you without the restaurant, doctor, hairdresser etc being aware that they weren’t arranging things with a human.

Text to Image

Text to image gen AI is another area which has really caught the public’s attention, with Adobe Firefly, Midjourney and Dall-E enabling anyone to create imagery that, until recently, required a senior graphic artist and an Adobe Creative Cloud subscription.  Image generation and image editing are much easier than they have ever been, with the caveat that the prompt tuning should be undertaken with even more care to ensure that the result matches with what you had in your mind’s eye.

However, like any other generative AI tools, they can make mistakes, misunderstand or “hallucinate”, which is when the Ai may produce output that isn’t real, which is where the value of experienced human editors comes to the fore.

Text to Video/3D

While the Text to image generation is impressive, text to video and text to 3D has the potential to revolutionise the way we create and consume video content.

It works by first understanding the text description then employing tools like image synthesis, animation and video editing to generate a video that matches its understanding of the description.

There are a number of opportunities that this technology gives us; it can create videos that would be difficult or impossible to make in the real world, like a video of an historical event or a fictional character.  Want to see “Space Jam” but with all Disney characters?  Only the copyright lawyers are stopping you.

Text to video generative AI also gives you the opportunity to create more personalised videos for your users and customers.  The generative AI could use inputs from your CDP to present the most compelling version of your content for each of your customers or for the customer groupings that you care most about.

The ability to rapidly iterate without incurring enormous additional production costs allows users to create more engaging, informative and visually appealing videos but there are some areas of concern.

Although the technology can produce some brilliant content, it is also in an earlier stage of development than other GenAI tools, so the quality of the content can be somewhat variable.  Similarly, the less mature technology, coupled with the significant computing power required to generate the video or 3D model or environment, meant that the technology can be expensive to develop and deploy.

Some of the more interesting text to video generative AI tools are Synthesia, which creates videos with lifelike AI avatars and voiceovers, ModelScope, which creates videos from prompts and Make-a-Video which is a Meta research project which employs the technology to create short, high-quality video clips.

Text to Task

Like with the other generative AI systems described, Text-to-Task (T2T) first uses its natural language capabilities to understand the nature of the request but rather than creating text, speech, video or images, T2T generates a task that matches the description requested.

One of the most obvious roles for T2T generative AI is in the automation of tasks which would be time consuming or difficult to do manually such as creating a to-do list of all upcoming tasks or bulk renaming of image files.

Using T2T in this way also allows multiple tasks to be layered on one another, so that the AI can apply a level of personalisation to the tasks described above.  The to-do list can use your interests or business concerns to prioritise tasks appropriately, or can rename the files.

This latter use case has been recently demonstrated by Aaron Ng on Twitter, with his GPTFile being able to manipulate files using Natural language rather than the traditional interfaces of mouse, touchscreen and keyboard.

https://twitter.com/i/status/1663274587860393984

<blockquote class="twitter-tweet"><p lang="en" dir="ltr">here’s gptfile, a way to organize files with natural language using gpt-4.<br><br>new operating system paradigms are on the horizon<br><br>repo below <a href="https://t.co/7LIcIR0SgC">pic.twitter.com/7LIcIR0SgC</a></p>&mdash; Aaron Ng (@localghost) <a href="https://twitter.com/localghost/status/1663274587860393984?ref_src=twsrc%5Etfw">May 29, 2023</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>

The practical uses of Software Agents such as this are virtually endless whether it be a next-gen operating system, Virtual Assistants, or Automation.  

Time consuming “housekeeping” actions, such as the optimisation of url and file names for SEO can be actioned in moments with systems such as these, improving site indexing and helping improve the website rankings

Applications for Generative AI

We’ve only started to scratch the surface of what applications are possible with Generative AI and new ideas and concepts are being developed all the time.

Below are some of the areas where Generative AI is already being used to advance human creativity and to democratise tools that were the preserve of those with the computing and financial resources.

Text to Text

  • Marketing content
  • Sales Emails
  • Support Chat/email
  • General writing
  • Note taking

Text to Code

  • Code Generation
  • Code Documentation
  • Text to SQL
  • Web App Builders

Text to Image

  • Image Generation
  • Consumer/Social
  • Media/Ads
  • Design

Text to Speech

  • Voice Synthesis
  • Dubbing
  • Audio Localisation

Text to Video

  • Video Generation
  • Video Editing

Text to 3D

  • 3D Models
  • 3D Scenes

Other areas where GenAI is making advances

  • Gaming
  • RPA
  • Music
  • Audio
  • Biology
  • Chemistry

If you are a brand looking to apply generative AI in a practical, scalable, cost efficient manner, Pierian’s team of experts can guide you through the process, ensuring you add value to your existing solutions. We understand that every organisation is unique and we tailor our strategies accordingly to bring commercial value to your business.

Contact Pierian today to see how we can help with adding Generative AI to your processes, services and operations.