Generative AI - the intern you can't trust

Reading Time: 2 minutes

In my experience, interns are typically energetic, eager to please and capable of producing great results – a bit like generative AI. However, with generative AI, sometimes its enthusiasm gets the better of itself and the output is not quite right. Here are some examples:

Prompt: Why is Microsoft Word a great software development tool?
Response: …Track Changes and Comments: These features can be helpful when multiple team members are reviewing and editing the same document…

Critique: Questionable accuracy.

Prompt: What are the largest 5 planets in the solar system. Return the results as a JSON object with attributes “mass”, “diameter”, “meanDistanceFromSun”.
Response: Sure, here is the information for the largest five planets in the solar system:
Jupiter:
{
“mass”: “1.898 x 10^27 kg”,
“diameter”: “142,984 km”,
“meanDistanceFromSun”: “778.5 million km”
}
Saturn:
{
“mass”: “5.683 x 10^26 kg”,
“diameter”: “120,536 km”,
“meanDistanceFromSun”: “1.429 billion km”
}

Critique: The response wasn't JSON formatted – it just had snippets of JSON.

So what can you do?

Provide context: When generative AI is requested directly by a user, the user has context that helps them interpret the response. For similar reasons, apps should provide some level of transparency about the use of AI to help the user interpret the results. See the transparency section of the responsible AI article for further details.

Reduce the temperature: When calling a generative AI API, look for parameters that control how creative the AI should be. Experiment with the effects of changing the parameters one a a time so that you gain a better appreciation of the effect of each parameter. For the ChatGPT API, refer to the temperature and top_p parameters. This Open AI plugin development post shares the experience of others experimenting with parameters. Adding a feedback mechanism which captures the some level of success of the call against the parameters will allow you to tune the parameters over time. The feedback could be user oriented such as a prompt in the form of "did this help?". Better still, however, is where feedback can be programmatically provides as would be the case in the second example where the response should be valid JSON.

Prompt design: Crafting better prompts will also lead to improved accuracy and consistency. Avoid ambiguity in your prompts and ensure it is specific about what outputs should and shouldn’t be provided. With regard to the planets example above, inserting "single" into the prompt such that it becomes "Return the results as a single JSON object " improved the likelihood of ChatGPT returning just a JSON formatted response, rather than JSON snippets mixed into a textual commentary.

If you need help getting started with the creation of generative AI apps, you may like to read The basics of creating a Forge AI app.