With prompts, one word can make all the difference. If you don’t articulate your incantation just right, the model may misunderstand and return something unexpected. (Temperature helps, but still…)
Conceptual conflicts are even stranger: a model may already have its own idea of how to do something, and either munge or else completely override your results.
For example, if a model already has a way to handle some concept (e.g., math) and you provide it with a custom tool that has an overlapping name or description (e.g., a custom “add” function), the model may take over tasks you intended your tool to handle. It may pre-process the data, passing incorrect arguments to your functions, or else override your answers, or even ignore your tools altogether.
How do you know when your model is actually doing what you asked it to do?
How do you know when you have a poorly articulated prompt?
How do you find the perfect articulation?
(Should you?)