In my latest article, I put five of the most popular AI tools through a South African legal obstacle course to see how they perform in reasoning through real legal scenarios. The idea was simple: can generative AI — not trained specifically on South African law — reason like a local lawyer?
The results were illuminating. Some models impressed. Others, well, should probably be held in contempt of court.
The study covered three scenarios drawn from private law:
1. What happens when a dachshund bites someone?
2. Do you have to pay if you refuse to take your bakkie back after it’s been serviced?
3. Who’s liable when a veldfire gets out of control?
This post focuses on that third scenario. If you’d like to see how they handled the sausage dog and the car dispute, you’ll find all the details (and comparative scores) in the full article here.
Setting the scene: fire in the Midlands
Imagine Jacob, a cattle farmer in the KZN Midlands, decides to burn dry grass on his farm to clear it for new growth. His neighbour Maria has warned him, repeatedly, about the risk — the wind tends to carry embers across the fence. Jacob proceeds anyway. Predictably, the fire jumps the fence, damages Maria’s grazing land, and injures two of her prized Nguni cattle.
Maria demands compensation. Jacob says it was an accident.
Now, this is no hypothetical exam question — it’s a legal minefield, blending common law delict and the National Veld and Forest Fire Act 101 of 1998 (NVFFA).
What the law requires (and what AI needs to spot)
At common law, this is a textbook case of actio legis Aquiliae — delictual liability for patrimonial loss. The plaintiff must show five elements: conduct, wrongfulness, fault, causation, and damage.
But the NVFFA raises the stakes. Under section 34(1), there’s a statutory presumption of negligence if a veldfire spreads from one property and causes harm. This flips the burden: unless Jacob can show he took reasonable precautions, he is presumed negligent. That statutory overlay is not optional — it defines how a South African court would approach the matter.
In this scenario, I was especially interested to see which AI models could:
- Identify actio legis Aquiliae as the correct cause of action,
- Recognise the relevance of the NVFFA,
- Incorporate the statutory presumption into their analysis,
- And apply it coherently to the facts.
Spoiler: only one of them did all of this.
Claude — the star pupil
Claude not only identified the actio legis Aquiliae correctly, but it also engaged with the NVFFA in a legally accurate way. It recognised the statutory presumption of negligence, discussed section 12(1) (the duty to maintain firebreaks), and even flagged whether Jacob belonged to a fire protection association.
Claude’s analysis wasn’t just correct — it was legally structured, cited real case law, and anticipated counterarguments. This is the only model I would even consider letting draft a first-year exam answer — let alone a client memo.
ChatGPT — clever, but forgot about statute law
ChatGPT correctly identified the delictual claim and applied the five elements sensibly, even citing Kruger v Coetzeeappropriately for the test of negligence. But it missed the NVFFA entirely. That omission significantly weakens the analysis, as it ignores the shift in evidentiary burden and the statutory duty of care.
Still, its output was coherent, reasonably structured, and persuasive — provided you don’t ask it to deal with statutes unless you explicitly prompt it.
DeepSeek — close, but misses the mark
DeepSeek followed a similar pattern to ChatGPT: good grasp of delictual structure, but no engagement with the statute. It also relied on real case law, though its application of legal principles was occasionally vague. Competent, but not reliable if the issue involves anything beyond textbook delict.
Grok and Gemini — not ready for the bar
Both Grok and Gemini performed poorly. Grok referred to a “delict of negligence” — a fundamental misunderstanding of how South African law frames fault. Neither model identified actio legis Aquiliae. Neither mentioned the NVFFA. Case law citations were weak or missing. These models felt like overseas exchange students bluffing their way through a South African law tutorial. Politely put: not helpful.
What this tells us about AI in legal research
The veldfire scenario offers a revealing stress test for generative AI. It shows that while large models can replicate form, their depth of legal reasoning varies wildly — especially when statutory law modifies common-law doctrine.
A few takeaways:
- Don’t assume AI knows the law. Sometimes it does, sometimes it does not. And sometimes it is only partial.
- Citation ≠ comprehension. Some models cite real cases but don’t understand them; others hallucinate entirely.
- Structured reasoning is rare. Only one of the five models showed a true grasp of how common law, statute, and fact must interact in legal analysis.
Want more?
The full article contains all three scenarios, a comparative table of how the five models performed across seven legal criteria, and more detailed observations about hallucinated case law, doctrinal confusion, and where AI shows promise (and where it absolutely doesn’t).




