I gave ChatGPT a Professional Geologist exam. It passed*
But it's still a long way from actual geospatial reasoning
Less than a year away from its release, we have seen ChatGPT pass various exams: not just beating college students at year-end tests but even professional licensure exams1. Recently, OpenAI added the ability to pass images to ChatGPT, allowing us to see how it does on more visual tasks.
As a geologist working in the tech/AI space, my mind immediately lept to how ChatGPT would perform on a geologic exam. Some items, like image recognition of minerals, seem right in the wheelhouse of new AIs, while others, like the interpretation of a geologic map require higher-order spatial reasoning. To test out these capabilities, I gave ChatGPT a professional geologist exam. Surprisingly, it passed, though that comes with MANY caveats.
The Exam
Geologists around the country receive “Professional Geologist” (aka “PG”) certifications that allow them to conduct official, regulated work like analyzing the soil and bedrock around an environmental contaminant2. Each state board has its own set of requirements, but they mostly rely on the Association of State Boards of Geology exam for the technical portion of the licensure.
Luckily for us, the ASBOG provides sample questions in its online preparation handbook. This came to a total of 71—not as much as the real exam, but enough for us to get a sense of how ChatGPT will do.
The Results
I was blown away: overall, ChatGPT had a very strong score, getting 84.5% of questions correct. Unfortunately, the ASBOG does not publish the scores required to pass, using instead a “minimum competency” threshold and weighting different sections differently, none of which it makes public. However, but doing some googling and Reddit spelunking, I found scores in the 65-70% range quoted as passing, so my estimation is this is high enough.
HOWEVER — ChatGPT did not do nearly as well on the questions that had a visual component, as you can see from the chart below: 88% success rate on non-visual questions, 64% on visual questions.
If we dig a little deeper, we can see some serious problems. This question was not on the sample exam I ended up using, but it shows ChatGPT’s challenges. The diagram below shows the “law of cross-cutting relationships”: if one geologic feature is “cut” by another, the feature which is “cut” is older and the feature that does the “cutting” is younger. ChatGPT knows this from a textbook perspective, but whiffs on actually being able to identify which features cut each other:
In this case, Dike 1 clearly cuts Faults A & B, rather than being cut by them (it has no separation or offset caused by the movement of the faults). Dike 2 is clearly not continuous across Fault B, so Fault B cuts Dike 2 and is therefore younger. However, ChatGPT completely gets these wrong—literally a geology 101 concept.
Let’s pick another question. In this one, the exam provides a geologic map and asks you to interpret the correct stratigraphic column.
In this case, ChatGPT struggles to even just read the correct order of letters and reproduce them in the text. For instance, for answer “A”, ChatGPT leaves off the first “X”. For answer “B”, it swaps the order of “Z” and “X”, causing answers B & C to be equivalent. If ChatGPT can’t read one order of letters from an image and reproduce it in text, you have to be doubtful that it can handle geologic unconformities or faulting.
Wild speculation
Clearly, even if ChatGPT can “pass” a professional geologist exam, it’s still a loooong ways from competing with geologists in the workplace. Personally, I have found it useful for catching up on “textbook” concepts that I haven’t studied since undergrad, but that is not the same as augmenting complex spatial reasoning.
Some questions regarding building to a “generalized” geological intelligence:
Is the dataset there? Much of the corpus geologic data and interpretation is not on the internet, sitting instead on company- or organization-specific servers. Many countries have strict data residency requirements that make it literally illegal to put their geologic data on the internet, to say nothing of the commercial risks associated with publishing your national datasets. And then so much of the data is sitting in file cabinets or salt storage caverns, probably to never see the light of day…
Is tuning the problem? For instance, ChatGPT 3.5 Turbo without the RLHF (human feedback) performs MUCH better at chess than Chat GPT 4.0 with RLHF. Worth additional investigation from someone with more time than me!
Can we accomplish generalized spatial reasoning… at all? Generalized spatial reasoning is a very different problem than text or image prediction. ChatGPT does well on some logical questions but totally whiffs on others. Geology has some of the most complex spatial problems in any field, so likely we see progress on simpler tasks from other domains first.
I’d love it if other geologists working in this area have any experiences to share. I know there have been some really nice “AI” successes in subspecialties like petrophysical log analysis, seismic data processing, and grain identification, but I haven’t seen any “higher-order” spatial thinking successes yet. Prove me wrong!
See https://engineering.uiowa.edu/news-all/2023/04/ui-engineering-study-finds-chatgpt-could-pass-exam-needed-become-licensed-engineer , https://www.bestcolleges.com/news/what-exams-has-chatgpt-passed/
I never ended up pursuing my Professional Geologist license, because it’s not really valued in oil and gas, at least in Houston.
Great post Ted. Here's something I'd like to see ChatGPT do before I start worrying about my work in drill-in fluids. At the highest level, which point I give myself credit for attaining, formation damage assessments are done to determine the amount of damage done to rock core-mostly in open hole completions. Damage mechanisms can be very elusive and the care and expertise of the person doing the analysis is definitely called upon to figure out a way forward that qualifies a fluid to drill the interval with minimal-invasion or damage. I absolutely defy Chat to perform this task. Cheers!
Ha! Yeah that is always fun 😆