Blackstone vs. Artificial Intelligence

Late one afternoon about a month ago, one of our

Report showing lead at 8 ppm, which we fed through AI.

Late one afternoon about a month ago, one of our analysts pulled up a report on a guy’s 2022 Tundra. On the oil slip, the customer mentioned that he was sharing his Blackstone report in one of his YouTube videos — which we generally really like since it gets our name out there. But then we watched the video. He had fed our report through AI…and AI got it wrong. The chatbot had incorrectly analyzed our test results.

We were able to reach out to the client and explain what was wrong, and they added the correction to their video. But what about the other cases out there, where someone uses AI to get a second opinion of their results? Well, as you can imagine, AI can be useful, but it’s not always right: It can overlook problems and cause anxiety over results that don’t warrant it.

AI’s answers — where do they come from?
AI chatbots like ChatGPT and Claude are “generative aritificial intelligence,” which (as we understand it) takes all the data it can get from the internet or wherever else to generate answers, responses, or even pictures and videos to whatever prompt you enter, no matter if the data is right or wrong, old or new.

That’s fine in some applications, but when it comes to oil analysis, you’re better off going straight to the source: our analysts, who have been looking at oil samples from engines, transmissions, gear boxes, etc. for years. We train our analysts using knowledge from engineers, people in the oil-blending business, and our clients — institutional knowledge that’s been growing since 1985.

What can AI get wrong?
For one, the wear metals. In the video we mentioned, AI told the client that copper, lead, and tin are bearing metals in their new Tundra’s engine, but that’s wrong: that engine has aluminum bearings. Of course, sometimes AI gets it right, too, so we decided to test ChatGPT ourselves, using results that could look concerning to the untrained eye.

Report showing lead at 8 ppm, which we fed through AI.
Figure 1 shows the results from a 6.0L Power Stroke engine. Lead is the only wear metal in bold, and it indicates some extra bearing wear. We didn’t make a big deal of that in the comments, but we did suggest they watch for low oil pressure as a precaution. Potassium was pretty high in this sample, but we didn’t call coolant contamination since they noted Archoil AR9100 additive was used, which has a lot of potassium and boron in it; and the other coolant marker (sodium) was also low. In our comments, we called this a pretty good report overall, and lead was the only thing to check back on.

AI with our comments
Then we uploaded the report to ChatGPT, both with and without our comments. We found that AI did a pretty good job summarizing the results when the comments were included (see Figure 2).

We disagree with listing potassium under the “Mild concerns” heading, but other than that, it looks pretty good — though note that it referenced our comments in its summary.

AI without our comments
Without our comments, AI’s interpretation of the report was a completely different story: AI didn’t read the data correctly. It said iron was at 42 ppm and silicon was at 20 ppm (see Figure 3).
Excerpt from AI's comments about our report.
Excerpt from AI's interpretation of our report, where it's getting the facts wrong.
It also incorrectly read the potassium result when we asked about it (see Figure 4). After we corrected it, it did call the iron reading normal but it still interpreted the potassium incorrectly and called it a “serious red flag” (see Figure 5). And after we asked it if using Archoil additive affected anything, it said “it does NOT contain potassium as a meaningful component,” which is just wrong.

More wrong stuff from the robot!

We’ve also seen cases where AI was unnecessarily alarmist, telling one BMW owner his bearings were “screaming” when only one metal was mildly out of line. It seems to be hard for AI to give a person good context or an answer to, “How worried do I need to be about this?” That understanding — having a clear idea of how much of a problem something might be — is something that only comes with time and experience, from evaluating hundreds of thousands of samples, talking with customers, and putting the results into context given the circumstances surrounding the sample, the engine’s history, and how it’s used.

The human element
Of course, humans aren’t perfect either. But we have been doing this for decades now. Even with new engines and new types of oil coming out all the time, we here at Blackstone generally have a pretty good idea of when something looks like a problem or not.

At Blackstone, there are humans behind everything we do. You will always get a live person when you call, and we’re happy to have you speak with the analyst who wrote your report. Our analysts are looking at your report, analyzing the data, and thinking about what the numbers mean in the context of what you’ve told us about the engine and its history. We will go over your results with you, answer any questions you may have, talk about the reasons for unusual readings, and make suggestions. AI might be fun to play around with, but if you want solid, factual information, it’s better to go straight to the source and rely on our comments or call/email us with questions. You can rely on Blackstone Labs: the humans behind the analysis.

 

About the Author

Kristin deviated from the family flock by attending Indiana University, earning an English degree. She worked as an editor and writer in Colorado and Michigan before the siren call of Blackstone brought her back to Indiana. Kristin started at Blackstone in 2002 and has since learned to love the intoxicating world of oil analysis. When she’s not working on the website, creating newsletters, doing HR stuff, or writing reports, Kristin enjoys running, swimming, gardening, and working on visiting all 50 states with her husband and kids.

Related articles