Blackstone vs. Artificial Intelligence
Late one afternoon about a month ago, one of our

Late one afternoon about a month ago, one of our analysts pulled up a report on a guy’s 2022 Tundra. On the oil slip, the customer mentioned that he was sharing his Blackstone report in one of his YouTube videos — which we generally really like since it gets our name out there. But then we watched the video. He had fed our report through AI…and AI got it wrong. The chatbot had incorrectly analyzed our test results.
We were able to reach out to the client and explain what was wrong, and they added the correction to their video. But what about the other cases out there, where someone uses AI to get a second opinion of their results? Well, as you can imagine, AI can be useful, but it’s not always right: It can overlook problems and cause anxiety over results that don’t warrant it.
AI’s answers — where do they come from?
AI chatbots like ChatGPT and Claude are “generative aritificial intelligence,” which (as we understand it) takes all the data it can get from the internet or wherever else to generate answers, responses, or even pictures and videos to whatever prompt you enter, no matter if the data is right or wrong, old or new.
That’s fine in some applications, but when it comes to oil analysis, you’re better off going straight to the source: our analysts, who have been looking at oil samples from engines, transmissions, gear boxes, etc. for years. We train our analysts using knowledge from engineers, people in the oil-blending business, and our clients — institutional knowledge that’s been growing since 1985.
What can AI get wrong?
For one, the wear metals. In the video we mentioned, AI told the client that copper, lead, and tin are bearing metals in their new Tundra’s engine, but that’s wrong: that engine has aluminum bearings. Of course, sometimes AI gets it right, too, so we decided to test ChatGPT ourselves, using results that could look concerning to the untrained eye.

Figure 1 shows the results from a 6.0L Power Stroke engine. Lead is the only wear metal in bold, and it indicates some extra bearing wear. We didn’t make a big deal of that in the comments, but we did suggest they watch for low oil pressure as a precaution. Potassium was pretty high in this sample, but we didn’t call coolant contamination since they noted Archoil AR9100 additive was used, which has a lot of potassium and boron in it; and the other coolant marker (sodium) was also low. In our comments, we called this a pretty good report overall, and lead was the only thing to check back on.
AI with our comments
Then we uploaded the report to ChatGPT, both with and without our comments. We found that AI did a pretty good job summarizing the results when the comments were included (see Figure 2).

We disagree with listing potassium under the “Mild concerns” heading, but other than that, it looks pretty good — though note that it referenced our comments in its summary.
AI without our comments
Without our comments, AI’s interpretation of the report was a completely different story: AI didn’t read the data correctly. It said iron was at 42 ppm and silicon was at 20 ppm (see Figure 3).


It also incorrectly read the potassium result when we asked about it (see Figure 4). After we corrected it, it did call the iron reading normal but it still interpreted the potassium incorrectly and called it a “serious red flag” (see Figure 5). And after we asked it if using Archoil additive affected anything, it said “it does NOT contain potassium as a meaningful component,” which is just wrong.

We’ve also seen cases where AI was unnecessarily alarmist, telling one BMW owner his bearings were “screaming” when only one metal was mildly out of line. It seems to be hard for AI to give a person good context or an answer to, “How worried do I need to be about this?” That understanding — having a clear idea of how much of a problem something might be — is something that only comes with time and experience, from evaluating hundreds of thousands of samples, talking with customers, and putting the results into context given the circumstances surrounding the sample, the engine’s history, and how it’s used.
The human element
Of course, humans aren’t perfect either. But we have been doing this for decades now. Even with new engines and new types of oil coming out all the time, we here at Blackstone generally have a pretty good idea of when something looks like a problem or not.
At Blackstone, there are humans behind everything we do. You will always get a live person when you call, and we’re happy to have you speak with the analyst who wrote your report. Our analysts are looking at your report, analyzing the data, and thinking about what the numbers mean in the context of what you’ve told us about the engine and its history. We will go over your results with you, answer any questions you may have, talk about the reasons for unusual readings, and make suggestions. AI might be fun to play around with, but if you want solid, factual information, it’s better to go straight to the source and rely on our comments or call/email us with questions. You can rely on Blackstone Labs: the humans behind the analysis.
Related articles
Better Mileage with Synthetic?
We experiment to see if brand makes a difference
Antifreeze: The Silent Killer
How exactly does antifreeze hurt engines?
The eBay Oils (Part 3)
Part 3 in our series on old oil additive packages
The eBay Oils (Part 2)
Part 2 in our series on oil additive packages from the old days










