What Is AI Snake Oil?

Graeme Woods

Global Business Analyst

An upcoming book called AI Snake Oil: What Artificial Intelligence Can Do, What It Can't, and How to Tell the Difference by two Princeton computer scientists, Arvind Narayanan and Sayash Kapoor, outlines some of the issues with AI hype.

The term "snake oil" is used to describe any product, idea, or promise that is promoted as a miracle solution but is fraudulent or ineffective. AI can be described as snake oil when it is used to sell a product by pretending to work when it doesn’t. It replaces the judgement of a person with that of a model that is not well designed or tested, that may have biases from poor choice of training data, or that doesn’t work accurately.

This book is already gaining interest, but the risk is that many people will want a simple answer, won’t read the book and will just take away the idea “AI = snake oil”. This is not what the authors present. Mature, well-designed AI can be very useful and is often embedded in the products we use every day, such as grammar checkers. Computer vision AI is widely used in manufacturing and security and is another example of mature AI technology.

The authors present a nuanced view and don’t discredit AI but discuss how it needs to be used carefully and with proper oversight. They explain that AI is an umbrella term covering several technologies that are at different stages of maturity and can be used either appropriately or inappropriately.

One example highlighted in the book as snake oil AI is the predictive models used to predict recidivism amongst prisoners and make parole decisions. Firstly, these models are not properly tested and are often biased and not accurate enough to be useful. Secondly, the model replaces human judgement that should consider broad context and guidelines. The model is used to make decisions that potentially have a high cost to individuals and the community if they are incorrect.

Understanding AI

To understand how AI should and shouldn’t be used, it is important to understand how AI works.

Some AI is stochastic, so it can often produce different results for the same input. This is due to random initialization or probabilistic methods used in training or the output. This can be useful for creative tasks.

For AI that outputs a decision, most people want a deterministic rather than a stochastic result. This means that for a given input, they want to always have the same output.

In addition, AI learns from data rather than from predetermined rules, so it may not be 100% accurate. There are some standard machine learning metrics such as accuracy, precision, recall, F1 score and AUC that can be used to give an indication on how the solution will perform in real world usage.

Before using the results of an AI, the accuracy should be measured. If the accuracy is less than that of a person and the result is important (it affects someone’s life or livelihood), a person needs to make the final, informed decision, considering the context, guidelines, regulations, best practices and all other facts.

Even highly accurate AI models should be regularly reviewed by a human to determine if the model is showing drift. Human review is also valuable to check edge cases and ensure that AI is being used ethically.

AI maturity

Different types of AI have different maturity profiles. Mature AI is technology that is well understood and where it is already useful and providing business value.

For the type of AI used to train image recognition, the technology is mature and accurate. These are solutions that utilize artificial intelligence for video surveillance and physical threat detection, including weapon detection and perimeter intrusion detection. A vendor will painstakingly collate thousands of images or videos and test their system against these. A well-trained system will achieve above-human levels of accuracy. Normally, the results are repeatable. For this technology, the vendor who has tested their model can confidently sell their solution.

Even with this mature technology, it is possible to design a poor system. A vendor developing face recognition for access control could use a training set that is biased towards white males, leading to a good fit for them but a poor fit for women of color. This could result in women of color being denied access to secure areas at their work, based on a poorly designed and tested solution.

Where there is an application such as a self-driving car, extra care needs to be taken to make sure that the training set is exhaustive and that appropriate algorithms are used. This type of application requires a very high standard of reliability and accuracy because autonomous driving is far more complex than simply recognizing an image. Currently, self-driving vehicle technology is not mature or reliable in real world conditions.

The quality of predictive AI depends on the type of problem, the model chosen, the features used, and the approach used for training the model. Some problems may be inherently unpredictable, such as the stock market, or it may not be possible to extract or encode relevant features effectively. For instance, a model predicting the fit of a candidate for a position that uses video as an input may find it difficult to determine how to use the video and sound as an input that has predictive value.

In these cases, the vendor has the responsibility to extensively test the model, disclose any potential shortcomings, and implement ongoing monitoring to ensure continued performance. This is particularly crucial with predictive models that affect insurance claims, justice or legal matters, or impact an individual's health or well-being.

Poorly tested or inadequately disclosed AI systems in these domains could lead to unfair denials of insurance claims, biased legal outcomes, or medical misdiagnoses. As such, adherence to machine learning best practices, industry standards and regulatory guidelines is essential to ensure responsible AI development and deployment and protect the community.

For generative AI, which includes Large Language Models (LLMs), care is needed as the technology is still maturing.

These models can summarize text and even help to write code, but the downside is that they can “hallucinate” or make up false responses. Another issue is that it can be very difficult and costly to train these models, and so public pre-trained models are used. LLMs are stochastic because the next response is selected from a probability distribution, so if you ask the same question twice, you will get different answers. This is not ideal in most applications.

Generative models are being used to embed AI capabilities in applications. Sometimes this is misused to make a snake oil product that can look convincing on the surface but can hide some problems underneath.

Treading carefully is important with new technology. Google rushed AI into production without proper checking, and the Gemini image generator made inappropriate and offensive images such as people of color as Nazis and Native American or female popes. Conversely, Apple’s considered approach, as announced at WWDC, seems to provide a usable solution that will meet customer needs.

The right and wrong way to solve an AI problem

For example, the business problem might be to recognize a situation where someone is openly carrying a knife. This knife detection system needs to be very accurate.

The right approach is to select a proven image recognition algorithm, compile training data showing people carrying knives and people not carrying knives (with different lighting conditions and camera angles), train a model, then test it on validation data and measure the accuracy.

The wrong approach is to use an unproven public generative model as a "zero shot" short cut solution to recognize frames and hope that it works, testing only a few images. A "zero shot" solution attempts to perform a task without any specific training for that task, relying instead on general knowledge from its training data.

Generative models can become an untrusted "black box" in these situations:

● Where the underlying training data is biased (for example, it includes data that is racist or sexist)

● Where the model input is not "grounded" in existing, trusted data

● Where the public model underpinning the solution shifts over time without the change being detected

● Where there is no easy way to measure the quality and accuracy of the results

● Where there is no large-scale, formal performance testing of the quality and accuracy of the output

The above example of using an existing public model to analyze a frame is an example of the misuse of a generative model. There is no proper testing; any testing would be on a few cases rather than using thousands of examples. The vendor could not be confident about the performance of the model. If deployed, the purchaser may not be aware of the deficiencies until it fails to work at a critical moment.

This approach can be used for a non-critical application where there is proper human oversight, but it would be inappropriate for medical, legal or security use cases, where accuracy is very important because of the high human cost of errors. For instance, in healthcare, using an untested AI model to diagnose serious illnesses could lead to missed diagnoses or unnecessary treatments, potentially harming patients.

Where predictive AI is being used in justice and HR applications (for example, to review parole applications or resumes), and the decision affects a person's life, the solution must again be used with human review. These applications raise significant ethical concerns, including potential perpetuation of existing biases and the risk of reducing complex human situations to oversimplified data points.

In these cases, models need to be developed with curated data and carefully trained and tested. Where accuracy is not at the human level, a person must be kept in the loop to validate the input before it is used. It is also crucial to implement ongoing monitoring and regular updates of deployed AI models to ensure they maintain their accuracy and relevance over time, adapting to any shifts in the underlying data or environment.

By following these guidelines, organizations can harness the power of AI while minimizing risks and ensuring responsible, ethical implementation in critical decision-making processes.

How does Scylla use AI?

Scylla develops video analytics AI to help protect human life. This is a responsibility that Scylla takes very seriously. Scylla does not use public generative models as a short cut for AI development but has carefully curated data, developed proprietary algorithms and extensively refined and tested our own models. This means that the entire end-to-end application is managed and controlled by Scylla.

All the models use appropriate and mature technology. Solutions are properly developed and tested using machine learning best practices. Scylla data sets are unbiased and include different ethnicities, and both males and females. Performance results are measured and disclosed to purchasers to ensure transparency.

This painstaking work leads to a good result. Scylla solutions offer market-leading accuracy and outstanding real-world performance across varied environments.

Conclusion

Snake Oil AI is a product when the vendor has used AI for marketing reasons without taking responsibility for the performance of their product.

Snake oil AI is fundamentally dishonest because it doesn’t keep its promises. The lack of quality and accuracy harms those it purports to help. The vendor has taken short cuts and is indifferent or cynical about the outcomes from the product, which may result in someone not being granted parole, a person not getting a job, or in the case of security video analytics, someone being hurt and the security officer never being aware that they need help.

Even if a product is not AI snake oil, it could still be unusable. Deploying functional and accurate AI can be challenging, even when the vendor is well intentioned. There may be issues due to using an inappropriate approach, not having enough data or biased data, or simply model drift. The model may try to predict a chaotic system or use the wrong features. Finally, well-designed and crafted AI does work as advertised. It is properly tested and is a known quantity. It uses technology appropriately and professionally.

Accuracy and performance are just two aspects of AI quality. Other aspects include transparency, fairness, protection of privacy, appropriate use of data (such as data scraped from the Internet), fairness and equity, and oversight of solutions to ensure that they continue to operate correctly.