Our society is populated by artificial entities known as the programmable race. While their presence is often apparent, comprehension sometimes takes a back seat. They serve us in customer service roles, engage with us in video games, and inundate our personalised social media feeds. Today, they have even infiltrated our financial lives, using artificial intelligence (AI) tools like ChatGPT to trade stocks and make investment decisions.
However, the consensus and opacity surrounding these AI tools mean that their output is only as reliable as the variables that govern them. In this vast and intricate landscape, the transparency and quality of data and algorithms guiding these technologies hold the utmost importance. Inadequate attention to critical factors such as trust and quality can result in inherent bias, misinformation, and susceptibility to manipulation by malicious actors. Therefore, we must improve our ability to understand the inner workings of these tools and the reasons behind their actions.
Singapore will pilot a programme for regulatory oversight of AI to address concerns that it discriminates against specific populations. Transparency and Accountability emerge as central tenets as we probe the data fuelling the algorithmic revolution. Critics misinterpret this as a veiled call to reveal intellectual property. But a nuanced examination unravels a more complex narrative.
Large language models are AI systems trained on comprehensive text datasets. Their design intent is to spawn human-like text in response to the input. The term "large" reflects the model's magnitude regarding parameter count and the volume of training data. Take OpenAI's GPT-3, for example -- its training utilises a colossal model incorporating 175 billion parameters of the vast amount of text.
These models must possess a conscious comprehension of the text they generate, and they rely on discerning patterns within their training data to create predictive outputs. The governing principle remains consistent: comprehensive, high-quality training data empowers the model to generate accurate predictions.
Contrarily, "proprietary models" are usually crafted by specific entities or corporations and encompass a model whose design, structure, and algorithms safeguard the creator's intellectual property. This term often juxtaposes against open-source models whose blueprints are publicly accessible for use, alteration, and distribution. The important delineation is that proprietary models are not fundamentally different from large language models. The terms underscore other characteristics of models.
See also: 80% of AI projects are projected to fail. Here's how it doesn't have to be this way
You reap what you sow
A model like OpenAI's GPT-3 can be a large language model and proprietary. As highlighted previously, these models are trained on extensive and complex datasets, which raises the risk of disruption to the resulting output quality due to training dataset tampering – a term we coin as data poisoning. Cybersecurity provides an apt analogy: "Garbage in, garbage out." Like cyber hygiene practices, the quality and curation of data feeding the model affect the output, enabling accurate anomaly detection whilst fostering innovation.
How can we prevent data poisoning? The key lies in meticulous data collection and curation, superseding haphazard data accumulation. This focus on high-quality data collection safeguards the accuracy of the model's output, regardless of the model being proprietary or open-source. The quantity of data is not the ultimate determinant of a model's efficacy but the quality and relevance of this data.
See also: Responsible AI starts with transparency
Data for an unbiased and safe internet
Algorithmic transparency mandates clarity about the general operations of algorithms. For instance, a loan decision-making algorithm should elucidate the factors it considers (income, credit score) and their respective weights. Algorithmic accountability, its counterpart, necessitates holding entities accountable for their algorithms' decisions, especially when outcomes bear signs of bias or discrimination.
Consider the usage of machine learning in intrusion detection systems (IDS), which monitor networks for potential threats or policy violations. Machine learning bolsters IDS capabilities by enabling the recognition of threats based on past data. Despite this advancement, transparency and accountability, challenges remain.
In this context, algorithmic transparency implies IDS users should understand the basis of decision-making. What characteristics signal a threat? How does it distinguish between normal and malicious activities? While disclosing exact system mechanics may aid potential attackers and hence should be avoided, users must possess adequate information to trust and navigate the system effectively.
Algorithmic accountability raises questions about responsibility in case of false positives or negatives. The IDS provider should assume accountability for these errors, mainly if they arise from algorithmic flaws.
The challenge here lies in maintaining an equilibrium between transparency, and accountability, protecting proprietary interests, and preventing undue advantage to potential attackers. It's a multifaceted task, necessitating nuanced consideration and a balanced approach. It's equally important to acknowledge the technical complexity of comprehending the decision-making process of some algorithms, like neural networks, and protecting proprietary information. Regardless of these hurdles, the consensus among experts is clear: we must strive towards enhanced algorithmic transparency and accountability.
Sean Duca is the regional vice president and chief security officer for JAPAC at Palo Alto Networks