It seems that every legacy app is adding AI features, and almost every new app being created right now is including AI. Of course AI is benefiting the users of these applications - or at least should. The question is, what is the cost to us, as the end user?
Register for a deep dive into this topic in our upcoming webinar!
Do you wonder where your data is going when you engage with AI?
Of primary concern right now is not the utility of adding AI features. The most pressing concern is the security of the data, and how it might be used.
Technological Advances
There have been several technological advances that have made the use of AI features such as Generative AI available and practical for everyday use. These include:
- the development of the Transformer (that’s the T in Chat GPT),
- development of processors capable of the complex and rapid calculations needed, and access to vast troves of data that can be used to train the models.
This data has been sourced from the open and highly available repository of the internet - Wikipedia, reddit, and every website published publicly.
How can a model be made smarter?
To make the models smarter, though, requires even more data - ideally data that is real, current, and includes feedback. If the web has already been scraped and ingested, where is this additional data going to come from? You, the user.
As the big players in the AI game fight for dominance - the best, fastest, most accurate, and most human-like, this data and feedback becomes more and more important. I am not going to go so far as to accuse any of those corporations of misleading or misusing our data, but until we all can come to grips with how and where the data is used, mistakes can and will be made. And there is even a chance that some things are being done intentionally.
Feedback process
The feedback process is probably familiar to you - if you enter a query into a generative AI chat, you might provide feedback if the answer is obviously inaccurate, or doesn’t make sense. More passive feedback might include not further clarifying your query with a follow up question, which might indicate that the answer was the one you were looking for, or at least wasn't completely wrong.
Are your emails & documents being read and used as training data?
But what of new data? It stands to reason that the passages of text you are entering might be used as examples of sentence structure and usage. If you are running an AI Model within your text editor or email client, are your emails and documents being read and used as training data? Possibly, and even if your answer is “probably not”, there remains a chance.
Is it possible, then, that the tool you are using to interrogate your video calls is storing and using the meeting as training data?
What about passwords or secret keys that might be stored in code that a co-pilot is helping you to write?
I don’t mean to be alarmist, or to make it sound like you are being spied on, but a certain degree of vigilance and understanding is required. And I am definitely not suggesting that we don’t use the tools available - this is all about using them safely.
Caution: Use AI tools Safely!
So if this is the case, what can we put in place, to safely use these tools? Here are a few that you might want to adopt yourself:
- Assume that any data you are entering into an AI model (written, spoken, images or video) is as open and public as something you might post on social media or publish on a public website.
- Don’t utilise data as above that might contain personal information, such as full name, addresses or ID numbers (to name just a few types).
- Never use data as above that might contain confidential proprietary company information (that is any information that is not in the public domain such as trade secrets or internal only information).
- Never use information that includes a companies systems, processes, internal policy etc.
- Anything that is not directly owned by you or provided to you to use as you wish.
You might be able to think of some other things here that you should not use. If you are not sure, in general, don’t use it. You can get advice from the person responsible in your company if needed.
What Are the Risks?
Providing data to an AI model that then uses that data as learning data means that matching responses in the future may expose that private or personal information to new users in the future.
To give an extreme example for illustration - imagine you have written a document explaining a new method to sort and filter information that you intend to build for a client. The document is intended only for that client and your internal team. It refers to research that the client has paid for, and is unique to them. Before sending the document, you upload it to an AI model to check for typos and fix formatting. It does so and looks great, you send it on to the client.
A year later a different, unrelated user of that AI model submits a query about how to sort information. Having used your document as learning data, and not having flagged it as proprietary, parts of your solution are now given to that new user, for free, and in violation of your agreement with the client.
How Can You Use AI Models Without Endangering Data Security?
The intention here is not to make you too scared to use the tools available - they can be a great advantage to you in production, and can help you to get more done with higher quality.
Here are a few ideas of things you might consider:
- Use AI models that include clear and published terms of use or privacy statements - they might declare up front that they will use data for further learning. If the terms are vague, difficult to understand, or non-existent assume the worst case scenario.
- Do not introduce AI tools into business applications unless you are very, very sure of the data security. Definitely only use tools that have been approved for use by your company.
- In any case, mask or de-identify personal or private information before submitting it.
- Do not use anything that might be covered by an NDA anywhere near AI tools that process in the cloud.
- For business applications; locate or have built tools that learn or process data in your own private infrastructure using your own embedded AI tools.
Conclusion
This is a fast moving subject, what is true today may not be so tomorrow. The onus definitely is on the user to stay alert and to not make assumptions about how secure the data may be. Certainly for the time being you should assume that any data fed to a public AI model is treated as public information.
Using AI tools can be an incredible benefit to you and your business, but use them sensibly.