Tadej Justin & Bojan Miličić13 July 2023

How to Use Large Language Models Responsibly

Protecting Your Sensitive Data

Remember HAL 9000 from 2001: A Space Odyssey? That intelligent machine that could chat like a human? Well, now we have real-life HALs everywhere, and while they're very cool and helpful, they feed on enormous amounts of our data.

Our biggest challenge today isn't keeping these machines in check but figuring out how to keep all that data safe and secure. From individual preferences to large business intellectual property, our digital footprints have made the concepts of data security and privacy more important than ever before.

From the European Union's General Data Protection Regulation (GDPR) to infamous breaches like Cambridge Analytica, the world has come to terms with the realities of this data-centric era we're living in. This means that companies today invest in more than just data - they invest in securing and protecting it.

Photo by Unsplash

LLMs: Revolutionizing Data Interaction

The technological landscape is continually evolving, and the rise of Large Language Models (LLMs) like GPT-4 by OpenAI has significantly changed the dynamics of data interaction and processing. These large language models can generate human-like text based on their input and provided context — a feature increasingly used across various industries.

Pseudonymization and Anonymization: What's the Difference?

Understanding the concepts of pseudonymization and anonymization is crucial when dealing with such sophisticated models.

Pseudonymization replaces personally identifiable information fields within a data record with artificial identifiers or pseudonyms, rendering the data record less identifying. LLMs' complex algorithms can generate remarkably coherent and contextually rich output from pseudonymized inputs, making privacy concerns redundant.

On the other hand, anonymization removes personally identifiable information from data altogether, ensuring that the individual the data describes remains anonymous. So, does a large language model like ChatGPT anonymize prompts? The answer is both yes and no.

Photo by Unsplash

Yes, if instructed to do so, it can find and anonymize prompts. However, you should be aware that in this case you are probably sending sensitive data to some other entity's API or servers. LLMs that function as ‘software as a service’ are a complicated system that can involve data exchange with other services and databases to an unknown degree.

And the answer is also ‘no’, because ChatGPT doesn't specifically anonymize prompts as an automatic feature. What’s more, it does not store personal data passed to the model during the interaction. That means data sent to the API isn't used to improve the models, adding a layer of data security.

In the end, the responsibility of ensuring that sensitive data is not unintentionally exposed while interacting with LLMs falls on the user or the organization. It is crucial to understand that sending data to an API from outside our company (or any enterprise environment) might inadvertently send sensitive information to another entity.

Staying in Control of Your Data

The million-dollar question is: How do we control our data when using ChatGPT and other LLMs? It starts with understanding how these models function, instituting rigorous data processing policies, and being proactive about pseudonymizing or anonymizing data before interacting with LLMs. Only by adopting such measures can we unlock the potential of LLMs without jeopardizing data privacy and security.

So as we plunge into this exciting new frontier, let's ensure we're navigating responsibly and not leaving our data – our gold – unprotected.

Photo by fabio on Unsplash

LLMs: Revolutionizing Data Interaction

Pseudonymization and Anonymization: What's the Difference?

Staying in Control of Your Data

Cookie Settings