Skip to main content
HomeBlogArtificial Intelligence (AI)

Promoting Responsible AI: Content Moderation in ChatGPT

Explore the ethical landscape of AI with a focus on content moderation in ChatGPT. Learn about OpenAI's Moderation API, real-world examples, and best practices for responsible AI development.
Sep 2023  · 11 min read

The first recorded usage of the term artificial intelligence was at an academic conference in 1956 in a speech by John McCarthy, referencing the simulation of human intelligence by machines. Thus, he’s often credited as one of the founding fathers of AI.

But the actual journey to understand whether machines can think began much before that. In fact, in 1950, English mathematician Alan Turing published a paper called Computing Machinery and Intelligence to propose a test to determine whether a computer can think.

Nowadays, questions around Artificial Intelligence (AI) are different. The technology has grown in popularity due to the increased data volumes we produce, advanced algorithms, and significant improvements in computing power. We are now at a place where it’s reasonable to consider the ethical and societal issues raised by using such tools.

While AI technology can improve efficiency and productivity, it also has the capacity to create harmful content, hold bias, and violate data privacy. This has kicked off interesting conversations regarding the practice of building AI responsibly, such that it empowers employees, impacts customers positively, and enables companies to scale AI ethically.

The Evolution of ChatGPT and Content Moderation

One of the most talked about AI advances in popular culture is ChatGPT. ChatGPT is an AI chatbot developed by OpenAI to emulate human-like interactions with users.

ChatGPT, which held the title as the fastest-adopted online service for a few months after gaining one million users in five days, has gone through several iterations since its initial debut in June 2018.

At the time, the model was called Generative Pre-trained Transformer 1 (GPT-1). It was the first large language model (LLM) developed by OpenAI, in response to Google’s invention of the transformer architecture in 2017. This model formed the foundations of ChatGPT as we know it today, but there were further iterations before we got there.

For example, in 2019, OpenAI released the more powerful GPT-2 model. The updates were focused on enhancing the model's language understanding capabilities, which meant training the model on a much larger dataset and fine-tuning access to allow users to customize the model for their specific use case. Shortly after, in 2020, there was GPT-3.

ChatGPT is now regarded as one of the most powerful language models available at the time of writing. It has the ability to perform natural language processing tasks such as translation, text completion, question answering, and text generation.

But that’s not to say it’s without its problems.

The power and limitations of ChatGPT

ChatGPT as we know it today is based on the GPT-3.5 architecture, with paid subscribers gaining access to GPT-4. The main difference is the dataset used and the fact it was optimized for conversational use cases to offer users a more personalized experience when they interact with the model through a chat interface.

These changes have improved the efficiency of communication between humans and computer programs. It’s also drastically improved how information is processed, consumed, and the customer experience.

But it’s not without its flaws. Several discussions have recently shed light on the potential of ChatGPT to generate inappropriate or biased responses. Our guide on the ethics of generative AI explores these in more detail.

This issue is rooted in how ChatGPT was built — the LLM was trained using the collective writings of individuals from diverse backgrounds. Though their diversity aids the model in its understanding, it also has the potential to introduce biases and prejudices into the work it generates.

The need for active moderation

Active content moderation describes the process of reviewing and monitoring user-generated content. The purpose is to ensure user-generated content meets certain standards and guidelines, which protects your brand from being associated with negativity and users from seeing content that may be offensive.

For example, one aspect of active content moderation includes removing inappropriate content from platforms, while another involves enforcing community guidelines to prevent things such as bullying.

Mechanisms of ChatGPT Content Moderation

OpenAI acknowledged the threat posed by user-generated content (e.g., destroying the reputation of the application, causing harm to users, etc.) and quickly swooped in to provide a tool to prevent inappropriate content coming from a language model or the user.

This tool is neatly packaged into the Moderation API, which enables developers to check their content against OpenAI’s usage policies. These policies seek to eradicate inappropriate language, such as:

  • Hate speech
  • Threatening language
  • Harassment
  • Self-harm (intent or instructions)
  • Graphical content (including pornography and violence)

However, it’s also possible for models such as ChatGPT to emit biased or inaccurate output when influenced by unfiltered user input. Thus, control measures are implemented to prevent the model from accidentally spreading false information.

Content is thus controlled on two fronts:

User input control

Techniques to monitor, filter, and manage content generated by users; the goal is to empower businesses to maintain integrity, safety, and moral standards when building LLM applications.

Output model control

These policies and procedures make it possible to keep track of and filter out the replies the model generates when interacting with users; it enables developers to address potential problems with the model output, such as bias.

Implementing input and output control puts the ethical responsibilities of LLMs on the business; it becomes the business's responsibility to ensure users have a positive experience with their products and services, and they ensure AI is used responsibly.

Methods of Content Moderation

There are six common types of moderation:


Before content is made visible to the world, it must be checked. This is called pre-moderation. The benefit of pre-moderation is that content deemed unsuitable or undesirable for an online platform can be prohibited before it does harm.

It’s an extremely popular choice of moderation for online communities targeting a younger audience. However, many think it kills the instantly gratifying nature of online communities since content must be cleared before it’s shared.


The opposite of pre-moderation is post-moderation. From a user-experience perspective, this tends to be preferred since it means content can be published immediately after submission before entering the queue to be moderated.

The major benefit of this approach is that users can communicate in real-time, but the cost can become prohibitive since website operators, who are the legal publishers of content, hold liability for what’s shown on their platform – the more people who see destructive content, the larger the damages would be should the case go to court.

Reactive moderation

Another option is to put the power in the hands of the community members. This means users can flag content that breaches the rules or members behaving inappropriately.

It’s common to see reactive moderation utilized simultaneously with pre- or post- moderation as an extra safety net for content that slips past the moderators. The big advantage of this approach is that it can scale as the community grows without putting extra strain on your resources.

Distributed moderation

Distributed moderation is a rare form of content filtering. It depends on a rating system that members of the community use to vote on whether a submission falls in line with the guidelines of the community.

Companies tend to refrain from putting the onus on community members to self-moderate as it poses legal and branding risks.

Automated moderation

As user-generated content increases, so does the complexity for human moderators to step in and manage the load. Not only is it hard for individuals to manually parse through tons of content, but it’s also demoralizing; Nobody goes to work excited to parse a pile of distressing content.

Adding automated moderation to human-powered moderation is a super valuable workaround. It consists of various technical tools, such as word filters, to reject and approve submissions.

No moderation

Content moderators would likely advise against the no-moderation approach. However, they also understand there may be good reasons why people may choose not to have content in their community moderated.

For example, maybe there are not enough resources to do so, or the notion of “free speech” is what your community attempts to embrace. Essentially, forfeiting moderation is giving up control, leaving platforms vulnerable, which is typically a turnoff for new users.

Real-World Content Moderation

Most platforms today, like Facebook and YouTube, leverage automated content moderation to block certain types of content. This typically involves the use of AI with the specific aim of improving content moderation efforts and enhancing the user experience on their platforms.

However, while it’s possible for AI to detect inappropriate content and take over the majority of the tedious work usually assigned to humans, they do have faults. Machines often tend to gloss over important nuances such as misinformation, bias, and hate speech, which means completely eliminating human moderators is not ideal.

In other words, automated moderation using AI is often paired with human-powered moderation to ensure platforms are safe for users and the brand is intact. Here are a few real-world examples:

Abusive content

Twitter (now X) faced severe criticism for not being able to efficiently respond to online harassment. However, since the Elon Musk takeover, the company has been leaning heavily on automation to moderate content. The result is tools such as Quality filter, which uses techniques such as natural language processing to limit the visibility of low-quality content.

Adult content

YouTube has a strict policy against nudity and sexual content; users face the potential of having their channel terminated if they do so. The way they catch such content is twofold: automated and reactive moderation.

In the automated approach, Google deploys algorithms that leverage techniques such as computer vision to check for nudity. If the content evades their algorithms, however, YouTube relies on viewers to flag content as inappropriate for it to be reviewed.

Fake and misleading content

Instagram relies on AI and human moderators to find, review, and take action on content that spreads misleading information – this was particularly evident during the Covid outbreak.

According to the Instagram Help Centre, “[Instagram’s] AI can detect and remove content that goes against our Community Guidelines before anyone reports it. At other times, our technology sends content to human review teams to take a closer look and make a decision on it.”


As we advance from the foundational theories of Turing and McCarthy to the practical applications of ChatGPT, the ethical landscape of AI has become increasingly complex. While AI technologies like ChatGPT offer transformative potential, they also pose ethical challenges, including content moderation and bias.

OpenAI's Moderation API is a significant stride toward responsible AI, but it's not a one-size-fits-all solution. Businesses and developers share the ethical responsibility to implement a multi-faceted approach to moderation, combining automated systems with human oversight. Real-world examples from platforms like Twitter and YouTube demonstrate the effectiveness of this hybrid model.

For those interested in diving deeper into responsible AI, we have several resources to help, such as a podcast on the future of responsible AI, a webinar on evaluating machine learning models in Python, and a course on AI ethics.

The ethical management of AI is a collective endeavor that requires ongoing vigilance from all stakeholders. As we push the boundaries of AI capabilities, let's also push for ethical integrity to ensure that technology serves as a force for good.

Photo of Kurtis Pykes
Kurtis Pykes

Start Learning AI Today!


AI Ethics

1 hr
Explore AI ethics focusing on principles, fairness, bias reduction, and trust in AI design.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
An AI stands in a court of law


ChatGPT and The Future of AI Regulations

Governments around the world are considering new AI regulations to tackle the potential dangers of next-generation AI tools like ChatGPT
Javier Canales Luna's photo

Javier Canales Luna

8 min


The Impact of ChatGPT and Generative AI on Jobs

Explore the potential impact of Generative AI, like ChatGPT, on the job market. Discover implications for white-collar professions and learn how to prepare for a future driven by AI.
Arne Warnke's photo

Arne Warnke

16 min


ChatGPT in Space: How AI Can Transform Deep Space Missions

Explore how tools like ChatGPT could revolutionize space travel by improving communication, data quality, and astronaut well-being. Learn about the challenges and solutions for AI in space.
James Chapman's photo

James Chapman

7 min

Scott Downes- DataFramed.png


ChatGPT and How Generative AI is Augmenting Workflows

Join in for a discussion on ChatGPT, GPT-3, and their use cases for working with text, helping companies scale their operations, and much more.
Richie Cotton's photo

Richie Cotton

48 min


Using ChatGPT to Moderate ChatGPT Responses

Explore the nuances of AI moderation with our in-depth guide on utilizing ChatGPT for moderating responses in GPT-based applications.
Andrea Valenzuela's photo

Andrea Valenzuela

17 min


A Beginner's Guide to Using the ChatGPT API

This guide walks you through the basics of the ChatGPT API, demonstrating its potential in natural language processing and AI-driven communication.
Moez Ali's photo

Moez Ali

11 min

See MoreSee More