Skip to main content
HomeBlogArtificial Intelligence (AI)

Bard vs ChatGPT for Data Science

Bard is the latest text generation LLM AI, created by Google. In this article, find out how Bard compares to the current reigning solution for Data Science, ChatGPT.
Mar 2023  · 15 min read

Google's Bard large language model has improved since its launch earlier in 2023, recently adding 'real-time' responses similar to other models. 

The current leader in this space is OpenAI's ChatGPT, and I wanted to know how Bard would match up. Google invented the transformer models that power these generative AIs and has huge resources available to try and win back the lead. On the other hand, OpenAI has the advantage of several years of user feedback from its GPT-2, GPT-3, GPT-3.5, and GPT-4 models.

Recently, DataCamp released a ChatGPT Cheat Sheet for Data Science that provides over 60 examples of real-world uses of ChatGPT for data science. In the cheat sheet, ChatGPT (powered by GPT-3.5-turbo, OpenAI's second-best model) performs well in all the tasks.

While Google explicitly mentions that Bard is not yet optimized for coding tasks, in this article, we'll test selected data science tasks to show where Bard performs well—and where it doesn't—to see if the crown has changed hands. You can find a more general comparison of ChatGPT vs Google Bard in a separate article. 

Introduction to ChatGPT Course

Get Started with ChatGPT

Start Now

Programming Workflows

Where does Bard perform well?

Code optimization is a common but tricky task. There are often many possible ways to make code more efficient or run faster, and there's usually no single correct answer. A good response to a code optimization request should provide various solutions. The task asks Bard to optimize some SQL code.

image19.jpg

Bard gives 10 options here. They are all sensible, though some of them are variations on a theme (five types of creating an index), and all are centered on database management rather than improving the query code.

Writing unit tests is an important task that few people enjoy, so having AI do this for you is potentially a blessing. Of course, it has to write good tests! In this example, Bard is asked to write tests for a function that calculated factorials.

image24.jpg

The code is well structured, and most of the tests are sensible. One area that may be of concern is that the test for string inputs asks the function to return numeric values. A human programmer would probably write tests that raise an error when a string was inputted since the factorial of text is meaningless. The prompt didn't specify what the behavior of a string input should be, so this problem could likely be corrected with a more detailed prompt, however, I'd consider this response good but not perfect.

Where does Bard perform badly?

Perhaps the most widely used data science use case for ChatGPT is explaining errors in code. That means I expected Bard to ace a task where it was asked to explain a simple syntax error in some R code.

image15.jpg

Unfortunately, Bard refused to answer the question (and gave a similar response when asked to explain Python and SQL errors). This is disappointing. However, the silver lining is that refusing to answer is the best possible way to fail. A big problem with AI models is when they confidently spout wrong answers that can mislead you. Well done to Google for imbuing Bard with caution.

Bard gave a similar response when asked to explain some Python code.

image21.jpg

Verdict: ChatGPT wins

While some of Bard's answers to programming questions were decent, it has some very obvious gaps around code and error explanation, making it unsuitable for assisting you in learning about your code.

Data Analysis Workflows

Where does Bard perform well?

Using natural language (i.e., human sentences) to write SQL queries is a hugely important use case because it enables less technically savvy business users to get access to data in databases. Here, Bard is asked to write a simple query that aggregates customer details.

image22.jpg

The code and explanations are correct, and it goes a step further by sorting the rows from most common email type to least common, which is useful for whoever has to look at the results.

Common table expressions (CTEs) are SQL's equivalent of Python or R's variables: they let you build up longer queries in stages. Rewriting complex queries that contain nested subqueries is a useful trick to make your code more readable. In the following example, I wanted Bard to move the subquery (the inner code that starts with SELECT) to a common table expression.

image6.jpg

Here, Bard completely misunderstood the problem, and moved everything into a CTE. However, Bard has a trick up its sleeve. whenever you ask Bard to generate a response, it actually generates three responses and lets you pick one. Here, the third response is correct. 

image27.jpg

Another interesting use case for data analysis is cleaning data. It's not a fun task for humans, so it's helpful if AI can do it for you. However, there are often multiple ways of cleaning the data rather than there being a single solution. 

image17.jpg

In this case, the code is correct but has two notable quirks. Firstly, it assumes that customer_id is in the DataFrame index. While some pandas users consider this to be the best way to store data, the prompt did say that customer_id was a column. Secondly, the .replace()/.isnull() combination is clunkier than using .fillna(). Again, Bard's "three solutions" approach comes to its rescue; the second solution addresses both of my complaints.

image3.jpg

Where does Bard perform badly?

As an education company, at DataCamp, we have an almost unlimited need for new datasets to let users perform analysis on. That means that having AI be able to generate datasets instantly is a game-changing feature. For everyone else, being able to generate datasets to test their analysis code against quickly is a big benefit. ChatGPT does this very well, but let's see how Bard performs.

image5.jpg

Again, it refuses to answer the question. As with the programming examples, I'm impressed with the restraint the AI shows but a little disappointed that it can't complete the task.

Since a large part of data analysis is data manipulation, it would be great if AI could write your code for you. The following example is slightly tricky because it requires the AI to understand that the year needs to be calculated from the join date.

image10.jpg

Unfortunately, this subtlety wasn't picked up on, and the code is wrong. In fact, all three of the generated solutions are wrong. This is a bad case of being wrong, too: the AI is confidently wrong in its response.

Verdict: ChatGPT wins

I love that Bard provides three solutions, so you have a better chance of getting something suitable. However, the inability to generate datasets and perform (slightly tricky) data manipulations is a blocker.

Data Visualization Workflows

Where does Bard perform well?

I tried a simple "create a scatter plot" task with a twist: changing the color theme of the plot. The news site FiveThirtyEight publishes its own color theme, which is available in the popular Matplotlib package.

image4.jpg

The code is correct, though it has a quirk that mcolors is imported but never used.

Where does Bard perform badly?

Another fairly straightforward plotting task is to draw a line plot with multiple lines on it. The dataset used is called gapminder, and it's commonly used for demonstration purposes. 

image14.jpg

Here, the code is almost correct but includes a similar problem as before: Bard is unsure whether country should be a column or an index (even though it is stated in the prompt). The row subsetting in the plt.plot() line thinks country is in the index, which is inconsistent with the previous line.

I asked Bard to perform the same task using R's ggplot2 package. 

image12.jpg

Bard politely refuses here. I'm slightly surprised at this since ggplot2 has been the most popular plotting package in R for over a decade, and there are lots of code examples available.

Verdict: ChatGPT wins

I might forgive Bard's confusion around using pandas indexes, but the inability to draw a simple ggplot2 plot renders it unsuitable for R users.

Machine Learning Workflows

Where does Bard perform well?

I tried a few tasks, and unfortunately, all of them were problematic. This is a major red flag for Bard since machine learning in Python is wildly popular.

Where does Bard perform badly?

Perhaps the most obvious place to start for machine learning is training a model and making predictions in Python. 

image7.jpg

The code is mostly good, but there's a problem with how the training and testing sets are created. It's good practice to randomly select rows to go into one set or the other, but Bard just picks the last thousand rows to go into the testing set (and puts everything else in training. That would only be OK if the movies in the dataset were randomly ordered.

Another big area of interest is model explainability. Decision trees are fairly easy to explain using Shap values, and there's a fairly standard workflow for doing this. 

image26.jpg

Unfortunately, the code includes a call to shap.explain(), a function that doesn't exist. This sort of problem is fairly easy to spot because you can try to run the code, and it will fail. However, generating non-working code is a waste of your time.

I also wanted to try a more conceptual machine-learning task. Much of the art of machine learning is in using domain knowledge to create meaningful features to input to your models. Coming up with ideas for features can be tricky, so it's an area where AI can be very helpful. 

image11.jpg

Here, the response isn't terrible since it does come up with two useful ideas for features (genre and budget). However, the other two ideas were just those specified in the prompt, and the second list wasn't requested. It also refers to "data engineering techniques" rather than "machine learning techniques," which is incorrect.

Verdict: ChatGPT wins

Bard performed very badly here; it's no contest at all.

Time Series Analysis Workflows

Where does Bard perform well?

Unfortunately, I couldn't persuade Bard to complete any time series tasks.

Where does Bard perform badly?

I tried data manipulation tasks for time series involving changing daily data into weekly and monthly data. 

image25.jpg

Bard refused.

Next, I tried to get it to perform a test for stationarity. 

image8.jpg

Again, Bard refused. As before, I'm pleased with the restraint built into the AI to Bard to reduce the number of wrong answers it generates.

Verdict: ChatGPT wins

It looks like time series analysis isn't a thing that Bard can do yet.

Natural Language Processing Workflows

Where does Bard perform well?

You might imagine that large language models would work well with language. Sentiment analysis (determining whether text has a positive or negative vibe) is important for quantifying product reviews on shopping sites and is considered table stakes for text analysis. This task was made slightly tricky by including an emoji and a slang phrase. 

image1.jpg

Bard's first response is OK, but it missed the idea that each line in the prompt should be considered separately. (In fairness, this wasn't made explicit, though a human might have intuitively assumed that.) Bard's third response did pick up on this detail. 

image20.jpg

In both cases, the emoji and slang were correctly interpreted.

Where does Bard perform badly?

Translating text from one human language to another is the original use case for transformer models. I asked Bard to provide a response in French. 

image9.jpg

Unfortunately, Bard provided an English explanation of machine learning and the translation of the phrase "machine learning' into French rather than providing the whole output in French.

Verdict: ChatGPT wins

The translation failure was a big surprise to me.

Conceptual and Career-Oriented Tasks

Where does Bard perform well?

Explaining technical ideas from your analysis to people without a data background is one of the harder parts of data science. I asked Bard to explain A/B testing to an executive. 

image2.jpg

The explanation is pretty good. There's a slightly dubious assumption that the new font resulted in a higher conversion rate (it depends on how the test was set up; it could just be a different conversion rate). The other two responses did pick up on this subtlety, so I won't dock points here.

Another useful AI task is to summarize longer works. Here, I asked Bard to summarize the original journal article about transformer models (written by Google employees). 

image18.jpg

It's a good, easy-to-read summary. One quirk is that some links are included to blog articles about the paper rather than to the original paper itself.

For people wanting a career in data, getting ideas on how to get that dream job is vital career advice. I asked Bard to come up with project ideas. This is another creative task where there is no single right answer. 

image13.jpg

Finally, providing introductions to longer pieces of work is one of those slightly annoying tasks that I'd prefer to outsource to AI. 

image23.jpg

Again, with only minimal context, the AI provides a well-written piece.

Where does Bard perform badly?

In this case, Bard performed well on all tasks I threw at it.

Verdict: It's a tie

While it's difficult to quantify whether ChatGPT or Bard is better for these creative tasks, I'm pleased to have found one areas where Bard consistently provided high-quality output.

Summary

It's very early days for Bard, having only been released to the public for less than 48 hours. There are some notable gaps in its knowledge of data science workflows: being unable to explain code errors and performing badly at Python machine learning tasks are big issues.

I love that you get three responses to choose the best one, and I love that it is restrained enough to know its limits. Refusing to answer a question is much preferable to giving a wrong answer.

Ultimately, ChatGPT reigns supreme for now, but I eagerly await watching Bard improve. Competition in the space of generative AI can only benefit the world.

Take it to the next level

If you want to learn more about using generative AI for data science, check out the following resources: 

Introduction to ChatGPT Course

Get Started with ChatGPT

Topics
Related

LLMOps Essentials: A Practical Guide to Operationalizing Large Language Models

Explore the essentials of LLMOps with our guide on operationalizing Large Language Models for efficiency and reliability in AI applications.
Andrea Valenzuela's photo

Andrea Valenzuela

15 min

How AI is Changing Cybersecurity with Brian Murphy, CEO of ReliaQuest

Adel and Brian explore the evolution of cybersecurity tools, the challenges faced by cybersecurity teams, how generative AI can be used both defensively and offensively in cybersecurity, and much more. 
Adel Nehme's photo

Adel Nehme

32 min

What is Google Gemini? Everything You Need To Know About Google’s ChatGPT Rival

Gemini defines a family of multimodal LLMs capable of understanding texts, images, videos, and audio. It’s also said to be capable of performing complex tasks in math and physics, as well as being able to generate high-quality code in several programming languages.
Kurtis Pykes 's photo

Kurtis Pykes

9 min

A Comprehensive Guide to Databricks Lakehouse AI For Data Scientists

This tutorial dives into the Databricks approach to AI & Machine Learning in the Databricks Lakehouse and introduces its latest features.
Arunn Thevapalan's photo

Arunn Thevapalan

12 min

How to Build a Chatbot Using the OpenAI API & Pinecone

Step-by-step guide for building LLM-powered Chatbot on your own custom data, leveraging RAG techniques using OpenAI and Pinecone in Python.
Moez Ali's photo

Moez Ali

12 min

An Introduction to LMQL: The Bridge Between SQL and Large Language Models

Discover everything you need to know about LMQL, short for Language Models Query Language, an innovative programming language for LLMs.
Kurtis Pykes 's photo

Kurtis Pykes

12 min

See MoreSee More