Course
Bard vs ChatGPT for Data Science
Google's Bard large language model has improved since its launch in 2023, recently adding "real-time" responses similar to other models.
The current leader in this space is OpenAI's ChatGPT, and I wanted to know how Bard would match up. Google invented the transformer models that power these generative AIs and has huge resources available to try and win back the lead. On the other hand, OpenAI has the advantage of several years of user feedback from its GPT-2, GPT-3, GPT-3.5, and GPT-4 models.
Recently, DataCamp released a ChatGPT Cheat Sheet for Data Science that provides over 60 examples of real-world uses of ChatGPT for data science. In the cheat sheet, ChatGPT performs well in all the tasks.
While Google explicitly mentions that Bard is not yet optimized for coding tasks, in this article, we'll test selected data science tasks to show where Bard performs well—and where it doesn't—to see if the crown has changed hands. You can find a more general comparison of ChatGPT vs Google Bard in a separate article.
Introduction to ChatGPT Course
Get Started with ChatGPT
Programming Workflows
Where does Bard perform well?
Code optimization is a common but tricky task. There are often many possible ways to make code more efficient or run faster, and there's usually no single correct answer. A good response to a code optimization request should provide various solutions. The task asks Bard to optimize some SQL code.
Bard gives 10 options here. They are all sensible, though some of them are variations on a theme (five types of creating an index), and all are centered on database management rather than improving the query code.
Writing unit tests is an important task that few people enjoy, so having AI do this for you is potentially a blessing. Of course, it has to write good tests! In this example, Bard is asked to write tests for a function that calculated factorials.
The code is well structured, and most of the tests are sensible. One area that may be of concern is that the test for string inputs asks the function to return numeric values. A human programmer would probably write tests that raise an error when a string was inputted since the factorial of text is meaningless. The prompt didn't specify what the behavior of a string input should be, so this problem could likely be corrected with a more detailed prompt, however, I'd consider this response good but not perfect.
Where does Bard perform badly?
Perhaps the most widely used data science use case for ChatGPT is explaining errors in code. That means I expected Bard to ace a task where it was asked to explain a simple syntax error in some R code.
Unfortunately, Bard refused to answer the question (and gave a similar response when asked to explain Python and SQL errors). This is disappointing. However, the silver lining is that refusing to answer is the best possible way to fail. A big problem with AI models is when they confidently spout wrong answers that can mislead you. Well done to Google for imbuing Bard with caution.
Bard gave a similar response when asked to explain some Python code.
Verdict: ChatGPT wins
While some of Bard's answers to programming questions were decent, it has some very obvious gaps around code and error explanation, making it unsuitable for assisting you in learning about your code.
Data Analysis Workflows
Where does Bard perform well?
Using natural language (i.e., human sentences) to write SQL queries is a hugely important use case because it enables less technically savvy business users to get access to data in databases. Here, Bard is asked to write a simple query that aggregates customer details.
The code and explanations are correct, and it goes a step further by sorting the rows from most common email type to least common, which is useful for whoever has to look at the results.
Common table expressions (CTEs) are SQL's equivalent of Python or R's variables: they let you build up longer queries in stages. Rewriting complex queries that contain nested subqueries is a useful trick to make your code more readable. In the following example, I wanted Bard to move the subquery (the inner code that starts with SELECT) to a common table expression.
Here, Bard completely misunderstood the problem, and moved everything into a CTE. However, Bard has a trick up its sleeve. whenever you ask Bard to generate a response, it actually generates three responses and lets you pick one. Here, the third response is correct.
Another interesting use case for data analysis is cleaning data. It's not a fun task for humans, so it's helpful if AI can do it for you. However, there are often multiple ways of cleaning the data rather than there being a single solution.
In this case, the code is correct but has two notable quirks. Firstly, it assumes that customer_id
is in the DataFrame index. While some pandas users consider this to be the best way to store data, the prompt did say that customer_id
was a column. Secondly, the .replace()
/.isnull()
combination is clunkier than using .fillna()
. Again, Bard's "three solutions" approach comes to its rescue; the second solution addresses both of my complaints.
Where does Bard perform badly?
As an education company, at DataCamp, we have an almost unlimited need for new datasets to let users perform analysis on. That means that having AI be able to generate datasets instantly is a game-changing feature. For everyone else, being able to generate datasets to test their analysis code against quickly is a big benefit. ChatGPT does this very well, but let's see how Bard performs.
Again, it refuses to answer the question. As with the programming examples, I'm impressed with the restraint the AI shows but a little disappointed that it can't complete the task.
Since a large part of data analysis is data manipulation, it would be great if AI could write your code for you. The following example is slightly tricky because it requires the AI to understand that the year needs to be calculated from the join date.
Unfortunately, this subtlety wasn't picked up on, and the code is wrong. In fact, all three of the generated solutions are wrong. This is a bad case of being wrong, too: the AI is confidently wrong in its response.
Verdict: ChatGPT wins
I love that Bard provides three solutions, so you have a better chance of getting something suitable. However, the inability to generate datasets and perform (slightly tricky) data manipulations is a blocker.
Data Visualization Workflows
Where does Bard perform well?
I tried a simple "create a scatter plot" task with a twist: changing the color theme of the plot. The news site FiveThirtyEight publishes its own color theme, which is available in the popular Matplotlib package.
The code is correct, though it has a quirk that mcolors is imported but never used.
Where does Bard perform badly?
Another fairly straightforward plotting task is to draw a line plot with multiple lines on it. The dataset used is called gapminder, and it's commonly used for demonstration purposes.
Here, the code is almost correct but includes a similar problem as before: Bard is unsure whether country
should be a column or an index (even though it is stated in the prompt). The row subsetting in the plt.plot()
line thinks country
is in the index, which is inconsistent with the previous line.
I asked Bard to perform the same task using R's ggplot2
package.
Bard politely refuses here. I'm slightly surprised at this since ggplot2 has been the most popular plotting package in R for over a decade, and there are lots of code examples available.
Verdict: ChatGPT wins
I might forgive Bard's confusion around using pandas indexes, but the inability to draw a simple ggplot2 plot renders it unsuitable for R users.
Machine Learning Workflows
Where does Bard perform well?
I tried a few tasks, and unfortunately, all of them were problematic. This is a major red flag for Bard since machine learning in Python is wildly popular.
Where does Bard perform badly?
Perhaps the most obvious place to start for machine learning is training a model and making predictions in Python.
The code is mostly good, but there's a problem with how the training and testing sets are created. It's good practice to randomly select rows to go into one set or the other, but Bard just picks the last thousand rows to go into the testing set (and puts everything else in training. That would only be OK if the movies in the dataset were randomly ordered.
Another big area of interest is model explainability. Decision trees are fairly easy to explain using Shap values, and there's a fairly standard workflow for doing this.
Unfortunately, the code includes a call to shap.explain()
, a function that doesn't exist. This sort of problem is fairly easy to spot because you can try to run the code, and it will fail. However, generating non-working code is a waste of your time.
I also wanted to try a more conceptual machine-learning task. Much of the art of machine learning is in using domain knowledge to create meaningful features to input to your models. Coming up with ideas for features can be tricky, so it's an area where AI can be very helpful.
Here, the response isn't terrible since it does come up with two useful ideas for features (genre and budget). However, the other two ideas were just those specified in the prompt, and the second list wasn't requested. It also refers to "data engineering techniques" rather than "machine learning techniques," which is incorrect.
Verdict: ChatGPT wins
Bard performed very badly here; it's no contest at all.
Time Series Analysis Workflows
Where does Bard perform well?
Unfortunately, I couldn't persuade Bard to complete any time series tasks.
Where does Bard perform badly?
I tried data manipulation tasks for time series involving changing daily data into weekly and monthly data.
Bard refused.
Next, I tried to get it to perform a test for stationarity.
Again, Bard refused. As before, I'm pleased with the restraint built into the AI to Bard to reduce the number of wrong answers it generates.
Verdict: ChatGPT wins
It looks like time series analysis isn't a thing that Bard can do yet.
Natural Language Processing Workflows
Where does Bard perform well?
You might imagine that large language models would work well with language. Sentiment analysis (determining whether text has a positive or negative vibe) is important for quantifying product reviews on shopping sites and is considered table stakes for text analysis. This task was made slightly tricky by including an emoji and a slang phrase.
Bard's first response is OK, but it missed the idea that each line in the prompt should be considered separately. (In fairness, this wasn't made explicit, though a human might have intuitively assumed that.) Bard's third response did pick up on this detail.
In both cases, the emoji and slang were correctly interpreted.
Where does Bard perform badly?
Translating text from one human language to another is the original use case for transformer models. I asked Bard to provide a response in French.
Unfortunately, Bard provided an English explanation of machine learning and the translation of the phrase "machine learning' into French rather than providing the whole output in French.
Verdict: ChatGPT wins
The translation failure was a big surprise to me.
Conceptual and Career-Oriented Tasks
Where does Bard perform well?
Explaining technical ideas from your analysis to people without a data background is one of the harder parts of data science. I asked Bard to explain A/B testing to an executive.
The explanation is pretty good. There's a slightly dubious assumption that the new font resulted in a higher conversion rate (it depends on how the test was set up; it could just be a different conversion rate). The other two responses did pick up on this subtlety, so I won't dock points here.
Another useful AI task is to summarize longer works. Here, I asked Bard to summarize the original journal article about transformer models (written by Google employees).
It's a good, easy-to-read summary. One quirk is that some links are included to blog articles about the paper rather than to the original paper itself.
For people wanting a career in data, getting ideas on how to get that dream job is vital career advice. I asked Bard to come up with project ideas. This is another creative task where there is no single right answer.
Finally, providing introductions to longer pieces of work is one of those slightly annoying tasks that I'd prefer to outsource to AI.
Again, with only minimal context, the AI provides a well-written piece.
Where does Bard perform badly?
In this case, Bard performed well on all tasks I threw at it.
Verdict: It's a tie
While it's difficult to quantify whether ChatGPT or Bard is better for these creative tasks, I'm pleased to have found one areas where Bard consistently provided high-quality output.
Summary
There are some notable gaps in Bard's knowledge of data science workflows: being unable to explain code errors and performing badly at Python machine learning tasks are big issues.
I love that you get three responses to choose the best one, and I love that it is restrained enough to know its limits. Refusing to answer a question is much preferable to giving a wrong answer.
Ultimately, ChatGPT reigns supreme for now, but I eagerly await watching Bard improve. Competition in the space of generative AI can only benefit the world.
Take it to the next level
If you want to learn more about using generative AI for data science, check out the following resources:
- Learn how to use ChatGPT effectively in the Introduction to ChatGPT course.
- Check out our Guide to Using ChatGPT For Data Science Projects.
- Download this handy reference cheat sheet of ChatGPT prompts for data science.
- Listen to this podcast episode on How ChatGPT and GPT-3 Are Augmenting Workflows to understand how ChatGPT can benefit your business.
- Learn more about What is GPT-4 and Why Does it Matter?
Introduction to ChatGPT Course
Get Started with ChatGPT
Richie helps individuals and organizations get better at using data and AI. He's been a data scientist since before it was called data science, and has written two books and created many DataCamp courses on the subject. He is a host of the DataFramed podcast, and runs DataCamp's webinar program.
Learn more about generative AI with these courses!
Course
Working with the OpenAI API
Course
Generative AI for Business
blog
ChatGPT vs Google Bard: A Comparative Guide to AI Chatbots
blog
Claude vs ChatGPT for Data Science: A Comparative Analysis
blog
Google Bard for Data Science Projects
podcast
ChatGPT and How Generative AI is Augmenting Workflows
code-along
Using ChatGPT's Advanced Data Analysis
code-along
Data Storytelling with ChatGPT
Joe Franklin