Kaggle Competitions: The Complete Guide
Table of Contents
- What are Kaggle Competitions?
- Is Participating In Kaggle Competitions Worth It?
- When Should You Participate In A Data Science Competition?
- What Are Datacamp Competitions?
- How Do You Find The Right Kaggle Competition For Your Level? (From Beginners To Advanced Skills)
- Kaggle Competition Tips
- Choose a programming language
- Participate in Competitions tagged with "Getting Started"
- Do not use Kaggle exclusively
- Focus on learning
- Study other public notebooks
- Read the competition rules
- Share your solutions
- Take this course in DataCamp
- What Are Kaggle Rankings?
- Kaggle Medals
- Competition Medals
- Dataset Medals
- Notebook Medals
- Discussion Medals
- Performance Tiers
- Novice
- Contributor
- Expert
- Master
- Grandmaster
- How Long Does It Take To Move From Contributor To Expert?
- Kaggle Medals
- Kaggle Ranking Tips
- Conclusion
What are Kaggle Competitions?
Now that we’ve covered Kaggle basics like notebooks and datasets, we can touch upon the most frequently asked questions about Kaggle competitions: Who is organizing them? Can I compete? Why should I compete?
That last question is the key. Kaggle competitions are based on challenging machine learning tasks organized by Kaggle itself or other large companies, organizations, and universities. In these competitions the users compete with other data scientists on the platform to submit more accurate predictions which are made by the machine learning models they build during the competition. After the submission, a score that reflects how well their model works is automatically calculated.
It is a common misconception that it is necessary to take probability and statistics courses or to have a deep understanding of certain machine learning libraries before taking part in these competitions. The truth is that you can, and should, participate in Kaggle Competitions regardless of your level. Competitions are not exclusive to experts, and everybody can gain valuable experience from them and even leverage these to build a data science portfolio.
Is Participating In Kaggle Competitions Worth It?
- No matter how experienced you are in data science, you can improve your skills by participating in competitions in this continuously growing and developing field. These data science competitions will challenge you within your own capabilities. The more time and effort you put into Kaggle or DataCamp data science competitions, the quicker you will get comfortable with the libraries and programming languages that you use.
- You will earn your ranking amongst data scientists of all levels worldwide.
- You will have the opportunity to review the winning solutions, and have access to different approaches to the same problem. This will make it easier for you to analyze the challenging tasks from different perspectives.
- You will be provided with top-quality datasets. This will help you to focus entirely on the solution because you will not need to be thinking about cleaning the data, gathering related data, or creating a consistent and well-tagged dataset yourself.
- You will have the chance to participate in a technical discussion with the winners of the competitions and other top-level data scientists. This will help expand your network.
- Working on real problems will motivate you and give you insight on the day-to-day work and responsibilities of a data scientist.
- There is a clear financial incentive.
Participating in Kaggle or DataCamp competitions is definitely worth it. Regardless of your level of expertise, you will surely find at least one of the listed benefits relevant for you. There is a broad variety of data science competitions, and new competitions are published regularly. Even if you are not interested in them right now, it is recommended to follow the competitions that are published as at least one will likely become relevant to you at some point in the future.
When Should You Participate In A Data Science Competition?
Before entering a competition, take the following three criteria into account:
- Will the amount of time and work spent on this competition be balanced with the improvement that you can gain from it?
- What is the financial incentive behind a win, and is it worth your time?
- Will the competition work, research, and contribution be satisfying to you as a data professional?
Some data scientists are generally happy to participate in any competition just to expand their network and practice. However, most people need to evaluate the criteria above in order to decide whether a competition will be worth it or not. You will also need to find your own criteria and consider them before joining a competition, always taking into account that the more competitions you participate in, the more benefits and learning experiences you can get from Kaggle.
What Are Datacamp Competitions?
DataCamp Competitions and Kaggle Competitions have many similarities. Just like in Kaggle, in DataCamp you have the chance to examine the publicly shared notebooks, and DataCamp competitions also have prizes. If you rank in the competitions, you win a 1-year premium DataCamp subscription and you can win cash prizes as well. DataCamp also has a very similar environment to Kaggle in which you will be able to come together with other data scientists of all levels through the discussion pages.
However, there are a couple of important differences between Kaggle and Datacamp competitions. On the one hand, Kaggle competitions focus more on machine learning, while DataCamp focuses on testing your analytical, storytelling, and visualization skills in a broader context. On the other hand, you have a bigger chance of winning DataCamp competitions as it is an up-and-coming platform with relatively few participants.
In any case, you should never participate in the competitions with the sole intention of winning. You will get much more out of them if you focus on progressing instead, and in this regard, DataCamp also offers great possibilities for improvement which you can easily take advantage of.
How Do You Find The Right Kaggle Competition For Your Level? (From Beginners To Advanced Skills)
Kaggle allows you to filter the competitions by title or using keywords, so it's easy for you to find those that interest you the most. It's as easy as entering the title or keywords in the search bar.
Figure 5.1: Competition Searching
You can also filter them by competition tags.
Figure 5.2: Competition Filtering
Apart from titles, keywords and tags, there are three other main filters: “Status”, “Prizes and Awards”, and “Categories”.
Status:
- Monetary: Contests with this tag are usually shared by big, well-known companies. If you rank in these competitions, you will win a cash prize. The lowest prizes are between $5,000 and $10,000. Competitions with prizes between $50,000 and $100,000 are the most common. The biggest prizes go up to 1 million dollars.
- Medals: You get Kaggle medals as a reward, depending on the score you reach in the competition. With these medals, your ranking increases. Kaggle Ranking is explained in detail in section 6.
- Other: These competitions reward participants with Kaggle merchandise, like t-shirts or stickers.
Categories:
- Featured: These competitions are usually published by large companies, organizations and even governments. Their cash prizes are much bigger than those offered in other categories.
- Research: These are research-themed competitions. There is little or no prize money.
- Getting Started: These don't include any rewards. They are generally competitions created for educational purposes. At the end of this section, you will find a sample competition with the tag “Getting Started”. You will see not only the sample, but also a tutorial on how to use a notebook and how to submit the results, among other relevant steps.
- Playground: These are suitable competitions for those who want to gain some experience and continue to improve their skills. The prizes are usually Kaggle merchandise (like t-shirts and stickers). These competitions are often fun and gamified.
- InClass: These are competitions that are usually hosted by universities and their participants are their machine learning students. Their objective is to engage and inspire these students.
- Analytics: These are data analysis competitions.
- Simulations: What sets these apart from traditional supervised machine learning challenges on Kaggle are the types of competitions with reinforcement learning tasks. Competitors develop models and let their models compete in a simulated environment.
Besides the main filters you also have some others that allow you to sort the competitions by additional parameters, like “Hotness”, “Recently Launched”, “Closing Soon”, “Reward”, and “Total Teams”.
Figure 5.3: Competition Sorting
Kaggle Competition Tips
1. Choose a programming language
Python and R are the most frequently used programming languages in the field of data science, especially when it comes to visualizations and machine learning tasks. If you’re wondering about MATLAB, you can use it for data science tasks on your local computer, but Kaggle notebooks only support Python, R and Julia.
If you use a different programming language, it may be much easier for you to carry out some tasks, but what makes a programming language powerful is the community and the open-source library support behind it. You would need to consider that (by using one of the more popular languages) any library that you import into your local computer, you will also be able to import into Kaggle.
If you are a beginner, Python or R are great places to start and stick with until you develop in the field. Bear in mind that many notebooks shared on Kaggle are written in Python and if you need to study them, you will be able to do it comfortably if you know the language.
2. Participate in Competitions tagged with "Getting Started"
Getting Started is perfect for beginners. If you complete these competitions and review the notebooks shared by others, you will learn a lot, and relatively quickly.
Here are some Getting Started competitions we recommend depending on your knowledge:
- If you are familiar with Classification algorithms at a basic level, you should try out Titanic.
- If you have some experience with regression, you can take a look at the House Prices Advanced Regression Techniques competition.
- If you are interested in the field of computer vision, you should participate in the Digit Recognizer.
- If you have some experience with image processing, we recommend Facial Keypoints Detection.
- If you are interested in natural language processing, check out Bag of Words Meets Bags of Popcorn.
3. Do not use Kaggle exclusively
Once you try out one of the basic level competitions listed above, you will have a bit more experience and it may be good for you to look for competitions on different platforms.
On DataCamp, you can participate in data science competitions such as “Designing a promo strategy for a drinks company”, which will require you to do some research and go the extra mile within your capabilities. What makes this competition special is that it is not focused only on machine learning, but it also challenges participants to significantly improve their analysis, storytelling, and visualizations skills. You can view the shared notebooks for this competition in the entries tab.
In order to maximize your chances of success and make the most of this competition, it is recommended that you complete the following courses first:
- Exploratory Data Analysis in Python
- Statistical Thinking in Python
- Data Manipulation with Pandas
- Customer Segmentation in Python
- Cluster Analysis in Python
4. Focus on learning
The prize money in Kaggle Competitions is significant. This may tempt you to deviate from your goal of learning. Do not focus on the prize, but prioritize learning and improving. Once you get enough experience, you will have time to think about how to get to the top position.
5. Study other public notebooks
Studying the notebooks shared in the competitions will help you learn different ways of solving the same problem.
6. Read the competition rules
Read the competition information and rules before you decide to participate, and make sure that you fully understand them before joining the competition.
7. Share your solutions
Sharing your solutions will increase your interaction with other data scientists, and you’ll be able to get feedback from others. By opening discussion topics about your solutions, you will earn medals faster and you’ll be able to quickly increase your Kaggle ranking. More information on Kaggle Ranking is provided in section 6.
8. Take this course in DataCamp
DataCamp’s Winning a Kaggle Competition in Python course will teach you how to approach and structure any data science competition entry. By taking this course, you will learn all the fundamental techniques used in competitions, like how to validate machine learning models, and how to avoid overfitting.
What Are Kaggle Rankings?
The Kaggle ranking system is a live leaderboard that ranks data scientists of all levels of expertise, who make different types of contributions to Kaggle, from commenting, to participating in Kaggle competitions.
Besides the main leaderboard, there are four other different types of rankings for: “Competition”, “Dataset”, “Notebook”, and “Discussion”. You can see your level in each of these categories in your own profile. As you win medals in the categories above, your rank and tier increase. Remember that medals are obtained through competition ratings and upvotes.
There are five major tiers in Kaggle: “Novice”, “Contributor”, “Expert”, “Master” and “Grandmaster”. As of now, there are only 241 data scientists in the tier of “Kaggle Grandmaster”, which is the top league. This proves how difficult it is to become a part of it. As for the rest, right now there are 1,668 masters, 7,206 experts, 64,668 contributors, and 92,747 novices. The tier you are in, just like the number of medals that you earn, will prove to be very advantageous in moving your career forwards.
Kaggle Medals
Medals represent a singular achievement in a category. This achievement can be a great competition result, a popular notebook, a useful dataset or an insightful comment, to name a few. Your achievements are standardized and a ranking system is created by making comparisons with other data scientists’ contributions.
Competition Medals
Competition medals are determined by your ranking in competitions. Remember that you do not win medals from competitions in the InClass, Playground, and Getting Started categories.
In competitions with 0-99 teams, being in the top 40% will give you a bronze medal. You will get a silver medal if you are in the top 20%, and a gold medal if you are in the top 10%. As the number of teams increases, the distribution of medals also changes. For example, when participating in a competition with 1000 or more teams, the top 10% will be awarded a bronze medal, the top 5% will be awarded a silver medal, and the top 10 teams will be awarded a gold medal.
Dataset Medals
The more popular datasets you share, the more upvotes you will receive from others. Your dataset medals are determined by the number of these upvotes. Datasets with 5-20 votes are awarded a bronze medal, datasets with 20-50 votes are awarded a silver medal, and datasets with 50 or more votes are awarded a gold medal. Votes given by novice-level users are not included in the calculation.
Notebook Medals
The same rules as dataset medals apply to notebook medals. Notebooks with 5-20 votes are awarded a bronze medal, notebooks with 20-50 votes are awarded a silver medal, and notebooks with 50 or more votes are awarded a gold medal. Votes given by novice-level users are not included in the calculation.
Discussion Medals
Discussion medals are obtained by calculating net votes, which are obtained by subtracting downvotes from upvotes. Votes made to your old posts and votes from novice levels are not included in the calculation. One net vote is enough to get a bronze medal. If you get between 5-10 net votes, you will receive a silver medal and with 10 or more net votes, you will get a gold medal.
Performance Tiers
You are assigned a performance tier for each ranking category (“Competitions”, “Datasets”, “Notebooks” and “Discussions”). Your highest tier in all categories is displayed as the main tier on your profile.
Novice
You automatically receive this tier when you register on the platform.
Contributor
The conditions for becoming a “Contributor” are the following:
- Run one notebook or script
- Make one competition or task submission
- Make one comment
- Give one upvote
Expert
In order to become an “Expert”, you must win at least 2 bronze medals in competitions; at least 3 bronze medals in datasets; at least 5 bronze medals in notebooks; and at least 50 bronze medals in discussions.
Master
In order to reach the “Master” tier, you must win at least 1 gold and 2 silver medals in competitions; at least 1 Gold medal and 4 Silver medals in datasets; at least 10 silver medals in notebooks, and at least 200 medals in discussions, out of which at least 50 need to be silver medals.
Grandmaster
To become a “Grandmaster”, you need to win at least 5 gold medals in competitions, out of which at least 1 of them needs to be a solo gold medal; at least 5 gold and 5 silver medals in datasets; at least 15 gold medals in notebooks; and at least 500 medals in discussions, out of which 50 need to be gold medals.
How Long Does It Take To Move From Contributor To Expert?
It all depends on how much you persevere and contribute to Kaggle. On average, it takes around 1 year to move from contributor to expert, according to this analysis. This timeframe depends on different levels of effort that each data scientist is willing to invest in Kaggle. For example, if you are investing time in learning and improving, you might take longer to get the expert badge, but if you are just trying to earn as many medals as you can as fast as possible, then this can take a shorter period of time.
It’s usually better to invest time in making real progress and not just on earning medals. To improve your knowledge and skills to progress in your career, it’s the real experience that counts.
Kaggle Ranking Tips
As previously mentioned, your goal should be to use Kaggle or DataCamp to expand your network, improve your skills and learn as much as you can.
However, earning medals is also valuable, of course, and since the conditions for earning medals also depend on high upvote numbers, you should consider increasing the number of upvotes you receive. The following tips will help you increase your medal count:
- Do not ask for upvotes. With an effort to provide useful information, upvotes should come naturally. In fact, asking people for upvotes might be counterproductive and will most probably get you downvoted.
- You may get more upvotes if you credit authors. But use this information selectively and only when this adds real value to the post, so you do not run the risk of spamming.
- Remember: focus solely on improving your skills and the medals will follow.
Kaggle tiers and medals are tangible manifestations of your real achievements. As you progress, seeing your achievements materialize will most likely motivate you.
Conclusion
Competing in Kaggle or DataCamp data science competitions is fun and one of the tools to motivate you on your data science journey. Although there are extraordinary data scientists who determine the prizes of the competitions in their income models, the aim of most data scientists is to learn as much as possible from the competitions and gain real experience.
The most helpful element here is the ambition to increase the competition rankings. With this ambition, data scientists can look at other people's notebooks and study different codes and strategies. The suggestions that would be given to your code after the competition can even be considered as a style of free mentoring.
For a step-step walkthrough of how to analyze a dataset for a competition, check out our Kaggle Competition Tutorial.
blog
Introducing DataCamp Competitions!
blog
What is Kaggle?
tutorial
Kaggle Datasets Tutorial: Kaggle Notebooks
tutorial
Kaggle Competition Tutorial: Machine Learning from the Titanic
tutorial
Kaggle Tutorial: Your First Machine Learning Model
code-along
Participating in DataCamp Competitions
Rogelio Montemayor