More and more people are breaking into data science every day. There are various reasons for this trend. First, booming demand: data science remains one of the top fastest-growing jobs on the market according to the LinkedIn 2020 emerging jobs report, despite the slowdown of the industry due to the COVID-19 pandemic.
Next, salary expectations: Glassdoor 2020 statistics show that the salary of an entry-level data scientist is about $89,000 USD. And finally, remarkable popularity: The Harvard Business Review declared in a memorable 2012 article that data scientist was the sexiest job of the 21st century.
If you are considering starting a new adventure in the field of data science, don’t hesitate—go for it. To help you succeed in your journey, we have prepared a list of 10 lessons and practical tips that will help you navigate and find your place in the wonderful world of data science.
- Demystifying data science
- What is a programming language?
- In the beginning, it will hurt. But be patient.
- You are not alone: data science resources
- The art of coding
- Where do I start learning?
- Keep learning
- Data science is a means to an end
- With great power comes great responsibility
- Conclusion: (You will) be data, my friend
Demystifying data science
Data science is often discussed alongside other tech buzzwords like big data, artificial intelligence and machine learning, making it difficult to get a clear understanding of what data science actually is.
In a few words, data science is an interdisciplinary field that combines scientific methods, programming, algorithms, and statistics to extract knowledge from data. Data science comprises a set of powerful tools and methodologies to deal with data that can be used in nearly every industry. The possibilities range widely, from basic data exploratory analysis and data collection techniques like web scraping, to some of the most valuable domain applications, such as recommendation engines, computer vision, autonomous cars, and natural language processing, where machine learning and deep learning play a critical role.
While data science is a natural choice for professionals with IT or programming backgrounds, the field is rapidly evolving, and today it is fair to say that everyone is welcome, no matter where you come from. The reason? As data science breaks into new disciplines, knowing the essential aspects of a certain area or field of inquiry is critical. In addition to technical and coding skills, data scientists should always have a certain degree of business domain expertise so they can understand what they are doing. This includes evaluating the input data, assessing the value and validity of the insights, and discerning what makes sense and what does not.
What is a programming language?
Learning to code is a necessary step to becoming a data scientist. There’s no alternative answer, despite the recent surge of “no-code” data science and AI platforms. While these solutions allow non-technical business users to build applications and software (in an attempt to address the software developer skills shortage), the competencies, resources and mindset that data scientists provide are hardly replaceable, at least for now.
Programming is a central part of the daily life of a data scientist. But what is programming anyway? And what is a programming language?
Programming is the technique that allows the execution of automated tasks of a computer system. To communicate with a computer we use programming languages. A programming language is a set of semantic and syntactic rules that programmers use to write the instructions—also known as algorithms—for a computer to complete a specific task.
There are hundreds of programming languages. In data science, the two most popular programming languages are Python and R. Both languages are great for any data science task you may think of. They are often portrayed as rivals, but a smarter approach is to see them as complementary languages; allies that can be combined to exploit their full potential and respective advantages. Fortunately, DataCamp has a large catalog of courses where you can learn both Python and R.
In the beginning, it will hurt. But be patient.
Let's be honest: learning to code is hard. This statement applies to every person, no matter your background. It’s time to stop thinking that a person who studied computer science or math is a better-suited candidate for data science than a liberal arts graduate. The only difference between them is that the former probably started programming in college and the latter likely did not. But be sure that the former also struggled with coding at some point, especially at the beginning.
Coding is like going to the gym. The first days it hurts. Your muscles feel sore and stiff. You are not used to that pain and, while laying on the couch, you may be tempted to quit. But if you don’t give up, if you keep exercising, things will gradually improve. After some weeks, you will find yourself beating fitness milestones that seemed unattainable not long ago. Eventually, going to the gym will become a part of your routine, and one day you will realize you enjoy working out.
For most babies, it takes between nine and 14 months to start talking. Fortunately, a programming language is much simpler and more rudimentary than a human language. If you are determined, you should be able to write basic scripts within a few months. Just like going to the gym, you have to be patient when learning a programming language.
You are not alone: data science resources
Your data science adventure will be full of obstacles. You may get stuck while writing your code, sometimes you will not understand why your script is not running properly, and there will be times when you simply have no clue how to start a certain data science task.
No need to stress: you are not alone. One of the coolest things in data science—and, more broadly, the programming ecosystem—is that the internet is full of resources and information that can help you overcome the challenges you may encounter. You just have to ask the right questions to get the right answers.
Here is a list of resources that will come to your rescue throughout your data science journey:
Stack Overflow: The Oracle of Delphi for programmers. With more than 16 million users, Stack Overflow is a question and answer public platform for programmers. If you have a problem with your Python or R script, you will likely end up looking for solutions on Stack Overflow.
Tutorials: Troubles with regression analysis? Don’t know where to start with web scraping? Reading a tutorial on the subject can be a great starting point. You can find comprehensive tutorials on a large range of topics on well-established platforms, such as DataCamp, and even YouTube.
Online courses: If you want to become a subject expert, sharpen your coding skills, or just want to broaden your data science horizons, a course is probably what you are looking for. There are many options on the market, including DataCamp. Don’t miss the opportunity to explore our large list of courses.
Data science books: Books will always be a great source of information. A growing number of data science books have been published over the last years, and many of them can be found online for free. Here’s a recommendation: O’Reilly books.
Documentation: Last but not least, we need to mention package documentation. Documentation is one of the most important aspects of a good package or library. Documentation is the primary source for users to understand the aim of a package and how it works. Although it may not be your most exciting read, in many cases the solution to your problem will be a package function or parameter you were not aware of until you read the documentation.
The art of coding
After some months of coding workouts, you will feel more confident with your skills. During this time you will have internalized a lot of processes, syntax, and commands. Without noticing, coding automatisms will show up, making your scripting more fluid.
Eventually, you will start seeing the big picture: you will discover that coding is an art. For example, you will understand that there are many ways to solve a programming problem, but some are more efficient than others. Indeed, the search for efficiency will guide your work, both in terms of writing and running your code. This will lead you to learn new programming strategies.
Another important aspect you will start addressing is readability. Remember that readability is not only important for other programmers who may have to deal with your scripts, but also for your “future you.” By making tiny changes in some elements, such as syntax structure, variable and function naming, and spacing and indenting, your code can look better and be more understandable. Also, making comments on your code and documenting your functions will make life much easier for you and other readers.
Where do I start learning?
We previously defined data science as an interdisciplinary field that combines scientific methods, programming, algorithms, and statistics to extract knowledge from data. Yes, that’s a lot to start with. Data science is such a demanding field and it’s easy for newcomers to get overwhelmed. If this is a comfort, know that every person breaking into data science, irrespective of their background, has to go through a certain learning process: it’s simply impossible to know everything straight away.
Where does one start, then? There is not a single answer for this, but you won’t get very far in your career without a foundation in programming, statistics, and math. Regarding programming, besides Python or R, knowing SQL is certainly a must. As for math and statistics, don’t be scared. It may take more time, but you will learn what a p-value or an artificial neural network is in your own time.
In the meantime, you can do other things to increase your visibility and chances of getting hired. A great idea is to create a portfolio. In this regard, we highly recommend trying our recently launched DataCamp Workspace, an online environment to write code, apply your skills by analyzing interesting datasets, and build out your data science portfolio. In addition, you could write articles about the field, participate in data science competitions, or take data science certifications.
If you think that finding your first data science job is the end of the adventure, you’re wrong. Data science is a dynamic and rapidly evolving field. What is popular today can be obsolete tomorrow. To get an idea of the state of the data science landscape, see this picture.
We can draw two conclusions from this image. First, data science is a lifelong learning process. You must keep learning or you risk becoming obsolete.
Second, it’s impossible to know all the programming languages and technologies out there. So choose what to learn according to the needs in your job and what you are most passionate about.
Data science is a means to an end
What’s the value of your work as a data scientist if you can’t communicate the relevance of the projects you’re working on? What’s the point of doing a deep and thorough data analysis if nobody understands what you’re doing?
The goal of data science is to extract insights from data and apply those insights to create value. In other words, data science is not an end in itself, but a means to create value. Skills such as good communication, storytelling, and creative thinking are key to translating insights into value. In this regard, as mentioned above, the industry can benefit from non-technical skills.
With great power comes great responsibility
This is probably the most important lesson. Data science is behind some of the most valuable applications and inventions in our lives. Our societies are rapidly changing, propelled by the disruptive force of data science and artificial intelligence, among other technologies.
In this context of uncertainty and deep changes, it’s important to stay critical and cautious. Our daily work as data scientists involves dealing with tons of data, building models, and translating insights into value, but we should always try to go beyond our computers and question ourselves about the societal impacts of our work.
Being critical and accountable is the first step to preventing industry abuses and ensuring a fair future.
Conclusion: (You will) be data, my friend
To conclude this post, we want to share with you one last lesson: data science is not only changing society, but it will also change your life forever.
The world we live in is complex. We constantly deal with processes and systems that are beyond our comprehension. To address that complexity, we work with models, which can be defined as a simplified description of reality. In this vein, data science provides a good number of models that can help us understand our world. For example, the relational model for database management can be very handy to structure information and describe complex phenomena. Social network models can help us understand how information flows from person to person or exploit the potential of the networks we are part of. Even more fascinating, the mathematical models that power machine learning applications are not only crucial for computers to learn, they also provide new perspectives to look at human intelligence, and thus to gain a deeper understanding of ourselves.
Finally, there’s another reason why data science will change you. We live in the age of Big Data. Tons of data is created and collected every day. Data is nothing but information of many types and sources. Reading allows us to learn from books and other source texts, and data literacy allows us to learn from data. This is quite a thing, as data offers unprecedented ways to address the study of any domain you may think of, from medicine, psychology, and art, to climate change, astronomy, and history.
Data science provides you with the tools to analyze data—what you do with them relies solely on your imagination and curiosity. So let’s continue the adventure: out there there is a whole world waiting to be (re)discovered by you. Make a start by building your data skills online.