Quick Guide to Data JournalismNovember 4th, 2016 in General
With a renewed focus on data storytelling in the data science industry, the approach to data science as a team sport, and big investigations carried out and published by data journalists, such as the Panama Papers, the 2016 U.S. Election Forecast or the Airbnb effect, the interest in data journalism is on the rise.
But what is it exactly and how do you become a data journalist?
Today’s blog post will try to give more insights and answers to these questions.
Towards A Definition of The Hottest Trend In Journalism
There are many definitions out there and it’s hard to see the forest for the trees. Some say that data journalism the same as data-driven journalism (DDJ), others insist in the two being different disciplines. You’ll find quite some definitions out there.
In this article, we will consider these two as the same discipline.
Then, it's easy, right? A definition isn’t so hard at all! It's just journalism done with data.
But what does journalism exactly mean and what is data?
Data journalism is generally defined as “the new possibilities that open up when you combine the traditional ‘nose for news’ and ability to tell a compelling story, with the sheer scale and range of digital information now available”.
However, this definition perhaps obscures the fact that there's a workflow: in that sense, Mirko Lorenz’ definition, where the discipline is described as a “workflow where data is the basis for analysis, visualization and –most importantly- storytelling”, could be more accurate.
Note that you'll often see the term "CAR" (Computer-Assisted Reporting) passing by. This was the first organized, systematic approach to using computers to collect and analyze data to improve the news.
How To Become a Data Journalist
Now that the context and definition is clear, you can start thinking about it takes to become a data journalist. The following section will give you more insight into what you need to do to become one and you will also find the outline of a step-by-step plan that you can follow, including the best resources.
What Does It Take To Become A Data Journalist?
You may have read some quotes on this, such as “To become a good data journalist, it helps to begin by becoming a good journalist” (Meredith Broussard) or “Computers don’t make a bad reporter into a good reporter. What they do is make a good reporter better” (Elliott Jaspin), but what in the end do you need to become a data journalist?
According to Scott Klein, Deputy Managing Editor at ProPublica, and Co-Founder of DocumentCloud candidates should possess 1. Journalism skills, 2. Design talent, 3. Coding acumen. That seems fairly simple, but what about the educational background, and what is exactly meant with ‘journalism skills’, ‘design talent’ or ‘coding acumen’?
For what concerns the first aspect, you might think that you need a journalism degree. Scott Klein confirms that most people on his team have degrees in journalism, but it’s certainly not a prerequisite. There are examples of data journalists that have math or computer sciences background.
And it works well, also because, according to Klein, “journalism is a natural fit for mathletes who want to make the world a better place". However, what Klein’s looking for in candidates also gives away that the educational background doesn’t necessarily play a big part, as long as you possess the three things he’s looking for.
And, going from the other side, it's certainly possible to become a data science journalist if you haven't got any technical background.
Whatever your background is, you'll need to consider it in your quest to acquiring the three skills you need to become a data journalist!
And these three skills don’t come easy. Unfortunately, there aren’t a lot of universities or courses out there that can teach you all three and most people confirm that you really need to learn a lot on your own.
You can follow a MOOC such as this one or this Big Data University course, or workshops taught by external data journalists, but the offer is quite scarce and doesn't come cheap. Many trainings for professional data journalists often consist of collaborations between data and journalist teams, calling on support networks, data bootcamps, ...
But mainly, it’s just about teaching yourself.
And this is where you need a step-by-step plan for yourself, complete with resources, to get where you need to be to become a data journalist or, if you already are one, to keep on educating yourself.
A Step-By-Step Plan
This step by step plan contains the first pointers in order to get started. You will need to personalize this guide according to your educational background and your learning style.
Here are the eight steps that are included in the plan to become a data journalist:
- Develop A Broad Knowledge Base
- Write, Write, Write
- Learn (Some) Programming Languages
- Discover Data Journalism Workflow
- Build Your Toolbox
- Start Building Your Network
- Continue Your Learning
- Go For it!
Journalists are naturally people that have to be able to adjust their skills whenever new topics come around. Furthermore, the subjects that data journalists cover can vary so much that you have to be able to cover a wide range, even wider than typical journalists.
The key to developing a broad knowledge base is probably different for everyone and depends on your learning style.
One of the ways, though, to get there is by reading, listening and watching a lot. But in the end, your attitude is probably the most important thing to get where you need to be. It’s definitely a plus if you’re curious by nature and that you have something that drives you to discover and learn new things all the time.
The broad knowledge base that is this first step designates doesn’t only cover knowledge of current affairs, but also knowledge of quantitative topics. You shouldn’t only be aware of, let’s say politics, and not know anything about statistics, because this would undoubtedly interfere with your capability to analyze political data. And let this also be one of the things that often comes back in articles is the advice of data journalists and editors: take stats classes. If you’re looking to get started, make sure to check out OpenIntro and the stats courses that DataCamp offers.
Lastly, consider getting a bit more background on the discipline itself:
- Listen to this interesting talk of Scott Klein on the history of data journalism
- Check out what your colleagues have been doing:
- The animal extinction project by Anna Flagg (ProPublica)
- The most dangerous jobs in America by Christopher Cannon, Alex McIntyre and Adam Pearce (Bloomberg)
- The NSA files decoded, by Ewen Macaskill, Gabriel Dance, Feilding Cage and Greg Chen (The Guardian)
- Visualizing the Iraq war logs, by Jonathan Stray and Julian Burgess
You might think that this step is sort of negligible in your plan to become a data journalist, but it really isn’t. Writing well is one of the things that is hard to teach and requires a lot of practice if you want to write fast and accurately, but still targeted to your audience and in the context of the medium you write in, which might be a blog, a newspaper, … It takes skill to write for an audience and not only for yourself. Whatever you think might be accessible and easy for others to read, could likely not be the case.
So make sure to take your time for this step. Luckily, there are quite some courses online with this topic so you won’t be left in the cold:
- EdX has a considerable offer for those who want to learn more about journalism. The “Journalism for Social Change” course could interest you, but also the “English for Journalists: Key Concepts” undoubtedly will be of great help.
- Also, Coursera offers journalism courses and a whole specialization track to launch your career in journalism.
- Check out the training courses of Mediabistro.
If you’re more into live training, you might consider one of The Guardian’s masterclasses or make a Google search to see if there are any classes in your neighborhood; There are a lot of universities and organizations out there that offer journalism courses.
Even though you can do a lot of things with tools such as Microsoft Excel and programming isn't a requirement to become a data journalist per se, learning how to code at this (early) stage will benefit you.
Contrary to what you might expect, the goal of learning to program here is not only to make sure you can gather information but rather that you’re able to display information.
Note that the choice of a certain language here is dependent on where you want to work and what data/story you're working on.
Some jobs as a data journalist require you to know more about web development than about gathering, transforming and modeling data and vice versa. To start, it’s probably best to have a basis in both and then develop your proficiency in whatever interests you more, since that will also play a part in which jobs you’re going to apply for.
Next, also skills with the Django (Python) and Ruby web frameworks are in high demand. If you're looking to learn how to program in both, you should consider the CodeSchool courses.
Lastly, R/SAS/SPSS and Python should also be on a data journalist’s to-learn list. These languages differ from the other languages and the Django framework that has been mentioned above in that they are excellent to analyze and model data. Courses that might come in handy here are DataCamp’s Introduction to R and Introduction to Python for Data Science courses. They are tailored to beginners and take your programming skills to the next level, step by step. For SAS training, you can go here and for SPSS, you can go here.
Knowing the workflow and having a toolbox at your disposal to tackle it is an essential step in your learning. There aren’t a lot of requirements to get started with this step, but most data journalists agree that you should be able to work with Microsoft Excel. So, if you have no idea about how spreadsheets work, you should make sure that you have a working proficiency with Excel before you start anything else.
If you have that basis, you can start looking into the data journalism workflow.
Very much like the data science workflow, data journalists should go through the steps of data collection, data wrangling, analysis, and data visualization and reporting.
However, the focus of this process will be less on modeling the data but will instead be more on the other steps, with a specific emphasis on reporting or storytelling.
Some resources that might help you get started on understanding data wrangling, data analysis, data visualization and reporting are:
- Edward Tufte’s books: these books are great if you have no idea about presenting data and information. They are an excellent resource for anybody that wants to brush up their knowledge of visualization and visualization theory. Also worth checking out is "The Functional Art" by Alberto Cairo and "Information Dashboard Design" by Stephen Few.
- "The Signal and the Noise: Why So Many Predictions Fail – But Some Don’t" by Nate Silver: a must-read for anyone that is interested in data or analysis.
- There are many resources out there that focus specifically on data collection, wrangling and visualization with Python and R. Read more about those here.
You need to have the right tools to tackle the workflow that was described above. Luckily, there is a wide range of tools at data journalists’ disposal. The choice for a certain tool naturally depends on the context you’re working in: a look at the job postings for data journalist teaches us that the tools can vary from job to job, and there’s also the context of the story and the data that will have some effect on your choice. Lastly, you might also have some preference for some tool because you have a good proficiency with it.
And this is where your attitude comes in.
You don't need to know how to work with every tool out there, but you should be up to speed with what each tool can offer you and your story.
You should possess the capability to pick up skills and a willingness to learn.
Below, we list some examples of tools that are often mentioned on forums such as Quora and in job postings. This overview follows the phases of the workflow that was described above and is not meant to be exhaustive: it is just meant to give you pointers on where to start.
Setting up your workspace. It's is probably the first step in the workflow. You’re probably going to need a coding editor if you’re planning on programming. Consider VIM, TextMate, Sublime Text or a fully-fledged IDE such as RStudio, Spyder, … You can also consider installing Git or some other versioning control system to manage your source code.
Getting your data. The basic building stone of your workflow is data. So you first need to have a clue of where you can obtain this data. The first way to get a hold of your data is through your network. Sources are very important because the data doesn’t always come to you but the story does. Then it’s your job to find the data to corroborate the story you hear from your sources.
The second way to obtain your data is through open data platforms. There are a lot of them out there, but resources that stand out are the World Bank and United Nations websites, Data Portals, and DataHub. You also have the U.S. Government's open data and the U.K. Government's Open data, and many other governmental websites with open data.
The Guardian’s Datablog could also be an interesting place to start as an aspiring data journalist, as it presents the data, context and questions regarding the data journalism process.
Datasets are also made available through mailing lists, such as the NICAR listserv, or forums.
Besides the datasets that are readily available to you through open data platforms, mailing lists, forums or sources, you could also get your data through web scraping. Here, you will need to make use of the packages or libraries that programming languages such as Python and R offer you. Or you can resort to tools that are specifically made for this purpose and that don’t require you to be proficient in programming.
On another note, it’s also possible to retrieve data from databases. SQL will come in handy at this point. If you would consider learning this skill, try focusing on MySQL, PostgreSQL or SQL Server. Consider taking a couple of tutorials on TutorialsPoint to find out more about databases and about SQL.
You immediately see that having a network is important here. We'll get back to this later.
In all of this, it’s very important that you don’t forget the legal considerations. Always check what you can do with the dataset you have obtained and who you need to give credit. If you don’t get a hold of the data that you need, you can always hand in a Freedom of Information (FOI) request (also called ‘wobbing’), which gives you the opportunity to request access to recorded information held by governmental public sector organizations, such as the police, schools or publicly owned museums.
Get your data in the workspace. To get your data into your workspace, you can first resort to the most basic way, namely, by working with Excel. However, when you’re already more advanced, you should be able to make use of the libraries or packages of the coding language(s) you’re using. You can use special functions to import data on .csv, .txt or other file extensions into your workspace, but there is also specific material out there to help you collect data from the Internet through web scraping. Python and R have libraries/packages that are designed for that purpose, namely
rvest. To scrape data from PDFs, you can use Tabula. To extract web data, you can also make use of import.io.
Wrangle your data. To wrangle your data means to manipulate, clean and reshape it in such a way that it is ready to perform your analysis on it. For Python, the
Pandas packages are the way to go. For R, you can fall back on
tidyr. For what concerns out-of-the-box tools, OpenRefine (formerly Google Refine) is a popular tool used for cleaning and transforming data, but you can also use DataWrangler or CSVKit.
Analyze your data. Here's probably where your stats skills will come in handy. Make use or R and Python to model your data with
statmod. You can also make use of tools such as DataRobot, Knime or RapidMiner.
Report your results. You can build dashboards with Tableau or Qlikview, or you can make an infographic with Adobe Illustrator, Adobe Indesign or Adobe Photoshop or you can put your code and visualizations in a Notebook. You have the web-based Jupyter notebook for your Python code. For R, you might consider making an R Markdown document.
Building your network will be important if you want to become a data journalist because that’s how you can find inspiration and mentorship. Your network will allow you to learn from the best.
Start by following some of the key people in data journalism and the industry on Twitter:
This is just a list to get you started.
Note that this list excludes the people that have already been mentioned throughout this post.
In addition, you can also join groups on Reddit or LinkedIn to stay up to date with the latest news: consider following the subreddit /r/theydidthemath or /r/datasets, but also take a look at the more specific Python or R subreddits to stay up to date with the latest news. The language specific Reddit and LinkedIn groups are listed here.
Furthermore, you could also consider going to Meetups like this one, and keep an eye out for events and/or conferences in your region through the Data Driven Journalism or the European Journalism Center sites. Also, consider joining the Knight-Mozilla Open News community.
Your learning will never be done. There is always so much more to discover and to do, especially when you want to start in this field or even when you already have a job.
You’ll always be learning.
Some additional resources that you can check out:
Follow and discover interesting sites. Visit blogs, such as FiveThirtyEight, TheUpshot (New York Times) or the ProPublica Nerd Blog, or other interesting sites about data visualization, such as EagerEyes or FlowingData. Also, don't forget to check out the blogs and portfolios of data journalists or data freelancers, such as Maarten Lambrechts, Alberto Lucas López or John Burn-Murdoch.
Listen to podcasts. If you like podcasts, you'll be excited to tune into Data Stories. It’s a great podcast on data visualization. Don’t miss this great episode where Scott Klein talks about his team and how they work at ProPublica. Also make sure to check out Partially Derivative and the podcast of FiveThirtyEight.
Read some books. Recommended books are, among many others,this book by Claire Miller and "Numbers in the Newsroom: Using Math and Statistics in News" by Sarah Cohen.
Discover some other training materials. Check out this list of training materials. And consider whether you're already comfortable with the tools of your toolbox. If you're not, consider taking some tutorials. You can find more R or Python tutorials here.
Start doing data journalism. Start small and start with a project on your own. Take a dataset and get started on analyzing, visualizing and reporting on the results. You can find projects on Kaggle or DrivenData. As a next step, you can start a blog. Just like those that are mentioned above. It’s a great way of showcasing your talent and a great addition to your resume. And you make sure that your results are shared with the world!
When you have gone through the previous steps, you might want to get yourself a job as a data journalist.
Some Last Advice
In the end, the best advice to become a data journalist is that of Maarten Lambrechts: just start doing data journalism.
In addition, here are some more tips:
Don't get discouraged. At start, you’ll most likely run into problems, but that's no reason to give up. You learn by doing and this takes some time.
Don’t be afraid to start small. There are newsrooms out there that don’t have big data teams yet. Keep this into account.
Take your time. It will take some experience to judge whether a certain project is worth it. Sometimes, you will work on data and it just won’t make it as a story. Also take your time to build up your network, to learn from others, to build experience in the whole workflow.
No comments yet. Be the first to respond!