Skip to main content

Top 10 Breakthroughs in Big Data Science in 2017

These are the 10 biggest breakthroughs that illustrate the potential of data science in harvesting usable information from big data.
Jan 2018  · 11 min read

Data, data, data. I cannot make bricks without clay. - Sherlock Holmes, in The Adventure of the Copper Beeches.

A simple illustration of the fact that we are in the age of big data, is that a Google search with either big data or Star Wars gives approximately 400 million results for both. As more people and business connect to internet and generated data gets stored, with an annual 4,300 percent increase in data production according to Forbes, a big data explosion is inevitable. By, 2020 Forbes predicts 30 billion devices will be connected to internet, resulting in 40 zettabytes of data. To compare: this was only 10 zettabytes of data in 2015. In parallel, experts expect that that the market value for big data utilization with predictive analytics or data science to grow from 130.1 billion dollars in 2016 to more than 203 billion dollars in 2020.

Big data and data science will impact healthcare, economy, politics, lifestyle and more. This article discusses the top 10 data science breakthroughs that have already unfolded in 2017 and yet its only the tip of the iceberg!

1. Are we alone in the universe?

This was the title of an opinion essay penned by none other than Winston Churchill, the well-known former prime minister of England, who was an advocate of science. In the article, Churchill agrees with other scientists of the time and now, that life in other parts of the universe is probable but Earth-like life will require Earth-like conditions. In December 2017, Google developed and applied data science algorithms on data or signals collected by NASA's Kepler telescope, to identify a Solar System like our own called Kepler-90 star system elsewhere in the universe. Similar to our Solar System, the Kepler-90 star system, which is 2,200 light years away, houses 8 planets and may potentially house Earth-like conditions in some of its planets. This is the first evidence of a star system with so many planets. The Google team applied neural network algorithm to scan the Kepler telescope data from Nasa, to detect this multi-planet star system, which could not be achieved by visual human scanning of the Kepler telescope signal.

2. Wonder how the dolphins are doing?

The Deepwater Horizon Oil Spill in 2010 at the Gulf of Mexico, is considered one of the worst marine oil spills. There has been great interest to understand the long-term impact of the spill on sea life and potentially counteract the adverse environmental effects of the oil spill at sea. Dolphins can use echolocation or sonar, which helps them "see" better underwater. Thus, scientists at Scripps Institute, California collected all acoustic signals from Gulf of Mexico using sensors, to potentially detect dolphin sonar signals for tracking them and for evaluation of their well-being. However, it was challenging to distinguish the dolphin sonar signals from the milieu of other acoustic signals collected. Can't say if the scientists were inspired by the 2016 movie Deepwater Horizon" on the oil spill of 2010 to make this breakthrough. Nevertheless, recently the scientists overcame this challenge by using unsupervised machine learning algorithm to pick discrete dolphin sonar signals from other sounds collected by the sensors. With this new approach, scientists hope to learn more about dolphins, which arguably match us humans in intelligence.

3. Will this cancer medication really work?

Precision medicine is a modern approach to treatment, where doctors select the best course of treatment for the patient based on the patients personalized genetic information. Our genetic information is like a fingerprint that is unique to every individual and determines how our body works. Moreover, the cancers and tumors that originate in our body also have unique genetic signatures. Therefore, it is logical to expect that because our bodies and tumors have unique genetic signatures, they will behave differently in response to drugs and all of us will not respond equally to the same drug. Based on this rationale, researchers collected a huge amount of big data -- genetic information from cancer patients and their recorded responses to drugs or chemicals. In October 2017, this gene-patient-drug interaction big data was used to apply data science methods of Support Vector Machine (SVM) and Recursive Feature Elimination (RFE), to predict personalized drug responses from genetic information of patients. This analysis is now available as open source and can be utilized for the first time to predict best drugs for treatment of cancers in patients, by just collecting their genetic profiles.

4. Can we predict the next severe thunderstorm and tornado?

Severe weather takes a huge toll on human life and billions of dollars in damage worldwide. Thus, its not surprising that ever since the development of computers, weather forecasting has been a significant research endeavor. This breakthrough may seem straight out of sci-fi movie "Twister", in which the scientists used climate data to make calculations and predicted storms. In May 2017, researchers at Penn State's College of Information and Technology, and Accuweather Inc published a pioneering work that utilized the power of big data and data science to predict severe weather in the global climate system. The researchers utilized a 'bow echo' signature signal, which is caught in the radar before a severe thunderstorm, hurricane or tornado develops. Though the bow echo signal is easily missed by human eyes, catching it early can help predict severe weather. By harnessing the vast data collected by the National Oceanic and Atmosphere Administration (NOAA), the researchers used machine learning to accurately and efficiently detect bow echoes and automatically predict severe thunderstorms, tornadoes and hurricanes. This is a new beginning and weather forecast systems worldwide are staring to adopt big data, and data science approaches to unravel the mysteries of global climate systems and save losses thorough prediction of severe weather.

5. How to increase safety and minimize crimes?

Prediction of crime seems futuristic and has filled pop-culture with entertainers such as Robocop, and Minority Report. By harnessing archived crime data records, data scientists this year have empowered authorities in Vancouver to predict break-ins with an algorithm and thereby ensure safety of the city. Machine learning data analytics was applied to recorded break-in criminal activity big data archives from 2012 to 2015, which enabled modelling and prediction of future break-ins. Every 2 hours, the algorithm now provides police officers on patrol an updated map, marking hot spots where these crimes are likely to occur.

Tip: if you want to know more about how police officers can make use of data science in their daily lives, check out this article.

6. Is this a new species of plant?

Scientists have so far discovered around 2 million species of plants, animals and other lifeforms, while millions are estimated to still remain undiscovered. Given this large number of species, it is challenging for taxonomists to determine if a specimen at hand is a new discovery or a species already known. Efforts geared towards digitizing images of plant and other discovered species has resulted in a big database containing 150 million images. By utilizing this image data, data scientists developed a deep learning algorithm in May 2017 that can use this information to determine the classification and new species assignment when new image data of a specimen in question is provided. The researchers anticipate that in addition to facilitating identification of new species, the specimen image data, which have been collected over centuries, can eventually help determine pattern of species adaptation in plants and animals with climate changes over time.

7. Can molecular structure predict smell of the molecule?

Presently, fragrance industry invests millions of dollars and time to develop and design novel aromatic molecules. Once designed, it still does not guarantee that the molecule will smell good without actual smell test with humans. Researchers from IBM and Rockefeller University teamed up to design a computational algorithm that can better predict smell from molecular structure. To train the algorithm, around 50 human volunteers were asked to smell and classify the smells into "sweet", "decayed" and other smell groups, of around 500 molecules whose molecular structural properties, such as atom types and functional groups, was available. The results published in Feb 2017, ushers a paradigm shift in our understanding of the basic interactions between odor molecules and how humans perceive it.

8. Where did the constitution come from?

As nations have grown, countries have connected, shared ideas and cultures. It is thus plausible to speculate that the constitution of countries across the world have inspired each other. Researchers at Dartmouth College published their work in in November 2017 that utilized big data analytics to map the evolution of constitutions across the world. National constitutions that were written from 1789 to 2008, were digitalized to apply data science methods. The textual content of this data came from 591 national constitutions, including the different versions of the constitution from the same country. This textual content was then analyzed using machine learning algorithm to identify connections between words, constitutions and was statistically scored. This study shows the power of text data based data science methods and in future could be applied to understand evolution of historic texts from ancient civilizations, such as Egyptian, Mesopotamia, Indus Valley, Chinese and other civilizations.

9. Which is the best city to celebrate Christmas in U.S.?

WalletHub harnessed big data from 100 U.S. cities with 29 parameters such as affordability, number of Christmas events, shopping deals, to develop and apply data science algorithms. The results predict that Chicago is the best place to celebrate Christmas based on the 29 parameters fed into the algorithm.

10. What's the risk of a second heart attack?

After the first hearth attach patients that experience irregular heartbeats are more likely to have a second heart attack. Thus, regular monitoring of patient heart rate by doctors is the major part of the patient recovery process. Cardiologists and data scientists at Stanford University, and University of California San Francisco have developed a data science algorithm that uses patient electronic health records, especially heart beat records and evaluates other risk factor records, to predict second heart attack for a patient. This is advantageous as the doctors don't need to physically analyze and evaluate the patient hands on for identifying elevated risk of heart attack in patients, allowing hospitals to save valuable time and resources for other aspects of patient recovery. As more and more patient electronic health records become available, it can be envisioned that predictive analysis will become a regular part of health care in the future.

In summary, these top 10 examples illustrate the potential of big data and data science in harvesting usable information from enormous amounts of big data, which cannot be otherwise comprehended by humans. Big data and data science are here to stay. To take an analogy from Star Wars, big data is the force that connects all of us in a big way!

"May the force (of big data) be with you" in 2018!

Data Science Concept Vector Image

How to Become a Data Scientist in 8 Steps

Find out everything you need to know about becoming a data scientist, and find out whether it’s the right career for you!
Jose Jorge Rodriguez Salgado's photo

Jose Jorge Rodriguez Salgado

12 min

YOLO Object Detection Explained

Understand YOLO object detection, its benefits, how it has evolved over the last couple of years and some real-life applications.
Zoumana Keita 's photo

Zoumana Keita

5 Ways to Use Data Science in Marketing

Discover five ways you can use data science in marketing. Get ahead of the game, improve your data skills, and work on a data science marketing project.
Natassha Selvaraj's photo

Natassha Selvaraj

DC Data in Soccer Infographic.png

How Data Science is Changing Soccer

With the Fifa 2022 World Cup upon us, learn about the most widely used data science use-cases in soccer.
Richie Cotton's photo

Richie Cotton


The Deep Learning Revolution in Space Science

Justin Fletcher joins the show to talk about how the US Space Force is using deep learning with telescope data to monitor satellites, potentially lethal space debris, and identify and prevent catastrophic collisions. 

Richie Cotton's photo

Richie Cotton

53 min

Regular Expressions Cheat Sheet

Regular expressions (regex or regexp) are a pattern of characters that describe an amount of text. Regular expressions are one of the most widely used tools in natural language processing and allow you to supercharge common text data manipulation tasks. Use this cheat sheet as a handy reminder when working with regular expressions.
DataCamp Team's photo

DataCamp Team

See MoreSee More