“Data is the new oil, we need to find it, extract it, refine it, distribute it and monetize it.” – David Buckingham
Data science is the process of extracting insights and trends that are hiding behind the data. Machine learning, a method of data analysis uses statistical techniques and predictive models which gives the system the ability to learn with data. A data scientist’s job has been acclaimed as the sexiest job of the 21st century. So what do data scientists exactly do? They collect data from various sources, clean it for uniformity and then apply various algorithms & statistical models. Finally, they identify patterns, trends and provide business solutions to their clients. Sounds cool, doesn’t it?
The basic entity – data:
Data is in structured and unstructured form. Structured data refers to information with a high degree of organization, such that it can be included in a database to readily perform analysis; whereas unstructured data is essentially the opposite. For example of an unstructured data, an email holds information such as the time sent, subject, and sender but the content of the message is not so easily broken down and categorized. This can introduce some compatibility issues with the structure of a relational database system.
Deliverables of data science:
By this time, you must be wondering about the applications of this booming field. So read on to find how data science has been a viable resource to companies who are ruling our minds and hearts.
Optimization of Search Engine Results
Various search engines are using data science algorithms to deliver results about our queries within a fraction of seconds. Two concepts namely web crawling and indexing are used to optimize websites.
Mental Health Care
Ginger.io performs behavioral analytics on their users to determine how they are feeling. According to Ginger’s website, their “behavioral analytics engine, built from years of research at the MIT Media Lab, aggregates, encrypts, and anonymizes patient data before running it through statistical analysis to create meaningful insights.”
Tech giants like Amazon, Google and Netflix are using users’ viewing history to suggest new products/ services/ movies, etc. This is also known as Content Based Recommendation (CBR).
Genetics and Genomics
Genomics is closely related to the field of precision medicine, which is a process that encompasses genetics, behavior, and environment to predict proper treatment; in contrast to a one-size-fits-all approach. Researchers are predicting whether a child will develop serious health issues before his/her birth. DNAs of parents are studied and based on the data, predictions are made.
Detecting Vehicle Insurance Fraud
Insurance companies have been receiving fraudulent claims which have incurred losses of billions of dollars. Hence, these companies have turned to machine learning to detect frauds using predictive models.
Detecting Financial Frauds
Credit card companies are using customers’ transaction details such as amount, merchant, location, time and others to classify transactions into fraudulent or legit.
Machine Learning in Airline Industries
Airlines like Delta is using ‘collect and analyze’ system to track baggage and feed its information to customers via an app. Southwest Airlines has been saving millions of dollars by tracking fuel usage of its Boeing planes. Japan Airlines predicts technical problems from data collected using sensors on planes and prevents costly flight cancellations.
Facebook’s data leak and Cambridge Analytica Fallout
This was in the recent news that the British political consulting firm, Cambridge Analytica had been alleged to perform analytics on Facebook’s data to strategize the win of Donald Trump in the presidential elections. The company has, in fact, revealed that they ran the digital & television campaign and that their data provided all the strategy.
These are only some of the applications. There are a hundred others and almost every other domain is using data science to develop and flourish.
Skills required to become a data scientist
Data scientists are deep thinkers. Inquisitiveness drives them to ask new questions and subsequently find out solutions. They are highly creative people who can generate an attractive and easy-to-grasp report or visual out of results. They have multi-modal communication skills. Apart from all these, the most important is the technical acumen. According to Udacity’s blog, these 8 technical skill-sets are required to master data science:
- Programming Skills
- Machine Learning
- Multivariable Calculus & Linear Algebra
- Data Wrangling
- Data Visualization & Communication
- Software Engineering
- Data Intuition
Rucha is a gregarious person pursuing engineering in Information Technology. She has worked as an editor for college magazines and a freelance technology journalist. She loves to explore technology and elucidate it to her readers from her writing. She believes that if science & technology is made available in lucid manner, it can be perceived as an art by all.