Data Science for beginners!

in Data Science

Data science for beginners!

I think that the best way to start this blog is to define what a data scientist is, and there’s no better definition then the one given by D.J. Patil and Thomas Davenport.

“It’s a high-ranking professional with the training and curiosity to make discoveries in the world of big data”

Big Data have opened up a world of new possibilities to statisticians who have developed technologies and analytical tools to extract relevant information for analysis and evaluation. These date are collected in dataset and usually they contain heterogeneous information from different sources that are often non-structured (e.g. images, GPS coordinates or posts and comments on social networks); this involves the use of non-conventional instruments to extrapolate and manage information. In the past, the collected data were analyzed using the descriptive statistics which fall within the field of business intelligence with limited data sets and simple models, but today the datasets are much more complex and require the use of inferential statistics and non-linear models.

A full 90% of all the data in the world has been generated over the last two years, demonstrating how the Internet of Things has given a strong boost to the production of unstructured data. Knowing how to extrapolate the data is a nearly 200 years old job, but it is only in recent times that those who work in this sector must face new problems, which therefore need a new job figure that encompasses among its skills many experiences in various fields. The data scientists work is multidisciplinary: combining computer skills and programming with knowledge of economics and sociology, with the help of statistical tools and software created specifically. The traditional statistics struggled to adapt to the amount of data and for that reason Data Science has emerged, proposing models and algorithms suitable to investigate data in unusual quantity to the statisticians and generating disciplines such as Data Mining or Machine Learning.

But who exactly is a data scientists and how it operates? We can describe his work through three different macro-skills (see here):

  • Hacking skills: are necessary for working with massive amount of electronic data that must be acquired, cleaned and manipulated.
  • Math and Statistics knowledge: allows a data scientist to choose appropriate methods and tools in order to extract insight from data.
  • Substantive expertise: in a scientific field is crucial for generating motivating questions and hypotheses and interpreting results.

The data scientists with this capabilities becomes more and more demanded in the corporate world. Labor market analysts agree that in the coming years there will be a big gap between the low supply and high demand for data scientists. Companies (as well as governments) are aware that they can gain competitive advantages from the data that are stored every day on Internet. For this reason in recent years Universities activated academic paths for the formation of these figures, until now formed independently only with experience and practice.

“The data scientists is an evolution of the quantitative analysts from which differ in the combination of varied skills and for a more distinct corporate mentality”

In a famous article of the Harvard Business Review, the data scientist job is considered the sexiest of the world. The charm of this figure comes from the fact that is a new discipline and in full transformation, suitable for anyone, from the mathematical-statistical astrophysicists to engineers, economists and political scientists. More than anything else, what is interesting in this work is to discover the world through the data, surfing in an ocean of numbers in search of the essential informations. Knowing how to use the tools of this profession allows people to show up in the labor market, ever so complex and problematic, with a great business card.

This blog was created with the aim to deepen this discipline, starting from the basics, with a “layman” approach since I come from a humanistic degree, so far away from the numbers and the analytical tools that are the daily bread of this profession, and I’ve always been in my comfort zone, now it’s time to change. As much as possible I’ll try to publish frequently news, tutorials and useful exercises to those who, like me, want to pursue this career.

Book – Interesting Reads

Leave a comment

Comment