The concepts of Data Science right from basics to actual coding by Vishrut Solanki

Data Science

Hey, I want to learn to write algorithms for data science but I don’t know where to start? I want to enter the world of data analysis but I have no background in programming or computer science.

These were some of my thoughts before I began the IE Data Science Bootcamp three weeks ago. I am sure there are many people out there who would have similar thoughts. I hope that reading from here onward helps you get some clarity.

As a metallurgist, who spent all the professional years in the field of Project Management, the field of data analytics had intrigued my interest for a long time. In April’17 I found an opportunity to work on a Business Intelligence Project and later another opportunity in Social Media Analytics. Both of these experiences increased my curiosity ten folds and I started researching on acquiring the core skills of a data scientist. Through a lot of research ranging from reading about the world of big data to meeting various data scientists, I eventually learned about the Bootcamp and decided to enrol in it. And these three weeks have exactly been what I was looking for.

The Bootcamp involves an intense 11-week period of learning the concepts of Data Science right from basics to actual coding. My classmate Inna wrote a wonderful blog introducing the program and explained the first two weeks (click here). The third week of program involved learning to use the libraries of ‘Dplyr’, “Lubridate’ and ‘Data.Table’ in R. After learning the basic functions of R in the first two weeks, we were taught how to use these libraries to work faster while handling and structuring data. One can choose to work with Data Table or Dplyr based on one’s preferences, functions of both these libraries are taught to understand the familiarity and application.

Through various class exercises and a control exam, I feel confident about using these libraries but of course there is a more to discover through practice and experimentation.

In Python, we began working with arrays which were the building blocks to working with Pandas. Pandas are structured data tables. Having learnt about how to structure data in R, it was easier to grasp the functions and their application in Python. With exercises in every session we learnt about ordering the data and performing mathematical operations on different parts of the data. Studying each of these languages 3 hours every day does help me absorb and digest the logic. There are moments in the day when I mix the functions in the programs only to realise my mistake seconds later when the program does not work. Making a glossary of commands has helped me control these errors.

Using the learnings from both these programs we began working on our Capstone Project with our client Minsait. The project is about understanding the pollution caused by cars in the city of Madrid and designing a model to predict the rise in air contamination in order for city officials to take necessary action. The project involves working on with real data, data that has impact on a real-life situation.

Working on this project not only gives me and my team hands on experience but also a sense of responsibility that the result of this project can have a social and environmental effect. Similarly, other groups have projects that have business impact in their respective sectors.

As we begin the 4th week, we will learn about data acquisition in R and SQL and explore Pandas in Python. I can see the improvement in my understanding of data science and programming and I look forward to the challenges that I will face in the coming weeks.