Data Science Club #2

Jakub MacinaData Science

The Data Science Club continued on Monday 30th of October in a lecture hall in the Slovak University of Technology in Bratislava. The big announcement is that Exponea joined forces with Personalized Web Research Group from the Faculty of Informatics and Information Technologies in Bratislava for collaboration on content preparation and making the Data Science Club a part of student research seminars.

The presentation topics were related to very useful aspects of data science – storing and understanding data you are dealing with.

In the first talk, Juraj Sottnik talked about the best ways for storing data. We touched on the types of problems we would like to solve, using the various types of data we deal with. The most commonly used approaches for data storage in the industry were discussed with a focus on relational databases and NoSQL databases along with small examples of their usage. Juraj, who has several years of experience in building data storage solutions, explained with an example of what kind of issues with storing data can be encountered by a simple e-shop web application. He also gave suggestions for picking suitable data storage for common problems (i.e. for storing payment data, you need to have reliable data in order to store, such as a relational database).

The goal of the exploratory data analysis talk by Jakub Sevcech, a researcher at the Slovak University of Technology, was to explain various techniques for describing and visualizing data (both continuous and categorical variables) and significance testing. What was really cool about this presentation was that he used interactive code samples which could be easily ran and modified. The most interesting part of this presentation for me was statistical testing where I learned the kind of statistical test to choose depending on the data to prove the statistical significance. This comes handy when you are going to evaluate the results of an A/B test. The funny part of the talk was XKCD comic describing how scientists badly want to publish their results and what common mistake they make.

To sum it up, I’m excited about collaboration between academic and industry to share ideas and knowledge. And the Data Science Club is right about it! During the final networking portion of the evening I met many interesting people who are passionate about data science and machine learning.

Key takeaways:

  • Approach to store data depends on your business problem.
  • Try to understand data before digging deeper into task.
  • Correlation is not causation.
  • Choose the right statistical test and be aware of their assumptions about the data.

If you didn’t have the opportunity to attend you can find content online: https://github.com/exponea/data-science-club and https://www.slideshare.net/data-science-club.

Don’t forget to book your seat to the upcoming Data Science Club, no.3!