Data Science Talks is a series of events aimed to reveal techniques, tips and tricks that teams working in a field of data science developed or use on daily basis. Talks will range in various fields of data mining, machine learning, probability models, predictive and prescriptive analytics, data engineering, pattern recognition, data visualization, data warehousing and other fields covering the extraction of knowledge from data.

dst#1: Predictive modeling using R

Boris Cergol, Ektimo CEO: In this talk I will share some of the best things I have learned about building predictive models. I will show you that by using the right packages, R can turn into a highly useful machine intelligence tool. By presenting a case study about predicting future stock returns, I will guide you through an example workflow of a data scientist that includes steps such as:

– data preprocessing and normalization,
– generating and selecting the right features,
– splitting data into training, validation and test sets,
– learning a classification algorithm,
– fine tuning your model,
– evaluating how good the model actually is.

The case study involves a big-ish dataset so I will touch upon how you can be more efficient using parallel computation and which packages bail you out when your data grows to big for your memory. Working example code will be provided to all participants.


