Mysteries behind Train and Test split in Machine Learning

Speaker: Kalyan Prasad

Track: Data Science

Type: Remote Talk

Abstract: Splitting your data into training and test sets can be disastrous if not done correctly. So, there are some best practices/ secrets one should know to be successful when splitting the dataset.

Description: One of the key aspects of supervised machine learning is model evaluation and validation. When you evaluate the predictive performance of your model, it’s essential that the process be unbiased. Using train_test_split() from the data science library scikit-learn, you can split your dataset into subsets that minimize the potential for bias in your evaluation and validation process.

In this talk we are going to cover the following points: • Overview of Machine Learning Process • Importance of Data Splitting • Understanding Train & Test split methods • How to choose the right methods • Hidden secrets behind train & test methods • Case study • Conclusions By the end of the talk, one will get a clear picture on Train & Test split in ML process


• Curiosity to learn something new • Familiarity with Python • Foundation level understanding of Machine Learning

Thinkst Canary