Machine Learning Tutorial Python – 8: Logistic Regression (Binary Classification)


Logistic regression is used for classification problems in machine learning. This tutorial will show you how to use sklearn logisticregression class to solve binary classification problem to predict if a customer would buy a life insurance. At the end we have an interesting exercise for you to solve.
Usually there are two types of machine learning problems (1) Linear regression where prediction value is continuous (2) Classification where predicted value is categorical. Logistic regression is used for classification problems mainly.

#MachineLearning #PythonMachineLearning #MachineLearningTutorial #Python #PythonTutorial #PythonTraining #MachineLearningCource #LogisticRegression

Exercise: Open above notebook from github and go to the end.

Topics that are covered in this Video:
0:01 – Theory (Explain difference between logic regression and classification)
1:18 – What is logistic regression?
1:26 – Classification types (Binary vs multiclass classification)
1:53 – Explanation of logistic regression using the example of if person will buy insurance based on his age
5:38 – Sigmoid or Logit function
8:18 – Coding (for coding we are using an example of if a person will buy insurance or not based on his age)
14:36 – sklearn predict_proba() function
15:49 – Exercise (Solve a problem of predicting employee retention based on salary, distance to work, promotion, department etc)

Next Video:
Machine Learning Tutorial Python – 8 Logistic Regression (Multiclass Classification):

Populor Playlist:
Data Science Full Course:

Data Science Project:

Machine learning tutorials:




Jupyter Notebook:

To download csv and code for all tutorials: go to click on a green button to clone or download the entire repository and then go to relevant folder to get access to that specific file.



Xem thêm bài viết khác:


  1. at 14:00 , why you used model.score(x_test, y_test), as you have to replace x_test with the prediction from the model to be like this:
    prediction = model.predict(x_test)
    model.score(prediction, y_test)

    then you can compare the prediction with the real value and so you can get the actual score

  2. Very good video! You are a great teacher!
    model.predict(25) doesn't work for me. I get the following error message
    ValueError: Expected 2D array, got scalar array instead:

  3. Sir in what case we don'tt do train and test split in some literature of logistics regression? quantity some thing

  4. when you start searching videos with codebasic (———-). I got problem while visualizing logistic curve. Sir, could you please share any email id or blog page for questions or help

  5. Sir, I have a CSV file on the desktop. Now how do I import the file to python? Like i am unable to put a code pd.read_csv(what should I put as address)?

  6. one question though in the exercise given of employee retention why did not you dummify the dept variable and included in the features, cannot it be one of the important features which we should be including coz I have included it and got better score as well.

  7. Thankyou very much for all help and support. Can you please make a video on mathematical explanation of sigmoid and logit function also ?

  8. You are the best teacher! I love the exercises at the end of each topic, which strengthens our understanding of what we learnt!!! Thank you so much! 🙂

  9. for seaborn pairplot, I'm getting this kind of error:
    'RuntimeError: Selected KDE bandwidth is 0. Cannot estiamte density.'
    Can anyone help?

  10. sir how to predict the single value lets say age 36 will buy or not the insurance the method model.predict(36) i have try but it it is not working but sir the video was nice

  11. Sir Kindly confirm whether you already have a video for how to do Exploratory data analysis and feature selection. Thank you.

  12. Dear Sir
    What a beautiful datasheet you have provided for practice with this video.
    Spent more than two days to play with it.
    Playing with the datasheet opened another dimension of the learning curve.
    Thank you very much for providing relevant exercises like this as a challenge!

  13. I got 0.21 acuracy that a person will leave the job , i took all factors into consideration using dummy variables,

  14. Thank you very much for this tutorial. I have a question with the exercise since we have multiple independent variables. How can I check which variable making the employees leave their company? Hope you will answer my question.

  15. sir when I use model.predict(52) I get the following error Expected 2D array, got scalar array instead:


  16. I know it sounds odd and a little out of context, but here in India, life insurance companies do not give a s**t about data scientists, they only hire street-smart salespeople who can hard-sell (or in most cases, mis-sell) insurance products to customers.

  17. sir actually i am getting two errors while doing the exercise
    first when i am using concat function of pandas I am getting multiple columns of the same
    and second I am facing error while dropping salary column(showing error like salary column not found in axis).

  18. This video is fantastic. I'm teaching myself machine learning and this was one of the most helpful resources I've found online. Excited to watch/work-through the rest of the videos! Thank you so much

  19. Hello Sir! Great work by you.
    there is a problem in your code may be due to version of python ….
    If we use X_train, X_test, y_train, y_test = train_test_split(df[['age']],df[['bought_insurance']],train_size=0.9)
    then only we get score of 1
    otherwise with your code it is 0.66.

  20. in the exercise one can argue the 'work accident' could also be included in the factors contributing employee retention.

  21. you should have talked about the datatype for the target variable 1 and 0 and which data type is required (numeric or string) for running the analysis.

  22. Thanks for making such great videos.
    Just had 1 Question, How can we plot graph for logistic regression with predicted values?

  23. Reasons behind the employee retentions are:

    1.Satisfaction Level

    2.Average Monthly Hours

    3.Promotion Last 5 Years

  24. It would be great if you could create a tutorial on the best way to find out the accuracy in case of classification problems in case of both balanced and unbalanced datasets

  25. 78 percent accuracy. I do all your exercises but in this I learned a lot. Thank you sir for such a great series @codebasics

  26. Step by step guide on how to learn data science for free:

    Machine learning tutorials with exercises:


Please enter your comment!
Please enter your name here