Supervised Learning : Regression¶

1 Cars4U Project¶

1.1 Objective¶

Explore and visualize the dataset.
Build a linear regression model to predict the prices of used cars.
Generate a set of insights and recommendations that will help the business.

1.2 Data:¶

S.No. : Serial Number
Name : Name of the car which includes Brand name and Model name
Location : The location in which the car is being sold or is available for purchase Cities
Year : Manufacturing year of the car
Kilometers_driven : The total kilometers driven in the car by the previous owner(s) in KM.
Fuel_Type : The type of fuel used by the car. (Petrol, Diesel, Electric, CNG, LPG)
Transmission : The type of transmission used by the car. (Automatic / Manual)
Owner : Type of ownership
Mileage : The standard mileage offered by the car company in kmpl or km/kg
Engine : The displacement volume of the engine in CC.
Power : The maximum power of the engine in bhp.
Seats : The number of seats in the car.
New_Price : The price of a new car of the same model in INR Lakhs.(1 Lakh = 100, 000)
Price : The price of the used car in INR Lakhs (1 Lakh = 100, 000)

1.3 Problem definition and questions to be answered¶

In this project we want to analyze how different characteristics of used cars impact the Price of the car. The questions to be answer are:

Does the brand and model of the car impact the price?
Do luxury brands increase the price of the car?
How much the total kilometers driven impact the price?
Is more profitable to trade cars in some locations than others?
What is the impact of the Fuel Type, Transmission, Mileage, Engine, Power and Seats on the price?
Is there are relationship between the price of new cars and used cars?

2 Import packages and turnoff warnings¶

In [3]:

Copied!





import warnings
warnings.filterwarnings("ignore")

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# import pandas_profiling
sns.set(color_codes=True)
%matplotlib inline
import warnings
warnings.filterwarnings("ignore")

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# import pandas_profiling
sns.set(color_codes=True)
%matplotlib inline

3 Import dataset and quality of data¶

In [5]:

Copied!

# read data from csv file
data = pd.read_csv(r"C:\Users\AndresDelgadillo\Downloads\used_cars_data.csv")
# read data from csv file
data = pd.read_csv(r"C:\Users\AndresDelgadillo\Downloads\used_cars_data.csv")

In [6]:

Copied!

# get columns
data.columns
# get columns
data.columns

Out[6]:

Index(['S.No.', 'Name', 'Location', 'Year', 'Kilometers_Driven', 'Fuel_Type',
       'Transmission', 'Owner_Type', 'Mileage', 'Engine', 'Power', 'Seats',
       'New_Price', 'Price'],
      dtype='object')

In [7]:

Copied!

# get size of dataset
data.shape
# get size of dataset
data.shape

Out[7]:

(7253, 14)

In [8]:

Copied!

# check dataset information 
data.info()
# check dataset information 
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7253 entries, 0 to 7252
Data columns (total 14 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   S.No.              7253 non-null   int64  
 1   Name               7253 non-null   object 
 2   Location           7253 non-null   object 
 3   Year               7253 non-null   int64  
 4   Kilometers_Driven  7253 non-null   int64  
 5   Fuel_Type          7253 non-null   object 
 6   Transmission       7253 non-null   object 
 7   Owner_Type         7253 non-null   object 
 8   Mileage            7251 non-null   object 
 9   Engine             7207 non-null   object 
 10  Power              7207 non-null   object 
 11  Seats              7200 non-null   float64
 12  New_Price          1006 non-null   object 
 13  Price              6019 non-null   float64
dtypes: float64(2), int64(3), object(9)
memory usage: 793.4+ KB

In [9]:

Copied!

# check dataset missing values
total = data.isnull().sum().sort_values(ascending=False) # total number of null values
print(total)
# check dataset missing values
total = data.isnull().sum().sort_values(ascending=False) # total number of null values
print(total)

New_Price            6247
Price                1234
Seats                  53
Engine                 46
Power                  46
Mileage                 2
S.No.                   0
Name                    0
Location                0
Year                    0
Kilometers_Driven       0
Fuel_Type               0
Transmission            0
Owner_Type              0
dtype: int64

There are 7253 rows and 14 columns.
'New_Price' and 'Price' columns have a big number of missing values, and that could affect the results of the analysis. A more deep study is necessary to deal with all missing values

4 Characteristics of the data¶

In [10]:

Copied!

# check first rows of data
data.head()
# check first rows of data
data.head()

Out[10]:

	S.No.	Name	Location	Year	Kilometers_Driven	Fuel_Type	Transmission	Owner_Type	Mileage	Engine	Power	Seats	New_Price	Price
0	0	Maruti Wagon R LXI CNG	Mumbai	2010	72000	CNG	Manual	First	26.6 km/kg	998 CC	58.16 bhp	5.0	NaN	1.75
1	1	Hyundai Creta 1.6 CRDi SX Option	Pune	2015	41000	Diesel	Manual	First	19.67 kmpl	1582 CC	126.2 bhp	5.0	NaN	12.50
2	2	Honda Jazz V	Chennai	2011	46000	Petrol	Manual	First	18.2 kmpl	1199 CC	88.7 bhp	5.0	8.61 Lakh	4.50
3	3	Maruti Ertiga VDI	Chennai	2012	87000	Diesel	Manual	First	20.77 kmpl	1248 CC	88.76 bhp	7.0	NaN	6.00
4	4	Audi A4 New 2.0 TDI Multitronic	Coimbatore	2013	40670	Diesel	Automatic	Second	15.2 kmpl	1968 CC	140.8 bhp	5.0	NaN	17.74

In [11]:

Copied!

data.tail()
data.tail()

Out[11]:

	S.No.	Name	Location	Year	Kilometers_Driven	Fuel_Type	Transmission	Owner_Type	Mileage	Engine	Power	Seats	New_Price	Price
7248	7248	Volkswagen Vento Diesel Trendline	Hyderabad	2011	89411	Diesel	Manual	First	20.54 kmpl	1598 CC	103.6 bhp	5.0	NaN	NaN
7249	7249	Volkswagen Polo GT TSI	Mumbai	2015	59000	Petrol	Automatic	First	17.21 kmpl	1197 CC	103.6 bhp	5.0	NaN	NaN
7250	7250	Nissan Micra Diesel XV	Kolkata	2012	28000	Diesel	Manual	First	23.08 kmpl	1461 CC	63.1 bhp	5.0	NaN	NaN
7251	7251	Volkswagen Polo GT TSI	Pune	2013	52262	Petrol	Automatic	Third	17.2 kmpl	1197 CC	103.6 bhp	5.0	NaN	NaN
7252	7252	Mercedes-Benz E-Class 2009-2013 E 220 CDI Avan...	Kochi	2014	72443	Diesel	Automatic	First	10.0 kmpl	2148 CC	170 bhp	5.0	NaN	NaN

In [12]:

Copied!

# get a random sample of data
np.random.seed(1) #setting the random seed via np.random.seed to get the same random results every time
data.sample(n=5)
# get a random sample of data
np.random.seed(1) #setting the random seed via np.random.seed to get the same random results every time
data.sample(n=5)

Out[12]:

	S.No.	Name	Location	Year	Kilometers_Driven	Fuel_Type	Transmission	Owner_Type	Mileage	Engine	Power	Seats	New_Price	Price
2397	2397	Ford EcoSport 1.5 Petrol Trend	Kolkata	2016	21460	Petrol	Manual	First	17.0 kmpl	1497 CC	121.36 bhp	5.0	9.47 Lakh	6.00
3777	3777	Maruti Wagon R VXI 1.2	Kochi	2015	49818	Petrol	Manual	First	21.5 kmpl	1197 CC	81.80 bhp	5.0	5.44 Lakh	4.11
4425	4425	Ford Endeavour 4x2 XLT	Hyderabad	2007	130000	Diesel	Manual	First	13.1 kmpl	2499 CC	141 bhp	7.0	NaN	6.00
3661	3661	Mercedes-Benz E-Class E250 CDI Avantgrade	Coimbatore	2016	39753	Diesel	Automatic	First	13.0 kmpl	2143 CC	201.1 bhp	5.0	NaN	35.28
4514	4514	Hyundai Xcent 1.2 Kappa AT SX Option	Kochi	2016	45560	Petrol	Automatic	First	16.9 kmpl	1197 CC	82 bhp	5.0	NaN	6.34

'Name' column could be split in 2 columns. The first column would be the brand and the second column the model of the car
'Location', 'Fuel_Type', 'Transmission', and 'Owner_Type' columns could be transformed to 'category'
'Mileage', 'Engine', 'Power', and 'New_Price' columns should be numerical values but they appear as 'object'. Processing columns is necessary to convert them to numerical
'S.No' is the same as the index of the dataset and we can drop the column

5 Processing columns¶

5.1 Mileage¶

This column is the standard mileage offered by the car company in kmpl or km/kg. We are going to split the column between values and units to see if there is a relation between Fuel_Type and Mileage

In [14]:

Copied!

# Split Mileage column to extract units
data[['Mileage','Unit']] = data['Mileage'].str.split(' ',n=2,expand=True)
# Split Mileage column to extract units
data[['Mileage','Unit']] = data['Mileage'].str.split(' ',n=2,expand=True)

In [15]:

Copied!

# Get unique pairs of Fuel_Type and Unit
data.groupby(['Fuel_Type','Unit']).size()
# Get unique pairs of Fuel_Type and Unit
data.groupby(['Fuel_Type','Unit']).size()

Out[15]:

Fuel_Type  Unit 
CNG        km/kg      62
Diesel     kmpl     3852
LPG        km/kg      12
Petrol     kmpl     3325
dtype: int64

There is a clear relation between 'Fuel_Type' and 'Unit'.

Mileage for CNG and LPG are in km/kg
Mileage for Diesel and Petrol are in kmpl It is not necessary to convert the units because Fuel_Type column will help to identify this information.

Now, we can convert Mileage to numeric and drop the Unit column

In [16]:

Copied!

# drop Name column
data.drop(['Unit'], axis=1, inplace=True)
# drop Name column
data.drop(['Unit'], axis=1, inplace=True)

In [17]:

Copied!

# Convert Mileage to Number
data['Mileage']=data['Mileage'].astype('float64')
# Convert Mileage to Number
data['Mileage']=data['Mileage'].astype('float64')

In [18]:

Copied!

# check Mileage is number
data['Mileage'].head()
# check Mileage is number
data['Mileage'].head()

Out[18]:

0    26.60
1    19.67
2    18.20
3    20.77
4    15.20
Name: Mileage, dtype: float64

5.2 Engine¶

'CC' string is going to be deleted

In [19]:

Copied!





def engine_to_num(engine):
    """This function takes in a string representing the engine and converts it to a number. 
    This function returns the same engine value if the input is already numeric."""
    if isinstance(engine, str):  # checks if engine is a string
        engine_val = float(engine.replace('CC', '').strip())
    else:  # this happens when the engine is already number or nan
        engine_val = engine
    # return engine as number
    return engine_val

# apply engine_to_num function to column 'Engine'
data['Engine'] = data['Engine'].apply(engine_to_num)
def engine_to_num(engine):
    """This function takes in a string representing the engine and converts it to a number. 
    This function returns the same engine value if the input is already numeric."""
    if isinstance(engine, str):  # checks if engine is a string
        engine_val = float(engine.replace('CC', '').strip())
    else:  # this happens when the engine is already number or nan
        engine_val = engine
    # return engine as number
    return engine_val

# apply engine_to_num function to column 'Engine'
data['Engine'] = data['Engine'].apply(engine_to_num)

In [20]:

Copied!

# check Engine is number
data['Engine'].head()
# check Engine is number
data['Engine'].head()

Out[20]:

0     998.0
1    1582.0
2    1199.0
3    1248.0
4    1968.0
Name: Engine, dtype: float64

5.3 Power¶

'bhp' string is going to be deleted

In [21]:

Copied!





def power_to_num(power):
    """This function takes in a string representing the power and converts it to a number. 
    This function returns the same power value if the input is already numeric."""
    if isinstance(power, str):  # checks if power is a string
        power_val = power.replace('bhp', '').strip()
        if power_val != 'null': # check that there is a value 
            power_val = float(power_val)
        else:
            power_val = np.nan # returns nan
    else:  # this happens when the power is already number or nan
        power_val = power
    # return power as number
    return power_val

# apply engine_to_num function to column 'Engine'
data['Power'] = data['Power'].apply(power_to_num)
def power_to_num(power):
    """This function takes in a string representing the power and converts it to a number. 
    This function returns the same power value if the input is already numeric."""
    if isinstance(power, str):  # checks if power is a string
        power_val = power.replace('bhp', '').strip()
        if power_val != 'null': # check that there is a value 
            power_val = float(power_val)
        else:
            power_val = np.nan # returns nan
    else:  # this happens when the power is already number or nan
        power_val = power
    # return power as number
    return power_val

# apply engine_to_num function to column 'Engine'
data['Power'] = data['Power'].apply(power_to_num)

In [22]:

Copied!

# check Power is number
data['Power'].head()
# check Power is number
data['Power'].head()

Out[22]:

0     58.16
1    126.20
2     88.70
3     88.76
4    140.80
Name: Power, dtype: float64

5.4 New_Price¶

'Lakh' and 'Cr' strings are going to be deleted.

1 Cr = 100 Lakh

In [23]:

Copied!





def price_to_num(price):
    """This function takes in a string representing the price and converts it to a number. 
    This function returns the same price value if the input is already numeric."""
    if isinstance(price, str):  # checks if price is a string
        # handles Cr and Lakh units
        if price.endswith('Lakh'):
            multiplier = 1
        elif price.endswith('Cr'):
            multiplier = 100
        price_val = float(price.replace('Lakh', '').replace('Cr', '').strip()) * multiplier
    else:  # this happens when the price is already number or nan
        price_val = price
    # return price as number
    return price_val

# apply price_to_num function to column 'New_Price'
data['New_Price'] = data['New_Price'].apply(price_to_num)
def price_to_num(price):
    """This function takes in a string representing the price and converts it to a number. 
    This function returns the same price value if the input is already numeric."""
    if isinstance(price, str):  # checks if price is a string
        # handles Cr and Lakh units
        if price.endswith('Lakh'):
            multiplier = 1
        elif price.endswith('Cr'):
            multiplier = 100
        price_val = float(price.replace('Lakh', '').replace('Cr', '').strip()) * multiplier
    else:  # this happens when the price is already number or nan
        price_val = price
    # return price as number
    return price_val

# apply price_to_num function to column 'New_Price'
data['New_Price'] = data['New_Price'].apply(price_to_num)

In [24]:

Copied!

# check Price is number
data['New_Price'].head()
# check Price is number
data['New_Price'].head()

Out[24]:

0     NaN
1     NaN
2    8.61
3     NaN
4     NaN
Name: New_Price, dtype: float64

5.5 Featuring Engineering¶

Name¶

The Name column represents Brand, Model and Specs of the car. We are going to split this column in 3 columns to get that information

In [26]:

Copied!

data[['Brand','Model','Specs']] = data['Name'].str.split(' ',n=2,expand=True)
data[['Brand','Model','Specs']] = data['Name'].str.split(' ',n=2,expand=True)

In [27]:

Copied!

data[['Name','Brand','Model','Specs']].head()
data[['Name','Brand','Model','Specs']].head()

Out[27]:

	Name	Brand	Model	Specs
0	Maruti Wagon R LXI CNG	Maruti	Wagon	R LXI CNG
1	Hyundai Creta 1.6 CRDi SX Option	Hyundai	Creta	1.6 CRDi SX Option
2	Honda Jazz V	Honda	Jazz	V
3	Maruti Ertiga VDI	Maruti	Ertiga	VDI
4	Audi A4 New 2.0 TDI Multitronic	Audi	A4	New 2.0 TDI Multitronic

Now, we can drop 'Name' column and use 'Brand', 'Model' and 'Specs' columns

In [28]:

Copied!

# drop Name column
data.drop(['Name'], axis=1, inplace=True)
# drop Name column
data.drop(['Name'], axis=1, inplace=True)

5.6 Category columns¶

'Brand', 'Model', 'Specs', 'Location', 'Fuel_Type', 'Transmission', and 'Owner_Type' columns are transformed to category

In [29]:

Copied!





data['Brand']=data['Brand'].astype('category')
data['Model']=data['Model'].astype('category')
data['Specs']=data['Specs'].astype('category')
data['Location']=data['Location'].astype('category')
data['Fuel_Type']=data['Fuel_Type'].astype('category')
data['Transmission']=data['Transmission'].astype('category')
data['Owner_Type']=data['Owner_Type'].astype('category')
data['Brand']=data['Brand'].astype('category')
data['Model']=data['Model'].astype('category')
data['Specs']=data['Specs'].astype('category')
data['Location']=data['Location'].astype('category')
data['Fuel_Type']=data['Fuel_Type'].astype('category')
data['Transmission']=data['Transmission'].astype('category')
data['Owner_Type']=data['Owner_Type'].astype('category')

5.7 Drop 'S.No.' column¶

In [30]:

Copied!

data.drop(['S.No.'], axis=1, inplace=True)
data.drop(['S.No.'], axis=1, inplace=True)

5.8 Duplicate rows¶

In [31]:

Copied!

# show all rows with duplicates 
data[data.duplicated(keep=False)]
# show all rows with duplicates 
data[data.duplicated(keep=False)]

Out[31]:

	Location	Year	Kilometers_Driven	Fuel_Type	Transmission	Owner_Type	Mileage	Engine	Power	Seats	New_Price	Price	Brand	Model	Specs
6498	Mumbai	2010	52000	Petrol	Manual	First	17.0	1497.0	118.0	5.0	NaN	NaN	Honda	City	1.5 E MT
6582	Mumbai	2010	52000	Petrol	Manual	First	17.0	1497.0	118.0	5.0	NaN	NaN	Honda	City	1.5 E MT

In [32]:

Copied!

# drop duplicate rows
data.drop(data[data.duplicated()].index, axis=0, inplace=True)
# drop duplicate rows
data.drop(data[data.duplicated()].index, axis=0, inplace=True)

In [33]:

Copied!

# Check there are no duplicates 
data.duplicated().sum()
# Check there are no duplicates 
data.duplicated().sum()

Out[33]:

5.8 Check characteristics of data after processing¶

In [34]:

Copied!

data.info()
data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 7252 entries, 0 to 7252
Data columns (total 15 columns):
 #   Column             Non-Null Count  Dtype   
---  ------             --------------  -----   
 0   Location           7252 non-null   category
 1   Year               7252 non-null   int64   
 2   Kilometers_Driven  7252 non-null   int64   
 3   Fuel_Type          7252 non-null   category
 4   Transmission       7252 non-null   category
 5   Owner_Type         7252 non-null   category
 6   Mileage            7250 non-null   float64 
 7   Engine             7206 non-null   float64 
 8   Power              7077 non-null   float64 
 9   Seats              7199 non-null   float64 
 10  New_Price          1006 non-null   float64 
 11  Price              6019 non-null   float64 
 12  Brand              7252 non-null   category
 13  Model              7252 non-null   category
 14  Specs              7251 non-null   category
dtypes: category(7), float64(6), int64(2)
memory usage: 665.0 KB

In [35]:

Copied!

# check first rows of data
data.head()
# check first rows of data
data.head()

Out[35]:

	Location	Year	Kilometers_Driven	Fuel_Type	Transmission	Owner_Type	Mileage	Engine	Power	Seats	New_Price	Price	Brand	Model	Specs
0	Mumbai	2010	72000	CNG	Manual	First	26.60	998.0	58.16	5.0	NaN	1.75	Maruti	Wagon	R LXI CNG
1	Pune	2015	41000	Diesel	Manual	First	19.67	1582.0	126.20	5.0	NaN	12.50	Hyundai	Creta	1.6 CRDi SX Option
2	Chennai	2011	46000	Petrol	Manual	First	18.20	1199.0	88.70	5.0	8.61	4.50	Honda	Jazz	V
3	Chennai	2012	87000	Diesel	Manual	First	20.77	1248.0	88.76	7.0	NaN	6.00	Maruti	Ertiga	VDI
4	Coimbatore	2013	40670	Diesel	Automatic	Second	15.20	1968.0	140.80	5.0	NaN	17.74	Audi	A4	New 2.0 TDI Multitronic

Data series are the correct Type.

6 Exploratory data analysis¶

6.1 Pandas profiling report¶

We can get a first statistical and descriptive analysis using pandas_profiling

In [36]:

Copied!

# get pandas profiling report
#pandas_profiling.ProfileReport(data)
# get pandas profiling report
#pandas_profiling.ProfileReport(data)

6.2 Pairplot¶

We are going to perform univariate and bivariate analysis to understand the relationship between the columns

In [37]:

Copied!

#sns.pairplot(data, diag_kind='kde');
#sns.pairplot(data, diag_kind='kde');

6.3 Univariate analysis¶

6.3.1 Numerical columns¶

In [38]:

Copied!

# Get stats for numerical columns
data.describe()
# Get stats for numerical columns
data.describe()

Out[38]:

	Year	Kilometers_Driven	Mileage	Engine	Power	Seats	New_Price	Price
count	7252.000000	7.252000e+03	7250.000000	7206.000000	7077.000000	7199.000000	1006.000000	6019.000000
mean	2013.365830	5.869999e+04	18.141738	1616.590064	112.764474	5.279761	22.779692	9.479468
std	3.254405	8.443351e+04	4.562492	595.324779	53.497297	0.811709	27.759344	11.187917
min	1996.000000	1.710000e+02	0.000000	72.000000	34.200000	0.000000	3.910000	0.440000
25%	2011.000000	3.400000e+04	15.170000	1198.000000	75.000000	5.000000	7.885000	3.500000
50%	2014.000000	5.342900e+04	18.160000	1493.000000	94.000000	5.000000	11.570000	5.640000
75%	2016.000000	7.300000e+04	21.100000	1968.000000	138.100000	5.000000	26.042500	9.950000
max	2019.000000	6.500000e+06	33.540000	5998.000000	616.000000	10.000000	375.000000	160.000000

In [40]:

Copied!

# Get the skewness of numerical columns
data.select_dtypes(include=np.number).skew()
# Get the skewness of numerical columns
data.select_dtypes(include=np.number).skew()

Out[40]:

Year                 -0.840219
Kilometers_Driven    61.578378
Mileage              -0.438397
Engine                1.412244
Power                 1.961084
Seats                 1.902039
New_Price             4.128300
Price                 3.335232
dtype: float64

6.3.1.1 Year¶

The Year distribution is slightly skewed to the left. The mean is 2013.36 and the median 2014, and there are not outliers.

6.3.1.2 Kilometers_Driven¶

The Kilometers_Driven distribution is highly skewed to the right. The mean is 58,699 km, the median 53,416 km, and there are several outliers as we can see in the chart below.

In [41]:

Copied!





# creating the 2 subplots
f2, (ax_box2, ax_hist2) = plt.subplots(nrows = 2, # Number of rows of the subplot grid= 2
                                       sharex = True, # x-axis will be shared among all subplots
                                       gridspec_kw = {"height_ratios": (.25, .75)});
sns.boxplot(data['Kilometers_Driven'], ax=ax_box2, showmeans=True, color='violet'); # boxplot 
sns.distplot(data['Kilometers_Driven'], kde=True, ax=ax_hist2); # histogram
ax_hist2.axvline(np.mean(data['Kilometers_Driven']), color='green', linestyle='--'); # Add mean to the histogram
ax_hist2.axvline(np.median(data['Kilometers_Driven']), color='black', linestyle='-'); # Add median to the histogram
# creating the 2 subplots
f2, (ax_box2, ax_hist2) = plt.subplots(nrows = 2, # Number of rows of the subplot grid= 2
                                       sharex = True, # x-axis will be shared among all subplots
                                       gridspec_kw = {"height_ratios": (.25, .75)});
sns.boxplot(data['Kilometers_Driven'], ax=ax_box2, showmeans=True, color='violet'); # boxplot 
sns.distplot(data['Kilometers_Driven'], kde=True, ax=ax_hist2); # histogram
ax_hist2.axvline(np.mean(data['Kilometers_Driven']), color='green', linestyle='--'); # Add mean to the histogram
ax_hist2.axvline(np.median(data['Kilometers_Driven']), color='black', linestyle='-'); # Add median to the histogram

No description has been provided for this image

6.3.1.3 Mileage¶

The Mileage distribution is fairly symmetrical. The mean is 18.14 and the median 18.16. However, there are 81 rows with value equal to 0

In [42]:

Copied!

# Number of rows with mileage equals to 0 
sum(data['Mileage']==0)
# Number of rows with mileage equals to 0 
sum(data['Mileage']==0)

Out[42]:

6.3.1.4 Engine|¶

The Engine distribution is skewed to the right. The mean is 1616 and the median 1493

In [43]:

Copied!





# creating the 2 subplots
f2, (ax_box2, ax_hist2) = plt.subplots(nrows = 2, # Number of rows of the subplot grid= 2
                                       sharex = True, # x-axis will be shared among all subplots
                                       gridspec_kw = {"height_ratios": (.25, .75)});
sns.boxplot(data['Engine'], ax=ax_box2, showmeans=True, color='violet'); # boxplot 
sns.distplot(data['Engine'], kde=True, ax=ax_hist2); # histogram
ax_hist2.axvline(np.mean(data['Engine']), color='green', linestyle='--'); # Add mean to the histogram
ax_hist2.axvline(np.median(data['Engine']), color='black', linestyle='-'); # Add median to the histogram
# creating the 2 subplots
f2, (ax_box2, ax_hist2) = plt.subplots(nrows = 2, # Number of rows of the subplot grid= 2
                                       sharex = True, # x-axis will be shared among all subplots
                                       gridspec_kw = {"height_ratios": (.25, .75)});
sns.boxplot(data['Engine'], ax=ax_box2, showmeans=True, color='violet'); # boxplot 
sns.distplot(data['Engine'], kde=True, ax=ax_hist2); # histogram
ax_hist2.axvline(np.mean(data['Engine']), color='green', linestyle='--'); # Add mean to the histogram
ax_hist2.axvline(np.median(data['Engine']), color='black', linestyle='-'); # Add median to the histogram

Engine has several values that are flagged as suspicious by the boxplot. However, those values are consistent with some powerful car models and we cannot considered them as outliers

In [44]:

Copied!

# cars with Engine>3000
data[data['Engine']>3000]
# cars with Engine>3000
data[data['Engine']>3000]

Out[44]:

	Location	Year	Kilometers_Driven	Fuel_Type	Transmission	Owner_Type	Mileage	Engine	Power	Seats	New_Price	Price	Brand	Model	Specs
70	Mumbai	2008	73000	Petrol	Automatic	First	8.50	4806.0	500.0	5.0	NaN	14.50	Porsche	Cayenne	2009-2014 Turbo
152	Kolkata	2010	35277	Petrol	Automatic	First	7.81	5461.0	362.9	5.0	NaN	30.00	Mercedes-Benz	S	Class 2005 2013 S 500
459	Coimbatore	2016	51002	Diesel	Automatic	First	11.33	4134.0	335.2	7.0	NaN	48.91	Audi	Q7	4.2 TDI Quattro Technology
586	Kochi	2014	79926	Diesel	Automatic	First	11.33	4134.0	335.2	7.0	NaN	29.77	Audi	Q7	4.2 TDI Quattro Technology
589	Bangalore	2006	47088	Petrol	Automatic	Second	10.13	3498.0	364.9	5.0	NaN	19.00	Mercedes-Benz	S	Class 2005 2013 S 350 L
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
6011	Hyderabad	2009	53000	Petrol	Automatic	First	0.00	3597.0	262.6	5.0	NaN	4.75	Skoda	Superb	3.6 V6 FSI
6186	Mumbai	2008	65000	Petrol	Automatic	Third	10.13	3498.0	364.9	5.0	NaN	NaN	Mercedes-Benz	S	Class 2005 2013 S 350 L
6354	Bangalore	2008	31200	Petrol	Automatic	Second	10.20	5998.0	616.0	5.0	375.0	NaN	Bentley	Flying	Spur W12
6842	Kolkata	2012	14850	Petrol	Automatic	First	10.00	3696.0	328.5	2.0	NaN	NaN	Nissan	370Z	AT
7057	Delhi	2009	64000	Petrol	Automatic	First	7.94	4395.0	450.0	4.0	NaN	NaN	BMW	6	Series 650i Coupe

65 rows × 15 columns

6.3.1.5 Power¶

The Power distribution is skewed to the right. The mean is 112 and the median 94

In [45]:

Copied!





# creating the 2 subplots
f2, (ax_box2, ax_hist2) = plt.subplots(nrows = 2, # Number of rows of the subplot grid= 2
                                       sharex = True, # x-axis will be shared among all subplots
                                       gridspec_kw = {"height_ratios": (.25, .75)});
sns.boxplot(data['Power'], ax=ax_box2, showmeans=True, color='violet'); # boxplot 
sns.distplot(data['Power'], kde=True, ax=ax_hist2); # histogram
ax_hist2.axvline(np.mean(data['Power']), color='green', linestyle='--'); # Add mean to the histogram
ax_hist2.axvline(np.median(data['Power']), color='black', linestyle='-'); # Add median to the histogram
# creating the 2 subplots
f2, (ax_box2, ax_hist2) = plt.subplots(nrows = 2, # Number of rows of the subplot grid= 2
                                       sharex = True, # x-axis will be shared among all subplots
                                       gridspec_kw = {"height_ratios": (.25, .75)});
sns.boxplot(data['Power'], ax=ax_box2, showmeans=True, color='violet'); # boxplot 
sns.distplot(data['Power'], kde=True, ax=ax_hist2); # histogram
ax_hist2.axvline(np.mean(data['Power']), color='green', linestyle='--'); # Add mean to the histogram
ax_hist2.axvline(np.median(data['Power']), color='black', linestyle='-'); # Add median to the histogram

At the same as Engine. Power has several values that are flagged as suspicious by the boxplot. However, those values are consistent with some powerful car models and we cannot considered them as outliers

6.3.1.6 Seats¶

6047 cars (83.4%) have 5 seats. There is one car with 0 seats and 53 with missing values.

6.3.1.7 New_Price¶

The New_Price distribution is skewed to the right. The mean is 22.7 and the median 11.5

In [46]:

Copied!





# creating the 2 subplots
f2, (ax_box2, ax_hist2) = plt.subplots(nrows = 2, # Number of rows of the subplot grid= 2
                                       sharex = True, # x-axis will be shared among all subplots
                                       gridspec_kw = {"height_ratios": (.25, .75)});
sns.boxplot(data['New_Price'], ax=ax_box2, showmeans=True, color='violet'); # boxplot 
sns.distplot(data['New_Price'], kde=True, ax=ax_hist2); # histogram
ax_hist2.axvline(np.mean(data['New_Price']), color='green', linestyle='--'); # Add mean to the histogram
ax_hist2.axvline(np.median(data['New_Price']), color='black', linestyle='-'); # Add median to the histogram
# creating the 2 subplots
f2, (ax_box2, ax_hist2) = plt.subplots(nrows = 2, # Number of rows of the subplot grid= 2
                                       sharex = True, # x-axis will be shared among all subplots
                                       gridspec_kw = {"height_ratios": (.25, .75)});
sns.boxplot(data['New_Price'], ax=ax_box2, showmeans=True, color='violet'); # boxplot 
sns.distplot(data['New_Price'], kde=True, ax=ax_hist2); # histogram
ax_hist2.axvline(np.mean(data['New_Price']), color='green', linestyle='--'); # Add mean to the histogram
ax_hist2.axvline(np.median(data['New_Price']), color='black', linestyle='-'); # Add median to the histogram

There are several values flagged as suspicious by the boxplot, but they could correspond to luxury cars, and we cannot considered as outliers

In [47]:

Copied!

# cars with New_Price>100
data[data['New_Price']>100]
# cars with New_Price>100
data[data['New_Price']>100]

Out[47]:

	Location	Year	Kilometers_Driven	Fuel_Type	Transmission	Owner_Type	Mileage	Engine	Power	Seats	New_Price	Price	Brand	Model	Specs
148	Mumbai	2013	23000	Petrol	Automatic	First	11.05	2894.0	444.00	4.0	128.0	37.00	Audi	RS5	Coupe
327	Coimbatore	2017	97430	Diesel	Automatic	First	14.75	2967.0	245.00	7.0	104.0	62.67	Audi	Q7	45 TDI Quattro Technology
1336	Mumbai	2016	20002	Diesel	Automatic	First	14.75	2967.0	245.00	7.0	104.0	67.00	Audi	Q7	45 TDI Quattro Technology
1505	Kochi	2019	26013	Diesel	Automatic	First	12.65	2993.0	255.00	5.0	139.0	97.07	Land	Rover	Range Rover Sport SE
1885	Delhi	2018	6000	Diesel	Automatic	First	11.00	2987.0	258.00	7.0	102.0	79.00	Mercedes-Benz	GLS	350d Grand Edition
2056	Kochi	2015	29966	Diesel	Automatic	Second	16.77	2993.0	261.49	5.0	140.0	43.60	BMW	7	Series 730Ld Eminence
2095	Coimbatore	2019	2526	Petrol	Automatic	First	19.00	2996.0	362.07	2.0	106.0	83.96	Mercedes-Benz	SLC	43 AMG
2178	Mumbai	2017	35000	Diesel	Automatic	First	18.00	2993.0	255.00	7.0	127.0	41.60	Land	Rover	Discovery HSE Luxury 3.0 TD6
2528	Delhi	2016	59000	Diesel	Automatic	First	18.00	2993.0	255.00	7.0	113.0	36.75	Land	Rover	Discovery SE 3.0 TD6
3132	Kochi	2019	14298	Petrol	Automatic	First	13.33	2995.0	340.00	5.0	136.0	2.02	Porsche	Cayenne	Base
3199	Kolkata	2012	41100	Diesel	Automatic	First	16.77	2993.0	261.49	5.0	166.0	26.50	BMW	7	Series 730Ld Design Pure Excellence CBU
3752	Kochi	2015	38467	Diesel	Automatic	First	12.65	2993.0	255.00	5.0	160.0	70.66	Land	Rover	Range Rover Sport HSE
4061	Mumbai	2013	23312	Petrol	Automatic	First	11.05	2894.0	444.00	4.0	128.0	40.50	Audi	RS5	Coupe
4079	Hyderabad	2017	25000	Diesel	Automatic	First	13.33	2993.0	255.00	5.0	230.0	160.00	Land	Rover	Range Rover 3.0 Diesel LWB Vogue
4778	Bangalore	2011	47140	Diesel	Automatic	Second	13.50	2925.0	281.61	5.0	171.0	30.00	Mercedes-Benz	S-Class	S 350 d
5545	Delhi	2014	47000	Diesel	Automatic	Second	12.65	2993.0	255.00	5.0	139.0	64.75	Land	Rover	Range Rover Sport SE
6212	Chennai	2017	16000	Diesel	Automatic	First	16.77	2993.0	261.49	5.0	158.0	NaN	BMW	7	Series 730Ld DPE Signature
6354	Bangalore	2008	31200	Petrol	Automatic	Second	10.20	5998.0	616.00	5.0	375.0	NaN	Bentley	Flying	Spur W12
6960	Coimbatore	2018	18338	Petrol	Automatic	First	19.00	2996.0	362.07	2.0	106.0	NaN	Mercedes-Benz	SLC	43 AMG

6.3.1.8 Price¶

The Price distribution is skewed to the right. The mean is 9.4 and the median 5.6

In [48]:

Copied!





# creating the 2 subplots
f2, (ax_box2, ax_hist2) = plt.subplots(nrows = 2, # Number of rows of the subplot grid= 2
                                       sharex = True, # x-axis will be shared among all subplots
                                       gridspec_kw = {"height_ratios": (.25, .75)});
sns.boxplot(data['Price'], ax=ax_box2, showmeans=True, color='violet'); # boxplot 
sns.distplot(data['Price'], kde=True, ax=ax_hist2); # histogram
ax_hist2.axvline(np.mean(data['Price']), color='green', linestyle='--'); # Add mean to the histogram
ax_hist2.axvline(np.median(data['Price']), color='black', linestyle='-'); # Add median to the histogram
# creating the 2 subplots
f2, (ax_box2, ax_hist2) = plt.subplots(nrows = 2, # Number of rows of the subplot grid= 2
                                       sharex = True, # x-axis will be shared among all subplots
                                       gridspec_kw = {"height_ratios": (.25, .75)});
sns.boxplot(data['Price'], ax=ax_box2, showmeans=True, color='violet'); # boxplot 
sns.distplot(data['Price'], kde=True, ax=ax_hist2); # histogram
ax_hist2.axvline(np.mean(data['Price']), color='green', linestyle='--'); # Add mean to the histogram
ax_hist2.axvline(np.median(data['Price']), color='black', linestyle='-'); # Add median to the histogram

Similar than New_Price. There are several values flagged as suspicious by the boxplot, but they could correspond to luxury cars, and we cannot considered as outliers

In [49]:

Copied!

# cars with Price>25
data[data['Price']>25]
# cars with Price>25
data[data['Price']>25]

Out[49]:

	Location	Year	Kilometers_Driven	Fuel_Type	Transmission	Owner_Type	Mileage	Engine	Power	Seats	New_Price	Price	Brand	Model	Specs
13	Delhi	2014	72000	Diesel	Automatic	First	12.70	2179.0	187.70	5.0	NaN	27.00	Land	Rover	Range Rover 2.2L Pure
19	Bangalore	2014	78500	Diesel	Automatic	First	14.84	2143.0	167.62	5.0	NaN	28.00	Mercedes-Benz	New	C-Class C 220 CDI BE Avantgare
38	Pune	2013	85000	Diesel	Automatic	First	11.74	2987.0	254.80	5.0	NaN	28.00	Mercedes-Benz	M-Class	ML 350 CDI
62	Delhi	2015	58000	Petrol	Automatic	First	11.74	1796.0	186.00	5.0	NaN	26.70	Mercedes-Benz	New	C-Class C 200 CGI Avantgarde
67	Coimbatore	2019	15369	Diesel	Automatic	First	0.00	1950.0	194.00	5.0	49.14	35.67	Mercedes-Benz	C-Class	Progressive C 220d
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
5927	Coimbatore	2018	29091	Diesel	Automatic	First	13.22	2967.0	241.40	5.0	NaN	45.52	Audi	Q5	3.0 TDI Quattro Technology
5946	Bangalore	2016	16000	Diesel	Automatic	First	14.69	2993.0	258.00	5.0	NaN	48.00	BMW	5	Series 2013-2017 530d M Sport
5970	Kochi	2018	17773	Petrol	Automatic	First	13.70	1991.0	183.00	5.0	39.22	26.76	Mercedes-Benz	GLA	Class 200 Sport
5996	Kochi	2016	31150	Diesel	Automatic	First	16.36	2179.0	187.70	5.0	NaN	30.54	Jaguar	XF	2.2 Litre Luxury
6008	Hyderabad	2013	40000	Diesel	Automatic	Second	17.85	2967.0	300.00	4.0	NaN	45.00	Porsche	Panamera	Diesel

499 rows × 15 columns

Categorical columns¶

In [50]:

Copied!

data.describe(include=["category"])
data.describe(include=["category"])

Out[50]:

	Location	Fuel_Type	Transmission	Owner_Type	Brand	Model	Specs
count	7252	7252	7252	7252	7252	7252	7251
unique	11	5	2	4	33	219	1893
top	Mumbai	Diesel	Manual	First	Maruti	Swift	VDI
freq	948	3852	5203	5951	1444	418	88

6.3.2.1 Location¶

In [51]:

Copied!

p = sns.countplot(data['Location'], order=data['Location'].value_counts().index);
plt.xticks(rotation=45);
p = sns.countplot(data['Location'], order=data['Location'].value_counts().index);
plt.xticks(rotation=45);

There are 11 distinct locations. Mumbai is the most frequent location, and Ahmedabad the least frequent

6.3.2.2 Transmission¶

In [52]:

Copied!

p = sns.countplot(data['Transmission'], order=data['Transmission'].value_counts().index);
plt.xticks(rotation=45);
p = sns.countplot(data['Transmission'], order=data['Transmission'].value_counts().index);
plt.xticks(rotation=45);

There are 2 distinct Transmission values, Manual and Automatic. Manual corresponds to the 72% of the cars

6.3.2.3 Owner Type¶

In [53]:

Copied!

p = sns.countplot(data['Owner_Type'], order=data['Owner_Type'].value_counts().index);
plt.xticks(rotation=45);
p = sns.countplot(data['Owner_Type'], order=data['Owner_Type'].value_counts().index);
plt.xticks(rotation=45);

There are 4 distinct categories for owner type. First owner corresponds to 82% of the rows

6.3.2.4 Fuel Type¶

In [54]:

Copied!

p = sns.countplot(data['Fuel_Type'], order=data['Fuel_Type'].value_counts().index);
plt.xticks(rotation=45);
p = sns.countplot(data['Fuel_Type'], order=data['Fuel_Type'].value_counts().index);
plt.xticks(rotation=45);

There are 5 distinct Fuel Types. Diesel is the most frequent location, and there are only 2 electric cars

6.3.2.5 Brand¶

In [55]:

Copied!

p = sns.countplot(data['Brand'], order=data['Brand'].value_counts().index);
plt.xticks(rotation=90);
p = sns.countplot(data['Brand'], order=data['Brand'].value_counts().index);
plt.xticks(rotation=90);

There are 33 distinct Brands. Maruti, Hyundai, Honda and Toyota the most common ones.

6.4 Bivariate analysis¶

In [57]:

Copied!

# Get correlation matrix for numeric variables
data.select_dtypes(include=np.number).corr()
# Get correlation matrix for numeric variables
data.select_dtypes(include=np.number).corr()

Out[57]:

	Year	Kilometers_Driven	Mileage	Engine	Power	Seats	New_Price	Price
Year	1.000000	-0.187884	0.322452	-0.054726	0.013448	0.008166	-0.058798	0.305327
Kilometers_Driven	-0.187884	1.000000	-0.069125	0.094816	0.030165	0.090218	-0.008221	-0.011493
Mileage	0.322452	-0.069125	1.000000	-0.593581	-0.531770	-0.310649	-0.378327	-0.306593
Engine	-0.054726	0.094816	-0.593581	1.000000	0.859777	0.399256	0.735981	0.658354
Power	0.013448	0.030165	-0.531770	0.859777	1.000000	0.095910	0.877708	0.772566
Seats	0.008166	0.090218	-0.310649	0.399256	0.095910	1.000000	-0.019459	0.052225
New_Price	-0.058798	-0.008221	-0.378327	0.735981	0.877708	-0.019459	1.000000	0.871847
Price	0.305327	-0.011493	-0.306593	0.658354	0.772566	0.052225	0.871847	1.000000

In [59]:

Copied!

# Display correlation matrix in a heatmap
sns.heatmap(data.select_dtypes(include=np.number).corr(), annot=True);
# Display correlation matrix in a heatmap
sns.heatmap(data.select_dtypes(include=np.number).corr(), annot=True);

Engine has a strong correlation with Power, New_Price and Price
Power has a strong correlation with Engine, New_Price and Price
New_price has a strong correlation with Engine, Power and Price
Price has a strong correlation with Engine, Power and New Price

6.4.1 Engine, Power and Price relationship¶

In [60]:

Copied!

sns.scatterplot(data=data, x='Power', y='Engine', hue='Price');
sns.scatterplot(data=data, x='Power', y='Engine', hue='Price');

There is a strong correlation between Power and Engine. The chart is also showing that more expensive cars tend to have high values for Power and Engine

6.4.2 Power, Seats and Price relationship¶

In [61]:

Copied!

sns.scatterplot(data=data, x='Power', y='Seats', hue='Price');
sns.scatterplot(data=data, x='Power', y='Seats', hue='Price');

There is not a clear relationship between Power and Seats. However, cars with 2 seats could have strong power and higher prices.

6.4.3 Price and Brand¶

In [64]:

Copied!





order_by_brand = data.groupby(by=["Brand"])["Price"].median().sort_values().iloc[::-1].index
plt.figure(figsize=(10,6));
plt.xticks(rotation=90);
sns.boxplot(x=data['Brand'], y=data['Price'], order=order_by_brand);
order_by_brand = data.groupby(by=["Brand"])["Price"].median().sort_values().iloc[::-1].index
plt.figure(figsize=(10,6));
plt.xticks(rotation=90);
sns.boxplot(x=data['Brand'], y=data['Price'], order=order_by_brand);

This chart shows there are:

Luxury brands that have high prices: BMW, Audi, Mercedes-Benz, Mini, Jaguar, Land, Porsche, Bentley, Lamborghini, Isuzu
Brands with medium prices: Ford, Renault, Skoda, Mahindra, Force, Mitsubishi, Toyota, ISUZU, Volvo, Jeep
Brands with low prices: Ambassador, Chevrolet, Fiat, Tata, Smart, Datsun, Maruti, Nissan, Hyundai, Volkswagen, Honda

6.4.4 Price, Location and Fuel Type¶

In [65]:

Copied!





order_by_loc = data.groupby(by=["Fuel_Type"])["Price"].median().sort_values().iloc[::-1].index
plt.figure(figsize=(15,6));
plt.xticks(rotation=90);
sns.boxplot(x=data['Fuel_Type'], y=data['Price'], hue=data['Location'], order=order_by_loc);
order_by_loc = data.groupby(by=["Fuel_Type"])["Price"].median().sort_values().iloc[::-1].index
plt.figure(figsize=(15,6));
plt.xticks(rotation=90);
sns.boxplot(x=data['Fuel_Type'], y=data['Price'], hue=data['Location'], order=order_by_loc);

Electric and Diesel cars have higher Price than Petrol, CNG and LPG.
Cars in Bangalore, Coimbatore, Kochi and Mumbai tend to have higher prices than other locations

7 Missing Value Treatment¶

First we are going to drop column New_Price since it has 6247(86.1%) rows with missing data.

In [66]:

Copied!

data.drop(['New_Price'], axis=1, inplace=True)
data.drop(['New_Price'], axis=1, inplace=True)

There are 1234 rows with missing Prices. We are going to drop all those rows because Price is the variable we would like to predict and we don't want to create artificial information in the model

In [67]:

Copied!

data.drop(data[data['Price'].isna()].index, axis=0, inplace=True)
data.drop(data[data['Price'].isna()].index, axis=0, inplace=True)

Let's check new data set

In [68]:

Copied!

data.info()
data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 6019 entries, 0 to 6018
Data columns (total 14 columns):
 #   Column             Non-Null Count  Dtype   
---  ------             --------------  -----   
 0   Location           6019 non-null   category
 1   Year               6019 non-null   int64   
 2   Kilometers_Driven  6019 non-null   int64   
 3   Fuel_Type          6019 non-null   category
 4   Transmission       6019 non-null   category
 5   Owner_Type         6019 non-null   category
 6   Mileage            6017 non-null   float64 
 7   Engine             5983 non-null   float64 
 8   Power              5876 non-null   float64 
 9   Seats              5977 non-null   float64 
 10  Price              6019 non-null   float64 
 11  Brand              6019 non-null   category
 12  Model              6019 non-null   category
 13  Specs              6019 non-null   category
dtypes: category(7), float64(5), int64(2)
memory usage: 520.4 KB

In [69]:

Copied!

# counting the number of missing values per row
num_missing = data.isnull().sum(axis=1)
num_missing.value_counts()
# counting the number of missing values per row
num_missing = data.isnull().sum(axis=1)
num_missing.value_counts()

Out[69]:

0    5872
1     107
3      36
2       4
Name: count, dtype: int64

We are going to analyze if there is a pattern for the 36 rows with 3 missing values.

In [70]:

Copied!

data[num_missing == 3]
data[num_missing == 3]

Out[70]:

	Location	Year	Kilometers_Driven	Fuel_Type	Transmission	Owner_Type	Mileage	Engine	Power	Seats	Price	Brand	Model	Specs
194	Ahmedabad	2007	60006	Petrol	Manual	First	0.00	NaN	NaN	NaN	2.95	Honda	City	1.5 GXI
208	Kolkata	2010	42001	Petrol	Manual	First	16.10	NaN	NaN	NaN	2.11	Maruti	Swift	1.3 VXi
733	Chennai	2006	97800	Petrol	Manual	Third	16.10	NaN	NaN	NaN	1.75	Maruti	Swift	1.3 VXi
749	Mumbai	2008	55001	Diesel	Automatic	Second	0.00	NaN	NaN	NaN	26.50	Land	Rover	Range Rover 3.0 D
1294	Delhi	2009	55005	Petrol	Manual	First	12.80	NaN	NaN	NaN	3.20	Honda	City	1.3 DX
1327	Hyderabad	2015	50295	Petrol	Manual	First	16.10	NaN	NaN	NaN	5.80	Maruti	Swift	1.3 ZXI
1385	Pune	2004	115000	Petrol	Manual	Second	0.00	NaN	NaN	NaN	1.50	Honda	City	1.5 GXI
1460	Coimbatore	2008	69078	Petrol	Manual	First	0.00	NaN	NaN	NaN	40.88	Land	Rover	Range Rover Sport 2005 2012 Sport
2074	Pune	2011	24255	Petrol	Manual	First	16.10	NaN	NaN	NaN	3.15	Maruti	Swift	1.3 LXI
2096	Coimbatore	2004	52146	Petrol	Manual	First	0.00	NaN	NaN	NaN	1.93	Hyundai	Santro	LP zipPlus
2264	Pune	2012	24500	Petrol	Manual	Second	18.30	NaN	NaN	NaN	2.95	Toyota	Etios	Liva V
2325	Pune	2015	67000	Petrol	Manual	First	16.10	NaN	NaN	NaN	4.70	Maruti	Swift	1.3 VXI ABS
2335	Mumbai	2007	55000	Petrol	Manual	Second	16.10	NaN	NaN	NaN	1.75	Maruti	Swift	1.3 VXi
2530	Kochi	2014	64158	Diesel	Automatic	First	18.48	NaN	NaN	NaN	17.89	BMW	5	Series 520d Sedan
2542	Bangalore	2011	65000	Petrol	Manual	Second	0.00	NaN	NaN	NaN	3.15	Hyundai	Santro	GLS II - Euro II
2623	Pune	2012	95000	Diesel	Automatic	Second	18.48	NaN	NaN	NaN	18.00	BMW	5	Series 520d Sedan
2668	Kolkata	2014	32986	Petrol	Manual	First	16.10	NaN	NaN	NaN	4.24	Maruti	Swift	1.3 VXi
2737	Jaipur	2001	200000	Petrol	Manual	First	12.00	NaN	NaN	NaN	0.70	Maruti	Wagon	R Vx
2780	Pune	2009	100000	Petrol	Manual	First	0.00	NaN	NaN	NaN	1.60	Hyundai	Santro	GLS II - Euro II
2842	Bangalore	2012	43000	Petrol	Manual	First	0.00	NaN	NaN	NaN	3.25	Hyundai	Santro	GLS II - Euro II
3272	Mumbai	2008	81000	Diesel	Automatic	Second	18.48	NaN	NaN	NaN	10.50	BMW	5	Series 520d Sedan
3404	Jaipur	2006	125000	Petrol	Manual	Fourth & Above	16.10	NaN	NaN	NaN	2.35	Maruti	Swift	1.3 VXi
3520	Delhi	2012	90000	Diesel	Automatic	First	18.48	NaN	NaN	NaN	14.50	BMW	5	Series 520d Sedan
3522	Kochi	2012	66400	Petrol	Manual	First	0.00	NaN	NaN	NaN	2.66	Hyundai	Santro	GLS II - Euro II
3810	Kolkata	2013	27000	Petrol	Automatic	First	14.00	NaN	NaN	NaN	11.99	Honda	CR-V	AT With Sun Roof
4011	Pune	2011	45271	Diesel	Manual	First	20.30	NaN	NaN	NaN	2.60	Fiat	Punto	1.3 Emotion
4152	Mumbai	2003	75000	Diesel	Automatic	Second	0.00	NaN	NaN	NaN	16.11	Land	Rover	Range Rover 3.0 D
4229	Bangalore	2005	79000	Petrol	Manual	Second	17.00	NaN	NaN	NaN	1.65	Hyundai	Santro	Xing XG
4577	Delhi	2012	72000	Diesel	Automatic	Third	18.48	NaN	NaN	NaN	13.85	BMW	5	Series 520d Sedan
4604	Pune	2011	98000	Petrol	Manual	First	16.70	NaN	NaN	NaN	3.15	Honda	Jazz	Select Edition
4697	Kochi	2017	17941	Petrol	Manual	First	15.70	NaN	NaN	NaN	3.93	Fiat	Punto	1.2 Dynamic
4712	Pune	2003	80000	Petrol	Manual	Second	17.00	NaN	NaN	NaN	0.90	Hyundai	Santro	Xing XG
4952	Kolkata	2010	47000	Petrol	Manual	First	14.60	NaN	NaN	NaN	1.49	Fiat	Punto	1.4 Emotion
5015	Delhi	2006	63000	Petrol	Manual	First	16.10	NaN	NaN	NaN	1.60	Maruti	Swift	1.3 VXi
5185	Delhi	2012	52000	Petrol	Manual	First	16.10	NaN	NaN	NaN	3.65	Maruti	Swift	1.3 LXI
5270	Bangalore	2002	53000	Petrol	Manual	Second	0.00	NaN	NaN	NaN	1.85	Honda	City	1.5 GXI

Now, we are going to get the columns with missing values

In [71]:

Copied!





for n in num_missing.value_counts().sort_index().index:
    if n > 0:
        print(f'Rows with exactly {n} missing values, NAs are found in:')
        n_miss_per_col = data[num_missing == n].isnull().sum()
        print(n_miss_per_col[n_miss_per_col > 0])
        print('\n')
for n in num_missing.value_counts().sort_index().index:
    if n > 0:
        print(f'Rows with exactly {n} missing values, NAs are found in:')
        n_miss_per_col = data[num_missing == n].isnull().sum()
        print(n_miss_per_col[n_miss_per_col > 0])
        print('\n')
        

Rows with exactly 1 missing values, NAs are found in:
Mileage      2
Power      103
Seats        2
dtype: int64


Rows with exactly 2 missing values, NAs are found in:
Power    4
Seats    4
dtype: int64


Rows with exactly 3 missing values, NAs are found in:
Engine    36
Power     36
Seats     36
dtype: int64

Now, let's calculate the percentage of missing values per column

In [72]:

Copied!

# percentage of missing values
data.isnull().sum(axis=0)
# percentage of missing values
data.isnull().sum(axis=0)

Out[72]:

Location               0
Year                   0
Kilometers_Driven      0
Fuel_Type              0
Transmission           0
Owner_Type             0
Mileage                2
Engine                36
Power                143
Seats                 42
Price                  0
Brand                  0
Model                  0
Specs                  0
dtype: int64

Engine, Power, Seats and Mileage columns have missing values.
Power column has 143 rows (2.5% of rows) with missing values.
Since the percentage of missing values is lower than 3% for all columns, we are going to impute missing values with the k-Nearest Neighbors using KKNImputer.
We select the k-Nearest Neighbors instead of the mean to avoid the influence of outliers in those columns

In [74]:

Copied!

# load KNNImputer 
from sklearn.impute import KNNImputer
imputer = KNNImputer()
# load KNNImputer 
from sklearn.impute import KNNImputer
imputer = KNNImputer()

In [75]:

Copied!





# create data set with only numeric columns
data_n = data.select_dtypes(include=np.number)
data_n_cols = data_n.columns.tolist()
data_n.info()
# create data set with only numeric columns
data_n = data.select_dtypes(include=np.number)
data_n_cols = data_n.columns.tolist()
data_n.info()

<class 'pandas.core.frame.DataFrame'>
Index: 6019 entries, 0 to 6018
Data columns (total 7 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Year               6019 non-null   int64  
 1   Kilometers_Driven  6019 non-null   int64  
 2   Mileage            6017 non-null   float64
 3   Engine             5983 non-null   float64
 4   Power              5876 non-null   float64
 5   Seats              5977 non-null   float64
 6   Price              6019 non-null   float64
dtypes: float64(5), int64(2)
memory usage: 376.2 KB

In [76]:

Copied!

# input values with KNNImputer
data_n = pd.DataFrame(imputer.fit_transform(data_n), columns=data_n_cols)
data_n.info()
# input values with KNNImputer
data_n = pd.DataFrame(imputer.fit_transform(data_n), columns=data_n_cols)
data_n.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6019 entries, 0 to 6018
Data columns (total 7 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Year               6019 non-null   float64
 1   Kilometers_Driven  6019 non-null   float64
 2   Mileage            6019 non-null   float64
 3   Engine             6019 non-null   float64
 4   Power              6019 non-null   float64
 5   Seats              6019 non-null   float64
 6   Price              6019 non-null   float64
dtypes: float64(7)
memory usage: 329.3 KB

In [77]:

Copied!





# replace columns with new imputed columns
data['Power'] = data_n['Power']
data['Mileage'] = data_n['Mileage']
data['Engine'] = data_n['Engine']
data['Seats'] = data_n['Seats']
#check there are not missing values
data.info()
# replace columns with new imputed columns
data['Power'] = data_n['Power']
data['Mileage'] = data_n['Mileage']
data['Engine'] = data_n['Engine']
data['Seats'] = data_n['Seats']
#check there are not missing values
data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 6019 entries, 0 to 6018
Data columns (total 14 columns):
 #   Column             Non-Null Count  Dtype   
---  ------             --------------  -----   
 0   Location           6019 non-null   category
 1   Year               6019 non-null   int64   
 2   Kilometers_Driven  6019 non-null   int64   
 3   Fuel_Type          6019 non-null   category
 4   Transmission       6019 non-null   category
 5   Owner_Type         6019 non-null   category
 6   Mileage            6019 non-null   float64 
 7   Engine             6019 non-null   float64 
 8   Power              6019 non-null   float64 
 9   Seats              6019 non-null   float64 
 10  Price              6019 non-null   float64 
 11  Brand              6019 non-null   category
 12  Model              6019 non-null   category
 13  Specs              6019 non-null   category
dtypes: category(7), float64(5), int64(2)
memory usage: 520.4 KB

There are no data missing and we can continue with the analysis

8 Log Transformation¶

8.1 Kilometers_Driven¶

Kilometers_Driven column is very skewed. We are going to use the log transformation to improve the distribution

In [78]:

Copied!

sns.histplot(data['Kilometers_Driven']);
sns.histplot(data['Kilometers_Driven']);

In [79]:

Copied!

# distribution of the log transformation
sns.histplot(np.log(data['Kilometers_Driven']));
# distribution of the log transformation
sns.histplot(np.log(data['Kilometers_Driven']));

We can see a very good improvement in the distribution. Now, we are going to create a new column with the log of Kilometers_Driven and drop the Kilometers_Driven column

In [80]:

Copied!

data['Kilometers_Driven_log'] = np.log(data['Kilometers_Driven'])
data.drop('Kilometers_Driven', axis=1, inplace=True)
data['Kilometers_Driven_log'] = np.log(data['Kilometers_Driven'])
data.drop('Kilometers_Driven', axis=1, inplace=True)

In [81]:

Copied!

# stats for new Kilometers_Driven_log column
data['Kilometers_Driven_log'].describe()
# stats for new Kilometers_Driven_log column
data['Kilometers_Driven_log'].describe()

Out[81]:

count    6019.000000
mean       10.758780
std         0.715788
min         5.141664
25%        10.434116
50%        10.878047
75%        11.198215
max        15.687313
Name: Kilometers_Driven_log, dtype: float64

In [82]:

Copied!

data['Kilometers_Driven_log'].skew()
data['Kilometers_Driven_log'].skew()

Out[82]:

-1.29076524053299

In [83]:

Copied!





# creating the 2 subplots
f2, (ax_box2, ax_hist2) = plt.subplots(nrows = 2, # Number of rows of the subplot grid= 2
                                       sharex = True, # x-axis will be shared among all subplots
                                       gridspec_kw = {"height_ratios": (.25, .75)});
sns.boxplot(data['Kilometers_Driven_log'], ax=ax_box2, showmeans=True, color='violet'); # boxplot 
sns.distplot(data['Kilometers_Driven_log'], kde=True, ax=ax_hist2); # histogram
ax_hist2.axvline(np.mean(data['Kilometers_Driven_log']), color='green', linestyle='--'); # Add mean to the histogram
ax_hist2.axvline(np.median(data['Kilometers_Driven_log']), color='black', linestyle='-');
# creating the 2 subplots
f2, (ax_box2, ax_hist2) = plt.subplots(nrows = 2, # Number of rows of the subplot grid= 2
                                       sharex = True, # x-axis will be shared among all subplots
                                       gridspec_kw = {"height_ratios": (.25, .75)});
sns.boxplot(data['Kilometers_Driven_log'], ax=ax_box2, showmeans=True, color='violet'); # boxplot 
sns.distplot(data['Kilometers_Driven_log'], kde=True, ax=ax_hist2); # histogram
ax_hist2.axvline(np.mean(data['Kilometers_Driven_log']), color='green', linestyle='--'); # Add mean to the histogram
ax_hist2.axvline(np.median(data['Kilometers_Driven_log']), color='black', linestyle='-');

There are several values flagged as suspicious by the boxplot for the Kilometers_Driven_log column. There are some outliers above 14, but the rest of the points aren't inconsistent with the overall distribution of the data.

8.2 Power¶

Power column is skewed. We are going to use the log transformation to improve the distribution

In [84]:

Copied!

sns.histplot(data['Power']);
sns.histplot(data['Power']);

In [85]:

Copied!

# distribution of the log transformation
sns.histplot(np.log(data['Power']));
# distribution of the log transformation
sns.histplot(np.log(data['Power']));

We can see an improvement in the distribution. Now, we are going to create a new column with the log of Power and drop the Power column

In [86]:

Copied!

data['Power_log'] = np.log(data['Power'])
data.drop('Power', axis=1, inplace=True)
data['Power_log'] = np.log(data['Power'])
data.drop('Power', axis=1, inplace=True)

In [87]:

Copied!

# stats for new Kilometers_Driven_log column
data['Power_log'].describe()
# stats for new Kilometers_Driven_log column
data['Power_log'].describe()

Out[87]:

count    6019.000000
mean        4.635187
std         0.414201
min         3.532226
25%         4.317488
50%         4.543295
75%         4.927978
max         6.327937
Name: Power_log, dtype: float64

In [88]:

Copied!

data['Power_log'].skew()
data['Power_log'].skew()

Out[88]:

0.46088996911606844

In [89]:

Copied!





# creating the 2 subplots
f2, (ax_box2, ax_hist2) = plt.subplots(nrows = 2, # Number of rows of the subplot grid= 2
                                       sharex = True, # x-axis will be shared among all subplots
                                       gridspec_kw = {"height_ratios": (.25, .75)});
sns.boxplot(data['Power_log'], ax=ax_box2, showmeans=True, color='violet'); # boxplot 
sns.distplot(data['Power_log'], kde=True, ax=ax_hist2); # histogram
ax_hist2.axvline(np.mean(data['Power_log']), color='green', linestyle='--'); # Add mean to the histogram
ax_hist2.axvline(np.median(data['Power_log']), color='black', linestyle='-');
# creating the 2 subplots
f2, (ax_box2, ax_hist2) = plt.subplots(nrows = 2, # Number of rows of the subplot grid= 2
                                       sharex = True, # x-axis will be shared among all subplots
                                       gridspec_kw = {"height_ratios": (.25, .75)});
sns.boxplot(data['Power_log'], ax=ax_box2, showmeans=True, color='violet'); # boxplot 
sns.distplot(data['Power_log'], kde=True, ax=ax_hist2); # histogram
ax_hist2.axvline(np.mean(data['Power_log']), color='green', linestyle='--'); # Add mean to the histogram
ax_hist2.axvline(np.median(data['Power_log']), color='black', linestyle='-');

There are several values flagged as suspicious by the boxplot for the Power_log column. However, those points aren't inconsistent with the overall distribution of the data.

8.3 Engine¶

Engine column is skewed. However, the log transformation does not improve the distribution

In [90]:

Copied!

sns.histplot(data['Engine']);
sns.histplot(data['Engine']);

In [91]:

Copied!

# distribution of the log transformation
sns.histplot(np.log(data['Engine']));
# distribution of the log transformation
sns.histplot(np.log(data['Engine']));

We do not see an improvement in the distribution, and we are going to keep the original column

8.4 Price¶

Price column is skewed. We are going to use the log transformation to improve the distribution

In [92]:

Copied!

sns.histplot(data['Price']);
sns.histplot(data['Price']);

In [93]:

Copied!

# distribution of the log transformation
sns.histplot(np.log(data['Price']));
# distribution of the log transformation
sns.histplot(np.log(data['Price']));

We can see an improvement in the distribution. Now, we are going to create a new column with the log of Price and drop the Price column

In [94]:

Copied!

data['Price_log'] = np.log(data['Price'])
data.drop('Price', axis=1, inplace=True)
data['Price_log'] = np.log(data['Price'])
data.drop('Price', axis=1, inplace=True)

In [95]:

Copied!

# stats for new Kilometers_Driven_log column
data['Price_log'].describe()
# stats for new Kilometers_Driven_log column
data['Price_log'].describe()

Out[95]:

count    6019.000000
mean        1.825095
std         0.874059
min        -0.820981
25%         1.252763
50%         1.729884
75%         2.297573
max         5.075174
Name: Price_log, dtype: float64

In [96]:

Copied!

data['Price_log'].skew()
data['Price_log'].skew()

Out[96]:

0.4173906918413524

In [97]:

Copied!





# creating the 2 subplots
f2, (ax_box2, ax_hist2) = plt.subplots(nrows = 2, # Number of rows of the subplot grid= 2
                                       sharex = True, # x-axis will be shared among all subplots
                                       gridspec_kw = {"height_ratios": (.25, .75)});
sns.boxplot(data['Price_log'], ax=ax_box2, showmeans=True, color='violet'); # boxplot 
sns.distplot(data['Price_log'], kde=True, ax=ax_hist2); # histogram
ax_hist2.axvline(np.mean(data['Price_log']), color='green', linestyle='--'); # Add mean to the histogram
ax_hist2.axvline(np.median(data['Price_log']), color='black', linestyle='-');
# creating the 2 subplots
f2, (ax_box2, ax_hist2) = plt.subplots(nrows = 2, # Number of rows of the subplot grid= 2
                                       sharex = True, # x-axis will be shared among all subplots
                                       gridspec_kw = {"height_ratios": (.25, .75)});
sns.boxplot(data['Price_log'], ax=ax_box2, showmeans=True, color='violet'); # boxplot 
sns.distplot(data['Price_log'], kde=True, ax=ax_hist2); # histogram
ax_hist2.axvline(np.mean(data['Price_log']), color='green', linestyle='--'); # Add mean to the histogram
ax_hist2.axvline(np.median(data['Price_log']), color='black', linestyle='-');

There are several values flagged as suspicious by the boxplot for the Price_log column. However, those points aren't inconsistent with the overall distribution of the data.

9 Outliers Treatment¶

9.1 Kilometers_Driven¶

Kilometers_Driven_log have some outliers above 14. We are going to replace those values with the median

In [98]:

Copied!

# replacing zeros with mean
data.loc[data['Kilometers_Driven_log']>14,'Kilometers_Driven_log'] = data['Kilometers_Driven_log'].mean()
# replacing zeros with mean
data.loc[data['Kilometers_Driven_log']>14,'Kilometers_Driven_log'] = data['Kilometers_Driven_log'].mean()

9.2 Mileage¶

Mileage column have several rows with value equals zero. We are going to replace those values with the median

In [99]:

Copied!

# replacing zeros with mean
data.loc[data['Mileage']==0,'Mileage'] = data['Mileage'].mean()
# replacing zeros with mean
data.loc[data['Mileage']==0,'Mileage'] = data['Mileage'].mean()

In [100]:

Copied!

# check new distribution
sns.histplot(data['Mileage']);
# check new distribution
sns.histplot(data['Mileage']);

In [101]:

Copied!

data['Mileage'].describe()
data['Mileage'].describe()

Out[101]:

count    6019.000000
mean       18.340122
std         4.151511
min         6.400000
25%        15.400000
50%        18.150000
75%        21.100000
max        33.540000
Name: Mileage, dtype: float64

9.3 Seats¶

There is one 1 car with 0 seats. We are going to replace this value with the mean

In [102]:

Copied!

data[data['Seats']==0]
data[data['Seats']==0]

Out[102]:

	Location	Year	Fuel_Type	Transmission	Owner_Type	Mileage	Engine	Seats	Brand	Model	Specs	Kilometers_Driven_log	Power_log	Price_log
3999	Hyderabad	2012	Petrol	Automatic	First	10.5	3197.0	0.0	Audi	A4	3.2 FSI Tiptronic Quattro	11.736069	5.084134	2.890372

In [103]:

Copied!

# replacing zeros with mean
data.loc[data['Seats']==0,'Seats'] = data['Seats'].mean()
# replacing zeros with mean
data.loc[data['Seats']==0,'Seats'] = data['Seats'].mean()

10 Model Building¶

First, we are going to drop column Specs because it has high cardinality (1893 distinct values)

In [104]:

Copied!

data.drop(['Specs'], axis=1, inplace=True)
data.drop(['Specs'], axis=1, inplace=True)

In [105]:

Copied!

# check there are not missing values and columns are the correcy type
data.info()
# check there are not missing values and columns are the correcy type
data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 6019 entries, 0 to 6018
Data columns (total 13 columns):
 #   Column                 Non-Null Count  Dtype   
---  ------                 --------------  -----   
 0   Location               6019 non-null   category
 1   Year                   6019 non-null   int64   
 2   Fuel_Type              6019 non-null   category
 3   Transmission           6019 non-null   category
 4   Owner_Type             6019 non-null   category
 5   Mileage                6019 non-null   float64 
 6   Engine                 6019 non-null   float64 
 7   Seats                  6019 non-null   float64 
 8   Brand                  6019 non-null   category
 9   Model                  6019 non-null   category
 10  Kilometers_Driven_log  6019 non-null   float64 
 11  Power_log              6019 non-null   float64 
 12  Price_log              6019 non-null   float64 
dtypes: category(6), float64(6), int64(1)
memory usage: 429.4 KB

10.1 Define independent and dependent variables¶

In [106]:

Copied!

ind_vars = data.drop(["Price_log"], axis=1)
dep_var = data[["Price_log"]]
ind_vars = data.drop(["Price_log"], axis=1)
dep_var = data[["Price_log"]]

10.2 Creating dummy variables¶

In [107]:

Copied!





def encode_cat_vars(x):
    x = pd.get_dummies(
        x,
        columns=x.select_dtypes(include=["object", "category"]).columns.tolist(),
        drop_first=True,
    )
    return x


ind_vars_num = encode_cat_vars(ind_vars)
ind_vars_num.head()
def encode_cat_vars(x):
    x = pd.get_dummies(
        x,
        columns=x.select_dtypes(include=["object", "category"]).columns.tolist(),
        drop_first=True,
    )
    return x


ind_vars_num = encode_cat_vars(ind_vars)
ind_vars_num.head()

Out[107]:

	Year	Mileage	Engine	Seats	Kilometers_Driven_log	Power_log	Location_Bangalore	Location_Chennai	Location_Coimbatore	Location_Delhi	...	Model_Xcent	Model_Xenon	Model_Xylo	Model_Yeti	Model_Z4	Model_Zen	Model_Zest	Model_i10	Model_i20	Model_redi-GO
0	2010	26.60	998.0	5.0	11.184421	4.063198	False	False	False	False	...	False	False	False	False	False	False	False	False	False	False
1	2015	19.67	1582.0	5.0	10.621327	4.837868	False	False	False	False	...	False	False	False	False	False	False	False	False	False	False
2	2011	18.20	1199.0	5.0	10.736397	4.485260	False	True	False	False	...	False	False	False	False	False	False	False	False	False	False
3	2012	20.77	1248.0	7.0	11.373663	4.485936	False	True	False	False	...	False	False	False	False	False	False	False	False	False	False
4	2013	15.20	1968.0	5.0	10.613246	4.947340	False	False	True	False	...	False	False	False	False	False	False	False	False	False	False

5 rows × 274 columns

In [108]:

Copied!

ind_vars_num.shape
ind_vars_num.shape

Out[108]:

(6019, 274)

The independent set has 6019 rows and 274 columns

10.3 Split the data into train and test¶

In [109]:

Copied!





from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

# Create train and test data sets
x_train, x_test, y_train, y_test = train_test_split(
    ind_vars_num, dep_var, test_size=0.3, random_state=1
)


# Create train and test data sets
x_train3, x_test3, y_train3, y_test3 = train_test_split(
    ind_vars_num, dep_var, test_size=0.2, random_state=10
)
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

# Create train and test data sets
x_train, x_test, y_train, y_test = train_test_split(
    ind_vars_num, dep_var, test_size=0.3, random_state=1
)


# Create train and test data sets
x_train3, x_test3, y_train3, y_test3 = train_test_split(
    ind_vars_num, dep_var, test_size=0.2, random_state=10
)

In [110]:

Copied!

print("Number of rows in train data =", x_train.shape[0])
print("Number of rows in train data =", x_test.shape[0])
print("Number of rows in train data =", x_train.shape[0])
print("Number of rows in train data =", x_test.shape[0])

Number of rows in train data = 4213
Number of rows in train data = 1806

10.4 Fitting a linear model¶

Now, we are going to run the linear regression using the train data set

In [111]:

Copied!

# Run Linear Regression
lin_reg_model = LinearRegression()
lin_reg_model.fit(x_train, y_train)
# Run Linear Regression
lin_reg_model = LinearRegression()
lin_reg_model.fit(x_train, y_train)

Out[111]:

LinearRegression()

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

10.5 Performance of the model¶

First, we are going to calculate the $R^2$ for the train and test sets

In [112]:

Copied!

# R^2 train set
lin_reg_model.score(x_train, y_train)
# R^2 train set
lin_reg_model.score(x_train, y_train)

Out[112]:

0.9587840089626443

In [113]:

Copied!

# R^2 test set
lin_reg_model.score(x_test, y_test)
# R^2 test set
lin_reg_model.score(x_test, y_test)

Out[113]:

0.959104503710836

In [114]:

Copied!





def r2(y,y_predict):
    e = y-y_predict
    ym = np.mean(y)
    v = y-ym
    e2 = np.sum(e*e)
    v2 = np.sum(v*v)
    return 1-(e2/v2)
def r2(y,y_predict):
    e = y-y_predict
    ym = np.mean(y)
    v = y-ym
    e2 = np.sum(e*e)
    v2 = np.sum(v*v)
    return 1-(e2/v2)

In [115]:

Copied!

r2(y_train,lin_reg_model.predict(x_train))
r2(y_train,lin_reg_model.predict(x_train))

Out[115]:

Price_log    0.958784
dtype: float64

In [116]:

Copied!

r2(y_test,lin_reg_model.predict(x_test))
r2(y_test,lin_reg_model.predict(x_test))

Out[116]:

Price_log    0.959105
dtype: float64

In [117]:

Copied!

r2(np.exp(y_train),np.exp(lin_reg_model.predict(x_train)))
r2(np.exp(y_train),np.exp(lin_reg_model.predict(x_train)))

Out[117]:

Price_log    0.922877
dtype: float64

In [118]:

Copied!

r2(np.exp(y_test),np.exp(lin_reg_model.predict(x_test)))
r2(np.exp(y_test),np.exp(lin_reg_model.predict(x_test)))

Out[118]:

Price_log    0.908336
dtype: float64

The $R^2$ for the train set is 0.958 and for the test set is 0.958. Both values are comparable and very similar. Therefore, the model is not overfitting and the performance is very good

10.5.1 Performance metrics¶

User functions to calculate performance metrics

In [119]:

Copied!





# To check model performance
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error

# Adjusted R^2
def adj_r2(ind_vars, targets, predictions):
    r2 = r2_score(targets, predictions)
    n = ind_vars.shape[0]
    k = ind_vars.shape[1]
    return 1 - ((1 - r2) * (n - 1) / (n - k - 1))


# Model performance check
def model_perf(model, inp, out):

    y_pred = model.predict(inp)
    y_act = out.values
    
    #Dictionary with metrics
    metrics = {"RMSE": np.sqrt(mean_squared_error(y_act, y_pred)),
               "MAE": mean_absolute_error(y_act, y_pred),
               "R^2": r2_score(y_act, y_pred),
               "Adjusted R^2": adj_r2(inp, y_act, y_pred)}
    return metrics
# To check model performance
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error

# Adjusted R^2
def adj_r2(ind_vars, targets, predictions):
    r2 = r2_score(targets, predictions)
    n = ind_vars.shape[0]
    k = ind_vars.shape[1]
    return 1 - ((1 - r2) * (n - 1) / (n - k - 1))


# Model performance check
def model_perf(model, inp, out):

    y_pred = model.predict(inp)
    y_act = out.values
    
    #Dictionary with metrics
    metrics = {"RMSE": np.sqrt(mean_squared_error(y_act, y_pred)),
               "MAE": mean_absolute_error(y_act, y_pred),
               "R^2": r2_score(y_act, y_pred),
               "Adjusted R^2": adj_r2(inp, y_act, y_pred)}
    return metrics

In [120]:

Copied!

# Model performance on train set
model_perf(lin_reg_model, x_train, y_train)
# Model performance on train set
model_perf(lin_reg_model, x_train, y_train)

Out[120]:

{'RMSE': 0.1770842726656583,
 'MAE': 0.12405596763815441,
 'R^2': 0.9587840089626443,
 'Adjusted R^2': 0.9559162635222594}

In [121]:

Copied!

# Model performance on test set
model_perf(lin_reg_model, x_test, y_test)
# Model performance on test set
model_perf(lin_reg_model, x_test, y_test)

Out[121]:

{'RMSE': 0.17745658563106884,
 'MAE': 0.12829130869663247,
 'R^2': 0.959104503710836,
 'Adjusted R^2': 0.9517855187446499}

We can conclude that the model is not overfitting since all metrics are comparable in both train and test sets. The model is able to predict Prices with a mean error of 0.129 on the test set

10.5.2 Residuals distribution¶

Train set residuals¶

Now, we are going to analyze the distribution of the residuals

In [122]:

Copied!





# train set residuals distribution
residuals_train = lin_reg_model.predict(x_train) - y_train
hplot = sns.histplot(residuals_train, kde=True);
hplot.set_xlim(-1,1);
# train set residuals distribution
residuals_train = lin_reg_model.predict(x_train) - y_train
hplot = sns.histplot(residuals_train, kde=True);
hplot.set_xlim(-1,1);

In [123]:

Copied!

# scatterplot between residuals and predicted variables
y_train_predict = pd.DataFrame(lin_reg_model.predict(x_train), columns=['y_predict'])
sns.scatterplot(x=y_train_predict['y_predict'], y=residuals_train['Price_log']);
# scatterplot between residuals and predicted variables
y_train_predict = pd.DataFrame(lin_reg_model.predict(x_train), columns=['y_predict'])
sns.scatterplot(x=y_train_predict['y_predict'], y=residuals_train['Price_log']);

The scatter plot is random and therefore the model does not violate the assumption of Homoscedasticity

Test set residuals¶

In [124]:

Copied!

residuals_test = lin_reg_model.predict(x_test) - y_test
hplot = sns.histplot(residuals_test, kde=True);
hplot.set_xlim(-1,1);
residuals_test = lin_reg_model.predict(x_test) - y_test
hplot = sns.histplot(residuals_test, kde=True);
hplot.set_xlim(-1,1);

In [125]:

Copied!

# scatterplot between residuals and predicted variables
y_test_predict = pd.DataFrame(lin_reg_model.predict(x_test), columns=['y_predict'])
sns.scatterplot(x=y_test_predict['y_predict'], y=residuals_test['Price_log']);
# scatterplot between residuals and predicted variables
y_test_predict = pd.DataFrame(lin_reg_model.predict(x_test), columns=['y_predict'])
sns.scatterplot(x=y_test_predict['y_predict'], y=residuals_test['Price_log']);

The scatter plot is random and therefore the model does not violate the assumption of Homoscedasticity

10.6 Coefficients and Intercept of the model¶

In [126]:

Copied!





# Create data frame with coefficients
coef_df = pd.DataFrame(
    np.append(lin_reg_model.coef_.flatten(), lin_reg_model.intercept_),
    index=x_train.columns.tolist() + ["Intercept"],
    columns=["Coefficients"],
)
# Display all coefficients
pd.set_option('display.max_rows', coef_df.shape[0]+1)
coef_df
# Create data frame with coefficients
coef_df = pd.DataFrame(
    np.append(lin_reg_model.coef_.flatten(), lin_reg_model.intercept_),
    index=x_train.columns.tolist() + ["Intercept"],
    columns=["Coefficients"],
)
# Display all coefficients
pd.set_option('display.max_rows', coef_df.shape[0]+1)
coef_df

Out[126]:

	Coefficients
Year	1.061775e-01
Mileage	1.316338e-03
Engine	-4.557528e-05
Seats	-1.807394e-04
Kilometers_Driven_log	-7.715894e-02
Power_log	3.782705e-01
Location_Bangalore	1.767534e-01
Location_Chennai	5.678747e-02
Location_Coimbatore	1.474313e-01
Location_Delhi	-8.008404e-02
Location_Hyderabad	1.564025e-01
Location_Jaipur	-1.610073e-02
Location_Kochi	-1.353527e-02
Location_Kolkata	-2.176047e-01
Location_Mumbai	-5.679099e-02
Location_Pune	-2.156988e-02
Fuel_Type_Diesel	4.966604e-02
Fuel_Type_Electric	3.062401e-01
Fuel_Type_LPG	-1.187979e-02
Fuel_Type_Petrol	-6.045009e-02
Transmission_Manual	-1.121924e-01
Owner_Type_Fourth & Above	-1.194041e-01
Owner_Type_Second	-5.739302e-02
Owner_Type_Third	-1.684767e-01
Brand_Audi	5.774404e-01
Brand_BMW	1.069879e-01
Brand_Bentley	9.889603e-01
Brand_Chevrolet	-7.607063e-01
Brand_Datsun	-1.033328e+00
Brand_Fiat	-9.095028e-01
Brand_Force	-4.784993e-02
Brand_Ford	-7.306453e-01
Brand_Hindustan	3.052961e-12
Brand_Honda	-5.494955e-01
Brand_Hyundai	-1.140057e+00
Brand_ISUZU	-3.463250e-01
Brand_Isuzu	8.891776e-13
Brand_Jaguar	8.123010e-01
Brand_Jeep	-1.848095e-02
Brand_Lamborghini	1.184292e+00
Brand_Land	4.192147e-01
Brand_Mahindra	-5.890545e-01
Brand_Maruti	-6.364073e-01
Brand_Mercedes-Benz	6.110158e-01
Brand_Mini	3.724754e-01
Brand_Mitsubishi	-6.988398e-02
Brand_Nissan	-4.932404e-01
Brand_OpelCorsa	-6.200596e-13
Brand_Porsche	7.997953e-01
Brand_Renault	-6.041900e-01
Brand_Skoda	-3.930683e-01
Brand_Smart	-2.371821e-01
Brand_Tata	-5.277481e-01
Brand_Toyota	1.267729e-02
Brand_Volkswagen	-4.056313e-01
Brand_Volvo	2.399334e-01
Model_1.4Gsi	-1.327410e-14
Model_1000	2.902123e-13
Model_3	1.064970e-01
Model_370Z	-1.439404e-13
Model_5	4.035985e-01
Model_6	1.121772e+00
Model_7	8.506502e-01
Model_800	-6.486312e-01
Model_A	-4.333749e-01
Model_A-Star	-2.464424e-01
Model_A3	-3.709362e-01
Model_A4	-2.706989e-01
Model_A6	-1.468467e-01
Model_A7	4.505197e-01
Model_A8	2.214500e-01
Model_Abarth	-5.806466e-14
Model_Accent	7.664687e-02
Model_Accord	1.786603e-01
Model_Alto	-4.737990e-01
Model_Amaze	-2.578199e-01
Model_Ameo	-3.923683e-01
Model_Aspire	1.004053e-01
Model_Aveo	-3.457447e-01
Model_Avventura	1.180748e-01
Model_B	-4.471035e-01
Model_BR-V	-6.853318e-02
Model_BRV	2.825778e-02
Model_Baleno	-1.021670e-01
Model_Beat	-3.622543e-01
Model_Beetle	-2.145506e-14
Model_Bolero	1.954131e-01
Model_Bolt	-6.321259e-01
Model_Boxster	4.052314e-15
Model_Brio	-3.555369e-01
Model_C-Class	-3.104868e-01
Model_CLA	-2.939341e-01
Model_CLS-Class	3.338558e-01
Model_CR-V	4.936109e-01
Model_Camry	2.003381e-01
Model_Captiva	1.795402e-01
Model_Captur	1.181568e-01
Model_Cayenne	-4.493680e-01
Model_Cayman	5.211655e-01
Model_Cedia	-4.966320e-01
Model_Celerio	-3.286449e-01
Model_Ciaz	1.128385e-01
Model_City	1.452770e-02
Model_Civic	-8.820436e-02
Model_Classic	-3.442637e-01
Model_Clubman	2.810297e-01
Model_Compass	-1.848095e-02
Model_Continental	9.889603e-01
Model_Cooper	1.988535e-01
Model_Corolla	-3.308798e-01
Model_Countryman	-1.074077e-01
Model_Creta	8.796960e-01
Model_CrossPolo	-3.654663e-01
Model_Cruze	8.068924e-02
Model_D-MAX	-3.463250e-01
Model_Duster	1.221770e-01
Model_Dzire	-4.513748e-02
Model_E	6.661338e-16
Model_E-Class	-1.518833e-01
Model_EON	3.452707e-02
Model_EcoSport	1.588931e-01
Model_Ecosport	2.573158e-01
Model_Eeco	-4.662428e-01
Model_Elantra	8.812794e-01
Model_Elite	5.396578e-01
Model_Endeavour	7.964040e-01
Model_Enjoy	-1.127670e-01
Model_Ertiga	2.135297e-01
Model_Esteem	-6.439154e-01
Model_Estilo	-3.688273e-01
Model_Etios	-7.384430e-01
Model_Evalia	-4.024599e-01
Model_F	5.949628e-01
Model_Fabia	-5.627410e-01
Model_Fiesta	-7.866301e-02
Model_Figo	-1.827555e-01
Model_Fluence	-6.255634e-02
Model_Flying	-1.110223e-16
Model_Fortuner	3.763759e-01
Model_Fortwo	-2.371821e-01
Model_Freestyle	4.682399e-01
Model_Fusion	1.819050e-01
Model_GL-Class	4.952753e-01
Model_GLA	-2.212246e-01
Model_GLC	1.174502e-01
Model_GLE	3.154723e-01
Model_GLS	4.387667e-01
Model_GO	-2.105149e-01
Model_Gallardo	1.184292e+00
Model_Getz	1.771284e-01
Model_Grand	2.571492e-01
Model_Grande	-1.834601e-01
Model_Hexa	3.285831e-01
Model_Ignis	-3.791245e-01
Model_Ikon	-2.971519e-01
Model_Indica	-8.287199e-01
Model_Indigo	-7.456837e-01
Model_Innova	8.701657e-02
Model_Jazz	-1.622787e-01
Model_Jeep	1.181196e-01
Model_Jetta	1.227673e-01
Model_KUV	-4.220742e-01
Model_KWID	-6.321051e-01
Model_Koleos	3.891056e-01
Model_Lancer	-1.172340e-01
Model_Land	-2.775558e-16
Model_Laura	-1.106812e-01
Model_Linea	6.946507e-03
Model_Lodgy	1.736719e-03
Model_Logan	-5.156677e-01
Model_M-Class	1.294610e-01
Model_MU	4.163336e-17
Model_MUX	2.706169e-16
Model_Manza	-6.278130e-01
Model_Micra	-3.797234e-01
Model_Mobilio	-1.055585e-01
Model_Montero	3.367374e-01
Model_Motors	9.714451e-17
Model_Mustang	1.576728e+00
Model_Nano	-1.066021e+00
Model_New	-2.606214e-01
Model_Nexon	-3.983023e-02
Model_NuvoSport	-3.193562e-01
Model_Octavia	1.391406e-01
Model_Omni	-4.721723e-01
Model_One	-4.784993e-02
Model_Optra	-1.671785e-01
Model_Outlander	-6.370752e-02
Model_Pajero	2.709521e-01
Model_Panamera	7.279978e-01
Model_Passat	5.778024e-02
Model_Petra	-2.967823e-01
Model_Platinum	8.326673e-17
Model_Polo	-3.349437e-01
Model_Prius	3.062401e-01
Model_Pulse	-2.651567e-01
Model_Punto	-2.652687e-01
Model_Q3	-2.577470e-01
Model_Q5	4.141793e-02
Model_Q7	2.518171e-01
Model_Qualis	1.120295e-01
Model_Quanto	-3.687409e-01
Model_R-Class	6.386756e-02
Model_RS5	3.304247e-01
Model_Rapid	-2.682940e-01
Model_Redi	-4.869040e-01
Model_Renault	-2.098534e-01
Model_Ritz	-2.182836e-01
Model_Rover	4.192147e-01
Model_S	1.629848e-01
Model_S-Class	2.594994e-01
Model_S-Cross	1.441084e-01
Model_S60	-1.927748e-02
Model_S80	-2.712951e-01
Model_SL-Class	7.085461e-01
Model_SLC	2.621350e-01
Model_SLK-Class	4.748246e-01
Model_SX4	-9.882907e-02
Model_Safari	2.992587e-02
Model_Sail	-1.856540e-01
Model_Santa	1.029070e+00
Model_Santro	1.564984e-01
Model_Scala	-2.755480e-01
Model_Scorpio	2.628888e-01
Model_Siena	-2.890131e-01
Model_Sonata	8.564400e-01
Model_Spark	-4.065682e-01
Model_Ssangyong	4.359764e-01
Model_Sumo	-9.638676e-02
Model_Sunny	-2.104897e-01
Model_Superb	2.042283e-01
Model_Swift	-4.020150e-02
Model_TT	3.280398e-01
Model_TUV	-8.666424e-02
Model_Tavera	5.592310e-01
Model_Teana	6.672175e-02
Model_Terrano	-1.826751e-02
Model_Thar	6.232812e-02
Model_Tiago	-6.192760e-01
Model_Tigor	-4.091198e-01
Model_Tiguan	7.408846e-01
Model_Tucson	8.083781e-01
Model_V40	6.752698e-02
Model_Vento	-2.342850e-01
Model_Venture	-4.374467e-01
Model_Verito	-2.849063e-01
Model_Verna	5.399973e-01
Model_Versa	-1.978553e-02
Model_Vitara	1.270174e-01
Model_WR-V	-1.783798e-01
Model_WRV	-4.824091e-02
Model_Wagon	-2.844928e-01
Model_X-Trail	4.509783e-01
Model_X1	1.142903e-01
Model_X3	4.470527e-01
Model_X5	7.394769e-01
Model_X6	9.909875e-01
Model_XC60	5.142907e-02
Model_XC90	4.115498e-01
Model_XE	0.000000e+00
Model_XF	-2.061667e-01
Model_XJ	4.235049e-01
Model_XUV300	3.627486e-01
Model_XUV500	3.739087e-01
Model_Xcent	2.875080e-01
Model_Xenon	-4.504703e-01
Model_Xylo	-1.931750e-01
Model_Yeti	2.052791e-01
Model_Z4	9.327412e-01
Model_Zen	-3.685581e-01
Model_Zest	-4.196436e-01
Model_i10	2.949555e-01
Model_i20	4.661499e-01
Model_redi-GO	-3.359091e-01
Intercept	-2.122822e+02

10.6.1 Coefficients Interpretation¶

Positive impact¶

This is the list of coefficients with positive impact on prices. Among them are Year, Mileage and Power_log. Increase in these will lead to an increase in the price.

In [127]:

Copied!

coef_df[coef_df['Coefficients']>0].sort_values(by='Coefficients', ascending=False)
coef_df[coef_df['Coefficients']>0].sort_values(by='Coefficients', ascending=False)

Out[127]:

	Coefficients
Model_Mustang	1.576728e+00
Brand_Lamborghini	1.184292e+00
Model_Gallardo	1.184292e+00
Model_6	1.121772e+00
Model_Santa	1.029070e+00
Model_X6	9.909875e-01
Brand_Bentley	9.889603e-01
Model_Continental	9.889603e-01
Model_Z4	9.327412e-01
Model_Elantra	8.812794e-01
Model_Creta	8.796960e-01
Model_Sonata	8.564400e-01
Model_7	8.506502e-01
Brand_Jaguar	8.123010e-01
Model_Tucson	8.083781e-01
Brand_Porsche	7.997953e-01
Model_Endeavour	7.964040e-01
Model_Tiguan	7.408846e-01
Model_X5	7.394769e-01
Model_Panamera	7.279978e-01
Model_SL-Class	7.085461e-01
Brand_Mercedes-Benz	6.110158e-01
Model_F	5.949628e-01
Brand_Audi	5.774404e-01
Model_Tavera	5.592310e-01
Model_Verna	5.399973e-01
Model_Elite	5.396578e-01
Model_Cayman	5.211655e-01
Model_GL-Class	4.952753e-01
Model_CR-V	4.936109e-01
Model_SLK-Class	4.748246e-01
Model_Freestyle	4.682399e-01
Model_i20	4.661499e-01
Model_X-Trail	4.509783e-01
Model_A7	4.505197e-01
Model_X3	4.470527e-01
Model_GLS	4.387667e-01
Model_Ssangyong	4.359764e-01
Model_XJ	4.235049e-01
Brand_Land	4.192147e-01
Model_Rover	4.192147e-01
Model_XC90	4.115498e-01
Model_5	4.035985e-01
Model_Koleos	3.891056e-01
Power_log	3.782705e-01
Model_Fortuner	3.763759e-01
Model_XUV500	3.739087e-01
Brand_Mini	3.724754e-01
Model_XUV300	3.627486e-01
Model_Montero	3.367374e-01
Model_CLS-Class	3.338558e-01
Model_RS5	3.304247e-01
Model_Hexa	3.285831e-01
Model_TT	3.280398e-01
Model_GLE	3.154723e-01
Fuel_Type_Electric	3.062401e-01
Model_Prius	3.062401e-01
Model_i10	2.949555e-01
Model_Xcent	2.875080e-01
Model_Clubman	2.810297e-01
Model_Pajero	2.709521e-01
Model_Scorpio	2.628888e-01
Model_SLC	2.621350e-01
Model_S-Class	2.594994e-01
Model_Ecosport	2.573158e-01
Model_Grand	2.571492e-01
Model_Q7	2.518171e-01
Brand_Volvo	2.399334e-01
Model_A8	2.214500e-01
Model_Ertiga	2.135297e-01
Model_Yeti	2.052791e-01
Model_Superb	2.042283e-01
Model_Camry	2.003381e-01
Model_Cooper	1.988535e-01
Model_Bolero	1.954131e-01
Model_Fusion	1.819050e-01
Model_Captiva	1.795402e-01
Model_Accord	1.786603e-01
Model_Getz	1.771284e-01
Location_Bangalore	1.767534e-01
Model_S	1.629848e-01
Model_EcoSport	1.588931e-01
Model_Santro	1.564984e-01
Location_Hyderabad	1.564025e-01
Location_Coimbatore	1.474313e-01
Model_S-Cross	1.441084e-01
Model_Octavia	1.391406e-01
Model_M-Class	1.294610e-01
Model_Vitara	1.270174e-01
Model_Jetta	1.227673e-01
Model_Duster	1.221770e-01
Model_Captur	1.181568e-01
Model_Jeep	1.181196e-01
Model_Avventura	1.180748e-01
Model_GLC	1.174502e-01
Model_X1	1.142903e-01
Model_Ciaz	1.128385e-01
Model_Qualis	1.120295e-01
Brand_BMW	1.069879e-01
Model_3	1.064970e-01
Year	1.061775e-01
Model_Aspire	1.004053e-01
Model_Innova	8.701657e-02
Model_Cruze	8.068924e-02
Model_Accent	7.664687e-02
Model_V40	6.752698e-02
Model_Teana	6.672175e-02
Model_R-Class	6.386756e-02
Model_Thar	6.232812e-02
Model_Passat	5.778024e-02
Location_Chennai	5.678747e-02
Model_XC60	5.142907e-02
Fuel_Type_Diesel	4.966604e-02
Model_Q5	4.141793e-02
Model_EON	3.452707e-02
Model_Safari	2.992587e-02
Model_BRV	2.825778e-02
Model_City	1.452770e-02
Brand_Toyota	1.267729e-02
Model_Linea	6.946507e-03
Model_Lodgy	1.736719e-03
Mileage	1.316338e-03
Brand_Hindustan	3.052961e-12
Brand_Isuzu	8.891776e-13
Model_1000	2.902123e-13
Model_Boxster	4.052314e-15
Model_E	6.661338e-16
Model_MUX	2.706169e-16
Model_Motors	9.714451e-17
Model_Platinum	8.326673e-17
Model_MU	4.163336e-17

Negative impact¶

This is the list of coefficients with negative impact on prices. Among them are Kilometers_Drive_log, Engine and Seats. Increase in these will lead to a decrease in the price

In [128]:

Copied!

coef_df[coef_df['Coefficients']<0].sort_values(by='Coefficients')
coef_df[coef_df['Coefficients']<0].sort_values(by='Coefficients')

Out[128]:

	Coefficients
Intercept	-2.122822e+02
Brand_Hyundai	-1.140057e+00
Model_Nano	-1.066021e+00
Brand_Datsun	-1.033328e+00
Brand_Fiat	-9.095028e-01
Model_Indica	-8.287199e-01
Brand_Chevrolet	-7.607063e-01
Model_Indigo	-7.456837e-01
Model_Etios	-7.384430e-01
Brand_Ford	-7.306453e-01
Model_800	-6.486312e-01
Model_Esteem	-6.439154e-01
Brand_Maruti	-6.364073e-01
Model_Bolt	-6.321259e-01
Model_KWID	-6.321051e-01
Model_Manza	-6.278130e-01
Model_Tiago	-6.192760e-01
Brand_Renault	-6.041900e-01
Brand_Mahindra	-5.890545e-01
Model_Fabia	-5.627410e-01
Brand_Honda	-5.494955e-01
Brand_Tata	-5.277481e-01
Model_Logan	-5.156677e-01
Model_Cedia	-4.966320e-01
Brand_Nissan	-4.932404e-01
Model_Redi	-4.869040e-01
Model_Alto	-4.737990e-01
Model_Omni	-4.721723e-01
Model_Eeco	-4.662428e-01
Model_Xenon	-4.504703e-01
Model_Cayenne	-4.493680e-01
Model_B	-4.471035e-01
Model_Venture	-4.374467e-01
Model_A	-4.333749e-01
Model_KUV	-4.220742e-01
Model_Zest	-4.196436e-01
Model_Tigor	-4.091198e-01
Model_Spark	-4.065682e-01
Brand_Volkswagen	-4.056313e-01
Model_Evalia	-4.024599e-01
Brand_Skoda	-3.930683e-01
Model_Ameo	-3.923683e-01
Model_Micra	-3.797234e-01
Model_Ignis	-3.791245e-01
Model_A3	-3.709362e-01
Model_Estilo	-3.688273e-01
Model_Quanto	-3.687409e-01
Model_Zen	-3.685581e-01
Model_CrossPolo	-3.654663e-01
Model_Beat	-3.622543e-01
Model_Brio	-3.555369e-01
Model_D-MAX	-3.463250e-01
Brand_ISUZU	-3.463250e-01
Model_Aveo	-3.457447e-01
Model_Classic	-3.442637e-01
Model_redi-GO	-3.359091e-01
Model_Polo	-3.349437e-01
Model_Corolla	-3.308798e-01
Model_Celerio	-3.286449e-01
Model_NuvoSport	-3.193562e-01
Model_C-Class	-3.104868e-01
Model_Ikon	-2.971519e-01
Model_Petra	-2.967823e-01
Model_CLA	-2.939341e-01
Model_Siena	-2.890131e-01
Model_Verito	-2.849063e-01
Model_Wagon	-2.844928e-01
Model_Scala	-2.755480e-01
Model_S80	-2.712951e-01
Model_A4	-2.706989e-01
Model_Rapid	-2.682940e-01
Model_Punto	-2.652687e-01
Model_Pulse	-2.651567e-01
Model_New	-2.606214e-01
Model_Amaze	-2.578199e-01
Model_Q3	-2.577470e-01
Model_A-Star	-2.464424e-01
Brand_Smart	-2.371821e-01
Model_Fortwo	-2.371821e-01
Model_Vento	-2.342850e-01
Model_GLA	-2.212246e-01
Model_Ritz	-2.182836e-01
Location_Kolkata	-2.176047e-01
Model_GO	-2.105149e-01
Model_Sunny	-2.104897e-01
Model_Renault	-2.098534e-01
Model_XF	-2.061667e-01
Model_Xylo	-1.931750e-01
Model_Sail	-1.856540e-01
Model_Grande	-1.834601e-01
Model_Figo	-1.827555e-01
Model_WR-V	-1.783798e-01
Owner_Type_Third	-1.684767e-01
Model_Optra	-1.671785e-01
Model_Jazz	-1.622787e-01
Model_E-Class	-1.518833e-01
Model_A6	-1.468467e-01
Owner_Type_Fourth & Above	-1.194041e-01
Model_Lancer	-1.172340e-01
Model_Enjoy	-1.127670e-01
Transmission_Manual	-1.121924e-01
Model_Laura	-1.106812e-01
Model_Countryman	-1.074077e-01
Model_Mobilio	-1.055585e-01
Model_Baleno	-1.021670e-01
Model_SX4	-9.882907e-02
Model_Sumo	-9.638676e-02
Model_Civic	-8.820436e-02
Model_TUV	-8.666424e-02
Location_Delhi	-8.008404e-02
Model_Fiesta	-7.866301e-02
Kilometers_Driven_log	-7.715894e-02
Brand_Mitsubishi	-6.988398e-02
Model_BR-V	-6.853318e-02
Model_Outlander	-6.370752e-02
Model_Fluence	-6.255634e-02
Fuel_Type_Petrol	-6.045009e-02
Owner_Type_Second	-5.739302e-02
Location_Mumbai	-5.679099e-02
Model_WRV	-4.824091e-02
Brand_Force	-4.784993e-02
Model_One	-4.784993e-02
Model_Dzire	-4.513748e-02
Model_Swift	-4.020150e-02
Model_Nexon	-3.983023e-02
Location_Pune	-2.156988e-02
Model_Versa	-1.978553e-02
Model_S60	-1.927748e-02
Model_Compass	-1.848095e-02
Brand_Jeep	-1.848095e-02
Model_Terrano	-1.826751e-02
Location_Jaipur	-1.610073e-02
Location_Kochi	-1.353527e-02
Fuel_Type_LPG	-1.187979e-02
Seats	-1.807394e-04
Engine	-4.557528e-05
Brand_OpelCorsa	-6.200596e-13
Model_370Z	-1.439404e-13
Model_Abarth	-5.806466e-14
Model_Beetle	-2.145506e-14
Model_1.4Gsi	-1.327410e-14
Model_Land	-2.775558e-16
Model_Flying	-1.110223e-16

10.6.2 Analysis of coefficients¶

As expected, Year (most recent) has a positive impact on Price.
As expected, Kilometers_Driven has a negative impact on Price.
Power has a positive impact on Price.
Seats and Engine have a negative impact on Price.
There are some locations with positive impact on Price: Bangalore, Chennai, Coimbatore and Hyderabad
While, other locations have a negative impact on Price: Delhi, Jaipur, Kochi, Kolkata, Mumbai and Pune
Diesel and Electric cars have a positive impact on Price
Fuel LPG and Petrol have a negative impact on Price
Manual transmission has a negative impact on Price
Second, Third, Fourth and above owners have a negative impact on Price.
The Brand and Model in luxury cars (Lamborghini, Jaguar, Porsche, etc) have a strong positive impact on Price.
Economy Brands and Models (Datsun, Renault, Honda, Mahindra) have a negative impact on Price.

11 Forward Feature Selection¶

11.1 Identify most important features¶

There are 274 independent variables, and it is difficult to identify all key variables with strong relationship with Price. Therefore, we will select a subset of important features with forward feature selection using SequentialFeatureSelector

In [130]:

Copied!

from mlxtend.feature_selection import SequentialFeatureSelector as SFS
from mlxtend.feature_selection import SequentialFeatureSelector as SFS

In [133]:

Copied!





reg = LinearRegression()

# Build step forward feature selection
sfs = SFS(
    reg,
    k_features=x_train.shape[1],
    forward=True,  # k_features denotes "Number of features to select"
    floating=False,
    scoring="r2",
    n_jobs=-1,
    verbose=2,
    cv=5,
)

# Perform SFFS
sfs = sfs.fit(x_train, y_train)
reg = LinearRegression()

# Build step forward feature selection
sfs = SFS(
    reg,
    k_features=x_train.shape[1],
    forward=True,  # k_features denotes "Number of features to select"
    floating=False,
    scoring="r2",
    n_jobs=-1,
    verbose=2,
    cv=5,
)

# Perform SFFS
sfs = sfs.fit(x_train, y_train)

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    3.8s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    7.5s
[Parallel(n_jobs=-1)]: Done 274 out of 274 | elapsed:   12.2s finished

[2024-10-30 15:22:54] Features: 1/274 -- score: 0.6008605735778773[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.3s
[Parallel(n_jobs=-1)]: Done 228 tasks      | elapsed:    4.8s
[Parallel(n_jobs=-1)]: Done 242 out of 273 | elapsed:    5.0s remaining:    0.6s
[Parallel(n_jobs=-1)]: Done 273 out of 273 | elapsed:    5.9s finished

[2024-10-30 15:23:00] Features: 2/274 -- score: 0.8258189338417044[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.3s
[Parallel(n_jobs=-1)]: Done 272 out of 272 | elapsed:    5.9s finished

[2024-10-30 15:23:06] Features: 3/274 -- score: 0.843163289844291[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.3s
[Parallel(n_jobs=-1)]: Done 240 out of 271 | elapsed:    5.0s remaining:    0.6s
[Parallel(n_jobs=-1)]: Done 271 out of 271 | elapsed:    5.8s finished

[2024-10-30 15:23:12] Features: 4/274 -- score: 0.8607242290453196[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.3s
[Parallel(n_jobs=-1)]: Done 270 out of 270 | elapsed:    5.7s finished

[2024-10-30 15:23:17] Features: 5/274 -- score: 0.8673527352506463[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.3s
[Parallel(n_jobs=-1)]: Done 238 out of 269 | elapsed:    5.0s remaining:    0.6s
[Parallel(n_jobs=-1)]: Done 269 out of 269 | elapsed:    5.8s finished

[2024-10-30 15:23:23] Features: 6/274 -- score: 0.8733406729210295[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.3s
[Parallel(n_jobs=-1)]: Done 268 out of 268 | elapsed:    5.8s finished

[2024-10-30 15:23:29] Features: 7/274 -- score: 0.8779019854352988[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.3s
[Parallel(n_jobs=-1)]: Done 236 out of 267 | elapsed:    5.3s remaining:    0.6s
[Parallel(n_jobs=-1)]: Done 267 out of 267 | elapsed:    6.0s finished

[2024-10-30 15:23:35] Features: 8/274 -- score: 0.8821076936222507[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.3s
[Parallel(n_jobs=-1)]: Done 266 out of 266 | elapsed:    5.9s finished

[2024-10-30 15:23:41] Features: 9/274 -- score: 0.8865803385052942[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.4s
[Parallel(n_jobs=-1)]: Done 234 out of 265 | elapsed:    5.4s remaining:    0.6s
[Parallel(n_jobs=-1)]: Done 265 out of 265 | elapsed:    6.1s finished

[2024-10-30 15:23:48] Features: 10/274 -- score: 0.892136862625631[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.3s
[Parallel(n_jobs=-1)]: Done 264 out of 264 | elapsed:    5.9s finished

[2024-10-30 15:23:53] Features: 11/274 -- score: 0.8997588358866588[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.3s
[Parallel(n_jobs=-1)]: Done 232 out of 263 | elapsed:    5.6s remaining:    0.7s
[Parallel(n_jobs=-1)]: Done 263 out of 263 | elapsed:    6.3s finished

[2024-10-30 15:24:00] Features: 12/274 -- score: 0.9045279013258785[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.3s
[Parallel(n_jobs=-1)]: Done 262 out of 262 | elapsed:    5.8s finished

[2024-10-30 15:24:06] Features: 13/274 -- score: 0.9077032400393492[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.3s
[Parallel(n_jobs=-1)]: Done 230 out of 261 | elapsed:    5.4s remaining:    0.6s
[Parallel(n_jobs=-1)]: Done 261 out of 261 | elapsed:    6.1s finished

[2024-10-30 15:24:12] Features: 14/274 -- score: 0.9113184768449554[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.4s
[Parallel(n_jobs=-1)]: Done 260 out of 260 | elapsed:    5.9s finished

[2024-10-30 15:24:18] Features: 15/274 -- score: 0.9144300582982374[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.5s
[Parallel(n_jobs=-1)]: Done 228 out of 259 | elapsed:    6.2s remaining:    0.8s
[Parallel(n_jobs=-1)]: Done 259 out of 259 | elapsed:    6.9s finished

[2024-10-30 15:24:25] Features: 16/274 -- score: 0.9165568997201585[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.3s
[Parallel(n_jobs=-1)]: Done 178 tasks      | elapsed:    6.6s
[Parallel(n_jobs=-1)]: Done 258 out of 258 | elapsed:   10.4s finished

[2024-10-30 15:24:36] Features: 17/274 -- score: 0.9187094547000717[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.4s
[Parallel(n_jobs=-1)]: Done 226 out of 257 | elapsed:    5.5s remaining:    0.7s
[Parallel(n_jobs=-1)]: Done 257 out of 257 | elapsed:    6.1s finished

[2024-10-30 15:24:42] Features: 18/274 -- score: 0.9210414399636491[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.4s
[Parallel(n_jobs=-1)]: Done 256 out of 256 | elapsed:    7.3s finished

[2024-10-30 15:24:49] Features: 19/274 -- score: 0.9231785068400782[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.4s
[Parallel(n_jobs=-1)]: Done 255 out of 255 | elapsed:    7.5s finished

[2024-10-30 15:24:57] Features: 20/274 -- score: 0.9244145086624963[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.4s
[Parallel(n_jobs=-1)]: Done 223 out of 254 | elapsed:    5.9s remaining:    0.7s
[Parallel(n_jobs=-1)]: Done 254 out of 254 | elapsed:    7.0s finished

[2024-10-30 15:25:04] Features: 21/274 -- score: 0.9254483170392269[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.4s
[Parallel(n_jobs=-1)]: Done 222 out of 253 | elapsed:    5.8s remaining:    0.7s
[Parallel(n_jobs=-1)]: Done 253 out of 253 | elapsed:    7.1s finished

[2024-10-30 15:25:11] Features: 22/274 -- score: 0.9264841941808273[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.4s
[Parallel(n_jobs=-1)]: Done 252 out of 252 | elapsed:    7.6s finished

[2024-10-30 15:25:19] Features: 23/274 -- score: 0.9274758369192984[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.5s
[Parallel(n_jobs=-1)]: Done 220 out of 251 | elapsed:    6.0s remaining:    0.8s
[Parallel(n_jobs=-1)]: Done 251 out of 251 | elapsed:    7.1s finished

[2024-10-30 15:25:26] Features: 24/274 -- score: 0.92844212796468[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.4s
[Parallel(n_jobs=-1)]: Done 250 out of 250 | elapsed:    6.7s finished

[2024-10-30 15:25:33] Features: 25/274 -- score: 0.9292203842611386[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.4s
[Parallel(n_jobs=-1)]: Done 218 out of 249 | elapsed:    5.9s remaining:    0.8s
[Parallel(n_jobs=-1)]: Done 249 out of 249 | elapsed:    7.0s finished

[2024-10-30 15:25:40] Features: 26/274 -- score: 0.9298510047495391[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.4s
[Parallel(n_jobs=-1)]: Done 248 out of 248 | elapsed:    7.3s finished

[2024-10-30 15:25:47] Features: 27/274 -- score: 0.9304019840531833[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.5s
[Parallel(n_jobs=-1)]: Done 216 out of 247 | elapsed:    6.1s remaining:    0.8s
[Parallel(n_jobs=-1)]: Done 247 out of 247 | elapsed:    7.1s finished

[2024-10-30 15:25:55] Features: 28/274 -- score: 0.9309305347004898[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.5s
[Parallel(n_jobs=-1)]: Done 246 out of 246 | elapsed:    7.0s finished

[2024-10-30 15:26:02] Features: 29/274 -- score: 0.931458673523718[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.4s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    5.6s
[Parallel(n_jobs=-1)]: Done 245 out of 245 | elapsed:   10.5s finished

[2024-10-30 15:26:12] Features: 30/274 -- score: 0.9319344324038337[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.4s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    6.7s
[Parallel(n_jobs=-1)]: Done 244 out of 244 | elapsed:   11.4s finished

[2024-10-30 15:26:24] Features: 31/274 -- score: 0.9324067034952849[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.5s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    5.6s
[Parallel(n_jobs=-1)]: Done 243 out of 243 | elapsed:   10.8s finished

[2024-10-30 15:26:35] Features: 32/274 -- score: 0.9328337130107986[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.5s
[Parallel(n_jobs=-1)]: Done 242 out of 242 | elapsed:    7.5s finished

[2024-10-30 15:26:42] Features: 33/274 -- score: 0.9333355895459066[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.4s
[Parallel(n_jobs=-1)]: Done 210 out of 241 | elapsed:    6.4s remaining:    0.9s
[Parallel(n_jobs=-1)]: Done 241 out of 241 | elapsed:    7.5s finished

[2024-10-30 15:26:50] Features: 34/274 -- score: 0.9338466167427188[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.4s
[Parallel(n_jobs=-1)]: Done 240 out of 240 | elapsed:    7.3s finished

[2024-10-30 15:26:57] Features: 35/274 -- score: 0.93427063728439[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.5s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    5.8s
[Parallel(n_jobs=-1)]: Done 239 out of 239 | elapsed:   10.7s finished

[2024-10-30 15:27:08] Features: 36/274 -- score: 0.9346969157079771[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.5s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    6.4s
[Parallel(n_jobs=-1)]: Done 238 out of 238 | elapsed:   11.5s finished

[2024-10-30 15:27:20] Features: 37/274 -- score: 0.9351182568635021[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.5s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    6.1s
[Parallel(n_jobs=-1)]: Done 237 out of 237 | elapsed:   10.7s finished

[2024-10-30 15:27:31] Features: 38/274 -- score: 0.9355593302089608[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.5s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    6.3s
[Parallel(n_jobs=-1)]: Done 236 out of 236 | elapsed:   11.5s finished

[2024-10-30 15:27:42] Features: 39/274 -- score: 0.9359783078777568[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.5s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    5.5s
[Parallel(n_jobs=-1)]: Done 235 out of 235 | elapsed:    9.7s finished

[2024-10-30 15:27:52] Features: 40/274 -- score: 0.9363600521469563[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.4s
[Parallel(n_jobs=-1)]: Done 234 out of 234 | elapsed:    6.7s finished

[2024-10-30 15:27:59] Features: 41/274 -- score: 0.9367483734642255[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.4s
[Parallel(n_jobs=-1)]: Done 202 out of 233 | elapsed:    5.9s remaining:    0.8s
[Parallel(n_jobs=-1)]: Done 233 out of 233 | elapsed:    6.7s finished

[2024-10-30 15:28:06] Features: 42/274 -- score: 0.9370884676559399[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.5s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    5.5s
[Parallel(n_jobs=-1)]: Done 232 out of 232 | elapsed:    9.7s finished

[2024-10-30 15:28:15] Features: 43/274 -- score: 0.9374125821578387[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.4s
[Parallel(n_jobs=-1)]: Done 200 out of 231 | elapsed:    5.9s remaining:    0.8s
[Parallel(n_jobs=-1)]: Done 231 out of 231 | elapsed:    6.7s finished

[2024-10-30 15:28:22] Features: 44/274 -- score: 0.9377590488212355[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.5s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    5.6s
[Parallel(n_jobs=-1)]: Done 230 out of 230 | elapsed:    9.7s finished

[2024-10-30 15:28:32] Features: 45/274 -- score: 0.9380620809895246[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.5s
[Parallel(n_jobs=-1)]: Done 198 out of 229 | elapsed:    6.1s remaining:    0.9s
[Parallel(n_jobs=-1)]: Done 229 out of 229 | elapsed:    6.8s finished

[2024-10-30 15:28:39] Features: 46/274 -- score: 0.9383606770154213[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.5s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    5.7s
[Parallel(n_jobs=-1)]: Done 228 out of 228 | elapsed:    9.8s finished

[2024-10-30 15:28:49] Features: 47/274 -- score: 0.9386434676769945[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.5s
[Parallel(n_jobs=-1)]: Done 196 out of 227 | elapsed:    6.2s remaining:    0.9s
[Parallel(n_jobs=-1)]: Done 227 out of 227 | elapsed:    6.8s finished

[2024-10-30 15:28:56] Features: 48/274 -- score: 0.938917220224309[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.5s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    5.7s
[Parallel(n_jobs=-1)]: Done 226 out of 226 | elapsed:    9.7s finished

[2024-10-30 15:29:06] Features: 49/274 -- score: 0.9391992842156454[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.5s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    5.9s
[Parallel(n_jobs=-1)]: Done 225 out of 225 | elapsed:   10.6s finished

[2024-10-30 15:29:16] Features: 50/274 -- score: 0.9394813780691168[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.6s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    6.8s
[Parallel(n_jobs=-1)]: Done 224 out of 224 | elapsed:   11.7s finished

[2024-10-30 15:29:28] Features: 51/274 -- score: 0.9397591483128049[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.7s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    6.6s
[Parallel(n_jobs=-1)]: Done 223 out of 223 | elapsed:   11.0s finished

[2024-10-30 15:29:39] Features: 52/274 -- score: 0.9400251925243157[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.6s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    6.7s
[Parallel(n_jobs=-1)]: Done 222 out of 222 | elapsed:   11.1s finished

[2024-10-30 15:29:50] Features: 53/274 -- score: 0.9403015910400038[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.5s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    6.1s
[Parallel(n_jobs=-1)]: Done 221 out of 221 | elapsed:   10.3s finished

[2024-10-30 15:30:01] Features: 54/274 -- score: 0.9405794121248693[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.6s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    6.4s
[Parallel(n_jobs=-1)]: Done 220 out of 220 | elapsed:   10.9s finished

[2024-10-30 15:30:12] Features: 55/274 -- score: 0.9408523283296407[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.6s
[Parallel(n_jobs=-1)]: Done 178 tasks      | elapsed:    7.3s
[Parallel(n_jobs=-1)]: Done 219 out of 219 | elapsed:    9.1s finished

[2024-10-30 15:30:21] Features: 56/274 -- score: 0.9411047750951868[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.5s
[Parallel(n_jobs=-1)]: Done 218 out of 218 | elapsed:    8.3s finished

[2024-10-30 15:30:30] Features: 57/274 -- score: 0.9413510911204437[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.5s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    6.1s
[Parallel(n_jobs=-1)]: Done 217 out of 217 | elapsed:   10.3s finished

[2024-10-30 15:30:40] Features: 58/274 -- score: 0.9415890736708146[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.5s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    6.3s
[Parallel(n_jobs=-1)]: Done 216 out of 216 | elapsed:   10.5s finished

[2024-10-30 15:30:51] Features: 59/274 -- score: 0.9418027576675003[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.6s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    6.7s
[Parallel(n_jobs=-1)]: Done 215 out of 215 | elapsed:   10.9s finished

[2024-10-30 15:31:02] Features: 60/274 -- score: 0.9419907820263911[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.6s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    6.7s
[Parallel(n_jobs=-1)]: Done 214 out of 214 | elapsed:   10.8s finished

[2024-10-30 15:31:12] Features: 61/274 -- score: 0.942173275773842[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.6s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    6.7s
[Parallel(n_jobs=-1)]: Done 213 out of 213 | elapsed:   10.9s finished

[2024-10-30 15:31:23] Features: 62/274 -- score: 0.942356160410377[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    8.1s
[Parallel(n_jobs=-1)]: Done 212 out of 212 | elapsed:   12.8s finished

[2024-10-30 15:31:36] Features: 63/274 -- score: 0.9425128317572566[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.6s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    6.9s
[Parallel(n_jobs=-1)]: Done 211 out of 211 | elapsed:   11.4s finished

[2024-10-30 15:31:48] Features: 64/274 -- score: 0.94266687641947[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.6s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    6.8s
[Parallel(n_jobs=-1)]: Done 210 out of 210 | elapsed:   11.0s finished

[2024-10-30 15:31:59] Features: 65/274 -- score: 0.9428146211526883[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.7s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    7.5s
[Parallel(n_jobs=-1)]: Done 209 out of 209 | elapsed:   11.6s finished

[2024-10-30 15:32:11] Features: 66/274 -- score: 0.9429493707252343[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.7s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    7.1s
[Parallel(n_jobs=-1)]: Done 208 out of 208 | elapsed:   11.0s finished

[2024-10-30 15:32:22] Features: 67/274 -- score: 0.9430837362751194[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.6s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    6.9s
[Parallel(n_jobs=-1)]: Done 207 out of 207 | elapsed:   10.6s finished

[2024-10-30 15:32:33] Features: 68/274 -- score: 0.9432136432224116[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.6s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    6.8s
[Parallel(n_jobs=-1)]: Done 206 out of 206 | elapsed:   10.5s finished

[2024-10-30 15:32:43] Features: 69/274 -- score: 0.943344306484075[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.6s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    6.5s
[Parallel(n_jobs=-1)]: Done 205 out of 205 | elapsed:   10.2s finished

[2024-10-30 15:32:53] Features: 70/274 -- score: 0.9434634052333731[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.5s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    7.1s
[Parallel(n_jobs=-1)]: Done 204 out of 204 | elapsed:   11.2s finished

[2024-10-30 15:33:05] Features: 71/274 -- score: 0.943577067507883[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.7s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    7.3s
[Parallel(n_jobs=-1)]: Done 203 out of 203 | elapsed:   11.1s finished

[2024-10-30 15:33:16] Features: 72/274 -- score: 0.9436891888008686[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.6s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    6.7s
[Parallel(n_jobs=-1)]: Done 202 out of 202 | elapsed:   10.3s finished

[2024-10-30 15:33:26] Features: 73/274 -- score: 0.9437981539239481[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.6s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    6.6s
[Parallel(n_jobs=-1)]: Done 201 out of 201 | elapsed:   10.1s finished

[2024-10-30 15:33:37] Features: 74/274 -- score: 0.9439100954531178[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.6s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    6.8s
[Parallel(n_jobs=-1)]: Done 200 out of 200 | elapsed:   10.2s finished

[2024-10-30 15:33:47] Features: 75/274 -- score: 0.9440163805264301[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.7s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    6.7s
[Parallel(n_jobs=-1)]: Done 199 out of 199 | elapsed:   10.6s finished

[2024-10-30 15:33:58] Features: 76/274 -- score: 0.9441303003262262[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.6s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    6.9s
[Parallel(n_jobs=-1)]: Done 198 out of 198 | elapsed:   10.3s finished

[2024-10-30 15:34:08] Features: 77/274 -- score: 0.9442572101220648[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.6s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    6.9s
[Parallel(n_jobs=-1)]: Done 197 out of 197 | elapsed:   10.3s finished

[2024-10-30 15:34:18] Features: 78/274 -- score: 0.9443594920438331[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.6s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    6.9s
[Parallel(n_jobs=-1)]: Done 196 out of 196 | elapsed:   10.3s finished

[2024-10-30 15:34:29] Features: 79/274 -- score: 0.9444757561496895[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.6s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    6.9s
[Parallel(n_jobs=-1)]: Done 195 out of 195 | elapsed:   10.3s finished

[2024-10-30 15:34:39] Features: 80/274 -- score: 0.9445896136937307[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.7s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    7.1s
[Parallel(n_jobs=-1)]: Done 194 out of 194 | elapsed:   10.4s finished

[2024-10-30 15:34:50] Features: 81/274 -- score: 0.9447056871041397[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.7s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    7.2s
[Parallel(n_jobs=-1)]: Done 193 out of 193 | elapsed:   10.4s finished

[2024-10-30 15:35:00] Features: 82/274 -- score: 0.9448220672453754[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.6s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    7.1s
[Parallel(n_jobs=-1)]: Done 192 out of 192 | elapsed:   10.4s finished

[2024-10-30 15:35:11] Features: 83/274 -- score: 0.94492005052404[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.7s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    7.3s
[Parallel(n_jobs=-1)]: Done 191 out of 191 | elapsed:   10.5s finished

[2024-10-30 15:35:21] Features: 84/274 -- score: 0.9450094008269196[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.7s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    7.2s
[Parallel(n_jobs=-1)]: Done 190 out of 190 | elapsed:   10.3s finished

[2024-10-30 15:35:32] Features: 85/274 -- score: 0.9450977522742992[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.7s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    7.8s
[Parallel(n_jobs=-1)]: Done 189 out of 189 | elapsed:   10.9s finished

[2024-10-30 15:35:43] Features: 86/274 -- score: 0.945183112223553[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.7s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    7.3s
[Parallel(n_jobs=-1)]: Done 188 out of 188 | elapsed:   10.4s finished

[2024-10-30 15:35:53] Features: 87/274 -- score: 0.9452683784764198[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.7s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    7.2s
[Parallel(n_jobs=-1)]: Done 187 out of 187 | elapsed:   10.1s finished

[2024-10-30 15:36:03] Features: 88/274 -- score: 0.9453481246814794[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.7s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    7.5s
[Parallel(n_jobs=-1)]: Done 186 out of 186 | elapsed:   10.5s finished

[2024-10-30 15:36:14] Features: 89/274 -- score: 0.9454213386716412[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.7s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    7.5s
[Parallel(n_jobs=-1)]: Done 185 out of 185 | elapsed:   10.4s finished

[2024-10-30 15:36:24] Features: 90/274 -- score: 0.9454975389541538[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.7s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    7.4s
[Parallel(n_jobs=-1)]: Done 184 out of 184 | elapsed:   10.3s finished

[2024-10-30 15:36:35] Features: 91/274 -- score: 0.9455714413525443[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.7s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    7.6s
[Parallel(n_jobs=-1)]: Done 183 out of 183 | elapsed:   10.4s finished

[2024-10-30 15:36:45] Features: 92/274 -- score: 0.9456308638597681[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.7s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    7.6s
[Parallel(n_jobs=-1)]: Done 182 out of 182 | elapsed:   10.7s finished

[2024-10-30 15:36:56] Features: 93/274 -- score: 0.9456890609297093[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.8s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    7.6s
[Parallel(n_jobs=-1)]: Done 181 out of 181 | elapsed:   10.3s finished

[2024-10-30 15:37:07] Features: 94/274 -- score: 0.9457530047073327[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.7s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    7.6s
[Parallel(n_jobs=-1)]: Done 180 out of 180 | elapsed:   10.4s finished

[2024-10-30 15:37:17] Features: 95/274 -- score: 0.9458156688654359[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    7.8s
[Parallel(n_jobs=-1)]: Done 179 out of 179 | elapsed:   10.5s finished

[2024-10-30 15:37:28] Features: 96/274 -- score: 0.9458783878178758[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.8s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    7.6s
[Parallel(n_jobs=-1)]: Done 178 out of 178 | elapsed:   10.3s finished

[2024-10-30 15:37:38] Features: 97/274 -- score: 0.9459372840659993[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.8s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    8.3s
[Parallel(n_jobs=-1)]: Done 177 out of 177 | elapsed:   11.1s finished

[2024-10-30 15:37:49] Features: 98/274 -- score: 0.9459940673854778[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.8s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    7.5s
[Parallel(n_jobs=-1)]: Done 176 out of 176 | elapsed:    9.9s finished

[2024-10-30 15:37:59] Features: 99/274 -- score: 0.9460492720394387[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.7s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    7.4s
[Parallel(n_jobs=-1)]: Done 175 out of 175 | elapsed:    9.8s finished

[2024-10-30 15:38:09] Features: 100/274 -- score: 0.9461033266642728[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.7s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    7.5s
[Parallel(n_jobs=-1)]: Done 174 out of 174 | elapsed:    9.8s finished

[2024-10-30 15:38:19] Features: 101/274 -- score: 0.9461515907857343[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.7s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    7.8s
[Parallel(n_jobs=-1)]: Done 173 out of 173 | elapsed:   10.0s finished

[2024-10-30 15:38:29] Features: 102/274 -- score: 0.9462131431251473[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.7s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    7.5s
[Parallel(n_jobs=-1)]: Done 172 out of 172 | elapsed:    9.8s finished

[2024-10-30 15:38:39] Features: 103/274 -- score: 0.9462549214668187[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.7s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    7.6s
[Parallel(n_jobs=-1)]: Done 171 out of 171 | elapsed:    9.8s finished

[2024-10-30 15:38:49] Features: 104/274 -- score: 0.9462972708555017[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.8s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    7.9s
[Parallel(n_jobs=-1)]: Done 170 out of 170 | elapsed:   10.0s finished

[2024-10-30 15:38:59] Features: 105/274 -- score: 0.9463400739021506[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.7s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    7.8s
[Parallel(n_jobs=-1)]: Done 169 out of 169 | elapsed:    9.9s finished

[2024-10-30 15:39:09] Features: 106/274 -- score: 0.9463834060610402[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.8s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    7.8s
[Parallel(n_jobs=-1)]: Done 168 out of 168 | elapsed:    9.9s finished

[2024-10-30 15:39:19] Features: 107/274 -- score: 0.9464307667944067[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.7s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    7.7s
[Parallel(n_jobs=-1)]: Done 167 out of 167 | elapsed:    9.7s finished

[2024-10-30 15:39:29] Features: 108/274 -- score: 0.9464850186244597[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.8s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    7.8s
[Parallel(n_jobs=-1)]: Done 166 out of 166 | elapsed:    9.8s finished

[2024-10-30 15:39:39] Features: 109/274 -- score: 0.9465277954018362[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.7s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    8.0s
[Parallel(n_jobs=-1)]: Done 165 out of 165 | elapsed:    9.9s finished

[2024-10-30 15:39:49] Features: 110/274 -- score: 0.9465667535032496[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.8s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    8.2s
[Parallel(n_jobs=-1)]: Done 164 out of 164 | elapsed:   10.2s finished

[2024-10-30 15:39:59] Features: 111/274 -- score: 0.9466030402172381[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.8s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    7.9s
[Parallel(n_jobs=-1)]: Done 163 out of 163 | elapsed:    9.8s finished

[2024-10-30 15:40:09] Features: 112/274 -- score: 0.9466365074034908[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.8s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    8.2s
[Parallel(n_jobs=-1)]: Done 162 out of 162 | elapsed:   10.1s finished

[2024-10-30 15:40:19] Features: 113/274 -- score: 0.9466699567782252[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.8s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    8.3s
[Parallel(n_jobs=-1)]: Done 161 out of 161 | elapsed:   10.0s finished

[2024-10-30 15:40:29] Features: 114/274 -- score: 0.9467020565119888[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.8s
[Parallel(n_jobs=-1)]: Done 160 out of 160 | elapsed:    9.9s finished

[2024-10-30 15:40:39] Features: 115/274 -- score: 0.9467339018748081[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.8s
[Parallel(n_jobs=-1)]: Done 159 out of 159 | elapsed:    9.6s finished

[2024-10-30 15:40:49] Features: 116/274 -- score: 0.9467653933027028[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.8s
[Parallel(n_jobs=-1)]: Done 158 out of 158 | elapsed:   10.0s finished

[2024-10-30 15:40:59] Features: 117/274 -- score: 0.9467921828397865[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 157 out of 157 | elapsed:    9.8s finished

[2024-10-30 15:41:09] Features: 118/274 -- score: 0.9468260457095262[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 156 out of 156 | elapsed:   10.0s finished

[2024-10-30 15:41:19] Features: 119/274 -- score: 0.94685019984123[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 155 out of 155 | elapsed:    9.6s finished

[2024-10-30 15:41:29] Features: 120/274 -- score: 0.9468726221227485[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.8s
[Parallel(n_jobs=-1)]: Done 154 out of 154 | elapsed:    9.8s finished

[2024-10-30 15:41:39] Features: 121/274 -- score: 0.9468937940854705[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 153 out of 153 | elapsed:    9.8s finished

[2024-10-30 15:41:49] Features: 122/274 -- score: 0.946912792717584[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 152 out of 152 | elapsed:   10.0s finished

[2024-10-30 15:41:59] Features: 123/274 -- score: 0.9469287907503101[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 151 out of 151 | elapsed:    9.7s finished

[2024-10-30 15:42:09] Features: 124/274 -- score: 0.9469436039366472[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.8s
[Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed:   11.1s finished

[2024-10-30 15:42:20] Features: 125/274 -- score: 0.9469570319894081[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 149 out of 149 | elapsed:   11.1s finished

[2024-10-30 15:42:31] Features: 126/274 -- score: 0.9469679475395798[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 148 out of 148 | elapsed:   10.8s finished

[2024-10-30 15:42:42] Features: 127/274 -- score: 0.946978064208601[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 147 out of 147 | elapsed:   10.7s finished

[2024-10-30 15:42:53] Features: 128/274 -- score: 0.9469880479693635[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 146 out of 146 | elapsed:   10.9s finished

[2024-10-30 15:43:04] Features: 129/274 -- score: 0.9469982149421234[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 145 out of 145 | elapsed:   10.6s finished

[2024-10-30 15:43:15] Features: 130/274 -- score: 0.9470063220552042[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 144 out of 144 | elapsed:   10.5s finished

[2024-10-30 15:43:25] Features: 131/274 -- score: 0.9470185501560527[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done 143 out of 143 | elapsed:   10.7s finished

[2024-10-30 15:43:36] Features: 132/274 -- score: 0.9470565490628251[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 142 out of 142 | elapsed:   10.4s finished

[2024-10-30 15:43:46] Features: 133/274 -- score: 0.9470944712657312[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 141 out of 141 | elapsed:   10.0s finished

[2024-10-30 15:43:57] Features: 134/274 -- score: 0.947173384136591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 140 out of 140 | elapsed:   10.1s finished

[2024-10-30 15:44:07] Features: 135/274 -- score: 0.9472745279064239[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 139 out of 139 | elapsed:    9.8s finished

[2024-10-30 15:44:17] Features: 136/274 -- score: 0.9473021761095183[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 138 out of 138 | elapsed:    9.8s finished

[2024-10-30 15:44:27] Features: 137/274 -- score: 0.947330640994581[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 137 out of 137 | elapsed:    9.8s finished

[2024-10-30 15:44:36] Features: 138/274 -- score: 0.9473590626880469[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 136 out of 136 | elapsed:    9.8s finished

[2024-10-30 15:44:46] Features: 139/274 -- score: 0.94737424598568[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 135 out of 135 | elapsed:    9.6s finished

[2024-10-30 15:44:56] Features: 140/274 -- score: 0.9473837932381318[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 134 out of 134 | elapsed:    9.9s finished

[2024-10-30 15:45:06] Features: 141/274 -- score: 0.9473929396082703[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 133 out of 133 | elapsed:    9.6s finished

[2024-10-30 15:45:16] Features: 142/274 -- score: 0.9474012770826576[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 132 out of 132 | elapsed:    9.7s finished

[2024-10-30 15:45:25] Features: 143/274 -- score: 0.9474061962769336[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 131 out of 131 | elapsed:    9.8s finished

[2024-10-30 15:45:35] Features: 144/274 -- score: 0.9474113416791115[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 130 out of 130 | elapsed:    9.5s finished

[2024-10-30 15:45:45] Features: 145/274 -- score: 0.9474160621882547[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 129 out of 129 | elapsed:    9.7s finished

[2024-10-30 15:45:55] Features: 146/274 -- score: 0.9474199201898766[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 128 out of 128 | elapsed:    9.6s finished

[2024-10-30 15:46:05] Features: 147/274 -- score: 0.9474233272155509[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 127 out of 127 | elapsed:    9.4s finished

[2024-10-30 15:46:14] Features: 148/274 -- score: 0.9474296351823636[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 126 out of 126 | elapsed:    9.5s finished

[2024-10-30 15:46:24] Features: 149/274 -- score: 0.9474395540206642[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 125 out of 125 | elapsed:    9.5s finished

[2024-10-30 15:46:33] Features: 150/274 -- score: 0.9474429828093704[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 124 out of 124 | elapsed:    9.4s finished

[2024-10-30 15:46:43] Features: 151/274 -- score: 0.9474465016426885[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 123 out of 123 | elapsed:    9.5s finished

[2024-10-30 15:46:52] Features: 152/274 -- score: 0.9474495032412944[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.3s
[Parallel(n_jobs=-1)]: Done 122 out of 122 | elapsed:   10.6s finished

[2024-10-30 15:47:03] Features: 153/274 -- score: 0.9474519156543872[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done 121 out of 121 | elapsed:    9.7s finished

[2024-10-30 15:47:13] Features: 154/274 -- score: 0.9474660378832593[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done 120 out of 120 | elapsed:    9.6s finished

[2024-10-30 15:47:23] Features: 155/274 -- score: 0.9474905332865735[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done 119 out of 119 | elapsed:    9.3s finished

[2024-10-30 15:47:32] Features: 156/274 -- score: 0.9475096523448945[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done 118 out of 118 | elapsed:    9.3s finished

[2024-10-30 15:47:41] Features: 157/274 -- score: 0.947511307748069[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 117 out of 117 | elapsed:    9.4s finished

[2024-10-30 15:47:51] Features: 158/274 -- score: 0.9475128007899316[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done 116 out of 116 | elapsed:    9.4s finished

[2024-10-30 15:48:00] Features: 159/274 -- score: 0.947514045113268[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 115 out of 115 | elapsed:    9.1s finished

[2024-10-30 15:48:10] Features: 160/274 -- score: 0.9475152239346908[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done 114 out of 114 | elapsed:    9.4s finished

[2024-10-30 15:48:19] Features: 161/274 -- score: 0.9475163807873658[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done 113 out of 113 | elapsed:    9.8s finished

[2024-10-30 15:48:29] Features: 162/274 -- score: 0.9475173605240507[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done 112 out of 112 | elapsed:   10.1s finished

[2024-10-30 15:48:39] Features: 163/274 -- score: 0.9475182819483899[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 111 out of 111 | elapsed:    9.1s finished

[2024-10-30 15:48:48] Features: 164/274 -- score: 0.9475196186641346[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 110 out of 110 | elapsed:    8.9s finished

[2024-10-30 15:48:57] Features: 165/274 -- score: 0.947559673168465[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 109 out of 109 | elapsed:    8.8s finished

[2024-10-30 15:49:06] Features: 166/274 -- score: 0.947613687662578[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 108 out of 108 | elapsed:    9.0s finished

[2024-10-30 15:49:15] Features: 167/274 -- score: 0.9476635282128644[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 107 out of 107 | elapsed:    8.4s finished

[2024-10-30 15:49:24] Features: 168/274 -- score: 0.947702904101134[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 106 out of 106 | elapsed:    8.8s finished

[2024-10-30 15:49:33] Features: 169/274 -- score: 0.9477199291545982[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done 105 out of 105 | elapsed:    8.8s finished

[2024-10-30 15:49:42] Features: 170/274 -- score: 0.9477278669523537[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.3s
[Parallel(n_jobs=-1)]: Done 104 out of 104 | elapsed:    8.9s finished

[2024-10-30 15:49:51] Features: 171/274 -- score: 0.9477322562290738[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 103 out of 103 | elapsed:    8.6s finished

[2024-10-30 15:49:59] Features: 172/274 -- score: 0.9477346568857261[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 102 out of 102 | elapsed:    8.4s finished

[2024-10-30 15:50:08] Features: 173/274 -- score: 0.9477348473305426[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done 101 out of 101 | elapsed:    8.8s finished

[2024-10-30 15:50:17] Features: 174/274 -- score: 0.9477348473305568[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done 100 out of 100 | elapsed:    8.7s finished

[2024-10-30 15:50:26] Features: 175/274 -- score: 0.9477348473305618[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.3s
[Parallel(n_jobs=-1)]: Done  99 out of  99 | elapsed:    8.8s finished

[2024-10-30 15:50:35] Features: 176/274 -- score: 0.9477348473305675[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.3s
[Parallel(n_jobs=-1)]: Done  98 out of  98 | elapsed:    8.9s finished

[2024-10-30 15:50:44] Features: 177/274 -- score: 0.9477348473305586[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done  97 out of  97 | elapsed:    8.5s finished

[2024-10-30 15:50:52] Features: 178/274 -- score: 0.9477348473305609[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done  96 out of  96 | elapsed:    8.5s finished

[2024-10-30 15:51:01] Features: 179/274 -- score: 0.9477348473305595[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.3s
[Parallel(n_jobs=-1)]: Done  95 out of  95 | elapsed:    8.2s finished

[2024-10-30 15:51:09] Features: 180/274 -- score: 0.9477348473305647[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done  94 out of  94 | elapsed:    8.3s finished

[2024-10-30 15:51:17] Features: 181/274 -- score: 0.9477348473305642[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.3s
[Parallel(n_jobs=-1)]: Done  93 out of  93 | elapsed:    8.3s finished

[2024-10-30 15:51:26] Features: 182/274 -- score: 0.9477348473305577[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.3s
[Parallel(n_jobs=-1)]: Done  92 out of  92 | elapsed:    8.1s finished

[2024-10-30 15:51:34] Features: 183/274 -- score: 0.9477348473305609[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done  91 out of  91 | elapsed:    7.8s finished

[2024-10-30 15:51:42] Features: 184/274 -- score: 0.9479327591304928[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.3s
[Parallel(n_jobs=-1)]: Done  90 out of  90 | elapsed:    8.2s finished

[2024-10-30 15:51:50] Features: 185/274 -- score: 0.9477348473305778[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.3s
[Parallel(n_jobs=-1)]: Done  89 out of  89 | elapsed:    7.9s finished

[2024-10-30 15:51:58] Features: 186/274 -- score: 0.9477348473305677[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.3s
[Parallel(n_jobs=-1)]: Done  88 out of  88 | elapsed:    7.9s finished

[2024-10-30 15:52:06] Features: 187/274 -- score: 0.9477348473305719[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done  87 out of  87 | elapsed:    7.8s finished

[2024-10-30 15:52:14] Features: 188/274 -- score: 0.9477348473305671[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.4s
[Parallel(n_jobs=-1)]: Done  86 out of  86 | elapsed:    8.0s finished

[2024-10-30 15:52:22] Features: 189/274 -- score: 0.94773484733056[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.4s
[Parallel(n_jobs=-1)]: Done  85 out of  85 | elapsed:    8.3s finished

[2024-10-30 15:52:31] Features: 190/274 -- score: 0.9477348473305632[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.3s
[Parallel(n_jobs=-1)]: Done  84 out of  84 | elapsed:    7.9s finished

[2024-10-30 15:52:39] Features: 191/274 -- score: 0.9477348473305647[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.3s
[Parallel(n_jobs=-1)]: Done  83 out of  83 | elapsed:    7.7s finished

[2024-10-30 15:52:46] Features: 192/274 -- score: 0.947734847330557[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.4s
[Parallel(n_jobs=-1)]: Done  82 out of  82 | elapsed:    8.1s finished

[2024-10-30 15:52:55] Features: 193/274 -- score: 0.9477348473305544[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.6s
[Parallel(n_jobs=-1)]: Done  81 out of  81 | elapsed:    8.3s finished

[2024-10-30 15:53:03] Features: 194/274 -- score: 0.9479431431644982[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.5s
[Parallel(n_jobs=-1)]: Done  80 out of  80 | elapsed:    8.0s finished

[2024-10-30 15:53:11] Features: 195/274 -- score: 0.9477174961107611[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.4s
[Parallel(n_jobs=-1)]: Done  79 out of  79 | elapsed:    7.5s finished

[2024-10-30 15:53:19] Features: 196/274 -- score: 0.9477220057713558[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.5s
[Parallel(n_jobs=-1)]: Done  78 out of  78 | elapsed:    7.5s finished

[2024-10-30 15:53:26] Features: 197/274 -- score: 0.9478021679359317[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.4s
[Parallel(n_jobs=-1)]: Done  77 out of  77 | elapsed:    7.9s finished

[2024-10-30 15:53:34] Features: 198/274 -- score: 0.9476871492756201[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.5s
[Parallel(n_jobs=-1)]: Done  76 out of  76 | elapsed:    7.7s finished

[2024-10-30 15:53:42] Features: 199/274 -- score: 0.9476871492756416[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.4s
[Parallel(n_jobs=-1)]: Done  75 out of  75 | elapsed:    7.4s finished

[2024-10-30 15:53:50] Features: 200/274 -- score: 0.947793040632469[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.4s
[Parallel(n_jobs=-1)]: Done  74 out of  74 | elapsed:    7.7s finished

[2024-10-30 15:53:58] Features: 201/274 -- score: 0.9477213973343659[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.6s
[Parallel(n_jobs=-1)]: Done  73 out of  73 | elapsed:    7.5s finished

[2024-10-30 15:54:05] Features: 202/274 -- score: 0.9476673271270835[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.4s
[Parallel(n_jobs=-1)]: Done  72 out of  72 | elapsed:    7.4s finished

[2024-10-30 15:54:13] Features: 203/274 -- score: 0.9476626655969114[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.5s
[Parallel(n_jobs=-1)]: Done  71 out of  71 | elapsed:    7.1s finished

[2024-10-30 15:54:20] Features: 204/274 -- score: 0.9476300273488697[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.6s
[Parallel(n_jobs=-1)]: Done  70 out of  70 | elapsed:    7.2s finished

[2024-10-30 15:54:27] Features: 205/274 -- score: 0.9474341605681742[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.5s
[Parallel(n_jobs=-1)]: Done  69 out of  69 | elapsed:    7.1s finished

[2024-10-30 15:54:34] Features: 206/274 -- score: 0.9474283068356218[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.5s
[Parallel(n_jobs=-1)]: Done  68 out of  68 | elapsed:    7.1s finished

[2024-10-30 15:54:42] Features: 207/274 -- score: 0.9474175553568045[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.5s
[Parallel(n_jobs=-1)]: Done  67 out of  67 | elapsed:    7.0s finished

[2024-10-30 15:54:49] Features: 208/274 -- score: 0.9474659427760701[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.6s
[Parallel(n_jobs=-1)]: Done  66 out of  66 | elapsed:    7.0s finished

[2024-10-30 15:54:56] Features: 209/274 -- score: 0.9474492362265832[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.5s
[Parallel(n_jobs=-1)]: Done  65 out of  65 | elapsed:    6.9s finished

[2024-10-30 15:55:03] Features: 210/274 -- score: 0.9472195020915131[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.5s
[Parallel(n_jobs=-1)]: Done  64 out of  64 | elapsed:    6.9s finished

[2024-10-30 15:55:10] Features: 211/274 -- score: 0.9472840750057999[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.4s
[Parallel(n_jobs=-1)]: Done  63 out of  63 | elapsed:    6.8s finished

[2024-10-30 15:55:17] Features: 212/274 -- score: 0.9470945712133542[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.7s
[Parallel(n_jobs=-1)]: Done  62 out of  62 | elapsed:    6.8s finished

[2024-10-30 15:55:24] Features: 213/274 -- score: 0.9471612135864695[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.6s
[Parallel(n_jobs=-1)]: Done  61 out of  61 | elapsed:    6.6s finished

[2024-10-30 15:55:30] Features: 214/274 -- score: 0.9469856521781331[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.6s
[Parallel(n_jobs=-1)]: Done  60 out of  60 | elapsed:    6.6s finished

[2024-10-30 15:55:37] Features: 215/274 -- score: 0.9469172110948165[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.6s
[Parallel(n_jobs=-1)]: Done  59 out of  59 | elapsed:    6.5s finished

[2024-10-30 15:55:44] Features: 216/274 -- score: 0.9468774297337254[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.6s
[Parallel(n_jobs=-1)]: Done  58 out of  58 | elapsed:    6.5s finished

[2024-10-30 15:55:50] Features: 217/274 -- score: 0.9468858969393554[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.9s
[Parallel(n_jobs=-1)]: Done  55 out of  57 | elapsed:    6.6s remaining:    0.1s
[Parallel(n_jobs=-1)]: Done  57 out of  57 | elapsed:    6.6s finished

[2024-10-30 15:55:57] Features: 218/274 -- score: 0.9468858969393794[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.8s
[Parallel(n_jobs=-1)]: Done  54 out of  56 | elapsed:    6.5s remaining:    0.1s
[Parallel(n_jobs=-1)]: Done  56 out of  56 | elapsed:    6.5s finished

[2024-10-30 15:56:04] Features: 219/274 -- score: 0.9468858969393843[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.8s
[Parallel(n_jobs=-1)]: Done  52 out of  55 | elapsed:    6.3s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done  55 out of  55 | elapsed:    6.4s finished

[2024-10-30 15:56:10] Features: 220/274 -- score: 0.9468966252072881[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.8s
[Parallel(n_jobs=-1)]: Done  51 out of  54 | elapsed:    6.0s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done  54 out of  54 | elapsed:    6.1s finished

[2024-10-30 15:56:16] Features: 221/274 -- score: 0.9468963452865928[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.7s
[Parallel(n_jobs=-1)]: Done  49 out of  53 | elapsed:    5.9s remaining:    0.4s
[Parallel(n_jobs=-1)]: Done  53 out of  53 | elapsed:    6.1s finished

[2024-10-30 15:56:22] Features: 222/274 -- score: 0.9468913460980689[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.8s
[Parallel(n_jobs=-1)]: Done  48 out of  52 | elapsed:    5.7s remaining:    0.4s
[Parallel(n_jobs=-1)]: Done  52 out of  52 | elapsed:    5.9s finished

[2024-10-30 15:56:28] Features: 223/274 -- score: 0.9469002261643519[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.7s
[Parallel(n_jobs=-1)]: Done  46 out of  51 | elapsed:    5.6s remaining:    0.5s
[Parallel(n_jobs=-1)]: Done  51 out of  51 | elapsed:    5.7s finished

[2024-10-30 15:56:34] Features: 224/274 -- score: 0.9468903226124956[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.8s
[Parallel(n_jobs=-1)]: Done  45 out of  50 | elapsed:    5.5s remaining:    0.5s
[Parallel(n_jobs=-1)]: Done  50 out of  50 | elapsed:    5.7s finished

[2024-10-30 15:56:40] Features: 225/274 -- score: 0.9468797661389831[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.7s
[Parallel(n_jobs=-1)]: Done  43 out of  49 | elapsed:    5.4s remaining:    0.7s
[Parallel(n_jobs=-1)]: Done  49 out of  49 | elapsed:    5.6s finished

[2024-10-30 15:56:46] Features: 226/274 -- score: 0.9468713588961108[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.9s
[Parallel(n_jobs=-1)]: Done  42 out of  48 | elapsed:    5.3s remaining:    0.7s
[Parallel(n_jobs=-1)]: Done  48 out of  48 | elapsed:    5.5s finished

[2024-10-30 15:56:51] Features: 227/274 -- score: 0.9468383593915652[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.8s
[Parallel(n_jobs=-1)]: Done  40 out of  47 | elapsed:    5.0s remaining:    0.8s
[Parallel(n_jobs=-1)]: Done  47 out of  47 | elapsed:    5.3s finished

[2024-10-30 15:56:57] Features: 228/274 -- score: 0.9467858747397688[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.9s
[Parallel(n_jobs=-1)]: Done  39 out of  46 | elapsed:    5.0s remaining:    0.8s
[Parallel(n_jobs=-1)]: Done  46 out of  46 | elapsed:    5.3s finished

[2024-10-30 15:57:02] Features: 229/274 -- score: 0.9467710676386449[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.9s
[Parallel(n_jobs=-1)]: Done  37 out of  45 | elapsed:    5.0s remaining:    1.0s
[Parallel(n_jobs=-1)]: Done  45 out of  45 | elapsed:    5.3s finished

[2024-10-30 15:57:08] Features: 230/274 -- score: 0.9467550149626927[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.9s
[Parallel(n_jobs=-1)]: Done  36 out of  44 | elapsed:    5.0s remaining:    1.0s
[Parallel(n_jobs=-1)]: Done  44 out of  44 | elapsed:    5.4s finished

[2024-10-30 15:57:13] Features: 231/274 -- score: 0.9467383676403364[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.9s
[Parallel(n_jobs=-1)]: Done  34 out of  43 | elapsed:    4.6s remaining:    1.1s
[Parallel(n_jobs=-1)]: Done  43 out of  43 | elapsed:    5.0s finished

[2024-10-30 15:57:18] Features: 232/274 -- score: 0.9467131783637202[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.8s
[Parallel(n_jobs=-1)]: Done  33 out of  42 | elapsed:    4.6s remaining:    1.2s
[Parallel(n_jobs=-1)]: Done  42 out of  42 | elapsed:    5.0s finished

[2024-10-30 15:57:23] Features: 233/274 -- score: 0.9466859943937971[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.9s
[Parallel(n_jobs=-1)]: Done  31 out of  41 | elapsed:    4.3s remaining:    1.3s
[Parallel(n_jobs=-1)]: Done  41 out of  41 | elapsed:    4.9s finished

[2024-10-30 15:57:28] Features: 234/274 -- score: 0.9466508831445383[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.8s
[Parallel(n_jobs=-1)]: Done  30 out of  40 | elapsed:    4.4s remaining:    1.4s
[Parallel(n_jobs=-1)]: Done  40 out of  40 | elapsed:    4.9s finished

[2024-10-30 15:57:34] Features: 235/274 -- score: 0.9466138704337347[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done  28 out of  39 | elapsed:    4.2s remaining:    1.6s
[Parallel(n_jobs=-1)]: Done  39 out of  39 | elapsed:    4.7s finished

[2024-10-30 15:57:38] Features: 236/274 -- score: 0.9465750648065324[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done  27 out of  38 | elapsed:    4.0s remaining:    1.6s
[Parallel(n_jobs=-1)]: Done  38 out of  38 | elapsed:    4.6s finished

[2024-10-30 15:57:43] Features: 237/274 -- score: 0.946533477078739[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 out of  37 | elapsed:    3.7s remaining:    1.7s
[Parallel(n_jobs=-1)]: Done  37 out of  37 | elapsed:    4.6s finished

[2024-10-30 15:57:48] Features: 238/274 -- score: 0.946486463996916[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done  24 out of  36 | elapsed:    3.7s remaining:    1.8s
[Parallel(n_jobs=-1)]: Done  36 out of  36 | elapsed:    4.5s finished

[2024-10-30 15:57:52] Features: 239/274 -- score: 0.9464394609940602[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done  22 out of  35 | elapsed:    3.5s remaining:    2.0s
[Parallel(n_jobs=-1)]: Done  35 out of  35 | elapsed:    4.3s finished

[2024-10-30 15:57:57] Features: 240/274 -- score: 0.9467273803937399[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done  21 out of  34 | elapsed:    3.5s remaining:    2.1s
[Parallel(n_jobs=-1)]: Done  34 out of  34 | elapsed:    4.2s finished

[2024-10-30 15:58:01] Features: 241/274 -- score: 0.9467599072842546[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done  19 out of  33 | elapsed:    3.3s remaining:    2.4s
[Parallel(n_jobs=-1)]: Done  33 out of  33 | elapsed:    4.2s finished

[2024-10-30 15:58:05] Features: 242/274 -- score: 0.9467735822418181[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done  18 out of  32 | elapsed:    3.0s remaining:    2.3s
[Parallel(n_jobs=-1)]: Done  32 out of  32 | elapsed:    4.0s finished

[2024-10-30 15:58:09] Features: 243/274 -- score: 0.94672377591958[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done  16 out of  31 | elapsed:    2.7s remaining:    2.5s
[Parallel(n_jobs=-1)]: Done  31 out of  31 | elapsed:    3.8s finished

[2024-10-30 15:58:13] Features: 244/274 -- score: 0.9466509249761966[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done  15 out of  30 | elapsed:    2.8s remaining:    2.8s
[Parallel(n_jobs=-1)]: Done  30 out of  30 | elapsed:    3.9s finished

[2024-10-30 15:58:17] Features: 245/274 -- score: 0.9466867379399406[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done  13 out of  29 | elapsed:    2.6s remaining:    3.2s
[Parallel(n_jobs=-1)]: Done  29 out of  29 | elapsed:    3.8s finished

[2024-10-30 15:58:21] Features: 246/274 -- score: 0.9466071795787135[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done  12 out of  28 | elapsed:    2.4s remaining:    3.3s
[Parallel(n_jobs=-1)]: Done  28 out of  28 | elapsed:    3.6s finished

[2024-10-30 15:58:25] Features: 247/274 -- score: 0.9465238896311116[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done  10 out of  27 | elapsed:    2.2s remaining:    3.7s
[Parallel(n_jobs=-1)]: Done  24 out of  27 | elapsed:    3.3s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done  27 out of  27 | elapsed:    3.4s finished

[2024-10-30 15:58:28] Features: 248/274 -- score: 0.946521579716731[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 out of  26 | elapsed:    2.1s remaining:    4.0s
[Parallel(n_jobs=-1)]: Done  23 out of  26 | elapsed:    3.3s remaining:    0.3s
[Parallel(n_jobs=-1)]: Done  26 out of  26 | elapsed:    3.4s finished

[2024-10-30 15:58:32] Features: 249/274 -- score: 0.9464362915278152[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   7 out of  25 | elapsed:    1.8s remaining:    4.7s
[Parallel(n_jobs=-1)]: Done  20 out of  25 | elapsed:    3.1s remaining:    0.7s
[Parallel(n_jobs=-1)]: Done  25 out of  25 | elapsed:    3.3s finished

[2024-10-30 15:58:35] Features: 250/274 -- score: 0.9463261799186548[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   6 out of  24 | elapsed:    1.7s remaining:    5.4s
[Parallel(n_jobs=-1)]: Done  19 out of  24 | elapsed:    3.0s remaining:    0.7s
[Parallel(n_jobs=-1)]: Done  24 out of  24 | elapsed:    3.1s finished

[2024-10-30 15:58:39] Features: 251/274 -- score: 0.9462690802707947[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   4 out of  23 | elapsed:    1.3s remaining:    6.5s
[Parallel(n_jobs=-1)]: Done  16 out of  23 | elapsed:    2.7s remaining:    1.1s
[Parallel(n_jobs=-1)]: Done  23 out of  23 | elapsed:    2.9s finished

[2024-10-30 15:58:42] Features: 252/274 -- score: 0.9461447525929906[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   3 out of  22 | elapsed:    1.3s remaining:    8.9s
[Parallel(n_jobs=-1)]: Done  15 out of  22 | elapsed:    2.6s remaining:    1.1s
[Parallel(n_jobs=-1)]: Done  22 out of  22 | elapsed:    2.9s finished

[2024-10-30 15:58:45] Features: 253/274 -- score: 0.9459954927439093[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done  12 out of  21 | elapsed:    2.3s remaining:    1.7s
[Parallel(n_jobs=-1)]: Done  21 out of  21 | elapsed:    2.8s finished

[2024-10-30 15:58:47] Features: 254/274 -- score: 0.9458383532301985[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done  11 out of  20 | elapsed:    2.2s remaining:    1.8s
[Parallel(n_jobs=-1)]: Done  20 out of  20 | elapsed:    2.7s finished

[2024-10-30 15:58:50] Features: 255/274 -- score: 0.945679156290041[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   8 out of  19 | elapsed:    1.9s remaining:    2.6s
[Parallel(n_jobs=-1)]: Done  19 out of  19 | elapsed:    2.6s finished

[2024-10-30 15:58:53] Features: 256/274 -- score: 0.9455899760485096[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   7 out of  18 | elapsed:    1.8s remaining:    2.9s
[Parallel(n_jobs=-1)]: Done  18 out of  18 | elapsed:    2.5s finished

[2024-10-30 15:58:56] Features: 257/274 -- score: 0.9454303045137135[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   4 out of  17 | elapsed:    1.4s remaining:    4.7s
[Parallel(n_jobs=-1)]: Done  13 out of  17 | elapsed:    2.1s remaining:    0.6s
[Parallel(n_jobs=-1)]: Done  17 out of  17 | elapsed:    2.3s finished

[2024-10-30 15:58:58] Features: 258/274 -- score: 0.9458661657907452[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   3 out of  16 | elapsed:    1.2s remaining:    5.5s
[Parallel(n_jobs=-1)]: Done  12 out of  16 | elapsed:    2.0s remaining:    0.6s
[Parallel(n_jobs=-1)]: Done  16 out of  16 | elapsed:    2.1s finished

[2024-10-30 15:59:00] Features: 259/274 -- score: 0.9461832380497425[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   8 out of  15 | elapsed:    1.8s remaining:    1.5s
[Parallel(n_jobs=-1)]: Done  15 out of  15 | elapsed:    2.2s finished

[2024-10-30 15:59:03] Features: 260/274 -- score: 0.9460372836244307[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   7 out of  14 | elapsed:    1.6s remaining:    1.6s
[Parallel(n_jobs=-1)]: Done  14 out of  14 | elapsed:    1.9s finished

[2024-10-30 15:59:04] Features: 261/274 -- score: 0.9458763464886486[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   3 out of  13 | elapsed:    1.2s remaining:    4.2s
[Parallel(n_jobs=-1)]: Done  10 out of  13 | elapsed:    1.7s remaining:    0.4s
[Parallel(n_jobs=-1)]: Done  13 out of  13 | elapsed:    1.8s finished

[2024-10-30 15:59:06] Features: 262/274 -- score: 0.9458467262666927[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of  12 | elapsed:    1.1s remaining:    6.0s
[Parallel(n_jobs=-1)]: Done   9 out of  12 | elapsed:    1.6s remaining:    0.5s
[Parallel(n_jobs=-1)]: Done  12 out of  12 | elapsed:    1.7s finished

[2024-10-30 15:59:08] Features: 263/274 -- score: 0.9456388669986378[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   4 out of  11 | elapsed:    1.1s remaining:    2.1s
[Parallel(n_jobs=-1)]: Done  11 out of  11 | elapsed:    1.5s finished

[2024-10-30 15:59:10] Features: 264/274 -- score: 0.9452713138924475[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   3 out of  10 | elapsed:    1.1s remaining:    2.6s
[Parallel(n_jobs=-1)]: Done  10 out of  10 | elapsed:    1.4s finished

[2024-10-30 15:59:11] Features: 265/274 -- score: 0.945161446961162[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   3 out of   9 | elapsed:    1.1s remaining:    2.3s
[Parallel(n_jobs=-1)]: Done   9 out of   9 | elapsed:    1.3s finished

[2024-10-30 15:59:13] Features: 266/274 -- score: 0.944754144553993[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   8 | elapsed:    0.9s remaining:    2.8s
[Parallel(n_jobs=-1)]: Done   8 out of   8 | elapsed:    1.1s finished

[2024-10-30 15:59:14] Features: 267/274 -- score: 0.9446487424045044[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   4 out of   7 | elapsed:    0.9s remaining:    0.6s
[Parallel(n_jobs=-1)]: Done   7 out of   7 | elapsed:    1.0s finished

[2024-10-30 15:59:15] Features: 268/274 -- score: 0.9441455030203605[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   3 out of   6 | elapsed:    0.8s remaining:    0.8s
[Parallel(n_jobs=-1)]: Done   6 out of   6 | elapsed:    0.9s finished

[2024-10-30 15:59:16] Features: 269/274 -- score: 0.9441455030203807[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   5 out of   5 | elapsed:    0.8s finished

[2024-10-30 15:59:17] Features: 270/274 -- score: 0.9434739244637808[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   4 out of   4 | elapsed:    0.6s finished

[2024-10-30 15:59:18] Features: 271/274 -- score: 0.9434233272407218[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   3 out of   3 | elapsed:    0.5s finished

[2024-10-30 15:59:18] Features: 272/274 -- score: 0.9424326394992495[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 out of   2 | elapsed:    0.4s finished

[2024-10-30 15:59:19] Features: 273/274 -- score: -30180713684991.74[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.

[2024-10-30 15:59:19] Features: 274/274 -- score: -2977506930194.4707

Now, we are going to plot the score vs number of features

In [134]:

Copied!





# score results
sfs_dict = sfs.get_metric_dict()
x = [i for i in sfs_dict]
y = [sfs_dict[i]['avg_score'] for i in sfs_dict]
# slice list to avoid last 2 extreme scores
x2 = x[0:272]
y2 = y[0:272]
sns.lineplot(x=x2, y=y2);
# score results
sfs_dict = sfs.get_metric_dict()
x = [i for i in sfs_dict]
y = [sfs_dict[i]['avg_score'] for i in sfs_dict]
# slice list to avoid last 2 extreme scores
x2 = x[0:272]
y2 = y[0:272]
sns.lineplot(x=x2, y=y2);

With 50 features the score is around 0.93, and it does not improve significantly with additional features. Actually, it decreases after 265 features.

In [135]:

Copied!





reg = LinearRegression()

# Build step forward feature selection with 50 features
sfs = SFS(
    reg,
    k_features=20,
    forward=True,
    floating=False,
    scoring="r2",
    n_jobs=-1,
    verbose=2,
    cv=5,
)

# Perform SFFS
sfs = sfs.fit(x_train, y_train)
reg = LinearRegression()

# Build step forward feature selection with 50 features
sfs = SFS(
    reg,
    k_features=20,
    forward=True,
    floating=False,
    scoring="r2",
    n_jobs=-1,
    verbose=2,
    cv=5,
)

# Perform SFFS
sfs = sfs.fit(x_train, y_train)

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    4.5s
[Parallel(n_jobs=-1)]: Done 130 tasks      | elapsed:    8.1s
[Parallel(n_jobs=-1)]: Done 274 out of 274 | elapsed:   12.2s finished

[2024-10-30 16:18:10] Features: 1/20 -- score: 0.6008605735778773[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.2s
[Parallel(n_jobs=-1)]: Done 228 tasks      | elapsed:    4.0s
[Parallel(n_jobs=-1)]: Done 242 out of 273 | elapsed:    4.2s remaining:    0.5s
[Parallel(n_jobs=-1)]: Done 273 out of 273 | elapsed:    4.9s finished

[2024-10-30 16:18:15] Features: 2/20 -- score: 0.8258189338417044[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.3s
[Parallel(n_jobs=-1)]: Done 272 out of 272 | elapsed:    4.8s finished

[2024-10-30 16:18:20] Features: 3/20 -- score: 0.843163289844291[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.3s
[Parallel(n_jobs=-1)]: Done 240 out of 271 | elapsed:    4.3s remaining:    0.5s
[Parallel(n_jobs=-1)]: Done 271 out of 271 | elapsed:    4.9s finished

[2024-10-30 16:18:25] Features: 4/20 -- score: 0.8607242290453196[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.3s
[Parallel(n_jobs=-1)]: Done 270 out of 270 | elapsed:    4.8s finished

[2024-10-30 16:18:30] Features: 5/20 -- score: 0.8673527352506463[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.3s
[Parallel(n_jobs=-1)]: Done 238 out of 269 | elapsed:    4.2s remaining:    0.5s
[Parallel(n_jobs=-1)]: Done 269 out of 269 | elapsed:    4.8s finished

[2024-10-30 16:18:35] Features: 6/20 -- score: 0.8733406729210295[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.3s
[Parallel(n_jobs=-1)]: Done 268 out of 268 | elapsed:    4.8s finished

[2024-10-30 16:18:39] Features: 7/20 -- score: 0.8779019854352988[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.3s
[Parallel(n_jobs=-1)]: Done 236 out of 267 | elapsed:    4.4s remaining:    0.5s
[Parallel(n_jobs=-1)]: Done 267 out of 267 | elapsed:    5.1s finished

[2024-10-30 16:18:45] Features: 8/20 -- score: 0.8821076936222507[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.3s
[Parallel(n_jobs=-1)]: Done 266 out of 266 | elapsed:    5.1s finished

[2024-10-30 16:18:50] Features: 9/20 -- score: 0.8865803385052942[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.3s
[Parallel(n_jobs=-1)]: Done 234 out of 265 | elapsed:    4.5s remaining:    0.5s
[Parallel(n_jobs=-1)]: Done 265 out of 265 | elapsed:    5.1s finished

[2024-10-30 16:18:55] Features: 10/20 -- score: 0.892136862625631[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.3s
[Parallel(n_jobs=-1)]: Done 264 out of 264 | elapsed:    5.1s finished

[2024-10-30 16:19:00] Features: 11/20 -- score: 0.8997588358866588[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.3s
[Parallel(n_jobs=-1)]: Done 232 out of 263 | elapsed:    4.5s remaining:    0.5s
[Parallel(n_jobs=-1)]: Done 263 out of 263 | elapsed:    5.1s finished

[2024-10-30 16:19:06] Features: 12/20 -- score: 0.9045279013258785[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.3s
[Parallel(n_jobs=-1)]: Done 262 out of 262 | elapsed:    5.1s finished

[2024-10-30 16:19:11] Features: 13/20 -- score: 0.9077032400393492[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.3s
[Parallel(n_jobs=-1)]: Done 230 out of 261 | elapsed:    4.5s remaining:    0.5s
[Parallel(n_jobs=-1)]: Done 261 out of 261 | elapsed:    5.1s finished

[2024-10-30 16:19:16] Features: 14/20 -- score: 0.9113184768449554[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.3s
[Parallel(n_jobs=-1)]: Done 260 out of 260 | elapsed:    5.1s finished

[2024-10-30 16:19:21] Features: 15/20 -- score: 0.9144300582982374[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.3s
[Parallel(n_jobs=-1)]: Done 228 out of 259 | elapsed:    4.6s remaining:    0.5s
[Parallel(n_jobs=-1)]: Done 259 out of 259 | elapsed:    5.2s finished

[2024-10-30 16:19:26] Features: 16/20 -- score: 0.9165568997201585[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.3s
[Parallel(n_jobs=-1)]: Done 258 out of 258 | elapsed:    5.2s finished

[2024-10-30 16:19:32] Features: 17/20 -- score: 0.9187094547000717[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.3s
[Parallel(n_jobs=-1)]: Done 226 out of 257 | elapsed:    5.0s remaining:    0.6s
[Parallel(n_jobs=-1)]: Done 257 out of 257 | elapsed:    5.5s finished

[2024-10-30 16:19:37] Features: 18/20 -- score: 0.9210414399636491[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.3s
[Parallel(n_jobs=-1)]: Done 256 out of 256 | elapsed:    5.4s finished

[2024-10-30 16:19:43] Features: 19/20 -- score: 0.9231785068400782[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    0.3s
[Parallel(n_jobs=-1)]: Done 255 out of 255 | elapsed:    5.7s finished

[2024-10-30 16:19:49] Features: 20/20 -- score: 0.9244145086624963

In [136]:

Copied!

# Most important features
feat_cols = list(sfs.k_feature_idx_)
print(feat_cols)
# Most important features
feat_cols = list(sfs.k_feature_idx_)
print(feat_cols)

[0, 2, 4, 5, 6, 8, 10, 13, 19, 20, 24, 25, 27, 37, 40, 43, 44, 52, 53, 266]

In [137]:

Copied!

x_train.columns[feat_cols]
x_train.columns[feat_cols]

Out[137]:

Index(['Year', 'Engine', 'Kilometers_Driven_log', 'Power_log',
       'Location_Bangalore', 'Location_Coimbatore', 'Location_Hyderabad',
       'Location_Kolkata', 'Fuel_Type_Petrol', 'Transmission_Manual',
       'Brand_Audi', 'Brand_BMW', 'Brand_Chevrolet', 'Brand_Jaguar',
       'Brand_Land', 'Brand_Mercedes-Benz', 'Brand_Mini', 'Brand_Tata',
       'Brand_Toyota', 'Model_Xylo'],
      dtype='object')

11.2 Retraining the model¶

New independent train and test sets with the 50 variables selected in the sequential feature selection

In [138]:

Copied!

x_train_final = x_train[x_train.columns[feat_cols]]
x_test_final = x_test[x_train.columns[feat_cols]]
x_train_final = x_train[x_train.columns[feat_cols]]
x_test_final = x_test[x_train.columns[feat_cols]]

In [139]:

Copied!

#check shape
x_train_final.shape
#check shape
x_train_final.shape

Out[139]:

(4213, 20)

In [140]:

Copied!

#check shape
x_test_final.shape
#check shape
x_test_final.shape

Out[140]:

(1806, 20)

In [141]:

Copied!





# Fitting linear model
lin_reg_model2 = LinearRegression()
lin_reg_model2.fit(x_train_final, y_train)

# let us check the coefficients and intercept of the model

coef_df = pd.DataFrame(
    np.append(lin_reg_model2.coef_.flatten(), lin_reg_model2.intercept_),
    index=x_train_final.columns.tolist() + ["Intercept"],
    columns=["Coefficients"],
)
coef_df
# Fitting linear model
lin_reg_model2 = LinearRegression()
lin_reg_model2.fit(x_train_final, y_train)

# let us check the coefficients and intercept of the model

coef_df = pd.DataFrame(
    np.append(lin_reg_model2.coef_.flatten(), lin_reg_model2.intercept_),
    index=x_train_final.columns.tolist() + ["Intercept"],
    columns=["Coefficients"],
)
coef_df

Out[141]:

	Coefficients
Year	0.115198
Engine	0.000226
Kilometers_Driven_log	-0.074748
Power_log	0.778158
Location_Bangalore	0.183211
Location_Coimbatore	0.156587
Location_Hyderabad	0.180627
Location_Kolkata	-0.192565
Fuel_Type_Petrol	-0.198387
Transmission_Manual	-0.107517
Brand_Audi	0.571186
Brand_BMW	0.519360
Brand_Chevrolet	-0.278779
Brand_Jaguar	0.607397
Brand_Land	0.887929
Brand_Mercedes-Benz	0.580511
Brand_Mini	0.913336
Brand_Tata	-0.406731
Brand_Toyota	0.251947
Model_Xylo	-0.500064
Intercept	-233.238329

11.2.1 Model Performance¶

In [142]:

Copied!

# R^2 train set
lin_reg_model2.score(x_train_final, y_train)
# R^2 train set
lin_reg_model2.score(x_train_final, y_train)

Out[142]:

0.9261017957003274

In [143]:

Copied!

# R^2 test set
lin_reg_model2.score(x_test_final, y_test)
# R^2 test set
lin_reg_model2.score(x_test_final, y_test)

Out[143]:

0.9318611297475975

In [144]:

Copied!

# Model performance on train set
model_perf(lin_reg_model2, x_train_final, y_train)
# Model performance on train set
model_perf(lin_reg_model2, x_train_final, y_train)

Out[144]:

{'RMSE': 0.2371177477741038,
 'MAE': 0.17574233711503315,
 'R^2': 0.9261017957003274,
 'Adjusted R^2': 0.9257492279317221}

In [145]:

Copied!

# Model performance on train set
model_perf(lin_reg_model2, x_test_final, y_test)
# Model performance on train set
model_perf(lin_reg_model2, x_test_final, y_test)

Out[145]:

{'RMSE': 0.22906136480762884,
 'MAE': 0.1709946146974173,
 'R^2': 0.9318611297475975,
 'Adjusted R^2': 0.9310976690164782}

Observations¶

The new regression model have 50 features that is 18% on the number of columns of the original regression model
The performance of the new model is very close to the original model

11.3 Coefficient Interpretation¶

11.3.1 Positive impact¶

This is the list of coefficients with positive impact on prices. Among them are Year, Power and Seats. Increase in these will lead to an increase in the price.

In [146]:

Copied!

coef_df[coef_df['Coefficients']>0]
coef_df[coef_df['Coefficients']>0]

Out[146]:

	Coefficients
Year	0.115198
Engine	0.000226
Power_log	0.778158
Location_Bangalore	0.183211
Location_Coimbatore	0.156587
Location_Hyderabad	0.180627
Brand_Audi	0.571186
Brand_BMW	0.519360
Brand_Jaguar	0.607397
Brand_Land	0.887929
Brand_Mercedes-Benz	0.580511
Brand_Mini	0.913336
Brand_Toyota	0.251947

11.3.2 Negative impact¶

This is the list of coefficients with negative impact on prices. Among them are Mileage, Engine and Kilometers_Drive_log. Increase in these will lead to a decrease in the price

In [147]:

Copied!

coef_df[coef_df['Coefficients']<0]
coef_df[coef_df['Coefficients']<0]

Out[147]:

	Coefficients
Kilometers_Driven_log	-0.074748
Location_Kolkata	-0.192565
Fuel_Type_Petrol	-0.198387
Transmission_Manual	-0.107517
Brand_Chevrolet	-0.278779
Brand_Tata	-0.406731
Model_Xylo	-0.500064
Intercept	-233.238329

11.3.3 Observations¶

The impact of the different features on Price is similar than the original regression model

12 Actionable Insights & Recommendations¶

Cars4U should focus on trade:

The business should focus to negotiate recent owned cars
Cars with high power have a positive impact
Diesel and Electric cars are more valued than other fuel types
Trade cars on specifics locations: Bangalore, Chennai, Coimbatore and Hyderabad
If possible, focus on luxury cars and models (Lamborghini, Jaguar, Porsche, etc)

Cars4U should avoid:

Cars with a large number of kilometers driven
Trading on Delhi, Kochi, Kolkata and Mumbai
LPG and Petrol cars
Manual transmission cars
Second and above owners cars
Economy Brands and Models (Datsun, Renault, Honda, Mahindra)

In [ ]: