Supervised Learning : Regression¶
1 Cars4U Project¶
1.1 Objective¶
- Explore and visualize the dataset.
- Build a linear regression model to predict the prices of used cars.
- Generate a set of insights and recommendations that will help the business.
1.2 Data:¶
- S.No. : Serial Number
- Name : Name of the car which includes Brand name and Model name
- Location : The location in which the car is being sold or is available for purchase Cities
- Year : Manufacturing year of the car
- Kilometers_driven : The total kilometers driven in the car by the previous owner(s) in KM.
- Fuel_Type : The type of fuel used by the car. (Petrol, Diesel, Electric, CNG, LPG)
- Transmission : The type of transmission used by the car. (Automatic / Manual)
- Owner : Type of ownership
- Mileage : The standard mileage offered by the car company in kmpl or km/kg
- Engine : The displacement volume of the engine in CC.
- Power : The maximum power of the engine in bhp.
- Seats : The number of seats in the car.
- New_Price : The price of a new car of the same model in INR Lakhs.(1 Lakh = 100, 000)
- Price : The price of the used car in INR Lakhs (1 Lakh = 100, 000)
1.3 Problem definition and questions to be answered¶
- Does the brand and model of the car impact the price?
- Do luxury brands increase the price of the car?
- How much the total kilometers driven impact the price?
- Is more profitable to trade cars in some locations than others?
- What is the impact of the Fuel Type, Transmission, Mileage, Engine, Power and Seats on the price?
- Is there are relationship between the price of new cars and used cars?
2 Import packages and turnoff warnings¶
import warnings
warnings.filterwarnings("ignore")
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# import pandas_profiling
sns.set(color_codes=True)
%matplotlib inline
3 Import dataset and quality of data¶
# read data from csv file
data = pd.read_csv(r"C:\Users\AndresDelgadillo\Downloads\used_cars_data.csv")
# get columns
data.columns
Index(['S.No.', 'Name', 'Location', 'Year', 'Kilometers_Driven', 'Fuel_Type', 'Transmission', 'Owner_Type', 'Mileage', 'Engine', 'Power', 'Seats', 'New_Price', 'Price'], dtype='object')
# get size of dataset
data.shape
(7253, 14)
# check dataset information
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 7253 entries, 0 to 7252 Data columns (total 14 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 S.No. 7253 non-null int64 1 Name 7253 non-null object 2 Location 7253 non-null object 3 Year 7253 non-null int64 4 Kilometers_Driven 7253 non-null int64 5 Fuel_Type 7253 non-null object 6 Transmission 7253 non-null object 7 Owner_Type 7253 non-null object 8 Mileage 7251 non-null object 9 Engine 7207 non-null object 10 Power 7207 non-null object 11 Seats 7200 non-null float64 12 New_Price 1006 non-null object 13 Price 6019 non-null float64 dtypes: float64(2), int64(3), object(9) memory usage: 793.4+ KB
# check dataset missing values
total = data.isnull().sum().sort_values(ascending=False) # total number of null values
print(total)
New_Price 6247 Price 1234 Seats 53 Engine 46 Power 46 Mileage 2 S.No. 0 Name 0 Location 0 Year 0 Kilometers_Driven 0 Fuel_Type 0 Transmission 0 Owner_Type 0 dtype: int64
- There are 7253 rows and 14 columns.
- 'New_Price' and 'Price' columns have a big number of missing values, and that could affect the results of the analysis.
A more deep study is necessary to deal with all missing values
4 Characteristics of the data¶
# check first rows of data
data.head()
S.No. | Name | Location | Year | Kilometers_Driven | Fuel_Type | Transmission | Owner_Type | Mileage | Engine | Power | Seats | New_Price | Price | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | Maruti Wagon R LXI CNG | Mumbai | 2010 | 72000 | CNG | Manual | First | 26.6 km/kg | 998 CC | 58.16 bhp | 5.0 | NaN | 1.75 |
1 | 1 | Hyundai Creta 1.6 CRDi SX Option | Pune | 2015 | 41000 | Diesel | Manual | First | 19.67 kmpl | 1582 CC | 126.2 bhp | 5.0 | NaN | 12.50 |
2 | 2 | Honda Jazz V | Chennai | 2011 | 46000 | Petrol | Manual | First | 18.2 kmpl | 1199 CC | 88.7 bhp | 5.0 | 8.61 Lakh | 4.50 |
3 | 3 | Maruti Ertiga VDI | Chennai | 2012 | 87000 | Diesel | Manual | First | 20.77 kmpl | 1248 CC | 88.76 bhp | 7.0 | NaN | 6.00 |
4 | 4 | Audi A4 New 2.0 TDI Multitronic | Coimbatore | 2013 | 40670 | Diesel | Automatic | Second | 15.2 kmpl | 1968 CC | 140.8 bhp | 5.0 | NaN | 17.74 |
data.tail()
S.No. | Name | Location | Year | Kilometers_Driven | Fuel_Type | Transmission | Owner_Type | Mileage | Engine | Power | Seats | New_Price | Price | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
7248 | 7248 | Volkswagen Vento Diesel Trendline | Hyderabad | 2011 | 89411 | Diesel | Manual | First | 20.54 kmpl | 1598 CC | 103.6 bhp | 5.0 | NaN | NaN |
7249 | 7249 | Volkswagen Polo GT TSI | Mumbai | 2015 | 59000 | Petrol | Automatic | First | 17.21 kmpl | 1197 CC | 103.6 bhp | 5.0 | NaN | NaN |
7250 | 7250 | Nissan Micra Diesel XV | Kolkata | 2012 | 28000 | Diesel | Manual | First | 23.08 kmpl | 1461 CC | 63.1 bhp | 5.0 | NaN | NaN |
7251 | 7251 | Volkswagen Polo GT TSI | Pune | 2013 | 52262 | Petrol | Automatic | Third | 17.2 kmpl | 1197 CC | 103.6 bhp | 5.0 | NaN | NaN |
7252 | 7252 | Mercedes-Benz E-Class 2009-2013 E 220 CDI Avan... | Kochi | 2014 | 72443 | Diesel | Automatic | First | 10.0 kmpl | 2148 CC | 170 bhp | 5.0 | NaN | NaN |
# get a random sample of data
np.random.seed(1) #setting the random seed via np.random.seed to get the same random results every time
data.sample(n=5)
S.No. | Name | Location | Year | Kilometers_Driven | Fuel_Type | Transmission | Owner_Type | Mileage | Engine | Power | Seats | New_Price | Price | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2397 | 2397 | Ford EcoSport 1.5 Petrol Trend | Kolkata | 2016 | 21460 | Petrol | Manual | First | 17.0 kmpl | 1497 CC | 121.36 bhp | 5.0 | 9.47 Lakh | 6.00 |
3777 | 3777 | Maruti Wagon R VXI 1.2 | Kochi | 2015 | 49818 | Petrol | Manual | First | 21.5 kmpl | 1197 CC | 81.80 bhp | 5.0 | 5.44 Lakh | 4.11 |
4425 | 4425 | Ford Endeavour 4x2 XLT | Hyderabad | 2007 | 130000 | Diesel | Manual | First | 13.1 kmpl | 2499 CC | 141 bhp | 7.0 | NaN | 6.00 |
3661 | 3661 | Mercedes-Benz E-Class E250 CDI Avantgrade | Coimbatore | 2016 | 39753 | Diesel | Automatic | First | 13.0 kmpl | 2143 CC | 201.1 bhp | 5.0 | NaN | 35.28 |
4514 | 4514 | Hyundai Xcent 1.2 Kappa AT SX Option | Kochi | 2016 | 45560 | Petrol | Automatic | First | 16.9 kmpl | 1197 CC | 82 bhp | 5.0 | NaN | 6.34 |
- 'Name' column could be split in 2 columns. The first column would be the brand and the second column the model of the car
- 'Location', 'Fuel_Type', 'Transmission', and 'Owner_Type' columns could be transformed to 'category'
- 'Mileage', 'Engine', 'Power', and 'New_Price' columns should be numerical values but they appear as 'object'.
Processing columns is necessary to convert them to numerical
- 'S.No' is the same as the index of the dataset and we can drop the column
# Split Mileage column to extract units
data[['Mileage','Unit']] = data['Mileage'].str.split(' ',n=2,expand=True)
# Get unique pairs of Fuel_Type and Unit
data.groupby(['Fuel_Type','Unit']).size()
Fuel_Type Unit CNG km/kg 62 Diesel kmpl 3852 LPG km/kg 12 Petrol kmpl 3325 dtype: int64
There is a clear relation between 'Fuel_Type' and 'Unit'.
- Mileage for CNG and LPG are in km/kg
- Mileage for Diesel and Petrol are in kmpl It is not necessary to convert the units because Fuel_Type column will help to identify this information.
Now, we can convert Mileage to numeric and drop the Unit column
# drop Name column
data.drop(['Unit'], axis=1, inplace=True)
# Convert Mileage to Number
data['Mileage']=data['Mileage'].astype('float64')
# check Mileage is number
data['Mileage'].head()
0 26.60 1 19.67 2 18.20 3 20.77 4 15.20 Name: Mileage, dtype: float64
5.2 Engine¶
'CC' string is going to be deleted
def engine_to_num(engine):
"""This function takes in a string representing the engine and converts it to a number.
This function returns the same engine value if the input is already numeric."""
if isinstance(engine, str): # checks if engine is a string
engine_val = float(engine.replace('CC', '').strip())
else: # this happens when the engine is already number or nan
engine_val = engine
# return engine as number
return engine_val
# apply engine_to_num function to column 'Engine'
data['Engine'] = data['Engine'].apply(engine_to_num)
# check Engine is number
data['Engine'].head()
0 998.0 1 1582.0 2 1199.0 3 1248.0 4 1968.0 Name: Engine, dtype: float64
5.3 Power¶
'bhp' string is going to be deleted
def power_to_num(power):
"""This function takes in a string representing the power and converts it to a number.
This function returns the same power value if the input is already numeric."""
if isinstance(power, str): # checks if power is a string
power_val = power.replace('bhp', '').strip()
if power_val != 'null': # check that there is a value
power_val = float(power_val)
else:
power_val = np.nan # returns nan
else: # this happens when the power is already number or nan
power_val = power
# return power as number
return power_val
# apply engine_to_num function to column 'Engine'
data['Power'] = data['Power'].apply(power_to_num)
# check Power is number
data['Power'].head()
0 58.16 1 126.20 2 88.70 3 88.76 4 140.80 Name: Power, dtype: float64
def price_to_num(price):
"""This function takes in a string representing the price and converts it to a number.
This function returns the same price value if the input is already numeric."""
if isinstance(price, str): # checks if price is a string
# handles Cr and Lakh units
if price.endswith('Lakh'):
multiplier = 1
elif price.endswith('Cr'):
multiplier = 100
price_val = float(price.replace('Lakh', '').replace('Cr', '').strip()) * multiplier
else: # this happens when the price is already number or nan
price_val = price
# return price as number
return price_val
# apply price_to_num function to column 'New_Price'
data['New_Price'] = data['New_Price'].apply(price_to_num)
# check Price is number
data['New_Price'].head()
0 NaN 1 NaN 2 8.61 3 NaN 4 NaN Name: New_Price, dtype: float64
data[['Brand','Model','Specs']] = data['Name'].str.split(' ',n=2,expand=True)
data[['Name','Brand','Model','Specs']].head()
Name | Brand | Model | Specs | |
---|---|---|---|---|
0 | Maruti Wagon R LXI CNG | Maruti | Wagon | R LXI CNG |
1 | Hyundai Creta 1.6 CRDi SX Option | Hyundai | Creta | 1.6 CRDi SX Option |
2 | Honda Jazz V | Honda | Jazz | V |
3 | Maruti Ertiga VDI | Maruti | Ertiga | VDI |
4 | Audi A4 New 2.0 TDI Multitronic | Audi | A4 | New 2.0 TDI Multitronic |
Now, we can drop 'Name' column and use 'Brand', 'Model' and 'Specs' columns
# drop Name column
data.drop(['Name'], axis=1, inplace=True)
5.6 Category columns¶
'Brand', 'Model', 'Specs', 'Location', 'Fuel_Type', 'Transmission', and 'Owner_Type' columns are transformed to category
data['Brand']=data['Brand'].astype('category')
data['Model']=data['Model'].astype('category')
data['Specs']=data['Specs'].astype('category')
data['Location']=data['Location'].astype('category')
data['Fuel_Type']=data['Fuel_Type'].astype('category')
data['Transmission']=data['Transmission'].astype('category')
data['Owner_Type']=data['Owner_Type'].astype('category')
5.7 Drop 'S.No.' column¶
data.drop(['S.No.'], axis=1, inplace=True)
5.8 Duplicate rows¶
# show all rows with duplicates
data[data.duplicated(keep=False)]
Location | Year | Kilometers_Driven | Fuel_Type | Transmission | Owner_Type | Mileage | Engine | Power | Seats | New_Price | Price | Brand | Model | Specs | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
6498 | Mumbai | 2010 | 52000 | Petrol | Manual | First | 17.0 | 1497.0 | 118.0 | 5.0 | NaN | NaN | Honda | City | 1.5 E MT |
6582 | Mumbai | 2010 | 52000 | Petrol | Manual | First | 17.0 | 1497.0 | 118.0 | 5.0 | NaN | NaN | Honda | City | 1.5 E MT |
# drop duplicate rows
data.drop(data[data.duplicated()].index, axis=0, inplace=True)
# Check there are no duplicates
data.duplicated().sum()
0
5.8 Check characteristics of data after processing¶
data.info()
<class 'pandas.core.frame.DataFrame'> Index: 7252 entries, 0 to 7252 Data columns (total 15 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Location 7252 non-null category 1 Year 7252 non-null int64 2 Kilometers_Driven 7252 non-null int64 3 Fuel_Type 7252 non-null category 4 Transmission 7252 non-null category 5 Owner_Type 7252 non-null category 6 Mileage 7250 non-null float64 7 Engine 7206 non-null float64 8 Power 7077 non-null float64 9 Seats 7199 non-null float64 10 New_Price 1006 non-null float64 11 Price 6019 non-null float64 12 Brand 7252 non-null category 13 Model 7252 non-null category 14 Specs 7251 non-null category dtypes: category(7), float64(6), int64(2) memory usage: 665.0 KB
# check first rows of data
data.head()
Location | Year | Kilometers_Driven | Fuel_Type | Transmission | Owner_Type | Mileage | Engine | Power | Seats | New_Price | Price | Brand | Model | Specs | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Mumbai | 2010 | 72000 | CNG | Manual | First | 26.60 | 998.0 | 58.16 | 5.0 | NaN | 1.75 | Maruti | Wagon | R LXI CNG |
1 | Pune | 2015 | 41000 | Diesel | Manual | First | 19.67 | 1582.0 | 126.20 | 5.0 | NaN | 12.50 | Hyundai | Creta | 1.6 CRDi SX Option |
2 | Chennai | 2011 | 46000 | Petrol | Manual | First | 18.20 | 1199.0 | 88.70 | 5.0 | 8.61 | 4.50 | Honda | Jazz | V |
3 | Chennai | 2012 | 87000 | Diesel | Manual | First | 20.77 | 1248.0 | 88.76 | 7.0 | NaN | 6.00 | Maruti | Ertiga | VDI |
4 | Coimbatore | 2013 | 40670 | Diesel | Automatic | Second | 15.20 | 1968.0 | 140.80 | 5.0 | NaN | 17.74 | Audi | A4 | New 2.0 TDI Multitronic |
Data series are the correct Type.
# get pandas profiling report
#pandas_profiling.ProfileReport(data)
6.2 Pairplot¶
We are going to perform univariate and bivariate analysis to understand the relationship between the columns
#sns.pairplot(data, diag_kind='kde');
6.3 Univariate analysis¶
6.3.1 Numerical columns¶
# Get stats for numerical columns
data.describe()
Year | Kilometers_Driven | Mileage | Engine | Power | Seats | New_Price | Price | |
---|---|---|---|---|---|---|---|---|
count | 7252.000000 | 7.252000e+03 | 7250.000000 | 7206.000000 | 7077.000000 | 7199.000000 | 1006.000000 | 6019.000000 |
mean | 2013.365830 | 5.869999e+04 | 18.141738 | 1616.590064 | 112.764474 | 5.279761 | 22.779692 | 9.479468 |
std | 3.254405 | 8.443351e+04 | 4.562492 | 595.324779 | 53.497297 | 0.811709 | 27.759344 | 11.187917 |
min | 1996.000000 | 1.710000e+02 | 0.000000 | 72.000000 | 34.200000 | 0.000000 | 3.910000 | 0.440000 |
25% | 2011.000000 | 3.400000e+04 | 15.170000 | 1198.000000 | 75.000000 | 5.000000 | 7.885000 | 3.500000 |
50% | 2014.000000 | 5.342900e+04 | 18.160000 | 1493.000000 | 94.000000 | 5.000000 | 11.570000 | 5.640000 |
75% | 2016.000000 | 7.300000e+04 | 21.100000 | 1968.000000 | 138.100000 | 5.000000 | 26.042500 | 9.950000 |
max | 2019.000000 | 6.500000e+06 | 33.540000 | 5998.000000 | 616.000000 | 10.000000 | 375.000000 | 160.000000 |
# Get the skewness of numerical columns
data.select_dtypes(include=np.number).skew()
Year -0.840219 Kilometers_Driven 61.578378 Mileage -0.438397 Engine 1.412244 Power 1.961084 Seats 1.902039 New_Price 4.128300 Price 3.335232 dtype: float64
6.3.1.1 Year¶
The Year distribution is slightly skewed to the left. The mean is 2013.36 and the median 2014, and there are not outliers.
6.3.1.2 Kilometers_Driven¶
The Kilometers_Driven distribution is highly skewed to the right. The mean is 58,699 km, the median 53,416 km, and there are several outliers as we can see in the chart below.
# creating the 2 subplots
f2, (ax_box2, ax_hist2) = plt.subplots(nrows = 2, # Number of rows of the subplot grid= 2
sharex = True, # x-axis will be shared among all subplots
gridspec_kw = {"height_ratios": (.25, .75)});
sns.boxplot(data['Kilometers_Driven'], ax=ax_box2, showmeans=True, color='violet'); # boxplot
sns.distplot(data['Kilometers_Driven'], kde=True, ax=ax_hist2); # histogram
ax_hist2.axvline(np.mean(data['Kilometers_Driven']), color='green', linestyle='--'); # Add mean to the histogram
ax_hist2.axvline(np.median(data['Kilometers_Driven']), color='black', linestyle='-'); # Add median to the histogram
6.3.1.3 Mileage¶
The Mileage distribution is fairly symmetrical. The mean is 18.14 and the median 18.16. However, there are 81 rows with value equal to 0
# Number of rows with mileage equals to 0
sum(data['Mileage']==0)
81
6.3.1.4 Engine|¶
The Engine distribution is skewed to the right. The mean is 1616 and the median 1493
# creating the 2 subplots
f2, (ax_box2, ax_hist2) = plt.subplots(nrows = 2, # Number of rows of the subplot grid= 2
sharex = True, # x-axis will be shared among all subplots
gridspec_kw = {"height_ratios": (.25, .75)});
sns.boxplot(data['Engine'], ax=ax_box2, showmeans=True, color='violet'); # boxplot
sns.distplot(data['Engine'], kde=True, ax=ax_hist2); # histogram
ax_hist2.axvline(np.mean(data['Engine']), color='green', linestyle='--'); # Add mean to the histogram
ax_hist2.axvline(np.median(data['Engine']), color='black', linestyle='-'); # Add median to the histogram
Engine
has several values that are flagged as suspicious by the boxplot. However, those values are consistent with some powerful car models and we cannot considered them as outliers
# cars with Engine>3000
data[data['Engine']>3000]
Location | Year | Kilometers_Driven | Fuel_Type | Transmission | Owner_Type | Mileage | Engine | Power | Seats | New_Price | Price | Brand | Model | Specs | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
70 | Mumbai | 2008 | 73000 | Petrol | Automatic | First | 8.50 | 4806.0 | 500.0 | 5.0 | NaN | 14.50 | Porsche | Cayenne | 2009-2014 Turbo |
152 | Kolkata | 2010 | 35277 | Petrol | Automatic | First | 7.81 | 5461.0 | 362.9 | 5.0 | NaN | 30.00 | Mercedes-Benz | S | Class 2005 2013 S 500 |
459 | Coimbatore | 2016 | 51002 | Diesel | Automatic | First | 11.33 | 4134.0 | 335.2 | 7.0 | NaN | 48.91 | Audi | Q7 | 4.2 TDI Quattro Technology |
586 | Kochi | 2014 | 79926 | Diesel | Automatic | First | 11.33 | 4134.0 | 335.2 | 7.0 | NaN | 29.77 | Audi | Q7 | 4.2 TDI Quattro Technology |
589 | Bangalore | 2006 | 47088 | Petrol | Automatic | Second | 10.13 | 3498.0 | 364.9 | 5.0 | NaN | 19.00 | Mercedes-Benz | S | Class 2005 2013 S 350 L |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
6011 | Hyderabad | 2009 | 53000 | Petrol | Automatic | First | 0.00 | 3597.0 | 262.6 | 5.0 | NaN | 4.75 | Skoda | Superb | 3.6 V6 FSI |
6186 | Mumbai | 2008 | 65000 | Petrol | Automatic | Third | 10.13 | 3498.0 | 364.9 | 5.0 | NaN | NaN | Mercedes-Benz | S | Class 2005 2013 S 350 L |
6354 | Bangalore | 2008 | 31200 | Petrol | Automatic | Second | 10.20 | 5998.0 | 616.0 | 5.0 | 375.0 | NaN | Bentley | Flying | Spur W12 |
6842 | Kolkata | 2012 | 14850 | Petrol | Automatic | First | 10.00 | 3696.0 | 328.5 | 2.0 | NaN | NaN | Nissan | 370Z | AT |
7057 | Delhi | 2009 | 64000 | Petrol | Automatic | First | 7.94 | 4395.0 | 450.0 | 4.0 | NaN | NaN | BMW | 6 | Series 650i Coupe |
65 rows × 15 columns
6.3.1.5 Power¶
The Power distribution is skewed to the right. The mean is 112 and the median 94
# creating the 2 subplots
f2, (ax_box2, ax_hist2) = plt.subplots(nrows = 2, # Number of rows of the subplot grid= 2
sharex = True, # x-axis will be shared among all subplots
gridspec_kw = {"height_ratios": (.25, .75)});
sns.boxplot(data['Power'], ax=ax_box2, showmeans=True, color='violet'); # boxplot
sns.distplot(data['Power'], kde=True, ax=ax_hist2); # histogram
ax_hist2.axvline(np.mean(data['Power']), color='green', linestyle='--'); # Add mean to the histogram
ax_hist2.axvline(np.median(data['Power']), color='black', linestyle='-'); # Add median to the histogram
At the same as Engine. Power has several values that are flagged as suspicious by the boxplot. However, those values are consistent with some powerful car models and we cannot considered them as outliers
# creating the 2 subplots
f2, (ax_box2, ax_hist2) = plt.subplots(nrows = 2, # Number of rows of the subplot grid= 2
sharex = True, # x-axis will be shared among all subplots
gridspec_kw = {"height_ratios": (.25, .75)});
sns.boxplot(data['New_Price'], ax=ax_box2, showmeans=True, color='violet'); # boxplot
sns.distplot(data['New_Price'], kde=True, ax=ax_hist2); # histogram
ax_hist2.axvline(np.mean(data['New_Price']), color='green', linestyle='--'); # Add mean to the histogram
ax_hist2.axvline(np.median(data['New_Price']), color='black', linestyle='-'); # Add median to the histogram
There are several values flagged as suspicious by the boxplot, but they could correspond to luxury cars, and we cannot considered as outliers
# cars with New_Price>100
data[data['New_Price']>100]
Location | Year | Kilometers_Driven | Fuel_Type | Transmission | Owner_Type | Mileage | Engine | Power | Seats | New_Price | Price | Brand | Model | Specs | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
148 | Mumbai | 2013 | 23000 | Petrol | Automatic | First | 11.05 | 2894.0 | 444.00 | 4.0 | 128.0 | 37.00 | Audi | RS5 | Coupe |
327 | Coimbatore | 2017 | 97430 | Diesel | Automatic | First | 14.75 | 2967.0 | 245.00 | 7.0 | 104.0 | 62.67 | Audi | Q7 | 45 TDI Quattro Technology |
1336 | Mumbai | 2016 | 20002 | Diesel | Automatic | First | 14.75 | 2967.0 | 245.00 | 7.0 | 104.0 | 67.00 | Audi | Q7 | 45 TDI Quattro Technology |
1505 | Kochi | 2019 | 26013 | Diesel | Automatic | First | 12.65 | 2993.0 | 255.00 | 5.0 | 139.0 | 97.07 | Land | Rover | Range Rover Sport SE |
1885 | Delhi | 2018 | 6000 | Diesel | Automatic | First | 11.00 | 2987.0 | 258.00 | 7.0 | 102.0 | 79.00 | Mercedes-Benz | GLS | 350d Grand Edition |
2056 | Kochi | 2015 | 29966 | Diesel | Automatic | Second | 16.77 | 2993.0 | 261.49 | 5.0 | 140.0 | 43.60 | BMW | 7 | Series 730Ld Eminence |
2095 | Coimbatore | 2019 | 2526 | Petrol | Automatic | First | 19.00 | 2996.0 | 362.07 | 2.0 | 106.0 | 83.96 | Mercedes-Benz | SLC | 43 AMG |
2178 | Mumbai | 2017 | 35000 | Diesel | Automatic | First | 18.00 | 2993.0 | 255.00 | 7.0 | 127.0 | 41.60 | Land | Rover | Discovery HSE Luxury 3.0 TD6 |
2528 | Delhi | 2016 | 59000 | Diesel | Automatic | First | 18.00 | 2993.0 | 255.00 | 7.0 | 113.0 | 36.75 | Land | Rover | Discovery SE 3.0 TD6 |
3132 | Kochi | 2019 | 14298 | Petrol | Automatic | First | 13.33 | 2995.0 | 340.00 | 5.0 | 136.0 | 2.02 | Porsche | Cayenne | Base |
3199 | Kolkata | 2012 | 41100 | Diesel | Automatic | First | 16.77 | 2993.0 | 261.49 | 5.0 | 166.0 | 26.50 | BMW | 7 | Series 730Ld Design Pure Excellence CBU |
3752 | Kochi | 2015 | 38467 | Diesel | Automatic | First | 12.65 | 2993.0 | 255.00 | 5.0 | 160.0 | 70.66 | Land | Rover | Range Rover Sport HSE |
4061 | Mumbai | 2013 | 23312 | Petrol | Automatic | First | 11.05 | 2894.0 | 444.00 | 4.0 | 128.0 | 40.50 | Audi | RS5 | Coupe |
4079 | Hyderabad | 2017 | 25000 | Diesel | Automatic | First | 13.33 | 2993.0 | 255.00 | 5.0 | 230.0 | 160.00 | Land | Rover | Range Rover 3.0 Diesel LWB Vogue |
4778 | Bangalore | 2011 | 47140 | Diesel | Automatic | Second | 13.50 | 2925.0 | 281.61 | 5.0 | 171.0 | 30.00 | Mercedes-Benz | S-Class | S 350 d |
5545 | Delhi | 2014 | 47000 | Diesel | Automatic | Second | 12.65 | 2993.0 | 255.00 | 5.0 | 139.0 | 64.75 | Land | Rover | Range Rover Sport SE |
6212 | Chennai | 2017 | 16000 | Diesel | Automatic | First | 16.77 | 2993.0 | 261.49 | 5.0 | 158.0 | NaN | BMW | 7 | Series 730Ld DPE Signature |
6354 | Bangalore | 2008 | 31200 | Petrol | Automatic | Second | 10.20 | 5998.0 | 616.00 | 5.0 | 375.0 | NaN | Bentley | Flying | Spur W12 |
6960 | Coimbatore | 2018 | 18338 | Petrol | Automatic | First | 19.00 | 2996.0 | 362.07 | 2.0 | 106.0 | NaN | Mercedes-Benz | SLC | 43 AMG |
6.3.1.8 Price¶
The Price distribution is skewed to the right. The mean is 9.4 and the median 5.6
# creating the 2 subplots
f2, (ax_box2, ax_hist2) = plt.subplots(nrows = 2, # Number of rows of the subplot grid= 2
sharex = True, # x-axis will be shared among all subplots
gridspec_kw = {"height_ratios": (.25, .75)});
sns.boxplot(data['Price'], ax=ax_box2, showmeans=True, color='violet'); # boxplot
sns.distplot(data['Price'], kde=True, ax=ax_hist2); # histogram
ax_hist2.axvline(np.mean(data['Price']), color='green', linestyle='--'); # Add mean to the histogram
ax_hist2.axvline(np.median(data['Price']), color='black', linestyle='-'); # Add median to the histogram
Similar than New_Price. There are several values flagged as suspicious by the boxplot, but they could correspond to luxury cars, and we cannot considered as outliers
# cars with Price>25
data[data['Price']>25]
Location | Year | Kilometers_Driven | Fuel_Type | Transmission | Owner_Type | Mileage | Engine | Power | Seats | New_Price | Price | Brand | Model | Specs | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
13 | Delhi | 2014 | 72000 | Diesel | Automatic | First | 12.70 | 2179.0 | 187.70 | 5.0 | NaN | 27.00 | Land | Rover | Range Rover 2.2L Pure |
19 | Bangalore | 2014 | 78500 | Diesel | Automatic | First | 14.84 | 2143.0 | 167.62 | 5.0 | NaN | 28.00 | Mercedes-Benz | New | C-Class C 220 CDI BE Avantgare |
38 | Pune | 2013 | 85000 | Diesel | Automatic | First | 11.74 | 2987.0 | 254.80 | 5.0 | NaN | 28.00 | Mercedes-Benz | M-Class | ML 350 CDI |
62 | Delhi | 2015 | 58000 | Petrol | Automatic | First | 11.74 | 1796.0 | 186.00 | 5.0 | NaN | 26.70 | Mercedes-Benz | New | C-Class C 200 CGI Avantgarde |
67 | Coimbatore | 2019 | 15369 | Diesel | Automatic | First | 0.00 | 1950.0 | 194.00 | 5.0 | 49.14 | 35.67 | Mercedes-Benz | C-Class | Progressive C 220d |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
5927 | Coimbatore | 2018 | 29091 | Diesel | Automatic | First | 13.22 | 2967.0 | 241.40 | 5.0 | NaN | 45.52 | Audi | Q5 | 3.0 TDI Quattro Technology |
5946 | Bangalore | 2016 | 16000 | Diesel | Automatic | First | 14.69 | 2993.0 | 258.00 | 5.0 | NaN | 48.00 | BMW | 5 | Series 2013-2017 530d M Sport |
5970 | Kochi | 2018 | 17773 | Petrol | Automatic | First | 13.70 | 1991.0 | 183.00 | 5.0 | 39.22 | 26.76 | Mercedes-Benz | GLA | Class 200 Sport |
5996 | Kochi | 2016 | 31150 | Diesel | Automatic | First | 16.36 | 2179.0 | 187.70 | 5.0 | NaN | 30.54 | Jaguar | XF | 2.2 Litre Luxury |
6008 | Hyderabad | 2013 | 40000 | Diesel | Automatic | Second | 17.85 | 2967.0 | 300.00 | 4.0 | NaN | 45.00 | Porsche | Panamera | Diesel |
499 rows × 15 columns
Categorical columns¶
data.describe(include=["category"])
Location | Fuel_Type | Transmission | Owner_Type | Brand | Model | Specs | |
---|---|---|---|---|---|---|---|
count | 7252 | 7252 | 7252 | 7252 | 7252 | 7252 | 7251 |
unique | 11 | 5 | 2 | 4 | 33 | 219 | 1893 |
top | Mumbai | Diesel | Manual | First | Maruti | Swift | VDI |
freq | 948 | 3852 | 5203 | 5951 | 1444 | 418 | 88 |
6.3.2.1 Location¶
p = sns.countplot(data['Location'], order=data['Location'].value_counts().index);
plt.xticks(rotation=45);
There are 11 distinct locations. Mumbai is the most frequent location, and Ahmedabad the least frequent
6.3.2.2 Transmission¶
p = sns.countplot(data['Transmission'], order=data['Transmission'].value_counts().index);
plt.xticks(rotation=45);
There are 2 distinct Transmission values, Manual and Automatic. Manual corresponds to the 72% of the cars
6.3.2.3 Owner Type¶
p = sns.countplot(data['Owner_Type'], order=data['Owner_Type'].value_counts().index);
plt.xticks(rotation=45);
There are 4 distinct categories for owner type. First owner corresponds to 82% of the rows
6.3.2.4 Fuel Type¶
p = sns.countplot(data['Fuel_Type'], order=data['Fuel_Type'].value_counts().index);
plt.xticks(rotation=45);
There are 5 distinct Fuel Types. Diesel is the most frequent location, and there are only 2 electric cars
6.3.2.5 Brand¶
p = sns.countplot(data['Brand'], order=data['Brand'].value_counts().index);
plt.xticks(rotation=90);
There are 33 distinct Brands. Maruti, Hyundai, Honda and Toyota the most common ones.
6.4 Bivariate analysis¶
# Get correlation matrix for numeric variables
data.select_dtypes(include=np.number).corr()
Year | Kilometers_Driven | Mileage | Engine | Power | Seats | New_Price | Price | |
---|---|---|---|---|---|---|---|---|
Year | 1.000000 | -0.187884 | 0.322452 | -0.054726 | 0.013448 | 0.008166 | -0.058798 | 0.305327 |
Kilometers_Driven | -0.187884 | 1.000000 | -0.069125 | 0.094816 | 0.030165 | 0.090218 | -0.008221 | -0.011493 |
Mileage | 0.322452 | -0.069125 | 1.000000 | -0.593581 | -0.531770 | -0.310649 | -0.378327 | -0.306593 |
Engine | -0.054726 | 0.094816 | -0.593581 | 1.000000 | 0.859777 | 0.399256 | 0.735981 | 0.658354 |
Power | 0.013448 | 0.030165 | -0.531770 | 0.859777 | 1.000000 | 0.095910 | 0.877708 | 0.772566 |
Seats | 0.008166 | 0.090218 | -0.310649 | 0.399256 | 0.095910 | 1.000000 | -0.019459 | 0.052225 |
New_Price | -0.058798 | -0.008221 | -0.378327 | 0.735981 | 0.877708 | -0.019459 | 1.000000 | 0.871847 |
Price | 0.305327 | -0.011493 | -0.306593 | 0.658354 | 0.772566 | 0.052225 | 0.871847 | 1.000000 |
# Display correlation matrix in a heatmap
sns.heatmap(data.select_dtypes(include=np.number).corr(), annot=True);
- Engine has a strong correlation with Power, New_Price and Price
- Power has a strong correlation with Engine, New_Price and Price
- New_price has a strong correlation with Engine, Power and Price
- Price has a strong correlation with Engine, Power and New Price
6.4.1 Engine, Power and Price relationship¶
sns.scatterplot(data=data, x='Power', y='Engine', hue='Price');
There is a strong correlation between Power and Engine. The chart is also showing that more expensive cars tend to have high values for Power and Engine
6.4.2 Power, Seats and Price relationship¶
sns.scatterplot(data=data, x='Power', y='Seats', hue='Price');
There is not a clear relationship between Power and Seats. However, cars with 2 seats could have strong power and higher prices.
6.4.3 Price and Brand¶
order_by_brand = data.groupby(by=["Brand"])["Price"].median().sort_values().iloc[::-1].index
plt.figure(figsize=(10,6));
plt.xticks(rotation=90);
sns.boxplot(x=data['Brand'], y=data['Price'], order=order_by_brand);
This chart shows there are:
- Luxury brands that have high prices: BMW, Audi, Mercedes-Benz, Mini, Jaguar, Land, Porsche, Bentley, Lamborghini, Isuzu
- Brands with medium prices: Ford, Renault, Skoda, Mahindra, Force, Mitsubishi, Toyota, ISUZU, Volvo, Jeep
- Brands with low prices: Ambassador, Chevrolet, Fiat, Tata, Smart, Datsun, Maruti, Nissan, Hyundai, Volkswagen, Honda
6.4.4 Price, Location and Fuel Type¶
order_by_loc = data.groupby(by=["Fuel_Type"])["Price"].median().sort_values().iloc[::-1].index
plt.figure(figsize=(15,6));
plt.xticks(rotation=90);
sns.boxplot(x=data['Fuel_Type'], y=data['Price'], hue=data['Location'], order=order_by_loc);
- Electric and Diesel cars have higher Price than Petrol, CNG and LPG.
- Cars in Bangalore, Coimbatore, Kochi and Mumbai tend to have higher prices than other locations
7 Missing Value Treatment¶
First we are going to drop column New_Price
since it has 6247(86.1%) rows with missing data.
data.drop(['New_Price'], axis=1, inplace=True)
There are 1234 rows with missing Prices. We are going to drop all those rows because Price is the variable we would like to predict and we don't want to create artificial information in the model
data.drop(data[data['Price'].isna()].index, axis=0, inplace=True)
Let's check new data set
data.info()
<class 'pandas.core.frame.DataFrame'> Index: 6019 entries, 0 to 6018 Data columns (total 14 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Location 6019 non-null category 1 Year 6019 non-null int64 2 Kilometers_Driven 6019 non-null int64 3 Fuel_Type 6019 non-null category 4 Transmission 6019 non-null category 5 Owner_Type 6019 non-null category 6 Mileage 6017 non-null float64 7 Engine 5983 non-null float64 8 Power 5876 non-null float64 9 Seats 5977 non-null float64 10 Price 6019 non-null float64 11 Brand 6019 non-null category 12 Model 6019 non-null category 13 Specs 6019 non-null category dtypes: category(7), float64(5), int64(2) memory usage: 520.4 KB
# counting the number of missing values per row
num_missing = data.isnull().sum(axis=1)
num_missing.value_counts()
0 5872 1 107 3 36 2 4 Name: count, dtype: int64
We are going to analyze if there is a pattern for the 36 rows with 3 missing values.
data[num_missing == 3]
Location | Year | Kilometers_Driven | Fuel_Type | Transmission | Owner_Type | Mileage | Engine | Power | Seats | Price | Brand | Model | Specs | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
194 | Ahmedabad | 2007 | 60006 | Petrol | Manual | First | 0.00 | NaN | NaN | NaN | 2.95 | Honda | City | 1.5 GXI |
208 | Kolkata | 2010 | 42001 | Petrol | Manual | First | 16.10 | NaN | NaN | NaN | 2.11 | Maruti | Swift | 1.3 VXi |
733 | Chennai | 2006 | 97800 | Petrol | Manual | Third | 16.10 | NaN | NaN | NaN | 1.75 | Maruti | Swift | 1.3 VXi |
749 | Mumbai | 2008 | 55001 | Diesel | Automatic | Second | 0.00 | NaN | NaN | NaN | 26.50 | Land | Rover | Range Rover 3.0 D |
1294 | Delhi | 2009 | 55005 | Petrol | Manual | First | 12.80 | NaN | NaN | NaN | 3.20 | Honda | City | 1.3 DX |
1327 | Hyderabad | 2015 | 50295 | Petrol | Manual | First | 16.10 | NaN | NaN | NaN | 5.80 | Maruti | Swift | 1.3 ZXI |
1385 | Pune | 2004 | 115000 | Petrol | Manual | Second | 0.00 | NaN | NaN | NaN | 1.50 | Honda | City | 1.5 GXI |
1460 | Coimbatore | 2008 | 69078 | Petrol | Manual | First | 0.00 | NaN | NaN | NaN | 40.88 | Land | Rover | Range Rover Sport 2005 2012 Sport |
2074 | Pune | 2011 | 24255 | Petrol | Manual | First | 16.10 | NaN | NaN | NaN | 3.15 | Maruti | Swift | 1.3 LXI |
2096 | Coimbatore | 2004 | 52146 | Petrol | Manual | First | 0.00 | NaN | NaN | NaN | 1.93 | Hyundai | Santro | LP zipPlus |
2264 | Pune | 2012 | 24500 | Petrol | Manual | Second | 18.30 | NaN | NaN | NaN | 2.95 | Toyota | Etios | Liva V |
2325 | Pune | 2015 | 67000 | Petrol | Manual | First | 16.10 | NaN | NaN | NaN | 4.70 | Maruti | Swift | 1.3 VXI ABS |
2335 | Mumbai | 2007 | 55000 | Petrol | Manual | Second | 16.10 | NaN | NaN | NaN | 1.75 | Maruti | Swift | 1.3 VXi |
2530 | Kochi | 2014 | 64158 | Diesel | Automatic | First | 18.48 | NaN | NaN | NaN | 17.89 | BMW | 5 | Series 520d Sedan |
2542 | Bangalore | 2011 | 65000 | Petrol | Manual | Second | 0.00 | NaN | NaN | NaN | 3.15 | Hyundai | Santro | GLS II - Euro II |
2623 | Pune | 2012 | 95000 | Diesel | Automatic | Second | 18.48 | NaN | NaN | NaN | 18.00 | BMW | 5 | Series 520d Sedan |
2668 | Kolkata | 2014 | 32986 | Petrol | Manual | First | 16.10 | NaN | NaN | NaN | 4.24 | Maruti | Swift | 1.3 VXi |
2737 | Jaipur | 2001 | 200000 | Petrol | Manual | First | 12.00 | NaN | NaN | NaN | 0.70 | Maruti | Wagon | R Vx |
2780 | Pune | 2009 | 100000 | Petrol | Manual | First | 0.00 | NaN | NaN | NaN | 1.60 | Hyundai | Santro | GLS II - Euro II |
2842 | Bangalore | 2012 | 43000 | Petrol | Manual | First | 0.00 | NaN | NaN | NaN | 3.25 | Hyundai | Santro | GLS II - Euro II |
3272 | Mumbai | 2008 | 81000 | Diesel | Automatic | Second | 18.48 | NaN | NaN | NaN | 10.50 | BMW | 5 | Series 520d Sedan |
3404 | Jaipur | 2006 | 125000 | Petrol | Manual | Fourth & Above | 16.10 | NaN | NaN | NaN | 2.35 | Maruti | Swift | 1.3 VXi |
3520 | Delhi | 2012 | 90000 | Diesel | Automatic | First | 18.48 | NaN | NaN | NaN | 14.50 | BMW | 5 | Series 520d Sedan |
3522 | Kochi | 2012 | 66400 | Petrol | Manual | First | 0.00 | NaN | NaN | NaN | 2.66 | Hyundai | Santro | GLS II - Euro II |
3810 | Kolkata | 2013 | 27000 | Petrol | Automatic | First | 14.00 | NaN | NaN | NaN | 11.99 | Honda | CR-V | AT With Sun Roof |
4011 | Pune | 2011 | 45271 | Diesel | Manual | First | 20.30 | NaN | NaN | NaN | 2.60 | Fiat | Punto | 1.3 Emotion |
4152 | Mumbai | 2003 | 75000 | Diesel | Automatic | Second | 0.00 | NaN | NaN | NaN | 16.11 | Land | Rover | Range Rover 3.0 D |
4229 | Bangalore | 2005 | 79000 | Petrol | Manual | Second | 17.00 | NaN | NaN | NaN | 1.65 | Hyundai | Santro | Xing XG |
4577 | Delhi | 2012 | 72000 | Diesel | Automatic | Third | 18.48 | NaN | NaN | NaN | 13.85 | BMW | 5 | Series 520d Sedan |
4604 | Pune | 2011 | 98000 | Petrol | Manual | First | 16.70 | NaN | NaN | NaN | 3.15 | Honda | Jazz | Select Edition |
4697 | Kochi | 2017 | 17941 | Petrol | Manual | First | 15.70 | NaN | NaN | NaN | 3.93 | Fiat | Punto | 1.2 Dynamic |
4712 | Pune | 2003 | 80000 | Petrol | Manual | Second | 17.00 | NaN | NaN | NaN | 0.90 | Hyundai | Santro | Xing XG |
4952 | Kolkata | 2010 | 47000 | Petrol | Manual | First | 14.60 | NaN | NaN | NaN | 1.49 | Fiat | Punto | 1.4 Emotion |
5015 | Delhi | 2006 | 63000 | Petrol | Manual | First | 16.10 | NaN | NaN | NaN | 1.60 | Maruti | Swift | 1.3 VXi |
5185 | Delhi | 2012 | 52000 | Petrol | Manual | First | 16.10 | NaN | NaN | NaN | 3.65 | Maruti | Swift | 1.3 LXI |
5270 | Bangalore | 2002 | 53000 | Petrol | Manual | Second | 0.00 | NaN | NaN | NaN | 1.85 | Honda | City | 1.5 GXI |
Now, we are going to get the columns with missing values
for n in num_missing.value_counts().sort_index().index:
if n > 0:
print(f'Rows with exactly {n} missing values, NAs are found in:')
n_miss_per_col = data[num_missing == n].isnull().sum()
print(n_miss_per_col[n_miss_per_col > 0])
print('\n')
Rows with exactly 1 missing values, NAs are found in: Mileage 2 Power 103 Seats 2 dtype: int64 Rows with exactly 2 missing values, NAs are found in: Power 4 Seats 4 dtype: int64 Rows with exactly 3 missing values, NAs are found in: Engine 36 Power 36 Seats 36 dtype: int64
Now, let's calculate the percentage of missing values per column
# percentage of missing values
data.isnull().sum(axis=0)
Location 0 Year 0 Kilometers_Driven 0 Fuel_Type 0 Transmission 0 Owner_Type 0 Mileage 2 Engine 36 Power 143 Seats 42 Price 0 Brand 0 Model 0 Specs 0 dtype: int64
Engine
,Power
,Seats
andMileage
columns have missing values.Power
column has 143 rows (2.5% of rows) with missing values.- Since the percentage of missing values is lower than 3% for all columns, we are going to impute missing values with the k-Nearest Neighbors using KKNImputer.
- We select the k-Nearest Neighbors instead of the mean to avoid the influence of outliers in those columns
# load KNNImputer
from sklearn.impute import KNNImputer
imputer = KNNImputer()
# create data set with only numeric columns
data_n = data.select_dtypes(include=np.number)
data_n_cols = data_n.columns.tolist()
data_n.info()
<class 'pandas.core.frame.DataFrame'> Index: 6019 entries, 0 to 6018 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Year 6019 non-null int64 1 Kilometers_Driven 6019 non-null int64 2 Mileage 6017 non-null float64 3 Engine 5983 non-null float64 4 Power 5876 non-null float64 5 Seats 5977 non-null float64 6 Price 6019 non-null float64 dtypes: float64(5), int64(2) memory usage: 376.2 KB
# input values with KNNImputer
data_n = pd.DataFrame(imputer.fit_transform(data_n), columns=data_n_cols)
data_n.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 6019 entries, 0 to 6018 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Year 6019 non-null float64 1 Kilometers_Driven 6019 non-null float64 2 Mileage 6019 non-null float64 3 Engine 6019 non-null float64 4 Power 6019 non-null float64 5 Seats 6019 non-null float64 6 Price 6019 non-null float64 dtypes: float64(7) memory usage: 329.3 KB
# replace columns with new imputed columns
data['Power'] = data_n['Power']
data['Mileage'] = data_n['Mileage']
data['Engine'] = data_n['Engine']
data['Seats'] = data_n['Seats']
#check there are not missing values
data.info()
<class 'pandas.core.frame.DataFrame'> Index: 6019 entries, 0 to 6018 Data columns (total 14 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Location 6019 non-null category 1 Year 6019 non-null int64 2 Kilometers_Driven 6019 non-null int64 3 Fuel_Type 6019 non-null category 4 Transmission 6019 non-null category 5 Owner_Type 6019 non-null category 6 Mileage 6019 non-null float64 7 Engine 6019 non-null float64 8 Power 6019 non-null float64 9 Seats 6019 non-null float64 10 Price 6019 non-null float64 11 Brand 6019 non-null category 12 Model 6019 non-null category 13 Specs 6019 non-null category dtypes: category(7), float64(5), int64(2) memory usage: 520.4 KB
There are no data missing and we can continue with the analysis
sns.histplot(data['Kilometers_Driven']);
# distribution of the log transformation
sns.histplot(np.log(data['Kilometers_Driven']));
We can see a very good improvement in the distribution. Now, we are going to create a new column with the log of Kilometers_Driven and drop the Kilometers_Driven column
data['Kilometers_Driven_log'] = np.log(data['Kilometers_Driven'])
data.drop('Kilometers_Driven', axis=1, inplace=True)
# stats for new Kilometers_Driven_log column
data['Kilometers_Driven_log'].describe()
count 6019.000000 mean 10.758780 std 0.715788 min 5.141664 25% 10.434116 50% 10.878047 75% 11.198215 max 15.687313 Name: Kilometers_Driven_log, dtype: float64
data['Kilometers_Driven_log'].skew()
-1.29076524053299
# creating the 2 subplots
f2, (ax_box2, ax_hist2) = plt.subplots(nrows = 2, # Number of rows of the subplot grid= 2
sharex = True, # x-axis will be shared among all subplots
gridspec_kw = {"height_ratios": (.25, .75)});
sns.boxplot(data['Kilometers_Driven_log'], ax=ax_box2, showmeans=True, color='violet'); # boxplot
sns.distplot(data['Kilometers_Driven_log'], kde=True, ax=ax_hist2); # histogram
ax_hist2.axvline(np.mean(data['Kilometers_Driven_log']), color='green', linestyle='--'); # Add mean to the histogram
ax_hist2.axvline(np.median(data['Kilometers_Driven_log']), color='black', linestyle='-');
There are several values flagged as suspicious by the boxplot for the Kilometers_Driven_log column. There are some outliers above 14, but the rest of the points aren't inconsistent with the overall distribution of the data.
8.2 Power¶
Power column is skewed. We are going to use the log transformation to improve the distribution
sns.histplot(data['Power']);
# distribution of the log transformation
sns.histplot(np.log(data['Power']));
We can see an improvement in the distribution. Now, we are going to create a new column with the log of Power and drop the Power column
data['Power_log'] = np.log(data['Power'])
data.drop('Power', axis=1, inplace=True)
# stats for new Kilometers_Driven_log column
data['Power_log'].describe()
count 6019.000000 mean 4.635187 std 0.414201 min 3.532226 25% 4.317488 50% 4.543295 75% 4.927978 max 6.327937 Name: Power_log, dtype: float64
data['Power_log'].skew()
0.46088996911606844
# creating the 2 subplots
f2, (ax_box2, ax_hist2) = plt.subplots(nrows = 2, # Number of rows of the subplot grid= 2
sharex = True, # x-axis will be shared among all subplots
gridspec_kw = {"height_ratios": (.25, .75)});
sns.boxplot(data['Power_log'], ax=ax_box2, showmeans=True, color='violet'); # boxplot
sns.distplot(data['Power_log'], kde=True, ax=ax_hist2); # histogram
ax_hist2.axvline(np.mean(data['Power_log']), color='green', linestyle='--'); # Add mean to the histogram
ax_hist2.axvline(np.median(data['Power_log']), color='black', linestyle='-');
There are several values flagged as suspicious by the boxplot for the Power_log column. However, those points aren't inconsistent with the overall distribution of the data.
8.3 Engine¶
Engine column is skewed. However, the log transformation does not improve the distribution
sns.histplot(data['Engine']);
# distribution of the log transformation
sns.histplot(np.log(data['Engine']));
We do not see an improvement in the distribution, and we are going to keep the original column
8.4 Price¶
Price column is skewed. We are going to use the log transformation to improve the distribution
sns.histplot(data['Price']);
# distribution of the log transformation
sns.histplot(np.log(data['Price']));
We can see an improvement in the distribution. Now, we are going to create a new column with the log of Price and drop the Price column
data['Price_log'] = np.log(data['Price'])
data.drop('Price', axis=1, inplace=True)
# stats for new Kilometers_Driven_log column
data['Price_log'].describe()
count 6019.000000 mean 1.825095 std 0.874059 min -0.820981 25% 1.252763 50% 1.729884 75% 2.297573 max 5.075174 Name: Price_log, dtype: float64
data['Price_log'].skew()
0.4173906918413524
# creating the 2 subplots
f2, (ax_box2, ax_hist2) = plt.subplots(nrows = 2, # Number of rows of the subplot grid= 2
sharex = True, # x-axis will be shared among all subplots
gridspec_kw = {"height_ratios": (.25, .75)});
sns.boxplot(data['Price_log'], ax=ax_box2, showmeans=True, color='violet'); # boxplot
sns.distplot(data['Price_log'], kde=True, ax=ax_hist2); # histogram
ax_hist2.axvline(np.mean(data['Price_log']), color='green', linestyle='--'); # Add mean to the histogram
ax_hist2.axvline(np.median(data['Price_log']), color='black', linestyle='-');
9 Outliers Treatment¶
9.1 Kilometers_Driven¶
Kilometers_Driven_log have some outliers above 14. We are going to replace those values with the median
# replacing zeros with mean
data.loc[data['Kilometers_Driven_log']>14,'Kilometers_Driven_log'] = data['Kilometers_Driven_log'].mean()
9.2 Mileage¶
Mileage column have several rows with value equals zero. We are going to replace those values with the median
# replacing zeros with mean
data.loc[data['Mileage']==0,'Mileage'] = data['Mileage'].mean()
# check new distribution
sns.histplot(data['Mileage']);
data['Mileage'].describe()
count 6019.000000 mean 18.340122 std 4.151511 min 6.400000 25% 15.400000 50% 18.150000 75% 21.100000 max 33.540000 Name: Mileage, dtype: float64
9.3 Seats¶
There is one 1 car with 0 seats. We are going to replace this value with the mean
data[data['Seats']==0]
Location | Year | Fuel_Type | Transmission | Owner_Type | Mileage | Engine | Seats | Brand | Model | Specs | Kilometers_Driven_log | Power_log | Price_log | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3999 | Hyderabad | 2012 | Petrol | Automatic | First | 10.5 | 3197.0 | 0.0 | Audi | A4 | 3.2 FSI Tiptronic Quattro | 11.736069 | 5.084134 | 2.890372 |
# replacing zeros with mean
data.loc[data['Seats']==0,'Seats'] = data['Seats'].mean()
10 Model Building¶
First, we are going to drop column Specs because it has high cardinality (1893 distinct values)
data.drop(['Specs'], axis=1, inplace=True)
# check there are not missing values and columns are the correcy type
data.info()
<class 'pandas.core.frame.DataFrame'> Index: 6019 entries, 0 to 6018 Data columns (total 13 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Location 6019 non-null category 1 Year 6019 non-null int64 2 Fuel_Type 6019 non-null category 3 Transmission 6019 non-null category 4 Owner_Type 6019 non-null category 5 Mileage 6019 non-null float64 6 Engine 6019 non-null float64 7 Seats 6019 non-null float64 8 Brand 6019 non-null category 9 Model 6019 non-null category 10 Kilometers_Driven_log 6019 non-null float64 11 Power_log 6019 non-null float64 12 Price_log 6019 non-null float64 dtypes: category(6), float64(6), int64(1) memory usage: 429.4 KB
10.1 Define independent and dependent variables¶
ind_vars = data.drop(["Price_log"], axis=1)
dep_var = data[["Price_log"]]
10.2 Creating dummy variables¶
def encode_cat_vars(x):
x = pd.get_dummies(
x,
columns=x.select_dtypes(include=["object", "category"]).columns.tolist(),
drop_first=True,
)
return x
ind_vars_num = encode_cat_vars(ind_vars)
ind_vars_num.head()
Year | Mileage | Engine | Seats | Kilometers_Driven_log | Power_log | Location_Bangalore | Location_Chennai | Location_Coimbatore | Location_Delhi | ... | Model_Xcent | Model_Xenon | Model_Xylo | Model_Yeti | Model_Z4 | Model_Zen | Model_Zest | Model_i10 | Model_i20 | Model_redi-GO | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2010 | 26.60 | 998.0 | 5.0 | 11.184421 | 4.063198 | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
1 | 2015 | 19.67 | 1582.0 | 5.0 | 10.621327 | 4.837868 | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
2 | 2011 | 18.20 | 1199.0 | 5.0 | 10.736397 | 4.485260 | False | True | False | False | ... | False | False | False | False | False | False | False | False | False | False |
3 | 2012 | 20.77 | 1248.0 | 7.0 | 11.373663 | 4.485936 | False | True | False | False | ... | False | False | False | False | False | False | False | False | False | False |
4 | 2013 | 15.20 | 1968.0 | 5.0 | 10.613246 | 4.947340 | False | False | True | False | ... | False | False | False | False | False | False | False | False | False | False |
5 rows × 274 columns
ind_vars_num.shape
(6019, 274)
The independent set has 6019 rows and 274 columns
10.3 Split the data into train and test¶
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
# Create train and test data sets
x_train, x_test, y_train, y_test = train_test_split(
ind_vars_num, dep_var, test_size=0.3, random_state=1
)
# Create train and test data sets
x_train3, x_test3, y_train3, y_test3 = train_test_split(
ind_vars_num, dep_var, test_size=0.2, random_state=10
)
print("Number of rows in train data =", x_train.shape[0])
print("Number of rows in train data =", x_test.shape[0])
Number of rows in train data = 4213 Number of rows in train data = 1806
10.4 Fitting a linear model¶
Now, we are going to run the linear regression using the train data set
# Run Linear Regression
lin_reg_model = LinearRegression()
lin_reg_model.fit(x_train, y_train)
LinearRegression()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
LinearRegression()
10.5 Performance of the model¶
First, we are going to calculate the $R^2$ for the train and test sets
# R^2 train set
lin_reg_model.score(x_train, y_train)
0.9587840089626443
# R^2 test set
lin_reg_model.score(x_test, y_test)
0.959104503710836
def r2(y,y_predict):
e = y-y_predict
ym = np.mean(y)
v = y-ym
e2 = np.sum(e*e)
v2 = np.sum(v*v)
return 1-(e2/v2)
r2(y_train,lin_reg_model.predict(x_train))
Price_log 0.958784 dtype: float64
r2(y_test,lin_reg_model.predict(x_test))
Price_log 0.959105 dtype: float64
r2(np.exp(y_train),np.exp(lin_reg_model.predict(x_train)))
Price_log 0.922877 dtype: float64
r2(np.exp(y_test),np.exp(lin_reg_model.predict(x_test)))
Price_log 0.908336 dtype: float64
The $R^2$ for the train set is 0.958 and for the test set is 0.958. Both values are comparable and very similar. Therefore, the model is not overfitting and the performance is very good
10.5.1 Performance metrics¶
User functions to calculate performance metrics
# To check model performance
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error
# Adjusted R^2
def adj_r2(ind_vars, targets, predictions):
r2 = r2_score(targets, predictions)
n = ind_vars.shape[0]
k = ind_vars.shape[1]
return 1 - ((1 - r2) * (n - 1) / (n - k - 1))
# Model performance check
def model_perf(model, inp, out):
y_pred = model.predict(inp)
y_act = out.values
#Dictionary with metrics
metrics = {"RMSE": np.sqrt(mean_squared_error(y_act, y_pred)),
"MAE": mean_absolute_error(y_act, y_pred),
"R^2": r2_score(y_act, y_pred),
"Adjusted R^2": adj_r2(inp, y_act, y_pred)}
return metrics
# Model performance on train set
model_perf(lin_reg_model, x_train, y_train)
{'RMSE': 0.1770842726656583, 'MAE': 0.12405596763815441, 'R^2': 0.9587840089626443, 'Adjusted R^2': 0.9559162635222594}
# Model performance on test set
model_perf(lin_reg_model, x_test, y_test)
{'RMSE': 0.17745658563106884, 'MAE': 0.12829130869663247, 'R^2': 0.959104503710836, 'Adjusted R^2': 0.9517855187446499}
We can conclude that the model is not overfitting since all metrics are comparable in both train and test sets. The model is able to predict Prices with a mean error of 0.129 on the test set
10.5.2 Residuals distribution¶
Train set residuals¶
Now, we are going to analyze the distribution of the residuals
# train set residuals distribution
residuals_train = lin_reg_model.predict(x_train) - y_train
hplot = sns.histplot(residuals_train, kde=True);
hplot.set_xlim(-1,1);
# scatterplot between residuals and predicted variables
y_train_predict = pd.DataFrame(lin_reg_model.predict(x_train), columns=['y_predict'])
sns.scatterplot(x=y_train_predict['y_predict'], y=residuals_train['Price_log']);
The scatter plot is random and therefore the model does not violate the assumption of Homoscedasticity
Test set residuals¶
residuals_test = lin_reg_model.predict(x_test) - y_test
hplot = sns.histplot(residuals_test, kde=True);
hplot.set_xlim(-1,1);
# scatterplot between residuals and predicted variables
y_test_predict = pd.DataFrame(lin_reg_model.predict(x_test), columns=['y_predict'])
sns.scatterplot(x=y_test_predict['y_predict'], y=residuals_test['Price_log']);
The scatter plot is random and therefore the model does not violate the assumption of Homoscedasticity
10.6 Coefficients and Intercept of the model¶
# Create data frame with coefficients
coef_df = pd.DataFrame(
np.append(lin_reg_model.coef_.flatten(), lin_reg_model.intercept_),
index=x_train.columns.tolist() + ["Intercept"],
columns=["Coefficients"],
)
# Display all coefficients
pd.set_option('display.max_rows', coef_df.shape[0]+1)
coef_df
Coefficients | |
---|---|
Year | 1.061775e-01 |
Mileage | 1.316338e-03 |
Engine | -4.557528e-05 |
Seats | -1.807394e-04 |
Kilometers_Driven_log | -7.715894e-02 |
Power_log | 3.782705e-01 |
Location_Bangalore | 1.767534e-01 |
Location_Chennai | 5.678747e-02 |
Location_Coimbatore | 1.474313e-01 |
Location_Delhi | -8.008404e-02 |
Location_Hyderabad | 1.564025e-01 |
Location_Jaipur | -1.610073e-02 |
Location_Kochi | -1.353527e-02 |
Location_Kolkata | -2.176047e-01 |
Location_Mumbai | -5.679099e-02 |
Location_Pune | -2.156988e-02 |
Fuel_Type_Diesel | 4.966604e-02 |
Fuel_Type_Electric | 3.062401e-01 |
Fuel_Type_LPG | -1.187979e-02 |
Fuel_Type_Petrol | -6.045009e-02 |
Transmission_Manual | -1.121924e-01 |
Owner_Type_Fourth & Above | -1.194041e-01 |
Owner_Type_Second | -5.739302e-02 |
Owner_Type_Third | -1.684767e-01 |
Brand_Audi | 5.774404e-01 |
Brand_BMW | 1.069879e-01 |
Brand_Bentley | 9.889603e-01 |
Brand_Chevrolet | -7.607063e-01 |
Brand_Datsun | -1.033328e+00 |
Brand_Fiat | -9.095028e-01 |
Brand_Force | -4.784993e-02 |
Brand_Ford | -7.306453e-01 |
Brand_Hindustan | 3.052961e-12 |
Brand_Honda | -5.494955e-01 |
Brand_Hyundai | -1.140057e+00 |
Brand_ISUZU | -3.463250e-01 |
Brand_Isuzu | 8.891776e-13 |
Brand_Jaguar | 8.123010e-01 |
Brand_Jeep | -1.848095e-02 |
Brand_Lamborghini | 1.184292e+00 |
Brand_Land | 4.192147e-01 |
Brand_Mahindra | -5.890545e-01 |
Brand_Maruti | -6.364073e-01 |
Brand_Mercedes-Benz | 6.110158e-01 |
Brand_Mini | 3.724754e-01 |
Brand_Mitsubishi | -6.988398e-02 |
Brand_Nissan | -4.932404e-01 |
Brand_OpelCorsa | -6.200596e-13 |
Brand_Porsche | 7.997953e-01 |
Brand_Renault | -6.041900e-01 |
Brand_Skoda | -3.930683e-01 |
Brand_Smart | -2.371821e-01 |
Brand_Tata | -5.277481e-01 |
Brand_Toyota | 1.267729e-02 |
Brand_Volkswagen | -4.056313e-01 |
Brand_Volvo | 2.399334e-01 |
Model_1.4Gsi | -1.327410e-14 |
Model_1000 | 2.902123e-13 |
Model_3 | 1.064970e-01 |
Model_370Z | -1.439404e-13 |
Model_5 | 4.035985e-01 |
Model_6 | 1.121772e+00 |
Model_7 | 8.506502e-01 |
Model_800 | -6.486312e-01 |
Model_A | -4.333749e-01 |
Model_A-Star | -2.464424e-01 |
Model_A3 | -3.709362e-01 |
Model_A4 | -2.706989e-01 |
Model_A6 | -1.468467e-01 |
Model_A7 | 4.505197e-01 |
Model_A8 | 2.214500e-01 |
Model_Abarth | -5.806466e-14 |
Model_Accent | 7.664687e-02 |
Model_Accord | 1.786603e-01 |
Model_Alto | -4.737990e-01 |
Model_Amaze | -2.578199e-01 |
Model_Ameo | -3.923683e-01 |
Model_Aspire | 1.004053e-01 |
Model_Aveo | -3.457447e-01 |
Model_Avventura | 1.180748e-01 |
Model_B | -4.471035e-01 |
Model_BR-V | -6.853318e-02 |
Model_BRV | 2.825778e-02 |
Model_Baleno | -1.021670e-01 |
Model_Beat | -3.622543e-01 |
Model_Beetle | -2.145506e-14 |
Model_Bolero | 1.954131e-01 |
Model_Bolt | -6.321259e-01 |
Model_Boxster | 4.052314e-15 |
Model_Brio | -3.555369e-01 |
Model_C-Class | -3.104868e-01 |
Model_CLA | -2.939341e-01 |
Model_CLS-Class | 3.338558e-01 |
Model_CR-V | 4.936109e-01 |
Model_Camry | 2.003381e-01 |
Model_Captiva | 1.795402e-01 |
Model_Captur | 1.181568e-01 |
Model_Cayenne | -4.493680e-01 |
Model_Cayman | 5.211655e-01 |
Model_Cedia | -4.966320e-01 |
Model_Celerio | -3.286449e-01 |
Model_Ciaz | 1.128385e-01 |
Model_City | 1.452770e-02 |
Model_Civic | -8.820436e-02 |
Model_Classic | -3.442637e-01 |
Model_Clubman | 2.810297e-01 |
Model_Compass | -1.848095e-02 |
Model_Continental | 9.889603e-01 |
Model_Cooper | 1.988535e-01 |
Model_Corolla | -3.308798e-01 |
Model_Countryman | -1.074077e-01 |
Model_Creta | 8.796960e-01 |
Model_CrossPolo | -3.654663e-01 |
Model_Cruze | 8.068924e-02 |
Model_D-MAX | -3.463250e-01 |
Model_Duster | 1.221770e-01 |
Model_Dzire | -4.513748e-02 |
Model_E | 6.661338e-16 |
Model_E-Class | -1.518833e-01 |
Model_EON | 3.452707e-02 |
Model_EcoSport | 1.588931e-01 |
Model_Ecosport | 2.573158e-01 |
Model_Eeco | -4.662428e-01 |
Model_Elantra | 8.812794e-01 |
Model_Elite | 5.396578e-01 |
Model_Endeavour | 7.964040e-01 |
Model_Enjoy | -1.127670e-01 |
Model_Ertiga | 2.135297e-01 |
Model_Esteem | -6.439154e-01 |
Model_Estilo | -3.688273e-01 |
Model_Etios | -7.384430e-01 |
Model_Evalia | -4.024599e-01 |
Model_F | 5.949628e-01 |
Model_Fabia | -5.627410e-01 |
Model_Fiesta | -7.866301e-02 |
Model_Figo | -1.827555e-01 |
Model_Fluence | -6.255634e-02 |
Model_Flying | -1.110223e-16 |
Model_Fortuner | 3.763759e-01 |
Model_Fortwo | -2.371821e-01 |
Model_Freestyle | 4.682399e-01 |
Model_Fusion | 1.819050e-01 |
Model_GL-Class | 4.952753e-01 |
Model_GLA | -2.212246e-01 |
Model_GLC | 1.174502e-01 |
Model_GLE | 3.154723e-01 |
Model_GLS | 4.387667e-01 |
Model_GO | -2.105149e-01 |
Model_Gallardo | 1.184292e+00 |
Model_Getz | 1.771284e-01 |
Model_Grand | 2.571492e-01 |
Model_Grande | -1.834601e-01 |
Model_Hexa | 3.285831e-01 |
Model_Ignis | -3.791245e-01 |
Model_Ikon | -2.971519e-01 |
Model_Indica | -8.287199e-01 |
Model_Indigo | -7.456837e-01 |
Model_Innova | 8.701657e-02 |
Model_Jazz | -1.622787e-01 |
Model_Jeep | 1.181196e-01 |
Model_Jetta | 1.227673e-01 |
Model_KUV | -4.220742e-01 |
Model_KWID | -6.321051e-01 |
Model_Koleos | 3.891056e-01 |
Model_Lancer | -1.172340e-01 |
Model_Land | -2.775558e-16 |
Model_Laura | -1.106812e-01 |
Model_Linea | 6.946507e-03 |
Model_Lodgy | 1.736719e-03 |
Model_Logan | -5.156677e-01 |
Model_M-Class | 1.294610e-01 |
Model_MU | 4.163336e-17 |
Model_MUX | 2.706169e-16 |
Model_Manza | -6.278130e-01 |
Model_Micra | -3.797234e-01 |
Model_Mobilio | -1.055585e-01 |
Model_Montero | 3.367374e-01 |
Model_Motors | 9.714451e-17 |
Model_Mustang | 1.576728e+00 |
Model_Nano | -1.066021e+00 |
Model_New | -2.606214e-01 |
Model_Nexon | -3.983023e-02 |
Model_NuvoSport | -3.193562e-01 |
Model_Octavia | 1.391406e-01 |
Model_Omni | -4.721723e-01 |
Model_One | -4.784993e-02 |
Model_Optra | -1.671785e-01 |
Model_Outlander | -6.370752e-02 |
Model_Pajero | 2.709521e-01 |
Model_Panamera | 7.279978e-01 |
Model_Passat | 5.778024e-02 |
Model_Petra | -2.967823e-01 |
Model_Platinum | 8.326673e-17 |
Model_Polo | -3.349437e-01 |
Model_Prius | 3.062401e-01 |
Model_Pulse | -2.651567e-01 |
Model_Punto | -2.652687e-01 |
Model_Q3 | -2.577470e-01 |
Model_Q5 | 4.141793e-02 |
Model_Q7 | 2.518171e-01 |
Model_Qualis | 1.120295e-01 |
Model_Quanto | -3.687409e-01 |
Model_R-Class | 6.386756e-02 |
Model_RS5 | 3.304247e-01 |
Model_Rapid | -2.682940e-01 |
Model_Redi | -4.869040e-01 |
Model_Renault | -2.098534e-01 |
Model_Ritz | -2.182836e-01 |
Model_Rover | 4.192147e-01 |
Model_S | 1.629848e-01 |
Model_S-Class | 2.594994e-01 |
Model_S-Cross | 1.441084e-01 |
Model_S60 | -1.927748e-02 |
Model_S80 | -2.712951e-01 |
Model_SL-Class | 7.085461e-01 |
Model_SLC | 2.621350e-01 |
Model_SLK-Class | 4.748246e-01 |
Model_SX4 | -9.882907e-02 |
Model_Safari | 2.992587e-02 |
Model_Sail | -1.856540e-01 |
Model_Santa | 1.029070e+00 |
Model_Santro | 1.564984e-01 |
Model_Scala | -2.755480e-01 |
Model_Scorpio | 2.628888e-01 |
Model_Siena | -2.890131e-01 |
Model_Sonata | 8.564400e-01 |
Model_Spark | -4.065682e-01 |
Model_Ssangyong | 4.359764e-01 |
Model_Sumo | -9.638676e-02 |
Model_Sunny | -2.104897e-01 |
Model_Superb | 2.042283e-01 |
Model_Swift | -4.020150e-02 |
Model_TT | 3.280398e-01 |
Model_TUV | -8.666424e-02 |
Model_Tavera | 5.592310e-01 |
Model_Teana | 6.672175e-02 |
Model_Terrano | -1.826751e-02 |
Model_Thar | 6.232812e-02 |
Model_Tiago | -6.192760e-01 |
Model_Tigor | -4.091198e-01 |
Model_Tiguan | 7.408846e-01 |
Model_Tucson | 8.083781e-01 |
Model_V40 | 6.752698e-02 |
Model_Vento | -2.342850e-01 |
Model_Venture | -4.374467e-01 |
Model_Verito | -2.849063e-01 |
Model_Verna | 5.399973e-01 |
Model_Versa | -1.978553e-02 |
Model_Vitara | 1.270174e-01 |
Model_WR-V | -1.783798e-01 |
Model_WRV | -4.824091e-02 |
Model_Wagon | -2.844928e-01 |
Model_X-Trail | 4.509783e-01 |
Model_X1 | 1.142903e-01 |
Model_X3 | 4.470527e-01 |
Model_X5 | 7.394769e-01 |
Model_X6 | 9.909875e-01 |
Model_XC60 | 5.142907e-02 |
Model_XC90 | 4.115498e-01 |
Model_XE | 0.000000e+00 |
Model_XF | -2.061667e-01 |
Model_XJ | 4.235049e-01 |
Model_XUV300 | 3.627486e-01 |
Model_XUV500 | 3.739087e-01 |
Model_Xcent | 2.875080e-01 |
Model_Xenon | -4.504703e-01 |
Model_Xylo | -1.931750e-01 |
Model_Yeti | 2.052791e-01 |
Model_Z4 | 9.327412e-01 |
Model_Zen | -3.685581e-01 |
Model_Zest | -4.196436e-01 |
Model_i10 | 2.949555e-01 |
Model_i20 | 4.661499e-01 |
Model_redi-GO | -3.359091e-01 |
Intercept | -2.122822e+02 |
coef_df[coef_df['Coefficients']>0].sort_values(by='Coefficients', ascending=False)
Coefficients | |
---|---|
Model_Mustang | 1.576728e+00 |
Brand_Lamborghini | 1.184292e+00 |
Model_Gallardo | 1.184292e+00 |
Model_6 | 1.121772e+00 |
Model_Santa | 1.029070e+00 |
Model_X6 | 9.909875e-01 |
Brand_Bentley | 9.889603e-01 |
Model_Continental | 9.889603e-01 |
Model_Z4 | 9.327412e-01 |
Model_Elantra | 8.812794e-01 |
Model_Creta | 8.796960e-01 |
Model_Sonata | 8.564400e-01 |
Model_7 | 8.506502e-01 |
Brand_Jaguar | 8.123010e-01 |
Model_Tucson | 8.083781e-01 |
Brand_Porsche | 7.997953e-01 |
Model_Endeavour | 7.964040e-01 |
Model_Tiguan | 7.408846e-01 |
Model_X5 | 7.394769e-01 |
Model_Panamera | 7.279978e-01 |
Model_SL-Class | 7.085461e-01 |
Brand_Mercedes-Benz | 6.110158e-01 |
Model_F | 5.949628e-01 |
Brand_Audi | 5.774404e-01 |
Model_Tavera | 5.592310e-01 |
Model_Verna | 5.399973e-01 |
Model_Elite | 5.396578e-01 |
Model_Cayman | 5.211655e-01 |
Model_GL-Class | 4.952753e-01 |
Model_CR-V | 4.936109e-01 |
Model_SLK-Class | 4.748246e-01 |
Model_Freestyle | 4.682399e-01 |
Model_i20 | 4.661499e-01 |
Model_X-Trail | 4.509783e-01 |
Model_A7 | 4.505197e-01 |
Model_X3 | 4.470527e-01 |
Model_GLS | 4.387667e-01 |
Model_Ssangyong | 4.359764e-01 |
Model_XJ | 4.235049e-01 |
Brand_Land | 4.192147e-01 |
Model_Rover | 4.192147e-01 |
Model_XC90 | 4.115498e-01 |
Model_5 | 4.035985e-01 |
Model_Koleos | 3.891056e-01 |
Power_log | 3.782705e-01 |
Model_Fortuner | 3.763759e-01 |
Model_XUV500 | 3.739087e-01 |
Brand_Mini | 3.724754e-01 |
Model_XUV300 | 3.627486e-01 |
Model_Montero | 3.367374e-01 |
Model_CLS-Class | 3.338558e-01 |
Model_RS5 | 3.304247e-01 |
Model_Hexa | 3.285831e-01 |
Model_TT | 3.280398e-01 |
Model_GLE | 3.154723e-01 |
Fuel_Type_Electric | 3.062401e-01 |
Model_Prius | 3.062401e-01 |
Model_i10 | 2.949555e-01 |
Model_Xcent | 2.875080e-01 |
Model_Clubman | 2.810297e-01 |
Model_Pajero | 2.709521e-01 |
Model_Scorpio | 2.628888e-01 |
Model_SLC | 2.621350e-01 |
Model_S-Class | 2.594994e-01 |
Model_Ecosport | 2.573158e-01 |
Model_Grand | 2.571492e-01 |
Model_Q7 | 2.518171e-01 |
Brand_Volvo | 2.399334e-01 |
Model_A8 | 2.214500e-01 |
Model_Ertiga | 2.135297e-01 |
Model_Yeti | 2.052791e-01 |
Model_Superb | 2.042283e-01 |
Model_Camry | 2.003381e-01 |
Model_Cooper | 1.988535e-01 |
Model_Bolero | 1.954131e-01 |
Model_Fusion | 1.819050e-01 |
Model_Captiva | 1.795402e-01 |
Model_Accord | 1.786603e-01 |
Model_Getz | 1.771284e-01 |
Location_Bangalore | 1.767534e-01 |
Model_S | 1.629848e-01 |
Model_EcoSport | 1.588931e-01 |
Model_Santro | 1.564984e-01 |
Location_Hyderabad | 1.564025e-01 |
Location_Coimbatore | 1.474313e-01 |
Model_S-Cross | 1.441084e-01 |
Model_Octavia | 1.391406e-01 |
Model_M-Class | 1.294610e-01 |
Model_Vitara | 1.270174e-01 |
Model_Jetta | 1.227673e-01 |
Model_Duster | 1.221770e-01 |
Model_Captur | 1.181568e-01 |
Model_Jeep | 1.181196e-01 |
Model_Avventura | 1.180748e-01 |
Model_GLC | 1.174502e-01 |
Model_X1 | 1.142903e-01 |
Model_Ciaz | 1.128385e-01 |
Model_Qualis | 1.120295e-01 |
Brand_BMW | 1.069879e-01 |
Model_3 | 1.064970e-01 |
Year | 1.061775e-01 |
Model_Aspire | 1.004053e-01 |
Model_Innova | 8.701657e-02 |
Model_Cruze | 8.068924e-02 |
Model_Accent | 7.664687e-02 |
Model_V40 | 6.752698e-02 |
Model_Teana | 6.672175e-02 |
Model_R-Class | 6.386756e-02 |
Model_Thar | 6.232812e-02 |
Model_Passat | 5.778024e-02 |
Location_Chennai | 5.678747e-02 |
Model_XC60 | 5.142907e-02 |
Fuel_Type_Diesel | 4.966604e-02 |
Model_Q5 | 4.141793e-02 |
Model_EON | 3.452707e-02 |
Model_Safari | 2.992587e-02 |
Model_BRV | 2.825778e-02 |
Model_City | 1.452770e-02 |
Brand_Toyota | 1.267729e-02 |
Model_Linea | 6.946507e-03 |
Model_Lodgy | 1.736719e-03 |
Mileage | 1.316338e-03 |
Brand_Hindustan | 3.052961e-12 |
Brand_Isuzu | 8.891776e-13 |
Model_1000 | 2.902123e-13 |
Model_Boxster | 4.052314e-15 |
Model_E | 6.661338e-16 |
Model_MUX | 2.706169e-16 |
Model_Motors | 9.714451e-17 |
Model_Platinum | 8.326673e-17 |
Model_MU | 4.163336e-17 |
Negative impact¶
This is the list of coefficients with negative impact on prices. Among them are Kilometers_Drive_log, Engine and Seats. Increase in these will lead to a decrease in the price
coef_df[coef_df['Coefficients']<0].sort_values(by='Coefficients')
Coefficients | |
---|---|
Intercept | -2.122822e+02 |
Brand_Hyundai | -1.140057e+00 |
Model_Nano | -1.066021e+00 |
Brand_Datsun | -1.033328e+00 |
Brand_Fiat | -9.095028e-01 |
Model_Indica | -8.287199e-01 |
Brand_Chevrolet | -7.607063e-01 |
Model_Indigo | -7.456837e-01 |
Model_Etios | -7.384430e-01 |
Brand_Ford | -7.306453e-01 |
Model_800 | -6.486312e-01 |
Model_Esteem | -6.439154e-01 |
Brand_Maruti | -6.364073e-01 |
Model_Bolt | -6.321259e-01 |
Model_KWID | -6.321051e-01 |
Model_Manza | -6.278130e-01 |
Model_Tiago | -6.192760e-01 |
Brand_Renault | -6.041900e-01 |
Brand_Mahindra | -5.890545e-01 |
Model_Fabia | -5.627410e-01 |
Brand_Honda | -5.494955e-01 |
Brand_Tata | -5.277481e-01 |
Model_Logan | -5.156677e-01 |
Model_Cedia | -4.966320e-01 |
Brand_Nissan | -4.932404e-01 |
Model_Redi | -4.869040e-01 |
Model_Alto | -4.737990e-01 |
Model_Omni | -4.721723e-01 |
Model_Eeco | -4.662428e-01 |
Model_Xenon | -4.504703e-01 |
Model_Cayenne | -4.493680e-01 |
Model_B | -4.471035e-01 |
Model_Venture | -4.374467e-01 |
Model_A | -4.333749e-01 |
Model_KUV | -4.220742e-01 |
Model_Zest | -4.196436e-01 |
Model_Tigor | -4.091198e-01 |
Model_Spark | -4.065682e-01 |
Brand_Volkswagen | -4.056313e-01 |
Model_Evalia | -4.024599e-01 |
Brand_Skoda | -3.930683e-01 |
Model_Ameo | -3.923683e-01 |
Model_Micra | -3.797234e-01 |
Model_Ignis | -3.791245e-01 |
Model_A3 | -3.709362e-01 |
Model_Estilo | -3.688273e-01 |
Model_Quanto | -3.687409e-01 |
Model_Zen | -3.685581e-01 |
Model_CrossPolo | -3.654663e-01 |
Model_Beat | -3.622543e-01 |
Model_Brio | -3.555369e-01 |
Model_D-MAX | -3.463250e-01 |
Brand_ISUZU | -3.463250e-01 |
Model_Aveo | -3.457447e-01 |
Model_Classic | -3.442637e-01 |
Model_redi-GO | -3.359091e-01 |
Model_Polo | -3.349437e-01 |
Model_Corolla | -3.308798e-01 |
Model_Celerio | -3.286449e-01 |
Model_NuvoSport | -3.193562e-01 |
Model_C-Class | -3.104868e-01 |
Model_Ikon | -2.971519e-01 |
Model_Petra | -2.967823e-01 |
Model_CLA | -2.939341e-01 |
Model_Siena | -2.890131e-01 |
Model_Verito | -2.849063e-01 |
Model_Wagon | -2.844928e-01 |
Model_Scala | -2.755480e-01 |
Model_S80 | -2.712951e-01 |
Model_A4 | -2.706989e-01 |
Model_Rapid | -2.682940e-01 |
Model_Punto | -2.652687e-01 |
Model_Pulse | -2.651567e-01 |
Model_New | -2.606214e-01 |
Model_Amaze | -2.578199e-01 |
Model_Q3 | -2.577470e-01 |
Model_A-Star | -2.464424e-01 |
Brand_Smart | -2.371821e-01 |
Model_Fortwo | -2.371821e-01 |
Model_Vento | -2.342850e-01 |
Model_GLA | -2.212246e-01 |
Model_Ritz | -2.182836e-01 |
Location_Kolkata | -2.176047e-01 |
Model_GO | -2.105149e-01 |
Model_Sunny | -2.104897e-01 |
Model_Renault | -2.098534e-01 |
Model_XF | -2.061667e-01 |
Model_Xylo | -1.931750e-01 |
Model_Sail | -1.856540e-01 |
Model_Grande | -1.834601e-01 |
Model_Figo | -1.827555e-01 |
Model_WR-V | -1.783798e-01 |
Owner_Type_Third | -1.684767e-01 |
Model_Optra | -1.671785e-01 |
Model_Jazz | -1.622787e-01 |
Model_E-Class | -1.518833e-01 |
Model_A6 | -1.468467e-01 |
Owner_Type_Fourth & Above | -1.194041e-01 |
Model_Lancer | -1.172340e-01 |
Model_Enjoy | -1.127670e-01 |
Transmission_Manual | -1.121924e-01 |
Model_Laura | -1.106812e-01 |
Model_Countryman | -1.074077e-01 |
Model_Mobilio | -1.055585e-01 |
Model_Baleno | -1.021670e-01 |
Model_SX4 | -9.882907e-02 |
Model_Sumo | -9.638676e-02 |
Model_Civic | -8.820436e-02 |
Model_TUV | -8.666424e-02 |
Location_Delhi | -8.008404e-02 |
Model_Fiesta | -7.866301e-02 |
Kilometers_Driven_log | -7.715894e-02 |
Brand_Mitsubishi | -6.988398e-02 |
Model_BR-V | -6.853318e-02 |
Model_Outlander | -6.370752e-02 |
Model_Fluence | -6.255634e-02 |
Fuel_Type_Petrol | -6.045009e-02 |
Owner_Type_Second | -5.739302e-02 |
Location_Mumbai | -5.679099e-02 |
Model_WRV | -4.824091e-02 |
Brand_Force | -4.784993e-02 |
Model_One | -4.784993e-02 |
Model_Dzire | -4.513748e-02 |
Model_Swift | -4.020150e-02 |
Model_Nexon | -3.983023e-02 |
Location_Pune | -2.156988e-02 |
Model_Versa | -1.978553e-02 |
Model_S60 | -1.927748e-02 |
Model_Compass | -1.848095e-02 |
Brand_Jeep | -1.848095e-02 |
Model_Terrano | -1.826751e-02 |
Location_Jaipur | -1.610073e-02 |
Location_Kochi | -1.353527e-02 |
Fuel_Type_LPG | -1.187979e-02 |
Seats | -1.807394e-04 |
Engine | -4.557528e-05 |
Brand_OpelCorsa | -6.200596e-13 |
Model_370Z | -1.439404e-13 |
Model_Abarth | -5.806466e-14 |
Model_Beetle | -2.145506e-14 |
Model_1.4Gsi | -1.327410e-14 |
Model_Land | -2.775558e-16 |
Model_Flying | -1.110223e-16 |
10.6.2 Analysis of coefficients¶
- As expected, Year (most recent) has a positive impact on Price.
- As expected, Kilometers_Driven has a negative impact on Price.
- Power has a positive impact on Price.
- Seats and Engine have a negative impact on Price.
- There are some locations with positive impact on Price: Bangalore, Chennai, Coimbatore and Hyderabad
- While, other locations have a negative impact on Price: Delhi, Jaipur, Kochi, Kolkata, Mumbai and Pune
- Diesel and Electric cars have a positive impact on Price
- Fuel LPG and Petrol have a negative impact on Price
- Manual transmission has a negative impact on Price
- Second, Third, Fourth and above owners have a negative impact on Price.
- The Brand and Model in luxury cars (Lamborghini, Jaguar, Porsche, etc) have a strong positive impact on Price.
- Economy Brands and Models (Datsun, Renault, Honda, Mahindra) have a negative impact on Price.
- There are 274 independent variables, and it is difficult to identify all key variables with strong relationship with Price. Therefore, we will select a subset of important features with forward feature selection using
SequentialFeatureSelector
from mlxtend.feature_selection import SequentialFeatureSelector as SFS
reg = LinearRegression()
# Build step forward feature selection
sfs = SFS(
reg,
k_features=x_train.shape[1],
forward=True, # k_features denotes "Number of features to select"
floating=False,
scoring="r2",
n_jobs=-1,
verbose=2,
cv=5,
)
# Perform SFFS
sfs = sfs.fit(x_train, y_train)
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 3.8s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 7.5s [Parallel(n_jobs=-1)]: Done 274 out of 274 | elapsed: 12.2s finished [2024-10-30 15:22:54] Features: 1/274 -- score: 0.6008605735778773[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.3s [Parallel(n_jobs=-1)]: Done 228 tasks | elapsed: 4.8s [Parallel(n_jobs=-1)]: Done 242 out of 273 | elapsed: 5.0s remaining: 0.6s [Parallel(n_jobs=-1)]: Done 273 out of 273 | elapsed: 5.9s finished [2024-10-30 15:23:00] Features: 2/274 -- score: 0.8258189338417044[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.3s [Parallel(n_jobs=-1)]: Done 272 out of 272 | elapsed: 5.9s finished [2024-10-30 15:23:06] Features: 3/274 -- score: 0.843163289844291[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.3s [Parallel(n_jobs=-1)]: Done 240 out of 271 | elapsed: 5.0s remaining: 0.6s [Parallel(n_jobs=-1)]: Done 271 out of 271 | elapsed: 5.8s finished [2024-10-30 15:23:12] Features: 4/274 -- score: 0.8607242290453196[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.3s [Parallel(n_jobs=-1)]: Done 270 out of 270 | elapsed: 5.7s finished [2024-10-30 15:23:17] Features: 5/274 -- score: 0.8673527352506463[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.3s [Parallel(n_jobs=-1)]: Done 238 out of 269 | elapsed: 5.0s remaining: 0.6s [Parallel(n_jobs=-1)]: Done 269 out of 269 | elapsed: 5.8s finished [2024-10-30 15:23:23] Features: 6/274 -- score: 0.8733406729210295[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.3s [Parallel(n_jobs=-1)]: Done 268 out of 268 | elapsed: 5.8s finished [2024-10-30 15:23:29] Features: 7/274 -- score: 0.8779019854352988[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.3s [Parallel(n_jobs=-1)]: Done 236 out of 267 | elapsed: 5.3s remaining: 0.6s [Parallel(n_jobs=-1)]: Done 267 out of 267 | elapsed: 6.0s finished [2024-10-30 15:23:35] Features: 8/274 -- score: 0.8821076936222507[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.3s [Parallel(n_jobs=-1)]: Done 266 out of 266 | elapsed: 5.9s finished [2024-10-30 15:23:41] Features: 9/274 -- score: 0.8865803385052942[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.4s [Parallel(n_jobs=-1)]: Done 234 out of 265 | elapsed: 5.4s remaining: 0.6s [Parallel(n_jobs=-1)]: Done 265 out of 265 | elapsed: 6.1s finished [2024-10-30 15:23:48] Features: 10/274 -- score: 0.892136862625631[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.3s [Parallel(n_jobs=-1)]: Done 264 out of 264 | elapsed: 5.9s finished [2024-10-30 15:23:53] Features: 11/274 -- score: 0.8997588358866588[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.3s [Parallel(n_jobs=-1)]: Done 232 out of 263 | elapsed: 5.6s remaining: 0.7s [Parallel(n_jobs=-1)]: Done 263 out of 263 | elapsed: 6.3s finished [2024-10-30 15:24:00] Features: 12/274 -- score: 0.9045279013258785[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.3s [Parallel(n_jobs=-1)]: Done 262 out of 262 | elapsed: 5.8s finished [2024-10-30 15:24:06] Features: 13/274 -- score: 0.9077032400393492[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.3s [Parallel(n_jobs=-1)]: Done 230 out of 261 | elapsed: 5.4s remaining: 0.6s [Parallel(n_jobs=-1)]: Done 261 out of 261 | elapsed: 6.1s finished [2024-10-30 15:24:12] Features: 14/274 -- score: 0.9113184768449554[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.4s [Parallel(n_jobs=-1)]: Done 260 out of 260 | elapsed: 5.9s finished [2024-10-30 15:24:18] Features: 15/274 -- score: 0.9144300582982374[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.5s [Parallel(n_jobs=-1)]: Done 228 out of 259 | elapsed: 6.2s remaining: 0.8s [Parallel(n_jobs=-1)]: Done 259 out of 259 | elapsed: 6.9s finished [2024-10-30 15:24:25] Features: 16/274 -- score: 0.9165568997201585[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.3s [Parallel(n_jobs=-1)]: Done 178 tasks | elapsed: 6.6s [Parallel(n_jobs=-1)]: Done 258 out of 258 | elapsed: 10.4s finished [2024-10-30 15:24:36] Features: 17/274 -- score: 0.9187094547000717[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.4s [Parallel(n_jobs=-1)]: Done 226 out of 257 | elapsed: 5.5s remaining: 0.7s [Parallel(n_jobs=-1)]: Done 257 out of 257 | elapsed: 6.1s finished [2024-10-30 15:24:42] Features: 18/274 -- score: 0.9210414399636491[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.4s [Parallel(n_jobs=-1)]: Done 256 out of 256 | elapsed: 7.3s finished [2024-10-30 15:24:49] Features: 19/274 -- score: 0.9231785068400782[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.4s [Parallel(n_jobs=-1)]: Done 255 out of 255 | elapsed: 7.5s finished [2024-10-30 15:24:57] Features: 20/274 -- score: 0.9244145086624963[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.4s [Parallel(n_jobs=-1)]: Done 223 out of 254 | elapsed: 5.9s remaining: 0.7s [Parallel(n_jobs=-1)]: Done 254 out of 254 | elapsed: 7.0s finished [2024-10-30 15:25:04] Features: 21/274 -- score: 0.9254483170392269[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.4s [Parallel(n_jobs=-1)]: Done 222 out of 253 | elapsed: 5.8s remaining: 0.7s [Parallel(n_jobs=-1)]: Done 253 out of 253 | elapsed: 7.1s finished [2024-10-30 15:25:11] Features: 22/274 -- score: 0.9264841941808273[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.4s [Parallel(n_jobs=-1)]: Done 252 out of 252 | elapsed: 7.6s finished [2024-10-30 15:25:19] Features: 23/274 -- score: 0.9274758369192984[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.5s [Parallel(n_jobs=-1)]: Done 220 out of 251 | elapsed: 6.0s remaining: 0.8s [Parallel(n_jobs=-1)]: Done 251 out of 251 | elapsed: 7.1s finished [2024-10-30 15:25:26] Features: 24/274 -- score: 0.92844212796468[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.4s [Parallel(n_jobs=-1)]: Done 250 out of 250 | elapsed: 6.7s finished [2024-10-30 15:25:33] Features: 25/274 -- score: 0.9292203842611386[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.4s [Parallel(n_jobs=-1)]: Done 218 out of 249 | elapsed: 5.9s remaining: 0.8s [Parallel(n_jobs=-1)]: Done 249 out of 249 | elapsed: 7.0s finished [2024-10-30 15:25:40] Features: 26/274 -- score: 0.9298510047495391[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.4s [Parallel(n_jobs=-1)]: Done 248 out of 248 | elapsed: 7.3s finished [2024-10-30 15:25:47] Features: 27/274 -- score: 0.9304019840531833[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.5s [Parallel(n_jobs=-1)]: Done 216 out of 247 | elapsed: 6.1s remaining: 0.8s [Parallel(n_jobs=-1)]: Done 247 out of 247 | elapsed: 7.1s finished [2024-10-30 15:25:55] Features: 28/274 -- score: 0.9309305347004898[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.5s [Parallel(n_jobs=-1)]: Done 246 out of 246 | elapsed: 7.0s finished [2024-10-30 15:26:02] Features: 29/274 -- score: 0.931458673523718[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.4s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 5.6s [Parallel(n_jobs=-1)]: Done 245 out of 245 | elapsed: 10.5s finished [2024-10-30 15:26:12] Features: 30/274 -- score: 0.9319344324038337[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.4s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 6.7s [Parallel(n_jobs=-1)]: Done 244 out of 244 | elapsed: 11.4s finished [2024-10-30 15:26:24] Features: 31/274 -- score: 0.9324067034952849[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.5s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 5.6s [Parallel(n_jobs=-1)]: Done 243 out of 243 | elapsed: 10.8s finished [2024-10-30 15:26:35] Features: 32/274 -- score: 0.9328337130107986[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.5s [Parallel(n_jobs=-1)]: Done 242 out of 242 | elapsed: 7.5s finished [2024-10-30 15:26:42] Features: 33/274 -- score: 0.9333355895459066[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.4s [Parallel(n_jobs=-1)]: Done 210 out of 241 | elapsed: 6.4s remaining: 0.9s [Parallel(n_jobs=-1)]: Done 241 out of 241 | elapsed: 7.5s finished [2024-10-30 15:26:50] Features: 34/274 -- score: 0.9338466167427188[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.4s [Parallel(n_jobs=-1)]: Done 240 out of 240 | elapsed: 7.3s finished [2024-10-30 15:26:57] Features: 35/274 -- score: 0.93427063728439[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.5s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 5.8s [Parallel(n_jobs=-1)]: Done 239 out of 239 | elapsed: 10.7s finished [2024-10-30 15:27:08] Features: 36/274 -- score: 0.9346969157079771[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.5s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 6.4s [Parallel(n_jobs=-1)]: Done 238 out of 238 | elapsed: 11.5s finished [2024-10-30 15:27:20] Features: 37/274 -- score: 0.9351182568635021[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.5s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 6.1s [Parallel(n_jobs=-1)]: Done 237 out of 237 | elapsed: 10.7s finished [2024-10-30 15:27:31] Features: 38/274 -- score: 0.9355593302089608[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.5s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 6.3s [Parallel(n_jobs=-1)]: Done 236 out of 236 | elapsed: 11.5s finished [2024-10-30 15:27:42] Features: 39/274 -- score: 0.9359783078777568[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.5s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 5.5s [Parallel(n_jobs=-1)]: Done 235 out of 235 | elapsed: 9.7s finished [2024-10-30 15:27:52] Features: 40/274 -- score: 0.9363600521469563[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.4s [Parallel(n_jobs=-1)]: Done 234 out of 234 | elapsed: 6.7s finished [2024-10-30 15:27:59] Features: 41/274 -- score: 0.9367483734642255[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.4s [Parallel(n_jobs=-1)]: Done 202 out of 233 | elapsed: 5.9s remaining: 0.8s [Parallel(n_jobs=-1)]: Done 233 out of 233 | elapsed: 6.7s finished [2024-10-30 15:28:06] Features: 42/274 -- score: 0.9370884676559399[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.5s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 5.5s [Parallel(n_jobs=-1)]: Done 232 out of 232 | elapsed: 9.7s finished [2024-10-30 15:28:15] Features: 43/274 -- score: 0.9374125821578387[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.4s [Parallel(n_jobs=-1)]: Done 200 out of 231 | elapsed: 5.9s remaining: 0.8s [Parallel(n_jobs=-1)]: Done 231 out of 231 | elapsed: 6.7s finished [2024-10-30 15:28:22] Features: 44/274 -- score: 0.9377590488212355[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.5s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 5.6s [Parallel(n_jobs=-1)]: Done 230 out of 230 | elapsed: 9.7s finished [2024-10-30 15:28:32] Features: 45/274 -- score: 0.9380620809895246[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.5s [Parallel(n_jobs=-1)]: Done 198 out of 229 | elapsed: 6.1s remaining: 0.9s [Parallel(n_jobs=-1)]: Done 229 out of 229 | elapsed: 6.8s finished [2024-10-30 15:28:39] Features: 46/274 -- score: 0.9383606770154213[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.5s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 5.7s [Parallel(n_jobs=-1)]: Done 228 out of 228 | elapsed: 9.8s finished [2024-10-30 15:28:49] Features: 47/274 -- score: 0.9386434676769945[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.5s [Parallel(n_jobs=-1)]: Done 196 out of 227 | elapsed: 6.2s remaining: 0.9s [Parallel(n_jobs=-1)]: Done 227 out of 227 | elapsed: 6.8s finished [2024-10-30 15:28:56] Features: 48/274 -- score: 0.938917220224309[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.5s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 5.7s [Parallel(n_jobs=-1)]: Done 226 out of 226 | elapsed: 9.7s finished [2024-10-30 15:29:06] Features: 49/274 -- score: 0.9391992842156454[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.5s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 5.9s [Parallel(n_jobs=-1)]: Done 225 out of 225 | elapsed: 10.6s finished [2024-10-30 15:29:16] Features: 50/274 -- score: 0.9394813780691168[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.6s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 6.8s [Parallel(n_jobs=-1)]: Done 224 out of 224 | elapsed: 11.7s finished [2024-10-30 15:29:28] Features: 51/274 -- score: 0.9397591483128049[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.7s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 6.6s [Parallel(n_jobs=-1)]: Done 223 out of 223 | elapsed: 11.0s finished [2024-10-30 15:29:39] Features: 52/274 -- score: 0.9400251925243157[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.6s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 6.7s [Parallel(n_jobs=-1)]: Done 222 out of 222 | elapsed: 11.1s finished [2024-10-30 15:29:50] Features: 53/274 -- score: 0.9403015910400038[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.5s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 6.1s [Parallel(n_jobs=-1)]: Done 221 out of 221 | elapsed: 10.3s finished [2024-10-30 15:30:01] Features: 54/274 -- score: 0.9405794121248693[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.6s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 6.4s [Parallel(n_jobs=-1)]: Done 220 out of 220 | elapsed: 10.9s finished [2024-10-30 15:30:12] Features: 55/274 -- score: 0.9408523283296407[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.6s [Parallel(n_jobs=-1)]: Done 178 tasks | elapsed: 7.3s [Parallel(n_jobs=-1)]: Done 219 out of 219 | elapsed: 9.1s finished [2024-10-30 15:30:21] Features: 56/274 -- score: 0.9411047750951868[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.5s [Parallel(n_jobs=-1)]: Done 218 out of 218 | elapsed: 8.3s finished [2024-10-30 15:30:30] Features: 57/274 -- score: 0.9413510911204437[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.5s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 6.1s [Parallel(n_jobs=-1)]: Done 217 out of 217 | elapsed: 10.3s finished [2024-10-30 15:30:40] Features: 58/274 -- score: 0.9415890736708146[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.5s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 6.3s [Parallel(n_jobs=-1)]: Done 216 out of 216 | elapsed: 10.5s finished [2024-10-30 15:30:51] Features: 59/274 -- score: 0.9418027576675003[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.6s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 6.7s [Parallel(n_jobs=-1)]: Done 215 out of 215 | elapsed: 10.9s finished [2024-10-30 15:31:02] Features: 60/274 -- score: 0.9419907820263911[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.6s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 6.7s [Parallel(n_jobs=-1)]: Done 214 out of 214 | elapsed: 10.8s finished [2024-10-30 15:31:12] Features: 61/274 -- score: 0.942173275773842[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.6s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 6.7s [Parallel(n_jobs=-1)]: Done 213 out of 213 | elapsed: 10.9s finished [2024-10-30 15:31:23] Features: 62/274 -- score: 0.942356160410377[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 8.1s [Parallel(n_jobs=-1)]: Done 212 out of 212 | elapsed: 12.8s finished [2024-10-30 15:31:36] Features: 63/274 -- score: 0.9425128317572566[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.6s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 6.9s [Parallel(n_jobs=-1)]: Done 211 out of 211 | elapsed: 11.4s finished [2024-10-30 15:31:48] Features: 64/274 -- score: 0.94266687641947[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.6s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 6.8s [Parallel(n_jobs=-1)]: Done 210 out of 210 | elapsed: 11.0s finished [2024-10-30 15:31:59] Features: 65/274 -- score: 0.9428146211526883[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.7s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 7.5s [Parallel(n_jobs=-1)]: Done 209 out of 209 | elapsed: 11.6s finished [2024-10-30 15:32:11] Features: 66/274 -- score: 0.9429493707252343[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.7s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 7.1s [Parallel(n_jobs=-1)]: Done 208 out of 208 | elapsed: 11.0s finished [2024-10-30 15:32:22] Features: 67/274 -- score: 0.9430837362751194[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.6s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 6.9s [Parallel(n_jobs=-1)]: Done 207 out of 207 | elapsed: 10.6s finished [2024-10-30 15:32:33] Features: 68/274 -- score: 0.9432136432224116[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.6s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 6.8s [Parallel(n_jobs=-1)]: Done 206 out of 206 | elapsed: 10.5s finished [2024-10-30 15:32:43] Features: 69/274 -- score: 0.943344306484075[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.6s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 6.5s [Parallel(n_jobs=-1)]: Done 205 out of 205 | elapsed: 10.2s finished [2024-10-30 15:32:53] Features: 70/274 -- score: 0.9434634052333731[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.5s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 7.1s [Parallel(n_jobs=-1)]: Done 204 out of 204 | elapsed: 11.2s finished [2024-10-30 15:33:05] Features: 71/274 -- score: 0.943577067507883[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.7s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 7.3s [Parallel(n_jobs=-1)]: Done 203 out of 203 | elapsed: 11.1s finished [2024-10-30 15:33:16] Features: 72/274 -- score: 0.9436891888008686[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.6s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 6.7s [Parallel(n_jobs=-1)]: Done 202 out of 202 | elapsed: 10.3s finished [2024-10-30 15:33:26] Features: 73/274 -- score: 0.9437981539239481[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.6s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 6.6s [Parallel(n_jobs=-1)]: Done 201 out of 201 | elapsed: 10.1s finished [2024-10-30 15:33:37] Features: 74/274 -- score: 0.9439100954531178[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.6s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 6.8s [Parallel(n_jobs=-1)]: Done 200 out of 200 | elapsed: 10.2s finished [2024-10-30 15:33:47] Features: 75/274 -- score: 0.9440163805264301[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.7s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 6.7s [Parallel(n_jobs=-1)]: Done 199 out of 199 | elapsed: 10.6s finished [2024-10-30 15:33:58] Features: 76/274 -- score: 0.9441303003262262[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.6s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 6.9s [Parallel(n_jobs=-1)]: Done 198 out of 198 | elapsed: 10.3s finished [2024-10-30 15:34:08] Features: 77/274 -- score: 0.9442572101220648[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.6s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 6.9s [Parallel(n_jobs=-1)]: Done 197 out of 197 | elapsed: 10.3s finished [2024-10-30 15:34:18] Features: 78/274 -- score: 0.9443594920438331[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.6s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 6.9s [Parallel(n_jobs=-1)]: Done 196 out of 196 | elapsed: 10.3s finished [2024-10-30 15:34:29] Features: 79/274 -- score: 0.9444757561496895[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.6s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 6.9s [Parallel(n_jobs=-1)]: Done 195 out of 195 | elapsed: 10.3s finished [2024-10-30 15:34:39] Features: 80/274 -- score: 0.9445896136937307[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.7s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 7.1s [Parallel(n_jobs=-1)]: Done 194 out of 194 | elapsed: 10.4s finished [2024-10-30 15:34:50] Features: 81/274 -- score: 0.9447056871041397[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.7s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 7.2s [Parallel(n_jobs=-1)]: Done 193 out of 193 | elapsed: 10.4s finished [2024-10-30 15:35:00] Features: 82/274 -- score: 0.9448220672453754[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.6s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 7.1s [Parallel(n_jobs=-1)]: Done 192 out of 192 | elapsed: 10.4s finished [2024-10-30 15:35:11] Features: 83/274 -- score: 0.94492005052404[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.7s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 7.3s [Parallel(n_jobs=-1)]: Done 191 out of 191 | elapsed: 10.5s finished [2024-10-30 15:35:21] Features: 84/274 -- score: 0.9450094008269196[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.7s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 7.2s [Parallel(n_jobs=-1)]: Done 190 out of 190 | elapsed: 10.3s finished [2024-10-30 15:35:32] Features: 85/274 -- score: 0.9450977522742992[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.7s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 7.8s [Parallel(n_jobs=-1)]: Done 189 out of 189 | elapsed: 10.9s finished [2024-10-30 15:35:43] Features: 86/274 -- score: 0.945183112223553[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.7s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 7.3s [Parallel(n_jobs=-1)]: Done 188 out of 188 | elapsed: 10.4s finished [2024-10-30 15:35:53] Features: 87/274 -- score: 0.9452683784764198[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.7s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 7.2s [Parallel(n_jobs=-1)]: Done 187 out of 187 | elapsed: 10.1s finished [2024-10-30 15:36:03] Features: 88/274 -- score: 0.9453481246814794[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.7s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 7.5s [Parallel(n_jobs=-1)]: Done 186 out of 186 | elapsed: 10.5s finished [2024-10-30 15:36:14] Features: 89/274 -- score: 0.9454213386716412[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.7s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 7.5s [Parallel(n_jobs=-1)]: Done 185 out of 185 | elapsed: 10.4s finished [2024-10-30 15:36:24] Features: 90/274 -- score: 0.9454975389541538[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.7s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 7.4s [Parallel(n_jobs=-1)]: Done 184 out of 184 | elapsed: 10.3s finished [2024-10-30 15:36:35] Features: 91/274 -- score: 0.9455714413525443[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.7s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 7.6s [Parallel(n_jobs=-1)]: Done 183 out of 183 | elapsed: 10.4s finished [2024-10-30 15:36:45] Features: 92/274 -- score: 0.9456308638597681[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.7s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 7.6s [Parallel(n_jobs=-1)]: Done 182 out of 182 | elapsed: 10.7s finished [2024-10-30 15:36:56] Features: 93/274 -- score: 0.9456890609297093[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.8s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 7.6s [Parallel(n_jobs=-1)]: Done 181 out of 181 | elapsed: 10.3s finished [2024-10-30 15:37:07] Features: 94/274 -- score: 0.9457530047073327[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.7s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 7.6s [Parallel(n_jobs=-1)]: Done 180 out of 180 | elapsed: 10.4s finished [2024-10-30 15:37:17] Features: 95/274 -- score: 0.9458156688654359[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 7.8s [Parallel(n_jobs=-1)]: Done 179 out of 179 | elapsed: 10.5s finished [2024-10-30 15:37:28] Features: 96/274 -- score: 0.9458783878178758[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.8s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 7.6s [Parallel(n_jobs=-1)]: Done 178 out of 178 | elapsed: 10.3s finished [2024-10-30 15:37:38] Features: 97/274 -- score: 0.9459372840659993[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.8s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 8.3s [Parallel(n_jobs=-1)]: Done 177 out of 177 | elapsed: 11.1s finished [2024-10-30 15:37:49] Features: 98/274 -- score: 0.9459940673854778[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.8s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 7.5s [Parallel(n_jobs=-1)]: Done 176 out of 176 | elapsed: 9.9s finished [2024-10-30 15:37:59] Features: 99/274 -- score: 0.9460492720394387[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.7s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 7.4s [Parallel(n_jobs=-1)]: Done 175 out of 175 | elapsed: 9.8s finished [2024-10-30 15:38:09] Features: 100/274 -- score: 0.9461033266642728[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.7s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 7.5s [Parallel(n_jobs=-1)]: Done 174 out of 174 | elapsed: 9.8s finished [2024-10-30 15:38:19] Features: 101/274 -- score: 0.9461515907857343[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.7s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 7.8s [Parallel(n_jobs=-1)]: Done 173 out of 173 | elapsed: 10.0s finished [2024-10-30 15:38:29] Features: 102/274 -- score: 0.9462131431251473[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.7s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 7.5s [Parallel(n_jobs=-1)]: Done 172 out of 172 | elapsed: 9.8s finished [2024-10-30 15:38:39] Features: 103/274 -- score: 0.9462549214668187[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.7s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 7.6s [Parallel(n_jobs=-1)]: Done 171 out of 171 | elapsed: 9.8s finished [2024-10-30 15:38:49] Features: 104/274 -- score: 0.9462972708555017[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.8s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 7.9s [Parallel(n_jobs=-1)]: Done 170 out of 170 | elapsed: 10.0s finished [2024-10-30 15:38:59] Features: 105/274 -- score: 0.9463400739021506[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.7s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 7.8s [Parallel(n_jobs=-1)]: Done 169 out of 169 | elapsed: 9.9s finished [2024-10-30 15:39:09] Features: 106/274 -- score: 0.9463834060610402[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.8s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 7.8s [Parallel(n_jobs=-1)]: Done 168 out of 168 | elapsed: 9.9s finished [2024-10-30 15:39:19] Features: 107/274 -- score: 0.9464307667944067[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.7s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 7.7s [Parallel(n_jobs=-1)]: Done 167 out of 167 | elapsed: 9.7s finished [2024-10-30 15:39:29] Features: 108/274 -- score: 0.9464850186244597[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.8s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 7.8s [Parallel(n_jobs=-1)]: Done 166 out of 166 | elapsed: 9.8s finished [2024-10-30 15:39:39] Features: 109/274 -- score: 0.9465277954018362[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.7s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 8.0s [Parallel(n_jobs=-1)]: Done 165 out of 165 | elapsed: 9.9s finished [2024-10-30 15:39:49] Features: 110/274 -- score: 0.9465667535032496[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.8s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 8.2s [Parallel(n_jobs=-1)]: Done 164 out of 164 | elapsed: 10.2s finished [2024-10-30 15:39:59] Features: 111/274 -- score: 0.9466030402172381[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.8s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 7.9s [Parallel(n_jobs=-1)]: Done 163 out of 163 | elapsed: 9.8s finished [2024-10-30 15:40:09] Features: 112/274 -- score: 0.9466365074034908[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.8s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 8.2s [Parallel(n_jobs=-1)]: Done 162 out of 162 | elapsed: 10.1s finished [2024-10-30 15:40:19] Features: 113/274 -- score: 0.9466699567782252[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.8s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 8.3s [Parallel(n_jobs=-1)]: Done 161 out of 161 | elapsed: 10.0s finished [2024-10-30 15:40:29] Features: 114/274 -- score: 0.9467020565119888[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.8s [Parallel(n_jobs=-1)]: Done 160 out of 160 | elapsed: 9.9s finished [2024-10-30 15:40:39] Features: 115/274 -- score: 0.9467339018748081[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.8s [Parallel(n_jobs=-1)]: Done 159 out of 159 | elapsed: 9.6s finished [2024-10-30 15:40:49] Features: 116/274 -- score: 0.9467653933027028[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.8s [Parallel(n_jobs=-1)]: Done 158 out of 158 | elapsed: 10.0s finished [2024-10-30 15:40:59] Features: 117/274 -- score: 0.9467921828397865[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 157 out of 157 | elapsed: 9.8s finished [2024-10-30 15:41:09] Features: 118/274 -- score: 0.9468260457095262[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 156 out of 156 | elapsed: 10.0s finished [2024-10-30 15:41:19] Features: 119/274 -- score: 0.94685019984123[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 155 out of 155 | elapsed: 9.6s finished [2024-10-30 15:41:29] Features: 120/274 -- score: 0.9468726221227485[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.8s [Parallel(n_jobs=-1)]: Done 154 out of 154 | elapsed: 9.8s finished [2024-10-30 15:41:39] Features: 121/274 -- score: 0.9468937940854705[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 153 out of 153 | elapsed: 9.8s finished [2024-10-30 15:41:49] Features: 122/274 -- score: 0.946912792717584[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 152 out of 152 | elapsed: 10.0s finished [2024-10-30 15:41:59] Features: 123/274 -- score: 0.9469287907503101[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 151 out of 151 | elapsed: 9.7s finished [2024-10-30 15:42:09] Features: 124/274 -- score: 0.9469436039366472[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.8s [Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 11.1s finished [2024-10-30 15:42:20] Features: 125/274 -- score: 0.9469570319894081[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 149 out of 149 | elapsed: 11.1s finished [2024-10-30 15:42:31] Features: 126/274 -- score: 0.9469679475395798[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 148 out of 148 | elapsed: 10.8s finished [2024-10-30 15:42:42] Features: 127/274 -- score: 0.946978064208601[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 147 out of 147 | elapsed: 10.7s finished [2024-10-30 15:42:53] Features: 128/274 -- score: 0.9469880479693635[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 146 out of 146 | elapsed: 10.9s finished [2024-10-30 15:43:04] Features: 129/274 -- score: 0.9469982149421234[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 145 out of 145 | elapsed: 10.6s finished [2024-10-30 15:43:15] Features: 130/274 -- score: 0.9470063220552042[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 144 out of 144 | elapsed: 10.5s finished [2024-10-30 15:43:25] Features: 131/274 -- score: 0.9470185501560527[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 143 out of 143 | elapsed: 10.7s finished [2024-10-30 15:43:36] Features: 132/274 -- score: 0.9470565490628251[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 142 out of 142 | elapsed: 10.4s finished [2024-10-30 15:43:46] Features: 133/274 -- score: 0.9470944712657312[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 141 out of 141 | elapsed: 10.0s finished [2024-10-30 15:43:57] Features: 134/274 -- score: 0.947173384136591[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 140 out of 140 | elapsed: 10.1s finished [2024-10-30 15:44:07] Features: 135/274 -- score: 0.9472745279064239[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 139 out of 139 | elapsed: 9.8s finished [2024-10-30 15:44:17] Features: 136/274 -- score: 0.9473021761095183[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 138 out of 138 | elapsed: 9.8s finished [2024-10-30 15:44:27] Features: 137/274 -- score: 0.947330640994581[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 137 out of 137 | elapsed: 9.8s finished [2024-10-30 15:44:36] Features: 138/274 -- score: 0.9473590626880469[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 136 out of 136 | elapsed: 9.8s finished [2024-10-30 15:44:46] Features: 139/274 -- score: 0.94737424598568[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.9s [Parallel(n_jobs=-1)]: Done 135 out of 135 | elapsed: 9.6s finished [2024-10-30 15:44:56] Features: 140/274 -- score: 0.9473837932381318[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 134 out of 134 | elapsed: 9.9s finished [2024-10-30 15:45:06] Features: 141/274 -- score: 0.9473929396082703[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 133 out of 133 | elapsed: 9.6s finished [2024-10-30 15:45:16] Features: 142/274 -- score: 0.9474012770826576[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 132 out of 132 | elapsed: 9.7s finished [2024-10-30 15:45:25] Features: 143/274 -- score: 0.9474061962769336[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 131 out of 131 | elapsed: 9.8s finished [2024-10-30 15:45:35] Features: 144/274 -- score: 0.9474113416791115[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 130 out of 130 | elapsed: 9.5s finished [2024-10-30 15:45:45] Features: 145/274 -- score: 0.9474160621882547[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 129 out of 129 | elapsed: 9.7s finished [2024-10-30 15:45:55] Features: 146/274 -- score: 0.9474199201898766[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 128 out of 128 | elapsed: 9.6s finished [2024-10-30 15:46:05] Features: 147/274 -- score: 0.9474233272155509[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 127 out of 127 | elapsed: 9.4s finished [2024-10-30 15:46:14] Features: 148/274 -- score: 0.9474296351823636[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 126 out of 126 | elapsed: 9.5s finished [2024-10-30 15:46:24] Features: 149/274 -- score: 0.9474395540206642[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 125 out of 125 | elapsed: 9.5s finished [2024-10-30 15:46:33] Features: 150/274 -- score: 0.9474429828093704[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 124 out of 124 | elapsed: 9.4s finished [2024-10-30 15:46:43] Features: 151/274 -- score: 0.9474465016426885[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 123 out of 123 | elapsed: 9.5s finished [2024-10-30 15:46:52] Features: 152/274 -- score: 0.9474495032412944[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.3s [Parallel(n_jobs=-1)]: Done 122 out of 122 | elapsed: 10.6s finished [2024-10-30 15:47:03] Features: 153/274 -- score: 0.9474519156543872[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 121 out of 121 | elapsed: 9.7s finished [2024-10-30 15:47:13] Features: 154/274 -- score: 0.9474660378832593[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 120 out of 120 | elapsed: 9.6s finished [2024-10-30 15:47:23] Features: 155/274 -- score: 0.9474905332865735[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 119 out of 119 | elapsed: 9.3s finished [2024-10-30 15:47:32] Features: 156/274 -- score: 0.9475096523448945[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 118 out of 118 | elapsed: 9.3s finished [2024-10-30 15:47:41] Features: 157/274 -- score: 0.947511307748069[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 117 out of 117 | elapsed: 9.4s finished [2024-10-30 15:47:51] Features: 158/274 -- score: 0.9475128007899316[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 116 out of 116 | elapsed: 9.4s finished [2024-10-30 15:48:00] Features: 159/274 -- score: 0.947514045113268[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 115 out of 115 | elapsed: 9.1s finished [2024-10-30 15:48:10] Features: 160/274 -- score: 0.9475152239346908[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 114 out of 114 | elapsed: 9.4s finished [2024-10-30 15:48:19] Features: 161/274 -- score: 0.9475163807873658[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 113 out of 113 | elapsed: 9.8s finished [2024-10-30 15:48:29] Features: 162/274 -- score: 0.9475173605240507[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 112 out of 112 | elapsed: 10.1s finished [2024-10-30 15:48:39] Features: 163/274 -- score: 0.9475182819483899[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 111 out of 111 | elapsed: 9.1s finished [2024-10-30 15:48:48] Features: 164/274 -- score: 0.9475196186641346[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 110 out of 110 | elapsed: 8.9s finished [2024-10-30 15:48:57] Features: 165/274 -- score: 0.947559673168465[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.0s [Parallel(n_jobs=-1)]: Done 109 out of 109 | elapsed: 8.8s finished [2024-10-30 15:49:06] Features: 166/274 -- score: 0.947613687662578[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 108 out of 108 | elapsed: 9.0s finished [2024-10-30 15:49:15] Features: 167/274 -- score: 0.9476635282128644[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 107 out of 107 | elapsed: 8.4s finished [2024-10-30 15:49:24] Features: 168/274 -- score: 0.947702904101134[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 106 out of 106 | elapsed: 8.8s finished [2024-10-30 15:49:33] Features: 169/274 -- score: 0.9477199291545982[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 105 out of 105 | elapsed: 8.8s finished [2024-10-30 15:49:42] Features: 170/274 -- score: 0.9477278669523537[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.3s [Parallel(n_jobs=-1)]: Done 104 out of 104 | elapsed: 8.9s finished [2024-10-30 15:49:51] Features: 171/274 -- score: 0.9477322562290738[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 103 out of 103 | elapsed: 8.6s finished [2024-10-30 15:49:59] Features: 172/274 -- score: 0.9477346568857261[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 102 out of 102 | elapsed: 8.4s finished [2024-10-30 15:50:08] Features: 173/274 -- score: 0.9477348473305426[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 101 out of 101 | elapsed: 8.8s finished [2024-10-30 15:50:17] Features: 174/274 -- score: 0.9477348473305568[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 100 out of 100 | elapsed: 8.7s finished [2024-10-30 15:50:26] Features: 175/274 -- score: 0.9477348473305618[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.3s [Parallel(n_jobs=-1)]: Done 99 out of 99 | elapsed: 8.8s finished [2024-10-30 15:50:35] Features: 176/274 -- score: 0.9477348473305675[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.3s [Parallel(n_jobs=-1)]: Done 98 out of 98 | elapsed: 8.9s finished [2024-10-30 15:50:44] Features: 177/274 -- score: 0.9477348473305586[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 97 out of 97 | elapsed: 8.5s finished [2024-10-30 15:50:52] Features: 178/274 -- score: 0.9477348473305609[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 96 out of 96 | elapsed: 8.5s finished [2024-10-30 15:51:01] Features: 179/274 -- score: 0.9477348473305595[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.3s [Parallel(n_jobs=-1)]: Done 95 out of 95 | elapsed: 8.2s finished [2024-10-30 15:51:09] Features: 180/274 -- score: 0.9477348473305647[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 94 out of 94 | elapsed: 8.3s finished [2024-10-30 15:51:17] Features: 181/274 -- score: 0.9477348473305642[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.3s [Parallel(n_jobs=-1)]: Done 93 out of 93 | elapsed: 8.3s finished [2024-10-30 15:51:26] Features: 182/274 -- score: 0.9477348473305577[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.3s [Parallel(n_jobs=-1)]: Done 92 out of 92 | elapsed: 8.1s finished [2024-10-30 15:51:34] Features: 183/274 -- score: 0.9477348473305609[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 91 out of 91 | elapsed: 7.8s finished [2024-10-30 15:51:42] Features: 184/274 -- score: 0.9479327591304928[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.3s [Parallel(n_jobs=-1)]: Done 90 out of 90 | elapsed: 8.2s finished [2024-10-30 15:51:50] Features: 185/274 -- score: 0.9477348473305778[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.3s [Parallel(n_jobs=-1)]: Done 89 out of 89 | elapsed: 7.9s finished [2024-10-30 15:51:58] Features: 186/274 -- score: 0.9477348473305677[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.3s [Parallel(n_jobs=-1)]: Done 88 out of 88 | elapsed: 7.9s finished [2024-10-30 15:52:06] Features: 187/274 -- score: 0.9477348473305719[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 87 out of 87 | elapsed: 7.8s finished [2024-10-30 15:52:14] Features: 188/274 -- score: 0.9477348473305671[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.4s [Parallel(n_jobs=-1)]: Done 86 out of 86 | elapsed: 8.0s finished [2024-10-30 15:52:22] Features: 189/274 -- score: 0.94773484733056[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.4s [Parallel(n_jobs=-1)]: Done 85 out of 85 | elapsed: 8.3s finished [2024-10-30 15:52:31] Features: 190/274 -- score: 0.9477348473305632[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.3s [Parallel(n_jobs=-1)]: Done 84 out of 84 | elapsed: 7.9s finished [2024-10-30 15:52:39] Features: 191/274 -- score: 0.9477348473305647[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.3s [Parallel(n_jobs=-1)]: Done 83 out of 83 | elapsed: 7.7s finished [2024-10-30 15:52:46] Features: 192/274 -- score: 0.947734847330557[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.4s [Parallel(n_jobs=-1)]: Done 82 out of 82 | elapsed: 8.1s finished [2024-10-30 15:52:55] Features: 193/274 -- score: 0.9477348473305544[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.6s [Parallel(n_jobs=-1)]: Done 81 out of 81 | elapsed: 8.3s finished [2024-10-30 15:53:03] Features: 194/274 -- score: 0.9479431431644982[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.5s [Parallel(n_jobs=-1)]: Done 80 out of 80 | elapsed: 8.0s finished [2024-10-30 15:53:11] Features: 195/274 -- score: 0.9477174961107611[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.4s [Parallel(n_jobs=-1)]: Done 79 out of 79 | elapsed: 7.5s finished [2024-10-30 15:53:19] Features: 196/274 -- score: 0.9477220057713558[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.5s [Parallel(n_jobs=-1)]: Done 78 out of 78 | elapsed: 7.5s finished [2024-10-30 15:53:26] Features: 197/274 -- score: 0.9478021679359317[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.4s [Parallel(n_jobs=-1)]: Done 77 out of 77 | elapsed: 7.9s finished [2024-10-30 15:53:34] Features: 198/274 -- score: 0.9476871492756201[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.5s [Parallel(n_jobs=-1)]: Done 76 out of 76 | elapsed: 7.7s finished [2024-10-30 15:53:42] Features: 199/274 -- score: 0.9476871492756416[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.4s [Parallel(n_jobs=-1)]: Done 75 out of 75 | elapsed: 7.4s finished [2024-10-30 15:53:50] Features: 200/274 -- score: 0.947793040632469[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.4s [Parallel(n_jobs=-1)]: Done 74 out of 74 | elapsed: 7.7s finished [2024-10-30 15:53:58] Features: 201/274 -- score: 0.9477213973343659[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.6s [Parallel(n_jobs=-1)]: Done 73 out of 73 | elapsed: 7.5s finished [2024-10-30 15:54:05] Features: 202/274 -- score: 0.9476673271270835[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.4s [Parallel(n_jobs=-1)]: Done 72 out of 72 | elapsed: 7.4s finished [2024-10-30 15:54:13] Features: 203/274 -- score: 0.9476626655969114[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.5s [Parallel(n_jobs=-1)]: Done 71 out of 71 | elapsed: 7.1s finished [2024-10-30 15:54:20] Features: 204/274 -- score: 0.9476300273488697[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.6s [Parallel(n_jobs=-1)]: Done 70 out of 70 | elapsed: 7.2s finished [2024-10-30 15:54:27] Features: 205/274 -- score: 0.9474341605681742[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.5s [Parallel(n_jobs=-1)]: Done 69 out of 69 | elapsed: 7.1s finished [2024-10-30 15:54:34] Features: 206/274 -- score: 0.9474283068356218[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.5s [Parallel(n_jobs=-1)]: Done 68 out of 68 | elapsed: 7.1s finished [2024-10-30 15:54:42] Features: 207/274 -- score: 0.9474175553568045[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.5s [Parallel(n_jobs=-1)]: Done 67 out of 67 | elapsed: 7.0s finished [2024-10-30 15:54:49] Features: 208/274 -- score: 0.9474659427760701[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.6s [Parallel(n_jobs=-1)]: Done 66 out of 66 | elapsed: 7.0s finished [2024-10-30 15:54:56] Features: 209/274 -- score: 0.9474492362265832[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.5s [Parallel(n_jobs=-1)]: Done 65 out of 65 | elapsed: 6.9s finished [2024-10-30 15:55:03] Features: 210/274 -- score: 0.9472195020915131[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.5s [Parallel(n_jobs=-1)]: Done 64 out of 64 | elapsed: 6.9s finished [2024-10-30 15:55:10] Features: 211/274 -- score: 0.9472840750057999[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.4s [Parallel(n_jobs=-1)]: Done 63 out of 63 | elapsed: 6.8s finished [2024-10-30 15:55:17] Features: 212/274 -- score: 0.9470945712133542[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.7s [Parallel(n_jobs=-1)]: Done 62 out of 62 | elapsed: 6.8s finished [2024-10-30 15:55:24] Features: 213/274 -- score: 0.9471612135864695[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.6s [Parallel(n_jobs=-1)]: Done 61 out of 61 | elapsed: 6.6s finished [2024-10-30 15:55:30] Features: 214/274 -- score: 0.9469856521781331[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.6s [Parallel(n_jobs=-1)]: Done 60 out of 60 | elapsed: 6.6s finished [2024-10-30 15:55:37] Features: 215/274 -- score: 0.9469172110948165[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.6s [Parallel(n_jobs=-1)]: Done 59 out of 59 | elapsed: 6.5s finished [2024-10-30 15:55:44] Features: 216/274 -- score: 0.9468774297337254[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.6s [Parallel(n_jobs=-1)]: Done 58 out of 58 | elapsed: 6.5s finished [2024-10-30 15:55:50] Features: 217/274 -- score: 0.9468858969393554[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.9s [Parallel(n_jobs=-1)]: Done 55 out of 57 | elapsed: 6.6s remaining: 0.1s [Parallel(n_jobs=-1)]: Done 57 out of 57 | elapsed: 6.6s finished [2024-10-30 15:55:57] Features: 218/274 -- score: 0.9468858969393794[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.8s [Parallel(n_jobs=-1)]: Done 54 out of 56 | elapsed: 6.5s remaining: 0.1s [Parallel(n_jobs=-1)]: Done 56 out of 56 | elapsed: 6.5s finished [2024-10-30 15:56:04] Features: 219/274 -- score: 0.9468858969393843[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.8s [Parallel(n_jobs=-1)]: Done 52 out of 55 | elapsed: 6.3s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 55 out of 55 | elapsed: 6.4s finished [2024-10-30 15:56:10] Features: 220/274 -- score: 0.9468966252072881[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.8s [Parallel(n_jobs=-1)]: Done 51 out of 54 | elapsed: 6.0s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 54 out of 54 | elapsed: 6.1s finished [2024-10-30 15:56:16] Features: 221/274 -- score: 0.9468963452865928[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.7s [Parallel(n_jobs=-1)]: Done 49 out of 53 | elapsed: 5.9s remaining: 0.4s [Parallel(n_jobs=-1)]: Done 53 out of 53 | elapsed: 6.1s finished [2024-10-30 15:56:22] Features: 222/274 -- score: 0.9468913460980689[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.8s [Parallel(n_jobs=-1)]: Done 48 out of 52 | elapsed: 5.7s remaining: 0.4s [Parallel(n_jobs=-1)]: Done 52 out of 52 | elapsed: 5.9s finished [2024-10-30 15:56:28] Features: 223/274 -- score: 0.9469002261643519[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.7s [Parallel(n_jobs=-1)]: Done 46 out of 51 | elapsed: 5.6s remaining: 0.5s [Parallel(n_jobs=-1)]: Done 51 out of 51 | elapsed: 5.7s finished [2024-10-30 15:56:34] Features: 224/274 -- score: 0.9468903226124956[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.8s [Parallel(n_jobs=-1)]: Done 45 out of 50 | elapsed: 5.5s remaining: 0.5s [Parallel(n_jobs=-1)]: Done 50 out of 50 | elapsed: 5.7s finished [2024-10-30 15:56:40] Features: 225/274 -- score: 0.9468797661389831[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.7s [Parallel(n_jobs=-1)]: Done 43 out of 49 | elapsed: 5.4s remaining: 0.7s [Parallel(n_jobs=-1)]: Done 49 out of 49 | elapsed: 5.6s finished [2024-10-30 15:56:46] Features: 226/274 -- score: 0.9468713588961108[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.9s [Parallel(n_jobs=-1)]: Done 42 out of 48 | elapsed: 5.3s remaining: 0.7s [Parallel(n_jobs=-1)]: Done 48 out of 48 | elapsed: 5.5s finished [2024-10-30 15:56:51] Features: 227/274 -- score: 0.9468383593915652[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.8s [Parallel(n_jobs=-1)]: Done 40 out of 47 | elapsed: 5.0s remaining: 0.8s [Parallel(n_jobs=-1)]: Done 47 out of 47 | elapsed: 5.3s finished [2024-10-30 15:56:57] Features: 228/274 -- score: 0.9467858747397688[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.9s [Parallel(n_jobs=-1)]: Done 39 out of 46 | elapsed: 5.0s remaining: 0.8s [Parallel(n_jobs=-1)]: Done 46 out of 46 | elapsed: 5.3s finished [2024-10-30 15:57:02] Features: 229/274 -- score: 0.9467710676386449[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.9s [Parallel(n_jobs=-1)]: Done 37 out of 45 | elapsed: 5.0s remaining: 1.0s [Parallel(n_jobs=-1)]: Done 45 out of 45 | elapsed: 5.3s finished [2024-10-30 15:57:08] Features: 230/274 -- score: 0.9467550149626927[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.9s [Parallel(n_jobs=-1)]: Done 36 out of 44 | elapsed: 5.0s remaining: 1.0s [Parallel(n_jobs=-1)]: Done 44 out of 44 | elapsed: 5.4s finished [2024-10-30 15:57:13] Features: 231/274 -- score: 0.9467383676403364[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.9s [Parallel(n_jobs=-1)]: Done 34 out of 43 | elapsed: 4.6s remaining: 1.1s [Parallel(n_jobs=-1)]: Done 43 out of 43 | elapsed: 5.0s finished [2024-10-30 15:57:18] Features: 232/274 -- score: 0.9467131783637202[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.8s [Parallel(n_jobs=-1)]: Done 33 out of 42 | elapsed: 4.6s remaining: 1.2s [Parallel(n_jobs=-1)]: Done 42 out of 42 | elapsed: 5.0s finished [2024-10-30 15:57:23] Features: 233/274 -- score: 0.9466859943937971[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.9s [Parallel(n_jobs=-1)]: Done 31 out of 41 | elapsed: 4.3s remaining: 1.3s [Parallel(n_jobs=-1)]: Done 41 out of 41 | elapsed: 4.9s finished [2024-10-30 15:57:28] Features: 234/274 -- score: 0.9466508831445383[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.8s [Parallel(n_jobs=-1)]: Done 30 out of 40 | elapsed: 4.4s remaining: 1.4s [Parallel(n_jobs=-1)]: Done 40 out of 40 | elapsed: 4.9s finished [2024-10-30 15:57:34] Features: 235/274 -- score: 0.9466138704337347[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 28 out of 39 | elapsed: 4.2s remaining: 1.6s [Parallel(n_jobs=-1)]: Done 39 out of 39 | elapsed: 4.7s finished [2024-10-30 15:57:38] Features: 236/274 -- score: 0.9465750648065324[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 27 out of 38 | elapsed: 4.0s remaining: 1.6s [Parallel(n_jobs=-1)]: Done 38 out of 38 | elapsed: 4.6s finished [2024-10-30 15:57:43] Features: 237/274 -- score: 0.946533477078739[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 25 out of 37 | elapsed: 3.7s remaining: 1.7s [Parallel(n_jobs=-1)]: Done 37 out of 37 | elapsed: 4.6s finished [2024-10-30 15:57:48] Features: 238/274 -- score: 0.946486463996916[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 24 out of 36 | elapsed: 3.7s remaining: 1.8s [Parallel(n_jobs=-1)]: Done 36 out of 36 | elapsed: 4.5s finished [2024-10-30 15:57:52] Features: 239/274 -- score: 0.9464394609940602[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 22 out of 35 | elapsed: 3.5s remaining: 2.0s [Parallel(n_jobs=-1)]: Done 35 out of 35 | elapsed: 4.3s finished [2024-10-30 15:57:57] Features: 240/274 -- score: 0.9467273803937399[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 21 out of 34 | elapsed: 3.5s remaining: 2.1s [Parallel(n_jobs=-1)]: Done 34 out of 34 | elapsed: 4.2s finished [2024-10-30 15:58:01] Features: 241/274 -- score: 0.9467599072842546[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 19 out of 33 | elapsed: 3.3s remaining: 2.4s [Parallel(n_jobs=-1)]: Done 33 out of 33 | elapsed: 4.2s finished [2024-10-30 15:58:05] Features: 242/274 -- score: 0.9467735822418181[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 18 out of 32 | elapsed: 3.0s remaining: 2.3s [Parallel(n_jobs=-1)]: Done 32 out of 32 | elapsed: 4.0s finished [2024-10-30 15:58:09] Features: 243/274 -- score: 0.94672377591958[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 16 out of 31 | elapsed: 2.7s remaining: 2.5s [Parallel(n_jobs=-1)]: Done 31 out of 31 | elapsed: 3.8s finished [2024-10-30 15:58:13] Features: 244/274 -- score: 0.9466509249761966[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 15 out of 30 | elapsed: 2.8s remaining: 2.8s [Parallel(n_jobs=-1)]: Done 30 out of 30 | elapsed: 3.9s finished [2024-10-30 15:58:17] Features: 245/274 -- score: 0.9466867379399406[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 13 out of 29 | elapsed: 2.6s remaining: 3.2s [Parallel(n_jobs=-1)]: Done 29 out of 29 | elapsed: 3.8s finished [2024-10-30 15:58:21] Features: 246/274 -- score: 0.9466071795787135[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 12 out of 28 | elapsed: 2.4s remaining: 3.3s [Parallel(n_jobs=-1)]: Done 28 out of 28 | elapsed: 3.6s finished [2024-10-30 15:58:25] Features: 247/274 -- score: 0.9465238896311116[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 10 out of 27 | elapsed: 2.2s remaining: 3.7s [Parallel(n_jobs=-1)]: Done 24 out of 27 | elapsed: 3.3s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 27 out of 27 | elapsed: 3.4s finished [2024-10-30 15:58:28] Features: 248/274 -- score: 0.946521579716731[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 out of 26 | elapsed: 2.1s remaining: 4.0s [Parallel(n_jobs=-1)]: Done 23 out of 26 | elapsed: 3.3s remaining: 0.3s [Parallel(n_jobs=-1)]: Done 26 out of 26 | elapsed: 3.4s finished [2024-10-30 15:58:32] Features: 249/274 -- score: 0.9464362915278152[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 7 out of 25 | elapsed: 1.8s remaining: 4.7s [Parallel(n_jobs=-1)]: Done 20 out of 25 | elapsed: 3.1s remaining: 0.7s [Parallel(n_jobs=-1)]: Done 25 out of 25 | elapsed: 3.3s finished [2024-10-30 15:58:35] Features: 250/274 -- score: 0.9463261799186548[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 6 out of 24 | elapsed: 1.7s remaining: 5.4s [Parallel(n_jobs=-1)]: Done 19 out of 24 | elapsed: 3.0s remaining: 0.7s [Parallel(n_jobs=-1)]: Done 24 out of 24 | elapsed: 3.1s finished [2024-10-30 15:58:39] Features: 251/274 -- score: 0.9462690802707947[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 4 out of 23 | elapsed: 1.3s remaining: 6.5s [Parallel(n_jobs=-1)]: Done 16 out of 23 | elapsed: 2.7s remaining: 1.1s [Parallel(n_jobs=-1)]: Done 23 out of 23 | elapsed: 2.9s finished [2024-10-30 15:58:42] Features: 252/274 -- score: 0.9461447525929906[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 3 out of 22 | elapsed: 1.3s remaining: 8.9s [Parallel(n_jobs=-1)]: Done 15 out of 22 | elapsed: 2.6s remaining: 1.1s [Parallel(n_jobs=-1)]: Done 22 out of 22 | elapsed: 2.9s finished [2024-10-30 15:58:45] Features: 253/274 -- score: 0.9459954927439093[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 12 out of 21 | elapsed: 2.3s remaining: 1.7s [Parallel(n_jobs=-1)]: Done 21 out of 21 | elapsed: 2.8s finished [2024-10-30 15:58:47] Features: 254/274 -- score: 0.9458383532301985[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 11 out of 20 | elapsed: 2.2s remaining: 1.8s [Parallel(n_jobs=-1)]: Done 20 out of 20 | elapsed: 2.7s finished [2024-10-30 15:58:50] Features: 255/274 -- score: 0.945679156290041[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 8 out of 19 | elapsed: 1.9s remaining: 2.6s [Parallel(n_jobs=-1)]: Done 19 out of 19 | elapsed: 2.6s finished [2024-10-30 15:58:53] Features: 256/274 -- score: 0.9455899760485096[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 7 out of 18 | elapsed: 1.8s remaining: 2.9s [Parallel(n_jobs=-1)]: Done 18 out of 18 | elapsed: 2.5s finished [2024-10-30 15:58:56] Features: 257/274 -- score: 0.9454303045137135[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 4 out of 17 | elapsed: 1.4s remaining: 4.7s [Parallel(n_jobs=-1)]: Done 13 out of 17 | elapsed: 2.1s remaining: 0.6s [Parallel(n_jobs=-1)]: Done 17 out of 17 | elapsed: 2.3s finished [2024-10-30 15:58:58] Features: 258/274 -- score: 0.9458661657907452[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 3 out of 16 | elapsed: 1.2s remaining: 5.5s [Parallel(n_jobs=-1)]: Done 12 out of 16 | elapsed: 2.0s remaining: 0.6s [Parallel(n_jobs=-1)]: Done 16 out of 16 | elapsed: 2.1s finished [2024-10-30 15:59:00] Features: 259/274 -- score: 0.9461832380497425[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 8 out of 15 | elapsed: 1.8s remaining: 1.5s [Parallel(n_jobs=-1)]: Done 15 out of 15 | elapsed: 2.2s finished [2024-10-30 15:59:03] Features: 260/274 -- score: 0.9460372836244307[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 7 out of 14 | elapsed: 1.6s remaining: 1.6s [Parallel(n_jobs=-1)]: Done 14 out of 14 | elapsed: 1.9s finished [2024-10-30 15:59:04] Features: 261/274 -- score: 0.9458763464886486[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 3 out of 13 | elapsed: 1.2s remaining: 4.2s [Parallel(n_jobs=-1)]: Done 10 out of 13 | elapsed: 1.7s remaining: 0.4s [Parallel(n_jobs=-1)]: Done 13 out of 13 | elapsed: 1.8s finished [2024-10-30 15:59:06] Features: 262/274 -- score: 0.9458467262666927[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 2 out of 12 | elapsed: 1.1s remaining: 6.0s [Parallel(n_jobs=-1)]: Done 9 out of 12 | elapsed: 1.6s remaining: 0.5s [Parallel(n_jobs=-1)]: Done 12 out of 12 | elapsed: 1.7s finished [2024-10-30 15:59:08] Features: 263/274 -- score: 0.9456388669986378[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 4 out of 11 | elapsed: 1.1s remaining: 2.1s [Parallel(n_jobs=-1)]: Done 11 out of 11 | elapsed: 1.5s finished [2024-10-30 15:59:10] Features: 264/274 -- score: 0.9452713138924475[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 3 out of 10 | elapsed: 1.1s remaining: 2.6s [Parallel(n_jobs=-1)]: Done 10 out of 10 | elapsed: 1.4s finished [2024-10-30 15:59:11] Features: 265/274 -- score: 0.945161446961162[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 3 out of 9 | elapsed: 1.1s remaining: 2.3s [Parallel(n_jobs=-1)]: Done 9 out of 9 | elapsed: 1.3s finished [2024-10-30 15:59:13] Features: 266/274 -- score: 0.944754144553993[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 2 out of 8 | elapsed: 0.9s remaining: 2.8s [Parallel(n_jobs=-1)]: Done 8 out of 8 | elapsed: 1.1s finished [2024-10-30 15:59:14] Features: 267/274 -- score: 0.9446487424045044[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 4 out of 7 | elapsed: 0.9s remaining: 0.6s [Parallel(n_jobs=-1)]: Done 7 out of 7 | elapsed: 1.0s finished [2024-10-30 15:59:15] Features: 268/274 -- score: 0.9441455030203605[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 3 out of 6 | elapsed: 0.8s remaining: 0.8s [Parallel(n_jobs=-1)]: Done 6 out of 6 | elapsed: 0.9s finished [2024-10-30 15:59:16] Features: 269/274 -- score: 0.9441455030203807[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 5 out of 5 | elapsed: 0.8s finished [2024-10-30 15:59:17] Features: 270/274 -- score: 0.9434739244637808[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 4 out of 4 | elapsed: 0.6s finished [2024-10-30 15:59:18] Features: 271/274 -- score: 0.9434233272407218[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 3 out of 3 | elapsed: 0.5s finished [2024-10-30 15:59:18] Features: 272/274 -- score: 0.9424326394992495[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 2 out of 2 | elapsed: 0.4s finished [2024-10-30 15:59:19] Features: 273/274 -- score: -30180713684991.74[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [2024-10-30 15:59:19] Features: 274/274 -- score: -2977506930194.4707
Now, we are going to plot the score vs number of features
# score results
sfs_dict = sfs.get_metric_dict()
x = [i for i in sfs_dict]
y = [sfs_dict[i]['avg_score'] for i in sfs_dict]
# slice list to avoid last 2 extreme scores
x2 = x[0:272]
y2 = y[0:272]
sns.lineplot(x=x2, y=y2);
With 50 features the score is around 0.93, and it does not improve significantly with additional features. Actually, it decreases after 265 features.
reg = LinearRegression()
# Build step forward feature selection with 50 features
sfs = SFS(
reg,
k_features=20,
forward=True,
floating=False,
scoring="r2",
n_jobs=-1,
verbose=2,
cv=5,
)
# Perform SFFS
sfs = sfs.fit(x_train, y_train)
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 4.5s [Parallel(n_jobs=-1)]: Done 130 tasks | elapsed: 8.1s [Parallel(n_jobs=-1)]: Done 274 out of 274 | elapsed: 12.2s finished [2024-10-30 16:18:10] Features: 1/20 -- score: 0.6008605735778773[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.2s [Parallel(n_jobs=-1)]: Done 228 tasks | elapsed: 4.0s [Parallel(n_jobs=-1)]: Done 242 out of 273 | elapsed: 4.2s remaining: 0.5s [Parallel(n_jobs=-1)]: Done 273 out of 273 | elapsed: 4.9s finished [2024-10-30 16:18:15] Features: 2/20 -- score: 0.8258189338417044[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.3s [Parallel(n_jobs=-1)]: Done 272 out of 272 | elapsed: 4.8s finished [2024-10-30 16:18:20] Features: 3/20 -- score: 0.843163289844291[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.3s [Parallel(n_jobs=-1)]: Done 240 out of 271 | elapsed: 4.3s remaining: 0.5s [Parallel(n_jobs=-1)]: Done 271 out of 271 | elapsed: 4.9s finished [2024-10-30 16:18:25] Features: 4/20 -- score: 0.8607242290453196[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.3s [Parallel(n_jobs=-1)]: Done 270 out of 270 | elapsed: 4.8s finished [2024-10-30 16:18:30] Features: 5/20 -- score: 0.8673527352506463[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.3s [Parallel(n_jobs=-1)]: Done 238 out of 269 | elapsed: 4.2s remaining: 0.5s [Parallel(n_jobs=-1)]: Done 269 out of 269 | elapsed: 4.8s finished [2024-10-30 16:18:35] Features: 6/20 -- score: 0.8733406729210295[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.3s [Parallel(n_jobs=-1)]: Done 268 out of 268 | elapsed: 4.8s finished [2024-10-30 16:18:39] Features: 7/20 -- score: 0.8779019854352988[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.3s [Parallel(n_jobs=-1)]: Done 236 out of 267 | elapsed: 4.4s remaining: 0.5s [Parallel(n_jobs=-1)]: Done 267 out of 267 | elapsed: 5.1s finished [2024-10-30 16:18:45] Features: 8/20 -- score: 0.8821076936222507[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.3s [Parallel(n_jobs=-1)]: Done 266 out of 266 | elapsed: 5.1s finished [2024-10-30 16:18:50] Features: 9/20 -- score: 0.8865803385052942[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.3s [Parallel(n_jobs=-1)]: Done 234 out of 265 | elapsed: 4.5s remaining: 0.5s [Parallel(n_jobs=-1)]: Done 265 out of 265 | elapsed: 5.1s finished [2024-10-30 16:18:55] Features: 10/20 -- score: 0.892136862625631[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.3s [Parallel(n_jobs=-1)]: Done 264 out of 264 | elapsed: 5.1s finished [2024-10-30 16:19:00] Features: 11/20 -- score: 0.8997588358866588[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.3s [Parallel(n_jobs=-1)]: Done 232 out of 263 | elapsed: 4.5s remaining: 0.5s [Parallel(n_jobs=-1)]: Done 263 out of 263 | elapsed: 5.1s finished [2024-10-30 16:19:06] Features: 12/20 -- score: 0.9045279013258785[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.3s [Parallel(n_jobs=-1)]: Done 262 out of 262 | elapsed: 5.1s finished [2024-10-30 16:19:11] Features: 13/20 -- score: 0.9077032400393492[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.3s [Parallel(n_jobs=-1)]: Done 230 out of 261 | elapsed: 4.5s remaining: 0.5s [Parallel(n_jobs=-1)]: Done 261 out of 261 | elapsed: 5.1s finished [2024-10-30 16:19:16] Features: 14/20 -- score: 0.9113184768449554[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.3s [Parallel(n_jobs=-1)]: Done 260 out of 260 | elapsed: 5.1s finished [2024-10-30 16:19:21] Features: 15/20 -- score: 0.9144300582982374[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.3s [Parallel(n_jobs=-1)]: Done 228 out of 259 | elapsed: 4.6s remaining: 0.5s [Parallel(n_jobs=-1)]: Done 259 out of 259 | elapsed: 5.2s finished [2024-10-30 16:19:26] Features: 16/20 -- score: 0.9165568997201585[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.3s [Parallel(n_jobs=-1)]: Done 258 out of 258 | elapsed: 5.2s finished [2024-10-30 16:19:32] Features: 17/20 -- score: 0.9187094547000717[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.3s [Parallel(n_jobs=-1)]: Done 226 out of 257 | elapsed: 5.0s remaining: 0.6s [Parallel(n_jobs=-1)]: Done 257 out of 257 | elapsed: 5.5s finished [2024-10-30 16:19:37] Features: 18/20 -- score: 0.9210414399636491[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.3s [Parallel(n_jobs=-1)]: Done 256 out of 256 | elapsed: 5.4s finished [2024-10-30 16:19:43] Features: 19/20 -- score: 0.9231785068400782[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers. [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 0.3s [Parallel(n_jobs=-1)]: Done 255 out of 255 | elapsed: 5.7s finished [2024-10-30 16:19:49] Features: 20/20 -- score: 0.9244145086624963
# Most important features
feat_cols = list(sfs.k_feature_idx_)
print(feat_cols)
[0, 2, 4, 5, 6, 8, 10, 13, 19, 20, 24, 25, 27, 37, 40, 43, 44, 52, 53, 266]
x_train.columns[feat_cols]
Index(['Year', 'Engine', 'Kilometers_Driven_log', 'Power_log', 'Location_Bangalore', 'Location_Coimbatore', 'Location_Hyderabad', 'Location_Kolkata', 'Fuel_Type_Petrol', 'Transmission_Manual', 'Brand_Audi', 'Brand_BMW', 'Brand_Chevrolet', 'Brand_Jaguar', 'Brand_Land', 'Brand_Mercedes-Benz', 'Brand_Mini', 'Brand_Tata', 'Brand_Toyota', 'Model_Xylo'], dtype='object')
11.2 Retraining the model¶
New independent train and test sets with the 50 variables selected in the sequential feature selection
x_train_final = x_train[x_train.columns[feat_cols]]
x_test_final = x_test[x_train.columns[feat_cols]]
#check shape
x_train_final.shape
(4213, 20)
#check shape
x_test_final.shape
(1806, 20)
# Fitting linear model
lin_reg_model2 = LinearRegression()
lin_reg_model2.fit(x_train_final, y_train)
# let us check the coefficients and intercept of the model
coef_df = pd.DataFrame(
np.append(lin_reg_model2.coef_.flatten(), lin_reg_model2.intercept_),
index=x_train_final.columns.tolist() + ["Intercept"],
columns=["Coefficients"],
)
coef_df
Coefficients | |
---|---|
Year | 0.115198 |
Engine | 0.000226 |
Kilometers_Driven_log | -0.074748 |
Power_log | 0.778158 |
Location_Bangalore | 0.183211 |
Location_Coimbatore | 0.156587 |
Location_Hyderabad | 0.180627 |
Location_Kolkata | -0.192565 |
Fuel_Type_Petrol | -0.198387 |
Transmission_Manual | -0.107517 |
Brand_Audi | 0.571186 |
Brand_BMW | 0.519360 |
Brand_Chevrolet | -0.278779 |
Brand_Jaguar | 0.607397 |
Brand_Land | 0.887929 |
Brand_Mercedes-Benz | 0.580511 |
Brand_Mini | 0.913336 |
Brand_Tata | -0.406731 |
Brand_Toyota | 0.251947 |
Model_Xylo | -0.500064 |
Intercept | -233.238329 |
11.2.1 Model Performance¶
# R^2 train set
lin_reg_model2.score(x_train_final, y_train)
0.9261017957003274
# R^2 test set
lin_reg_model2.score(x_test_final, y_test)
0.9318611297475975
# Model performance on train set
model_perf(lin_reg_model2, x_train_final, y_train)
{'RMSE': 0.2371177477741038, 'MAE': 0.17574233711503315, 'R^2': 0.9261017957003274, 'Adjusted R^2': 0.9257492279317221}
# Model performance on train set
model_perf(lin_reg_model2, x_test_final, y_test)
{'RMSE': 0.22906136480762884, 'MAE': 0.1709946146974173, 'R^2': 0.9318611297475975, 'Adjusted R^2': 0.9310976690164782}
Observations¶
- The new regression model have 50 features that is 18% on the number of columns of the original regression model
- The performance of the new model is very close to the original model
11.3 Coefficient Interpretation¶
11.3.1 Positive impact¶
This is the list of coefficients with positive impact on prices. Among them are Year, Power and Seats. Increase in these will lead to an increase in the price.
coef_df[coef_df['Coefficients']>0]
Coefficients | |
---|---|
Year | 0.115198 |
Engine | 0.000226 |
Power_log | 0.778158 |
Location_Bangalore | 0.183211 |
Location_Coimbatore | 0.156587 |
Location_Hyderabad | 0.180627 |
Brand_Audi | 0.571186 |
Brand_BMW | 0.519360 |
Brand_Jaguar | 0.607397 |
Brand_Land | 0.887929 |
Brand_Mercedes-Benz | 0.580511 |
Brand_Mini | 0.913336 |
Brand_Toyota | 0.251947 |
11.3.2 Negative impact¶
This is the list of coefficients with negative impact on prices. Among them are Mileage, Engine and Kilometers_Drive_log. Increase in these will lead to a decrease in the price
coef_df[coef_df['Coefficients']<0]
Coefficients | |
---|---|
Kilometers_Driven_log | -0.074748 |
Location_Kolkata | -0.192565 |
Fuel_Type_Petrol | -0.198387 |
Transmission_Manual | -0.107517 |
Brand_Chevrolet | -0.278779 |
Brand_Tata | -0.406731 |
Model_Xylo | -0.500064 |
Intercept | -233.238329 |
11.3.3 Observations¶
The impact of the different features on Price is similar than the original regression model
12 Actionable Insights & Recommendations¶
Cars4U should focus on trade:
- The business should focus to negotiate recent owned cars
- Cars with high power have a positive impact
- Diesel and Electric cars are more valued than other fuel types
- Trade cars on specifics locations: Bangalore, Chennai, Coimbatore and Hyderabad
- If possible, focus on luxury cars and models (Lamborghini, Jaguar, Porsche, etc)
Cars4U should avoid:
- Cars with a large number of kilometers driven
- Trading on Delhi, Kochi, Kolkata and Mumbai
- LPG and Petrol cars
- Manual transmission cars
- Second and above owners cars
- Economy Brands and Models (Datsun, Renault, Honda, Mahindra)