데이터 불러오기 -> 데이터 전처리 -> 특성 공학 -> 데이터셋 분리 -> 모델링 -> 평가
파이프라인 방식
데이터 불러오기 -> 데이터 전처리 -> 데이터셋 분리 -> 파이프라인 구축(피처공학, 모델링) -> 평가
데이터 불러오기
1 2 3 4
import pandas as pd import numpy as np data = pd.read_csv('https://raw.githubusercontent.com/MicrosoftDocs/ml-basics/master/data/daily-bike-share.csv') data.info()
from sklearn.preprocessing import StandardScaler, OrdinalEncoder, OneHotEncoder from sklearn.impute import SimpleImputer from sklearn.compose import ColumnTransformer from sklearn.pipeline import Pipeline
# regressors = [pipe_rf, pipe_dt] for regressor in regressors: pipeline = Pipeline(steps = [ ('preprocessor', preprocessor) ,('regressor',regressor) ]) model = pipeline.fit(X_train, y_train) predictions = model.predict(X_val) print(regressor) print(f'Model r2 score:{r2_score(predictions, y_val)}')
RandomForestRegressor()
Model r2 score:0.7447806201844671
DecisionTreeRegressor()
Model r2 score:0.5885371412997458
LinearRegression()
Model r2 score:0.5703227526319388