데이터 시각화의 기본 조건
목적에 맞는 그래프 선정
선형 그래프, 막대 그래프, 산점도, 박스플롯 등등
환경에 맞는 도구 선택
코드 기반 : R, Python
프로그램 기반 : Excel, PowerBI, Tableau 등등
문맥(도메인)에 맞는 색과 도형 사용
파이썬 시각화 라이브러리 Matplotlib
정형 데이터 / 이미지 데이터
Pyplot API : Pyplot 모듈에 있는 함수를 각각 불러와서 구현
객체지향 API : Matplotlib에 구현된 객체지향 라이브러리를 직접 활용
라이브러리가 늘어나고, 코드가 복잡함
그래프의 디테일한 세부 옵션 조정이 용이함
일반적으로 두 API를 혼합하여 사용
Seaborn
Matplotlib에 종속된 라이브러리
Matplotlib에 비해 코드가 간결함
통계 그래프 구현이 보다 용이
세부적인 옵션은 Matplotlib에서 조정
라이브러리 불러오기 1 2 3 4 import matplotlibimport seaborn as snsprint ("matplotlib ver :" , matplotlib.__version__)print ("seaborn ver :" , sns.__version__)
matplotlib ver : 3.2.2
seaborn ver : 0.11.2
시각화 테스트 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 import matplotlib.pyplot as pltdates = [ '2021-01-01' , '2021-01-02' , '2021-01-03' , '2021-01-04' , '2021-01-05' , '2021-01-06' , '2021-01-07' , '2021-01-08' , '2021-01-09' , '2021-01-10' ] min_temperature = [20.7 , 17.9 , 18.8 , 14.6 , 15.8 , 15.8 , 15.8 , 17.4 , 21.8 , 20.0 ] max_temperature = [34.7 , 28.9 , 31.8 , 25.6 , 28.8 , 21.8 , 22.8 , 28.4 , 30.8 , 32.0 ] fig, ax = plt.subplots(nrows=1 , ncols=1 , figsize=(10 ,6 )) ax.plot(dates, min_temperature, label = "Min Temp." ) ax.plot(dates, max_temperature, label = "Max Temp." ) ax.legend() plt.show()
주식 데이터 예제 1 !pip install yfinance --upgrade --no-cache-dir
Requirement already satisfied: yfinance in /usr/local/lib/python3.7/dist-packages (0.1.70)
Requirement already satisfied: numpy>=1.15 in /usr/local/lib/python3.7/dist-packages (from yfinance) (1.21.5)
Requirement already satisfied: pandas>=0.24.0 in /usr/local/lib/python3.7/dist-packages (from yfinance) (1.3.5)
Requirement already satisfied: multitasking>=0.0.7 in /usr/local/lib/python3.7/dist-packages (from yfinance) (0.0.10)
Requirement already satisfied: lxml>=4.5.1 in /usr/local/lib/python3.7/dist-packages (from yfinance) (4.8.0)
Requirement already satisfied: requests>=2.26 in /usr/local/lib/python3.7/dist-packages (from yfinance) (2.27.1)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.7/dist-packages (from pandas>=0.24.0->yfinance) (2.8.2)
Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.7/dist-packages (from pandas>=0.24.0->yfinance) (2018.9)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.7.3->pandas>=0.24.0->yfinance) (1.15.0)
Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.7/dist-packages (from requests>=2.26->yfinance) (2.0.12)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests>=2.26->yfinance) (2.10)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests>=2.26->yfinance) (2021.10.8)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests>=2.26->yfinance) (1.24.3)
1 2 3 4 5 import yfinance as yfdata = yf.download("AAPL" , start="2019-08-01" , end="2022-03-23" ) ts = data['Open' ] print (ts.head())print (type (ts))
[*********************100%***********************] 1 of 1 completed
Date
2019-08-01 53.474998
2019-08-02 51.382500
2019-08-05 49.497501
2019-08-06 49.077499
2019-08-07 48.852501
Name: Open, dtype: float64
<class 'pandas.core.series.Series'>
pyplot 모듈 1 2 3 4 5 6 7 import matplotlib.pyplot as pltplt.plot(ts) plt.title("Stock Market of AAPL" ) plt.xlabel("Date" ) plt.ylabel("Open Price" ) plt.show()
객체지향 라이브러리 1 2 3 4 5 6 7 8 9 import matplotlib.pyplot as pltfig, ax = plt.subplots() ax.plot(ts) ax.set_title("Stock Market of AAPL" ) ax.set_xlabel("Date" ) ax.set_ylabel("Open Price" ) plt.show()
막대 그래프 Matplotlib 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 import matplotlib.pyplot as pltimport numpy as npimport calendarmonth_list = [1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 ] sold_list = [300 , 400 , 550 , 900 , 600 , 960 , 900 , 910 , 800 , 700 , 550 , 450 ] fig, ax = plt.subplots(figsize=(10 ,6 )) barplots = ax.bar(month_list, sold_list) print ("barplots :" , barplots) for plot in barplots: print (plot) height = plot.get_height() ax.text(plot.get_x() + plot.get_width()/2 , height, height, ha = 'center' , va = 'bottom' ) plt.xticks(month_list, calendar.month_name[1 :13 ], rotation=30 ) plt.show()
barplots : <BarContainer object of 12 artists>
Rectangle(xy=(0.6, 0), width=0.8, height=300, angle=0)
Rectangle(xy=(1.6, 0), width=0.8, height=400, angle=0)
Rectangle(xy=(2.6, 0), width=0.8, height=550, angle=0)
Rectangle(xy=(3.6, 0), width=0.8, height=900, angle=0)
Rectangle(xy=(4.6, 0), width=0.8, height=600, angle=0)
Rectangle(xy=(5.6, 0), width=0.8, height=960, angle=0)
Rectangle(xy=(6.6, 0), width=0.8, height=900, angle=0)
Rectangle(xy=(7.6, 0), width=0.8, height=910, angle=0)
Rectangle(xy=(8.6, 0), width=0.8, height=800, angle=0)
Rectangle(xy=(9.6, 0), width=0.8, height=700, angle=0)
Rectangle(xy=(10.6, 0), width=0.8, height=550, angle=0)
Rectangle(xy=(11.6, 0), width=0.8, height=450, angle=0)
Seaborn 1 2 3 fig, ax = plt.subplots() sns.countplot(x="day" , data=tips) plt.show()
1 2 3 4 print (tips['day' ].value_counts().index)print (tips['day' ].value_counts().values)print ()print (tips['day' ].value_counts(ascending=True ))
CategoricalIndex(['Sat', 'Sun', 'Thur', 'Fri'], categories=['Thur', 'Fri', 'Sat', 'Sun'], ordered=False, dtype='category')
[87 76 62 19]
Fri 19
Thur 62
Sun 76
Sat 87
Name: day, dtype: int64
1 2 3 4 5 6 7 8 9 10 fig, ax = plt.subplots() ax = sns.countplot(x="day" , data=tips, order=tips['day' ].value_counts().index, alpha=0.5 ) for plot in ax.patches: print (plot) height = plot.get_height() ax.text(plot.get_x() + plot.get_width()/2 , height, height, ha = 'center' , va = 'bottom' ) ax.set_ylim(-1 , 100 ) plt.show()
Rectangle(xy=(-0.4, 0), width=0.8, height=87, angle=0)
Rectangle(xy=(0.6, 0), width=0.8, height=76, angle=0)
Rectangle(xy=(1.6, 0), width=0.8, height=62, angle=0)
Rectangle(xy=(2.6, 0), width=0.8, height=19, angle=0)
산점도 Matplotlib 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 import seaborn as snstips = sns.load_dataset("tips" ) print (tips.info())x = tips['total_bill' ] y = tips['tip' ] fig, ax = plt.subplots(figsize=(10 ,6 )) ax.scatter(x,y) ax.set_title('Scatter of tips' ) ax.set_xlabel('Total Bill' ) ax.set_ylabel('Tip' ) plt.show()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 244 entries, 0 to 243
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 total_bill 244 non-null float64
1 tip 244 non-null float64
2 sex 244 non-null category
3 smoker 244 non-null category
4 day 244 non-null category
5 time 244 non-null category
6 size 244 non-null int64
dtypes: category(4), float64(2), int64(1)
memory usage: 7.4 KB
None
1 2 3 4 5 6 7 8 9 10 tips['sex_color' ] = tips['sex' ].map ({'Male' :'#4663F5' , 'Female' :'#FF5F2E' }) fig, ax = plt.subplots(figsize=(10 ,6 )) for label, data in tips.groupby('sex' ): ax.scatter(data['total_bill' ], data['tip' ], label=label, color=data['sex_color' ], alpha=0.5 ) ax.set_xlabel('Total Bill' ) ax.set_ylabel('Tip' ) ax.legend() plt.show()
Seaborn 1 2 3 4 5 6 7 8 9 10 import matplotlib.pyplot as pltimport seaborn as snstips = sns.load_dataset("tips" ) print (tips.info())fig, ax = plt.subplots(figsize=(10 ,6 )) sns.scatterplot(x='total_bill' , y='tip' , hue='sex' , data=tips) ax.legend() plt.show()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 244 entries, 0 to 243
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 total_bill 244 non-null float64
1 tip 244 non-null float64
2 sex 244 non-null category
3 smoker 244 non-null category
4 day 244 non-null category
5 time 244 non-null category
6 size 244 non-null int64
dtypes: category(4), float64(2), int64(1)
memory usage: 7.4 KB
None
두 개의 그래프를 동시에 표현 1 2 3 4 5 6 fig, ax = plt.subplots(nrows=1 , ncols=2 , figsize=(15 ,5 )) sns.regplot(x='total_bill' , y='tip' , data=tips, ax=ax[0 ], fit_reg=True ) ax[0 ].set_title("Scatterplot with Regression Line" ) sns.regplot(x='total_bill' , y='tip' , data=tips, ax=ax[1 ], fit_reg=False ) ax[1 ].set_title("Scatterplot without Regression Line" ) plt.show()
종합 예제 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 import matplotlib.pyplot as pltimport seaborn as snsimport numpy as npfrom matplotlib.ticker import (MultipleLocator, AutoMinorLocator, FuncFormatter)def major_formatter (x, pos ): return "$ %.2f" % x formatter = FuncFormatter(major_formatter) tips = sns.load_dataset("tips" ) fig, ax = plt.subplots(nrows=1 , ncols=2 , figsize=(20 ,6 )) ax0 = sns.barplot(x="day" , y='total_bill' , data=tips, ax=ax[0 ], ci=None , alpha=0.85 ) for p in ax0.patches: height = np.round (p.get_height(), 2 ) ax0.text(p.get_x() + p.get_width()/2. , height+1 , height, ha = 'center' , size=12 ) ax0.set_ylim(-3 , 30 ) ax0.set_title("Basic Bar Graph" ) ax1 = sns.barplot(x="day" , y="total_bill" , data=tips, ax=ax[1 ], ci=None , color='lightgray' , alpha=0.85 , zorder=2 ) group_mean = tips.groupby(['day' ])['total_bill' ].agg('mean' ) h_day = group_mean.sort_values(ascending=False ).index[0 ] h_mean = group_mean.sort_values(ascending=False ).values[0 ] for plot in ax1.patches: height = np.round (plot.get_height(), 2 ) fontweight = "normal" color = "k" if h_mean == height: fontweight = "bold" color = "darkred" plot.set_facecolor(color) plot.set_edgecolor("black" ) ax1.text(plot.get_x() + plot.get_width()/2. , height+1 , height, ha ='center' , size=12 , fontweight=fontweight, color=color) ax1.spines['top' ].set_visible(False ) ax1.spines['left' ].set_position(("outward" , 20 )) ax1.spines['left' ].set_visible(False ) ax1.spines['right' ].set_visible(False ) ax1.spines['bottom' ].set_visible(False ) ax1.set_ylim(-1 , 30 ) ax1.set_title("Ideal Bar Graph" , size=16 ) ax1.yaxis.set_major_locator(MultipleLocator(10 )) ax1.yaxis.set_major_formatter(formatter) ax1.yaxis.set_minor_locator(MultipleLocator(5 )) ax1.set_ylabel("Avg. Total Bill($)" , fontsize=14 ) ax1.set_xlabel("Weekday" , fontsize=14 ) ax1.grid(axis="y" , which="major" , color="lightgray" ) ax1.grid(axis="y" , which="minor" , ls=":" ) for xtick in ax1.get_xticklabels(): if xtick.get_text() == h_day: xtick.set_color("darkred" ) xtick.set_fontweight("demibold" ) ax1.set_xticklabels(['Thursday' , 'Friday' , 'Saturday' , 'Sunday' ], size=12 ) plt.show()