TYC

‘A chemical engineer enthusiastic in data science’

0%

Analysis of Housing Price in Taichung

Taichung is my hometown in Taiwan, and it’s a lovely city. While buying a house is a milestone in most of people’s life, I aim to investigate the price and trends of current housing market in Taichung. Specifically, I would take the reported trading records of houses in 2019 to do the analysis.

Data pre-processing

Initialization of environment and data

First, we include the libraries applied in this study and load the original data of reported trading records. Then, we drop the incomplete data in the dataframe.

1
2
3
4
5
6
7
library(ggplot2)
library(dplyr)
library(plyr)
library(gridExtra)
thp <- read.csv('./taichung_house_109_wo_address.csv')
thp <- na.omit(thp)
head(thp)
##   district        target land_size_meter_squared   category transaction_date
## 3  Central building_land                   10.30 commercial            10805
## 4  Central building_land                   12.75 commercial            10801
## 5  Central building_land                   45.00 commercial            10712
## 6  Central building_land                  841.60 commercial            10804
## 7  Central building_land                   90.00 commercial            10805
## 8  Central building_land                   17.90 commercial            10801
##       building_type  building_use property_age building_size_meter_squared
## 3             store         other           35                      129.51
## 4 condo_wo_elevator   residential           56                       62.16
## 5  condo_w_elevator   residential           40                      258.77
## 6  condo_w_elevator         other           29                     7601.57
## 7  condo_w_elevator res_com_mixed           40                      542.37
## 8  commerical_space    commercial           29                      207.36
##   price_total price_per_meter_squared parking_size_meter_squared parking_total
## 3     1500000                   11582                          0             0
## 4      777350                   12506                          0             0
## 5     4000000                   15458                          0             0
## 6   117800000                   15497                          0             0
## 7    10000000                   18438                          0             0
## 8     4000000                   19290                          0             0
##   latitude longtitude
## 3 24.14085   120.6838
## 4 24.13721   120.6828
## 5 24.14325   120.6767
## 6 24.13984   120.6818
## 7 24.13940   120.6783
## 8 24.14603   120.6774

Data cleaning

From summary, we find out that there are some data without specific category or usage. Besides, there also some trades only include land or parking lot. Since we would like to focus on housing price, we will drop out these data as well.

1
summary(thp)
##     district                      target      land_size_meter_squared
##  Beitun :3672   building             :   54   Min.   :   0.00        
##  Xitun  :3410   building_land        : 8604   1st Qu.:  13.47        
##  North  :2361   building_land_parking:11367   Median :  21.15        
##  Nantun :2183   land                 :    0   Mean   :  38.07        
##  Dali   :1841   parking_lot          :    0   3rd Qu.:  34.50        
##  South  :1596                                 Max.   :4948.00        
##  (Other):4962                                                        
##          category     transaction_date           building_type  
##              :  836   Min.   :10205    apartment        :11136  
##  agricultural:   65   1st Qu.:10712    single_house     : 3377  
##  commercial  : 4161   Median :10803    condo_w_elevator : 2139  
##  industrial  :   51   Mean   :10758    studio           : 1428  
##  other       :  380   3rd Qu.:10805    condo_wo_elevator: 1130  
##  residential :14532   Max.   :10809    store            :  467  
##                                        (Other)          :  348  
##         building_use    property_age   building_size_meter_squared
##  other        :11253   Min.   : 0.00   Min.   :   0.07            
##  residential  : 7622   1st Qu.: 1.00   1st Qu.:  99.02            
##  res_com_mixed:  631   Median :13.00   Median : 138.94            
##  commercial   :  460   Mean   :15.33   Mean   : 158.09            
##               :   23   3rd Qu.:26.00   3rd Qu.: 186.19            
##  industrial   :   14   Max.   :84.00   Max.   :7667.26            
##  (Other)      :   22                                              
##   price_total        price_per_meter_squared parking_size_meter_squared
##  Min.   :0.000e+00   Min.   :      0         Min.   :  0.00            
##  1st Qu.:5.450e+06   1st Qu.:  48504         1st Qu.:  0.00            
##  Median :8.220e+06   Median :  62357         Median :  0.00            
##  Mean   :1.140e+07   Mean   :  69000         Mean   : 15.21            
##  3rd Qu.:1.250e+07   3rd Qu.:  80623         3rd Qu.: 28.36            
##  Max.   :1.448e+09   Max.   :8203988         Max.   :470.70            
##                                                                        
##  parking_total         latitude       longtitude   
##  Min.   :       0   Min.   :24.05   Min.   :120.6  
##  1st Qu.:       0   1st Qu.:24.13   1st Qu.:120.6  
##  Median :       0   Median :24.16   Median :120.7  
##  Mean   :  436610   Mean   :24.15   Mean   :120.7  
##  3rd Qu.:  750000   3rd Qu.:24.17   3rd Qu.:120.7  
##  Max.   :18400000   Max.   :24.28   Max.   :120.8  
## 
1
2
thp_build <- subset(thp, (thp$target != 'land' & thp$target != 'parking_lot' 
& thp$category != "" & thp$building_use !=""))

Then, we would like to convert the unit for m2 to a more commonly used one in Taiwan, which is pyeong that equals to 3.3 m2. In this work, all of the prices are in New Taiwan Dollars (NTD). Here, we see that the maximum NTD/pyeong is around 27 millions, which is an absolutely unreasonable price in Taichung. Even in Taipei, the NTD/pyeong is around 3 millions for the luxuriest house.

1
2
thp_build <- transform(thp_build, price_per_pyeong = 3.3*price_per_meter_squared)
summary(thp_build$price_per_pyeong)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##        0   161026   207205   229131   267283 27073160

Therefore, we are going to drop the data that excess this value. Then, the result seems more reasonable.

1
2
thp_build <- thp_build[which(thp_build$price_per_pyeong < 3000000),]
summary(thp_build)
##     district                      target      land_size_meter_squared
##  Beitun :3667   building             :    0   Min.   :   0.02        
##  Xitun  :3400   building_land        : 8049   1st Qu.:  13.23        
##  North  :2349   building_land_parking:11111   Median :  20.57        
##  Nantun :2179   land                 :    0   Mean   :  35.35        
##  South  :1593   parking_lot          :    0   3rd Qu.:  32.52        
##  Dali   :1405                                 Max.   :3826.00        
##  (Other):4567                                                        
##          category     transaction_date           building_type  
##              :    0   Min.   :10205    apartment        :10914  
##  agricultural:   63   1st Qu.:10712    single_house     : 2904  
##  commercial  : 4160   Median :10803    condo_w_elevator : 2056  
##  industrial  :   51   Mean   :10758    studio           : 1410  
##  other       :  378   3rd Qu.:10805    condo_wo_elevator: 1098  
##  residential :14508   Max.   :10809    store            :  458  
##                                        (Other)          :  320  
##         building_use    property_age   building_size_meter_squared
##  other        :10720   Min.   : 0.00   Min.   :   0.07            
##  residential  : 7387   1st Qu.: 1.00   1st Qu.:  99.14            
##  res_com_mixed:  565   Median :13.00   Median : 140.01            
##  commercial   :  455   Mean   :15.13   Mean   : 158.62            
##  industrial   :   14   3rd Qu.:25.00   3rd Qu.: 187.00            
##  res_ind_mixed:   14   Max.   :84.00   Max.   :7667.26            
##  (Other)      :    5                                              
##   price_total        price_per_meter_squared parking_size_meter_squared
##  Min.   :        0   Min.   :     0          Min.   :  0.00            
##  1st Qu.:  5480000   1st Qu.: 48793          1st Qu.:  0.00            
##  Median :  8320000   Median : 62778          Median :  0.00            
##  Mean   : 11363156   Mean   : 68582          Mean   : 15.43            
##  3rd Qu.: 12600000   3rd Qu.: 80980          3rd Qu.: 28.42            
##  Max.   :682469956   Max.   :887281          Max.   :470.70            
##                                                                        
##  parking_total         latitude       longtitude    price_per_pyeong 
##  Min.   :       0   Min.   :24.07   Min.   :120.6   Min.   :      0  
##  1st Qu.:       0   1st Qu.:24.14   1st Qu.:120.6   1st Qu.: 161016  
##  Median :       0   Median :24.16   Median :120.7   Median : 207166  
##  Mean   :  443806   Mean   :24.16   Mean   :120.7   Mean   : 226322  
##  3rd Qu.:  800000   3rd Qu.:24.17   3rd Qu.:120.7   3rd Qu.: 267234  
##  Max.   :18400000   Max.   :24.28   Max.   :120.8   Max.   :2928027  
## 

Then, we would like to convert the transaction_date to the date format. Before this, since the data is span from 2018-10 to 2019-09, we are going to drop those before October, 2018.

1
2
3
4
5
6
7
8
9
10
thp_build <- thp_build[which(thp_build$transaction_date >= 10710),]

year <- as.character(thp_build$transaction_date %/% 100 + 1911)
month <- thp_build$transaction_date %% 100
month <- mapvalues(month,from = c(1,2,3,4,5,6,7,8,9,10,11,12),
to = c("01","02","03","04","05","06","07","08","09","10","11","12"))
new_date <- as.Date(paste(year,month,"01",sep="-"))

thp_build <- transform(thp_build, new_transaction_date = new_date)
summary(thp_build)
##     district                      target     land_size_meter_squared
##  Beitun :2950   building             :   0   Min.   :   0.02        
##  Xitun  :2659   building_land        :7747   1st Qu.:  13.36        
##  Nantun :1910   building_land_parking:8319   Median :  21.60        
##  North  :1866   land                 :   0   Mean   :  37.58        
##  South  :1446   parking_lot          :   0   3rd Qu.:  34.78        
##  Taiping:1185                                Max.   :3826.00        
##  (Other):4050                                                       
##          category     transaction_date           building_type 
##              :    0   Min.   :10710    apartment        :8199  
##  agricultural:   61   1st Qu.:10801    single_house     :2734  
##  commercial  : 3084   Median :10804    condo_w_elevator :2023  
##  industrial  :   51   Mean   :10791    studio           :1308  
##  other       :  344   3rd Qu.:10806    condo_wo_elevator:1090  
##  residential :12526   Max.   :10809    store            : 425  
##                                        (Other)          : 287  
##         building_use   property_age   building_size_meter_squared
##  other        :7657   Min.   : 0.00   Min.   :   0.07            
##  residential  :7362   1st Qu.: 4.00   1st Qu.:  96.83            
##  res_com_mixed: 560   Median :21.00   Median : 139.19            
##  commercial   : 454   Mean   :17.96   Mean   : 158.43            
##  industrial   :  14   3rd Qu.:26.00   3rd Qu.: 190.20            
##  res_ind_mixed:  14   Max.   :84.00   Max.   :7667.26            
##  (Other)      :   5                                              
##   price_total        price_per_meter_squared parking_size_meter_squared
##  Min.   :        0   Min.   :     0          Min.   :  0.00            
##  1st Qu.:  5000000   1st Qu.: 47307          1st Qu.:  0.00            
##  Median :  8100000   Median : 59752          Median :  0.00            
##  Mean   : 11131365   Mean   : 66236          Mean   : 12.26            
##  3rd Qu.: 12800000   3rd Qu.: 77191          3rd Qu.: 24.17            
##  Max.   :682469956   Max.   :887281          Max.   :197.16            
##                                                                        
##  parking_total         latitude       longtitude    price_per_pyeong 
##  Min.   :       0   Min.   :24.07   Min.   :120.6   Min.   :      0  
##  1st Qu.:       0   1st Qu.:24.14   1st Qu.:120.6   1st Qu.: 156114  
##  Median :       0   Median :24.16   Median :120.7   Median : 197182  
##  Mean   :  305324   Mean   :24.16   Mean   :120.7   Mean   : 218578  
##  3rd Qu.:       0   3rd Qu.:24.17   3rd Qu.:120.7   3rd Qu.: 254731  
##  Max.   :16200000   Max.   :24.28   Max.   :120.8   Max.   :2928027  
##                                                                      
##  new_transaction_date
##  Min.   :2018-10-01  
##  1st Qu.:2019-01-01  
##  Median :2019-04-01  
##  Mean   :2019-03-18  
##  3rd Qu.:2019-06-01  
##  Max.   :2019-09-01  
## 

House price in different area

First, we would like to focus on either single house, apartment, condo or studio, so we will select them and investigate the house price in every districts.

1
unique(thp_build$building_type)
## [1] store             condo_wo_elevator condo_w_elevator  commerical_space 
## [5] studio            other             apartment         single_house     
## [9] factory          
## 11 Levels: apartment commerical_space condo_w_elevator ... studio
1
2
3
4
5
6
7
8
9
10
11
thp_res_build <- subset(thp_build, (thp_build$building_type == 'condo_wo_elevator' | thp_build$building_type == 'condo_w_elevator' | thp_build$building_type == 'single_house' | thp_build$building_type == 'apartment' | thp_build$building_type == 'studio'))

g1 <- ggplot(aes(x=district, y=price_total, fill=district), data=thp_res_build) +
geom_boxplot() +
scale_y_log10() +
xlab('District') +
ylab('Total Price [NTD]')
g1+theme(axis.text = element_text(size=12, color='black'),
axis.title = element_text(size=14),
legend.text = element_text(size=12, color='black'),
legend.title = element_text(size=14))

Since the median for most of the district is around 10 millions, we zoom in to those around 10 millions. We can see that in average, the price in Central, West, Nantun, and Xitun district are higher. Sepcifically, there are lots of luxury homes, with selling price over 40 millions, in Xitun district.

1
unique(thp_res_build$building_type)
## [1] condo_wo_elevator condo_w_elevator  studio            apartment        
## [5] single_house     
## 11 Levels: apartment commerical_space condo_w_elevator ... studio
1
2
3
4
5
6
7
8
9
g2 <- ggplot(aes(x=district, y=price_total, fill=district), data=thp_res_build) +
geom_boxplot() +
ylim(5e+06,5e+07) +
xlab('District') +
ylab('Total Price [NTD]')
g2+theme(axis.text = element_text(size=12, color='black'),
axis.title = element_text(size=14),
legend.text = element_text(size=12, color='black'),
legend.title = element_text(size=14))

Moreover, we know that it is possible that the total price would be affected by the area of building. Therefore, we also draw the boxplot for NTD/pyeong. We can see that there are few cases in Beitun, Dali, Nantun, and Xitun district are higher than 2 millions, and these are likely the luxury homes. Then, we zoom into the data to see the difference.

1
2
3
4
5
6
7
8
9
options(repr.plot.width = 5, repr.plot.height = 3)
g3 <- ggplot(aes(x=district, y=price_per_pyeong, fill=district), data=thp_res_build) +
geom_boxplot() +
xlab('District') +
ylab('Price [NTD/pyeong]')
g3+theme(axis.text = element_text(size=12, color='black'),
axis.title = element_text(size=14),
legend.text = element_text(size=12, color='black'),
legend.title = element_text(size=14))

Clearly, we see that the median price of NTD/pyeong is similar, the price of Nantun district is the highest, and the Central district is the lowest. This is attributed to Nantun district is the newly developed area, but the Central district is the old urban area.

1
2
3
4
5
6
7
8
9
g4 <- ggplot(aes(x=district, y=price_per_pyeong, fill=district), data=thp_res_build) +
geom_boxplot() +
ylim(0, 1e6) +
xlab('District') +
ylab('Price [NTD/pyeong]')
g4+theme(axis.text = element_text(size=12, color='black'),
axis.title = element_text(size=14),
legend.text = element_text(size=12, color='black'),
legend.title = element_text(size=14))

The effect of age of the buildings and type of the buildings

Then, we know that the age of the building would affect the prices as well. Therefore, we separate the property age into different group and see the influence of it. Interestingly, we see that the price for more than 30 years old home is higher in many districts. One possible reason is that the price per pyeong is higher for single house, and the old home is mostly the single house.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
age_0_10_price_by_district <- thp_res_build %>%
filter(property_age <=10) %>%
dplyr::group_by(district) %>%
dplyr::summarise(count=n(),mean_price_per_pyeong=mean(price_per_pyeong),
median_price_per_pyeong=median(price_per_pyeong))

age_11_20_price_by_district <- thp_res_build %>%
filter(property_age <=20 & property_age > 10) %>%
dplyr::group_by(district) %>%
dplyr::summarise(count=n(),mean_price_per_pyeong=mean(price_per_pyeong),
median_price_per_pyeong=median(price_per_pyeong))

age_21_30_price_by_district <- thp_res_build %>%
filter(property_age <=30 & property_age > 20) %>%
dplyr::group_by(district) %>%
dplyr::summarise(count=n(),mean_price_per_pyeong=mean(price_per_pyeong),
median_price_per_pyeong=median(price_per_pyeong))

age_30_price_by_district <- thp_res_build %>%
filter(property_age > 30) %>%
dplyr::group_by(district) %>%
dplyr::summarise(count=n(),mean_price_per_pyeong=mean(price_per_pyeong),
median_price_per_pyeong=median(price_per_pyeong))

g5 <- ggplot() +
geom_line(aes(x=district, y=median_price_per_pyeong, group=1, color="< 10"),
data=age_0_10_price_by_district, size=1) +
geom_line(aes(x=district, y=median_price_per_pyeong, group=1, color="11 ~ 20"),
data=age_11_20_price_by_district, group=1, size=1) +
geom_line(aes(x=district, y=median_price_per_pyeong, group=1, color="21 ~ 30"),
data=age_21_30_price_by_district, group=1, size=1) +
geom_line(aes(x=district, y=median_price_per_pyeong, group=1, color="> 30"),
data=age_30_price_by_district, group=1, size=1) +
scale_color_manual(name="Age of Homes\n(year)", values=c("< 10"="#999999", "11 ~ 20"="#E69F00", "21 ~ 30"="#0072B2", "> 30"="#009E73")) +
xlab('District') +
ylab('Median Price [NTD/pyeong]')
g5+theme(axis.text = element_text(size=12, color='black'),
axis.title = element_text(size=14),
legend.text = element_text(size=12, color='black'),
legend.title = element_text(size=14))

By grouping the data for different buildings, we see that the price per pyeong is the highest for single house for different ages of homes, and the different is largest for age larger than 30 years old. Moreover, For buildings over 30 years old, we find out that the number of single houses is significantly more than other type of buildings. This follows our intuition that people usually built single house in the past, but we tend to build apartments for higher land capacity nowadays. This is consistent with the much higher number of apartments for those less than 10 years old.

Based on this observation, we are confident that our hypothesis is correct. The median price per pyeong is much higher for properties over 30 years old because there are more single houses, and the price per pyeong for single house is much higher as well.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
age_0_10_price_by_building_type <- thp_res_build %>%
filter(property_age <= 10) %>%
dplyr::group_by(building_type) %>%
dplyr::summarise(count=n(),mean_price_per_pyeong=mean(price_per_pyeong),
median_price_per_pyeong=median(price_per_pyeong))
age_11_20_price_by_building_type <- thp_res_build %>%
filter(property_age <=20 & property_age > 10) %>%
dplyr::group_by(building_type) %>%
dplyr::summarise(count=n(),mean_price_per_pyeong=mean(price_per_pyeong),
median_price_per_pyeong=median(price_per_pyeong))
age_21_30_price_by_building_type <- thp_res_build %>%
filter(property_age <=30 & property_age > 20) %>%
dplyr::group_by(building_type) %>%
dplyr::summarise(count=n(),mean_price_per_pyeong=mean(price_per_pyeong),
median_price_per_pyeong=median(price_per_pyeong))
age_30_price_by_building_type <- thp_res_build %>%
filter(property_age > 30) %>%
dplyr::group_by(building_type) %>%
dplyr::summarise(count=n(),mean_price_per_pyeong=mean(price_per_pyeong),
median_price_per_pyeong=median(price_per_pyeong))

g6 <- ggplot() +
geom_line(aes(x=building_type, y=median_price_per_pyeong, group=1, color="< 10"),
data=age_0_10_price_by_building_type, size=1) +
geom_line(aes(x=building_type, y=median_price_per_pyeong, group=1, color='11 ~ 20'),
data=age_11_20_price_by_building_type, size=1) +
geom_line(aes(x=building_type, y=median_price_per_pyeong, group=1, color='21 ~ 30'),
data=age_21_30_price_by_building_type, size=1) +
geom_line(aes(x=building_type, y=median_price_per_pyeong, group=1, color='> 30'),
data=age_30_price_by_building_type, size=1) +
scale_color_manual(name="Age of Homes (year)", values=c("< 10"="#999999", "11 ~ 20"="#E69F00", "21 ~ 30"="#0072B2", "> 30"="#009E73")) +
xlab('Building Type') +
ylab('Median Price [NTD/pyeong]') +
theme(axis.text = element_text(size=12, color='black'),
axis.title = element_text(size=14),
legend.text = element_text(size=12, color='black'),
legend.title = element_text(size=14),
legend.position = "top")

g7 <- ggplot() +
geom_line(aes(x=building_type, y=count, group=1, color="< 10"),
data=age_0_10_price_by_building_type, size=1) +
geom_line(aes(x=building_type, y=count, group=1, color='11 ~ 20'),
data=age_11_20_price_by_building_type, size=1) +
geom_line(aes(x=building_type, y=count, group=1, color='21 ~ 30'),
data=age_21_30_price_by_building_type, size=1) +
geom_line(aes(x=building_type, y=count, group=1, color='> 30'),
data=age_30_price_by_building_type, size=1) +
scale_color_manual(name="Age of Homes (year)", values=c("< 10"="#999999", "11 ~ 20"="#E69F00", "21 ~ 30"="#0072B2", "> 30"="#009E73")) +
xlab('Building Type') +
ylab('Number of Buildings') +
theme(axis.text = element_text(size=12, color='black'),
axis.title = element_text(size=14),
legend.position = 'none')

grid.arrange(g6, g7, ncol = 1)

Furthermore, we would like to take Nantun and Taiping district as example to further investigate our hypothesis. In both districts, the price per pyeong for single house is much highrer than other type of
buildings, and there are much more single houses for those older than 30 years old.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
age_30_price_nantun <- thp_res_build %>%
filter(property_age > 30 & district == 'Nantun') %>%
dplyr::group_by(building_type)

age_30_price_nantun_summary <- age_30_price_nantun %>%
dplyr::summarise(mean_price_per_pyeong=mean(price_per_pyeong),
median_price_per_pyeong=median(price_per_pyeong))

age_30_price_taiping <- thp_res_build %>%
filter(property_age > 30 & district == 'Taiping') %>%
dplyr::group_by(building_type)

age_30_price_taiping_summary <- age_30_price_taiping %>%
dplyr::summarise(mean_price_per_pyeong=mean(price_per_pyeong),
median_price_per_pyeong=median(price_per_pyeong))

g8 <- ggplot() +
geom_bar(aes(x=building_type), data=age_30_price_nantun, fill = "#0072B2") +
geom_line(aes(x=building_type, y=median_price_per_pyeong/5000, group=1),
data=age_30_price_nantun_summary, color = "#E69F00", size=1) +
scale_y_continuous(sec.axis = sec_axis(~.*5000, name = "Median Price per Pyeong")) +
ylab('Number of Buildings') +
xlab('Building Type') +
ggtitle('Nantun District') +
theme(axis.text.y.left = element_text(size=12, color='#0072B2'),
axis.title.y.left = element_text(size=14, color='#0072B2'),
axis.ticks.y.left = element_line(color = "#0072B2"),
axis.text.y.right = element_text(size=12, color='#E69F00'),
axis.title.y.right = element_text(size=14, color='#E69F00'),
axis.ticks.y.right = element_line(color = "#E69F00"),
axis.text.x = element_text(size=12, color='black'),
axis.title.x = element_text(size=14, color='black'),
axis.ticks.x = element_line(color = "black"),
plot.title = element_text(size=18, color='black'))

g9 <- ggplot() +
geom_bar(aes(x=building_type), data=age_30_price_taiping, fill = "#0072B2") +
geom_line(aes(x=building_type, y=median_price_per_pyeong/3000, group=1),
data=age_30_price_taiping_summary, color = "#E69F00", size=1) +
scale_y_continuous(sec.axis = sec_axis(~.*3000, name = "Median Price per Pyeong")) +
ylab('Number of Buildings') +
xlab('Building Type') +
ggtitle('Taiping District') +
theme(axis.text.y.left = element_text(size=12, color='#0072B2'),
axis.title.y.left = element_text(size=14, color='#0072B2'),
axis.ticks.y.left = element_line(color = "#0072B2"),
axis.text.y.right = element_text(size=12, color='#E69F00'),
axis.title.y.right = element_text(size=14, color='#E69F00'),
axis.ticks.y.right = element_line(color = "#E69F00"),
axis.text.x = element_text(size=12, color='black'),
axis.title.x = element_text(size=14, color='black'),
axis.ticks.x = element_line(color = "black"),
plot.title = element_text(size=18, color='black'))

grid.arrange(g8, g9, nrow = 1)

The relationship among size, type, and price of the buildings

After probing the influence of age, type, and location of the building, we also want to see if there is relationship between size of the building and price. Intuitively, those luxury homes usually are much larger than typical home with better location and newly renovation. We plot the building size vs. NTD/pyeong to investigate the correlation. Here, we find out that there are three outliers that are larger than 500 pyeongs. These transaction may include whole property rather than single household. Therefore, we will exclude them.

1
2
3
4
5
6
7
8
9
10
11
thp_res_build <- transform(thp_res_build, building_size_pyeong = building_size_meter_squared/3.3)

g10 <- ggplot() +
geom_point(aes(x = building_size_pyeong, y = price_per_pyeong, color=building_type),
data = thp_res_build, alpha=0.5) +
xlab('Building Size [Pyeong]') +
ylab('Price [NTD/pyeong]')
g10+theme(axis.text = element_text(size=12, color='black'),
axis.title = element_text(size=14),
legend.text = element_text(size=12, color='black'),
legend.title = element_text(size=14))

Here, we found that aside from single house, other type of buildings in general follow this trend. This can be attributed to the value of land rather than the value of building as we discussed earlier. Since the land of those single houses are valuable, buyers are willing to pay more to buy those smaller or older building. We also zoom in to lower price per pyeong, and the results also consistent with what we observed.

1
2
3
4
5
6
7
8
9
10
g10 <- ggplot() +
geom_point(aes(x = building_size_pyeong, y = price_per_pyeong, color=building_type),
data = thp_res_build, alpha=0.5) +
xlim(0,500) +
xlab('Building Size [Pyeong]') +
ylab('Price [NTD/pyeong]')
g10+theme(axis.text = element_text(size=12, color='black'),
axis.title = element_text(size=14),
legend.text = element_text(size=12, color='black'),
legend.title = element_text(size=14))

1
2
3
4
5
6
7
8
9
10
11
g11 <- ggplot() +
geom_point(aes(x = building_size_pyeong, y = price_per_pyeong, color=building_type),
data = thp_res_build, alpha=0.5) +
xlim(0,500) +
ylim(0,1e6) +
xlab('Building Size [Pyeong]') +
ylab('Price [NTD/pyeong]')
g11+theme(axis.text = element_text(size=12, color='black'),
axis.title = element_text(size=14),
legend.text = element_text(size=12, color='black'),
legend.title = element_text(size=14))