Taichung is my hometown in Taiwan, and it’s a lovely city. While buying a house is a milestone in most of people’s life, I aim to investigate the price and trends of current housing market in Taichung. Specifically, I would take the reported trading records of houses in 2019 to do the analysis.
Data pre-processing
Initialization of environment and data
First, we include the libraries applied in this study and load the original data of reported trading records. Then, we drop the incomplete data in the dataframe.
1 | library(ggplot2) |
## district target land_size_meter_squared category transaction_date
## 3 Central building_land 10.30 commercial 10805
## 4 Central building_land 12.75 commercial 10801
## 5 Central building_land 45.00 commercial 10712
## 6 Central building_land 841.60 commercial 10804
## 7 Central building_land 90.00 commercial 10805
## 8 Central building_land 17.90 commercial 10801
## building_type building_use property_age building_size_meter_squared
## 3 store other 35 129.51
## 4 condo_wo_elevator residential 56 62.16
## 5 condo_w_elevator residential 40 258.77
## 6 condo_w_elevator other 29 7601.57
## 7 condo_w_elevator res_com_mixed 40 542.37
## 8 commerical_space commercial 29 207.36
## price_total price_per_meter_squared parking_size_meter_squared parking_total
## 3 1500000 11582 0 0
## 4 777350 12506 0 0
## 5 4000000 15458 0 0
## 6 117800000 15497 0 0
## 7 10000000 18438 0 0
## 8 4000000 19290 0 0
## latitude longtitude
## 3 24.14085 120.6838
## 4 24.13721 120.6828
## 5 24.14325 120.6767
## 6 24.13984 120.6818
## 7 24.13940 120.6783
## 8 24.14603 120.6774Data cleaning
From summary, we find out that there are some data without specific category or usage. Besides, there also some trades only include land or parking lot. Since we would like to focus on housing price, we will drop out these data as well.
1 | summary(thp) |
## district target land_size_meter_squared
## Beitun :3672 building : 54 Min. : 0.00
## Xitun :3410 building_land : 8604 1st Qu.: 13.47
## North :2361 building_land_parking:11367 Median : 21.15
## Nantun :2183 land : 0 Mean : 38.07
## Dali :1841 parking_lot : 0 3rd Qu.: 34.50
## South :1596 Max. :4948.00
## (Other):4962
## category transaction_date building_type
## : 836 Min. :10205 apartment :11136
## agricultural: 65 1st Qu.:10712 single_house : 3377
## commercial : 4161 Median :10803 condo_w_elevator : 2139
## industrial : 51 Mean :10758 studio : 1428
## other : 380 3rd Qu.:10805 condo_wo_elevator: 1130
## residential :14532 Max. :10809 store : 467
## (Other) : 348
## building_use property_age building_size_meter_squared
## other :11253 Min. : 0.00 Min. : 0.07
## residential : 7622 1st Qu.: 1.00 1st Qu.: 99.02
## res_com_mixed: 631 Median :13.00 Median : 138.94
## commercial : 460 Mean :15.33 Mean : 158.09
## : 23 3rd Qu.:26.00 3rd Qu.: 186.19
## industrial : 14 Max. :84.00 Max. :7667.26
## (Other) : 22
## price_total price_per_meter_squared parking_size_meter_squared
## Min. :0.000e+00 Min. : 0 Min. : 0.00
## 1st Qu.:5.450e+06 1st Qu.: 48504 1st Qu.: 0.00
## Median :8.220e+06 Median : 62357 Median : 0.00
## Mean :1.140e+07 Mean : 69000 Mean : 15.21
## 3rd Qu.:1.250e+07 3rd Qu.: 80623 3rd Qu.: 28.36
## Max. :1.448e+09 Max. :8203988 Max. :470.70
##
## parking_total latitude longtitude
## Min. : 0 Min. :24.05 Min. :120.6
## 1st Qu.: 0 1st Qu.:24.13 1st Qu.:120.6
## Median : 0 Median :24.16 Median :120.7
## Mean : 436610 Mean :24.15 Mean :120.7
## 3rd Qu.: 750000 3rd Qu.:24.17 3rd Qu.:120.7
## Max. :18400000 Max. :24.28 Max. :120.8
## 1 | thp_build <- subset(thp, (thp$target != 'land' & thp$target != 'parking_lot' |
Then, we would like to convert the unit for m2 to a more commonly used one in Taiwan, which is pyeong that equals to 3.3 m2. In this work, all of the prices are in New Taiwan Dollars (NTD). Here, we see that the maximum NTD/pyeong is around 27 millions, which is an absolutely unreasonable price in Taichung. Even in Taipei, the NTD/pyeong is around 3 millions for the luxuriest house.
1 | thp_build <- transform(thp_build, price_per_pyeong = 3.3*price_per_meter_squared) |
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 161026 207205 229131 267283 27073160Therefore, we are going to drop the data that excess this value. Then, the result seems more reasonable.
1 | thp_build <- thp_build[which(thp_build$price_per_pyeong < 3000000),] |
## district target land_size_meter_squared
## Beitun :3667 building : 0 Min. : 0.02
## Xitun :3400 building_land : 8049 1st Qu.: 13.23
## North :2349 building_land_parking:11111 Median : 20.57
## Nantun :2179 land : 0 Mean : 35.35
## South :1593 parking_lot : 0 3rd Qu.: 32.52
## Dali :1405 Max. :3826.00
## (Other):4567
## category transaction_date building_type
## : 0 Min. :10205 apartment :10914
## agricultural: 63 1st Qu.:10712 single_house : 2904
## commercial : 4160 Median :10803 condo_w_elevator : 2056
## industrial : 51 Mean :10758 studio : 1410
## other : 378 3rd Qu.:10805 condo_wo_elevator: 1098
## residential :14508 Max. :10809 store : 458
## (Other) : 320
## building_use property_age building_size_meter_squared
## other :10720 Min. : 0.00 Min. : 0.07
## residential : 7387 1st Qu.: 1.00 1st Qu.: 99.14
## res_com_mixed: 565 Median :13.00 Median : 140.01
## commercial : 455 Mean :15.13 Mean : 158.62
## industrial : 14 3rd Qu.:25.00 3rd Qu.: 187.00
## res_ind_mixed: 14 Max. :84.00 Max. :7667.26
## (Other) : 5
## price_total price_per_meter_squared parking_size_meter_squared
## Min. : 0 Min. : 0 Min. : 0.00
## 1st Qu.: 5480000 1st Qu.: 48793 1st Qu.: 0.00
## Median : 8320000 Median : 62778 Median : 0.00
## Mean : 11363156 Mean : 68582 Mean : 15.43
## 3rd Qu.: 12600000 3rd Qu.: 80980 3rd Qu.: 28.42
## Max. :682469956 Max. :887281 Max. :470.70
##
## parking_total latitude longtitude price_per_pyeong
## Min. : 0 Min. :24.07 Min. :120.6 Min. : 0
## 1st Qu.: 0 1st Qu.:24.14 1st Qu.:120.6 1st Qu.: 161016
## Median : 0 Median :24.16 Median :120.7 Median : 207166
## Mean : 443806 Mean :24.16 Mean :120.7 Mean : 226322
## 3rd Qu.: 800000 3rd Qu.:24.17 3rd Qu.:120.7 3rd Qu.: 267234
## Max. :18400000 Max. :24.28 Max. :120.8 Max. :2928027
## Then, we would like to convert the transaction_date to the date format. Before this, since the data is span from 2018-10 to 2019-09, we are going to drop those before October, 2018.
1 | thp_build <- thp_build[which(thp_build$transaction_date >= 10710),] |
## district target land_size_meter_squared
## Beitun :2950 building : 0 Min. : 0.02
## Xitun :2659 building_land :7747 1st Qu.: 13.36
## Nantun :1910 building_land_parking:8319 Median : 21.60
## North :1866 land : 0 Mean : 37.58
## South :1446 parking_lot : 0 3rd Qu.: 34.78
## Taiping:1185 Max. :3826.00
## (Other):4050
## category transaction_date building_type
## : 0 Min. :10710 apartment :8199
## agricultural: 61 1st Qu.:10801 single_house :2734
## commercial : 3084 Median :10804 condo_w_elevator :2023
## industrial : 51 Mean :10791 studio :1308
## other : 344 3rd Qu.:10806 condo_wo_elevator:1090
## residential :12526 Max. :10809 store : 425
## (Other) : 287
## building_use property_age building_size_meter_squared
## other :7657 Min. : 0.00 Min. : 0.07
## residential :7362 1st Qu.: 4.00 1st Qu.: 96.83
## res_com_mixed: 560 Median :21.00 Median : 139.19
## commercial : 454 Mean :17.96 Mean : 158.43
## industrial : 14 3rd Qu.:26.00 3rd Qu.: 190.20
## res_ind_mixed: 14 Max. :84.00 Max. :7667.26
## (Other) : 5
## price_total price_per_meter_squared parking_size_meter_squared
## Min. : 0 Min. : 0 Min. : 0.00
## 1st Qu.: 5000000 1st Qu.: 47307 1st Qu.: 0.00
## Median : 8100000 Median : 59752 Median : 0.00
## Mean : 11131365 Mean : 66236 Mean : 12.26
## 3rd Qu.: 12800000 3rd Qu.: 77191 3rd Qu.: 24.17
## Max. :682469956 Max. :887281 Max. :197.16
##
## parking_total latitude longtitude price_per_pyeong
## Min. : 0 Min. :24.07 Min. :120.6 Min. : 0
## 1st Qu.: 0 1st Qu.:24.14 1st Qu.:120.6 1st Qu.: 156114
## Median : 0 Median :24.16 Median :120.7 Median : 197182
## Mean : 305324 Mean :24.16 Mean :120.7 Mean : 218578
## 3rd Qu.: 0 3rd Qu.:24.17 3rd Qu.:120.7 3rd Qu.: 254731
## Max. :16200000 Max. :24.28 Max. :120.8 Max. :2928027
##
## new_transaction_date
## Min. :2018-10-01
## 1st Qu.:2019-01-01
## Median :2019-04-01
## Mean :2019-03-18
## 3rd Qu.:2019-06-01
## Max. :2019-09-01
## House price in different area
First, we would like to focus on either single house, apartment, condo or studio, so we will select them and investigate the house price in every districts.
1 | unique(thp_build$building_type) |
## [1] store condo_wo_elevator condo_w_elevator commerical_space
## [5] studio other apartment single_house
## [9] factory
## 11 Levels: apartment commerical_space condo_w_elevator ... studio1 | thp_res_build <- subset(thp_build, (thp_build$building_type == 'condo_wo_elevator' | thp_build$building_type == 'condo_w_elevator' | thp_build$building_type == 'single_house' | thp_build$building_type == 'apartment' | thp_build$building_type == 'studio')) |

Since the median for most of the district is around 10 millions, we zoom in to those around 10 millions. We can see that in average, the price in Central, West, Nantun, and Xitun district are higher. Sepcifically, there are lots of luxury homes, with selling price over 40 millions, in Xitun district.
1 | unique(thp_res_build$building_type) |
## [1] condo_wo_elevator condo_w_elevator studio apartment
## [5] single_house
## 11 Levels: apartment commerical_space condo_w_elevator ... studio1 | g2 <- ggplot(aes(x=district, y=price_total, fill=district), data=thp_res_build) + |

Moreover, we know that it is possible that the total price would be affected by the area of building. Therefore, we also draw the boxplot for NTD/pyeong. We can see that there are few cases in Beitun, Dali, Nantun, and Xitun district are higher than 2 millions, and these are likely the luxury homes. Then, we zoom into the data to see the difference.
1 | options(repr.plot.width = 5, repr.plot.height = 3) |

Clearly, we see that the median price of NTD/pyeong is similar, the price of Nantun district is the highest, and the Central district is the lowest. This is attributed to Nantun district is the newly developed area, but the Central district is the old urban area.
1 | g4 <- ggplot(aes(x=district, y=price_per_pyeong, fill=district), data=thp_res_build) + |

The effect of age of the buildings and type of the buildings
Then, we know that the age of the building would affect the prices as well. Therefore, we separate the property age into different group and see the influence of it. Interestingly, we see that the price for more than 30 years old home is higher in many districts. One possible reason is that the price per pyeong is higher for single house, and the old home is mostly the single house.
1 | age_0_10_price_by_district <- thp_res_build %>% |

By grouping the data for different buildings, we see that the price per pyeong is the highest for single house for different ages of homes, and the different is largest for age larger than 30 years old. Moreover, For buildings over 30 years old, we find out that the number of single houses is significantly more than other type of buildings. This follows our intuition that people usually built single house in the past, but we tend to build apartments for higher land capacity nowadays. This is consistent with the much higher number of apartments for those less than 10 years old.
Based on this observation, we are confident that our hypothesis is correct. The median price per pyeong is much higher for properties over 30 years old because there are more single houses, and the price per pyeong for single house is much higher as well.
1 | age_0_10_price_by_building_type <- thp_res_build %>% |

Furthermore, we would like to take Nantun and Taiping district as example to further investigate our hypothesis. In both districts, the price per pyeong for single house is much highrer than other type of
buildings, and there are much more single houses for those older than 30 years old.
1 | age_30_price_nantun <- thp_res_build %>% |

The relationship among size, type, and price of the buildings
After probing the influence of age, type, and location of the building, we also want to see if there is relationship between size of the building and price. Intuitively, those luxury homes usually are much larger than typical home with better location and newly renovation. We plot the building size vs. NTD/pyeong to investigate the correlation. Here, we find out that there are three outliers that are larger than 500 pyeongs. These transaction may include whole property rather than single household. Therefore, we will exclude them.
1 | thp_res_build <- transform(thp_res_build, building_size_pyeong = building_size_meter_squared/3.3) |

Here, we found that aside from single house, other type of buildings in general follow this trend. This can be attributed to the value of land rather than the value of building as we discussed earlier. Since the land of those single houses are valuable, buyers are willing to pay more to buy those smaller or older building. We also zoom in to lower price per pyeong, and the results also consistent with what we observed.
1 | g10 <- ggplot() + |

1 | g11 <- ggplot() + |
