This analysis is on average home prices in US metro cities over the last 10 years. The specific focus is on the housing downturn and more recent recovery. The main conclusion is that there were a number of cities especially in the south that benefitted from the downturn. I.e. the home prices in these areas primarily increased while other cities experienced a sharp decline. On the other hand, during the recovery, those cities experienced decline while other cities recovered.¶
In [1]:
%pylab inline
import pandas as pd
metro_df = pd.read_csv("Metro_Zhvi_3bedroom.csv")
metro_df[1:10]
Out[1]:
In [2]:
import matplotlib.pyplot as plt
res = pd.isnull(metro_df).sum()
plt.plot(res)
# Null data reduces over time
Out[2]:
In [3]:
metro_df = metro_df.transpose()
In [4]:
# set column names to proper city names. Remove the 1st row since it contains the city names
metro_df.columns = metro_df.iloc[0,:]
metro_df = metro_df.ix[1:,:]
metro_df[0:10]
Out[4]:
In [5]:
# lets plot how home prices in the US has changed over time
Avg_US_Home = metro_df["United States"]
Avg_US_Home[1:].plot()
# home prices dropped in 2009 and started increase in late 2011
Out[5]:
In [6]:
MultipleHomes = metro_df.iloc[1:,1:10]
plot_obj = MultipleHomes.plot(figsize = (20,20),legend=False)
plot_obj.legend()
# plot across multiple cities
Out[6]:
In [6]:
In [21]:
# would be interesting to do a correlation between cities. Derive a correlation matrix
MultipleHomes = metro_df[:]
MultipleHomes = MultipleHomes.astype(float) # This is to make the pandas correlation function work....
corr_matrix = MultipleHomes.corr()
corr_matrix.iloc[0:8,0:8]
Out[21]:
In [115]:
# print out the list of least correlated cities to most
corr_matrix.iloc[0:,0].sort()
LeastCorrelated = corr_matrix.sort("United States",ascending=True).iloc[0:,0]
LeastCorrelated
Out[115]:
In [121]:
# Lets plot a few of the least correlated cities.
metro_df[LeastCorrelated.index[0:5]].plot()
metro_df["United States"].plot(figsize=(20,20))
fig.legend()
Out[121]:
No comments:
Post a Comment