Solar Radiation Prediction

07-21-2017

Sci-kit learn is a fantastic set of tools for machine learning in python. It is built on numpy, scipy, and matplotlib introduced in the first py-guy post and makes data analysis and visualization simple and intuitive. sci-kit learn provides classification, regression, clustering, dimensionality reduction, model selection, and preprocessing algorithms making data analysis in python accessible to everyone. We will cover an example of linear regression in this weeks post exploring Solar Radiation data from a NASA hackathon.

First after importing packages let’s read in the SolarPrediction.csv data set. The link to the data set is commented in the code block.


 

Taking a first look at the data set, specifically, UNIXTime and Date, note it is not formatted to a particular type so we will look at this later.

headshape.png

 

df.shape
df.describe()

Calling the describe method on the data frame returns some descriptive statistics on the data set and tells us there might be a relationship between radiation, humidity and or temperature.

descr

So let’s look at a correlation plot to get a better feel for any possible relationships.

truthmat= df.corr()
sns.heatmap(truthmat, vmax=.8, square=True)

matrix

There is a strong relationship between radiation and temperature (unsurprisingly or surprisingly) so let’s choose two features with some ambiguity. Pressure and Temperature will do fine, we will use seaborn, a statistical visualization library based on matplotlib to explore the relationship between the two features.

p = sns.jointplot(x="Pressure", y="Temperature", data=df)
pp.subplots_adjust(top=.9)
p.fig.suptitle('Temperature vs. Pressure')

 

temp_press.png

There is a clear positive trend albeit noisy because of the low pressure gradient. Lets do some quick feature engineering to get a better look at the trend.

 

#Convert time to_datetime
df['Time_conv'] = pd.to_datetime(df['Time'], format='%H:%M:%S')

#Add column 'hour'
df['hour'] = pd.to_datetime(df['Time_conv'], format='%H:%M:%S').dt.hour

#Add column 'month'
df['month'] = pd.to_datetime(df['UNIXTime'].astype(int), unit='s').dt.month

#Add column 'year'
df['year'] = pd.to_datetime(df['UNIXTime'].astype(int), unit='s').dt.year

#Duration of Day
df['total_time'] = pd.to_datetime(df['TimeSunSet'], format='%H:%M:%S').dt.hour - pd.to_datetime(df['TimeSunRise'], format='%H:%M:%S').dt.hour
df.head()

First we will convert to date time to manipulate later then add hour, month and year columns for a granular scope. Much Better!

screen-shot-2017-07-21-at-8-05-13-pm.png

With sklearn linear regression we can train python to model the data and then test the model for its accuracy. We will drop temperature column from the dependent variables  because that is what we want to learn.

 

y = df['Temperature']
X = df.drop(['Temperature', 'Data', 'Time', 'TimeSunRise', 'TimeSunSet','Time_conv',], axis=1)

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=101)
from sklearn.linear_model import LinearRegression
lm = LinearRegression()
lm.fit(X_train,y_train)

Now let’s predict the temperature given the features.

 

X.head()
predictions = lm.predict( X_test)
pp.scatter(y_test,predictions)
pp.xlabel('Temperature Test')
pp.ylabel('Predicted Temperature')

linreg.png

MSE and RMSE values tell us the there is significance and the model performed well and as you can see there is a positive upward trend centered around the mean.

print(metrics.mean_squared_error(y_test, predictions))
print(np.sqrt(metrics.mean_squared_error(y_test, predictions)))

Screen Shot 2017-07-21 at 8.16.00 PM

If you like these blog posts or want to comment and or share something do so below and follow py-guy!

Note: I referenced kaggler Sarah VCH’s notebook in making todays blog post, specifically the feature engineering code in the fifth code block. If you want to see her notebook I’ve listed the link below.

https://www.kaggle.com/sarahvch/investigating-solar-radiation

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s