Pythonic Lambda Queries


There is not much literature out there praising the usefulness of lambda functions, so I thought I would make this post about pythonic lambda functions.

Lambda functions originate in lambda calculus as abstractions and application of functions using variable binding and substitution. Such functions take nested form to encapsulate non-specific code with out cluttering code. Because lambda functions are not be bound to any identifier they are also less error-prone.

Given a large data set with the following structure, where dashes represent some data entry such as a string, int or datetime data value, this is often useful in building complex queries of big data.


Screenshot (69)

The code below instantiates lambda functions that will select sub columns from a sequential level column for each given category.

# levels of each subtasks
tutorial= lambda x: x.loc[x['Level']=='1 Tutorial']
training= lambda x: x.loc[x['Level']=='2 Training']
distractors= lambda x: x.loc[x['Level']=='3 Distractors']

Here the dataframe is returned to as a queryable data structure, and is reused for different variety of skills.

snack=df.loc[df['Skill'] == "Snack"]
peeling_bannanas= dat.loc[dat['Subtask']=='Peeling Bannanas']

The above statement would return all the data from the table, selecting “Snack” skill as the basis column and all the tutorial levels from “peeling bananas” subskill. As such these little functions can be efficient components, critical to complex dependency heavy programming.


If you like these blog posts or want to comment and or share something do so below and follow py-guy!

First blog post ~ python packages


Welcome to py-guy! py-guy blog explores science, culture and technology with simple examples and thoughtful discussions. For the first post I will talk about why python is a useful programming language and some nifty things python can do while exploring the MOMA data set. The Museum of Modern Art collection is an excellent data set containing title, artist, date, medium etc. of every artwork in the Museum of Modern Art and is perfect for the scope of this post. To download the data set and run your own analysis I’ve listed the link below.

Python seamlessly enables all stages of data manipulation and utilizing matplotlib, numpy, and pandas packages streamlines the process of intuitive data analysis. At first I felt cheated that I could just import a package to run all the calculations without knowing any of what is going on under the covers but after my first few modules I can say these packages are powerful components in the py-guy toolbox.

import math, json, collections, itertools
from collections import Counter
import numpy as np
import pandas as pd
import matplotlib.pyplot as pp

arts=pd.read_csv("artworks.csv",names=['id','title','artist-id','name','date','medium','dimensions','aquisition-date','credit','catalogue','department','classification','object-number','diameter','circumference','height', 'length', 'width', 'depth', 'weight', 'duration'],dtype='str')

With pandas there is a sort method you can call on any data frame to sort in ascending or descending order. Pandas enhances numpy by including data labels with descriptive indices, robust handling of common data formats and missing data, and relational databases operations.

df['date']=pd.to_numeric(df['date'], errors='coerce')

romanticism= df[(df['date']>=1790) & (df['date']<=1880)]
modern= df[(df['date']>=1860) & (df['date']<=1945)]
contemporary= df[(df['date']>=1946) & (df['date']<=2017)]

df1[-5:] # check if successful

Then using matplotlib set a histogram for dates, setting the bins to the range of art periods to plot a histogram of the given data set.


# list comprehension to pull only dates of type float from df
dat=[d for d in df['date'] if np.isnan(d)==False]

# set plot
pp.ylabel('Number of Artworks')
pp.title('Artworks per Year')


Python language is expressive in its readability and simplicity.  In only a few lines of code you can read, manipulate and plot data.


# according to wikipedia art periods are defined by the
# development of the work of an artist, groups of artists or art movement
# Romanticism -1790 - 1880
# Modern art - 1860 - 1945
# Contemporary art - 1946–present

periods = ('Romanticism','Modern','Contemporary')
y_pos = np.arange(3)
arts = [romanticism.size,modern.size,contemporary.size], arts, align='center', alpha=0.5, color=['coral','yellow','teal'])
pp.xticks(y_pos, periods)
pp.title('Pieces per Movement')


Using collections and list comprehensions is just another powerful component python has to offer. I will make another blog post on python collections and list comprehensions but for now here is a quick example illustrating their utility.

# make a list comprehension
nam=[n for n in df['name']]

# using the from collections import Counter
# above line is equivalent to collections.Counter(nam)

# sort the collection by most artworks

artists=[artist[0] for artist in mc]
common_arts=[arts[1] for arts in mc]

Let’s try a horizontal bar chart with ‘barh.’

y_pos = np.arange(len(common_arts))
pp.figure(figsize=(10, 3))
pp.barh(y_pos, common_arts, align='center', alpha=0.5)
pp.yticks(y_pos, artists)
pp.xlabel('Number of Artworks')
pp.title('Top 10 Artists with most pieces in Moma')



Similarly this process can be repeated for different variables and scopes returning some interesting results.

arts=pd.read_csv("artworks.csv",names=['id','title','artist-id','name','date','medium','dimensions','aquisition-date','credit','catalogue','department','classification','object-number','diameter','circumference','height', 'length', 'width', 'depth', 'weight', 'duration'],dtype='str')

cls=[c for c in df['classification']]

clsArr= [c[0] for c in clsCol]
numCls=[c[1] for c in clsCol]
y_pos = np.arange(len(clsArr))

pp.figure(figsize=(10, 20))
pp.barh(y_pos, numCls, align='center', alpha=0.5)
pp.xlabel('Number of Artworks')
pp.title('Classication of Artworks')


Hello Docker!


This post is about working with containers and python web applications. But first, a little bit on containers and their use in a development ecosystem. Docker is a technology that provides abstraction and automation at an operating system level offering reusability, automation control, version control, per review, and testing capabilities.
This virtualization of bare-bones operating systems into containers enables a micro-services model where all the units of work are divided into separate units of work, facilitating scalability, relability and testing. In essence, docker containers allows developers to be accountable for programming features without having to worry about machine dependencies.

The following code blocks gives a walk-through on building a dockerized web app, the hello world of docker.

First create a new directory called “hello_docker” to contain our webserver. Then inside the hello_docker directory create a another directory creatively named “app.” Inside the app directory, create a file called with the following contents.

from flask import Flask
app = Flask(__name__)
def hello_world():
 return 'Hello World!\n'
if __name__ == '__main__':, host='')


This web server will run off a docker container so lets create a dockerfile that will be used to create our docker image. In the hello_docker directory create a file called Dockerfile with the following contents.

FROM python:3.4
RUN pip install Flask==0.10.1
COPY app /app
CMD ["python", ""]


Dockerfiles contain a set of instructions for docker to create an image to specifications. The first line pulls the python 3 image as a base installation and installs flask, the next lines copy the code from our directory to the image on build and runs the server.

Next to build the sample app run in the terminal,

cd hello_docker
docker build -t hello_docker .
docker run -d -p 5000:5000 hello_docker


Docker run with flags -d -p run the app in the background and forward port 5000 in the container to port 5000 on the host. The command should output a hash confirming a successful execution.

Screen Shot 2018-02-11 at 6.41.17 PM

Our docker image is now built and running! To test the application run the following command verifying the message “Hello Docker!”:

curl $(docker-machine ip default):5000
Hello Docker!


This walkthrough barely scratched the surface of  some of dockers capabilities so if you are interested in experimenting with docker I’ve listed a few links to get started with.

Here are a few more helpful cmds.

#To see your all docker containers
docker ps -a

#to see your running docker containers
docker ps

#to see your docker images
docker images

If you like these blog posts or want to comment and or share something do so below and follow py-guy!

Python Operating System Calls


Python’s lightweight dynamic interface is proven excellent for networking, data scraping and gui generating tasks. Python’s powerful and possibly overlooked os module enables you to take a dynamic approach to operating system programming. With python you can read or write to and from files across different areas on your hard drive and interface with the cmd line simply utilizing a few lines of code. This becomes useful in managing dependencies and project states.

First import sys and os

import sys, os

Then we will create a new method lets call it lister which will take an argument root to create our directory tree. The for loop will iterate through each directory containing files and os.walk() will generate a list of directories either top-down or bottom-up. This will print each directory to the command line console encapsulated by braces.

def lister(root):
....for (thisdir, subshere, fileshere) in os.walk(root):
........print('[' + thisdir + ']')

Each directory at its root yields a tuple containing three variables: dirpath, dirnames, filenames. These variables make up a tree-like data structure (where *  represents many).

                                                        Screen Shot 2017-12-13 at 2.06.33 PM.png

The nested for loop iterates through all the files contained by the directory.

........for fname in fileshere:
............path = os.path.join(thisdir, fname)

The path is collected and concatenated with the filename and then printed to the console.

if __name__ == '__main__':

When is run the root directory must be called with the root directory to pass as an argument so sys knows where to begin the os walk.

Screen Shot 2017-12-13 at 2.13.26 PM

The resulting output might be similar to the stream below.

Screen Shot 2017-12-13 at 2.12.33 PM.png

If you like these blog posts or want to comment and or share something do so below and follow py-guy!

Python Web services, JSON, and ISS Oh My!


In this post I will talk about how to handle JSON data from an external API utilizing python. Making calls to web services is made simple with python, with just a few lines of code you can track the International Space Station’s (ISS) position and time, realtime with a sleek graphical user interface. The following is a link to the project files download,

The Turtle module is an object oriented graphics tool that draws to a canvas or screen. Turtle’s methods derived include forward(), backwards(), left() and right() like telling a turtle in what direction to draw. Turtle will draw over a NASA curated 2D map of Earth, so you should place the ‘map.jpg’ file in your project directory.

So one of the first things we need to do is instantiate a turtle screen with the following command.

# turtle provides a simple graphical interface to display data
# we need a screen to plot our space station position
import turtle
screen= turtle.Screen()

The image size is 720w by 360h so our turtle screen size should fit the image size.

# the image size is 720w x 360h
# set coordinates to map longitude and latitude
# set background picture to NASA world map, centered at 0



To represent the ISS on the 2D map let’s choose an image, it doesn’t have to be the following icon but it’s a nice icon so Houston we have liftoff!

# adds turtle object with name iss to list of objects
iss= turtle.Turtle()


Our location object will tell turtle to write the ISS png file to the screen at a specific position given the latitude and longitude of the ISS. Instantiate a Turtle() to create an object with the following code.


# location object for turtle to plot
location= turtle.Turtle()

# used later to write text

Now, before we can tell our turtle to draw the ISS overhead-time we need the actual latitude and longitude coordinates of the passing ISS. A quick google search gives us the coordinates to store in a dictionary.

# Cape Canaveral ---&gt; 28.392218, -80.607713
# Central Park, NYC ---&gt; 40.782865, -73.965355
# create python dictionary to iterate and plot time of overhead location
coords['nasa_fl']=(28.523397, -80.681874)
coords['centralp']=(40.782865, -73.965355)

To call the api we first need the url, ‘,&#8217; this will tell the api to give us the data we need to extrapolate the ISS data.

import urllib.request
import json

Then to make the call to the url use urllib.request to access the url, querying for each given location. The data is then stored as a result,  loaded in json format. Json stands for JavaScript Object Notation and is used to conveniently organize data.

Screenshot (76)

The lines above are the contents of the json data, data is accessed similar to a python dictionary utilizing keys and indices.

import time

# setup loop to iterate and plot when the iss will be at the plotted location.
for k,v in coords.items():
 pass_url= ''
 pass_url= pass_url+'?lat='+str(v[0])+'&amp;lon='+str(v[1])
 pass_response= urllib.request.urlopen(pass_url)
 pass_result= json.loads(
# write turtle at new location coords
 location.write(time.ctime(over), font=style)

The above code block makes a call to the api, loads the json data, parses the overhead pass time (when the iss will be over the specified position) and then plots the time at the given location.

Screenshot (77)

# init current loc off iss coord
# make call to api
loc_url= ''
# the coords are pcked into jso, iss_position key
location= loc_result['iss_position']
lat= float(location['latitude'])
lon= float(location['longitude'])
<pre># set up while loop to plot moving iss
# iss loc updates approx 3 sec

# update call to webservice to get new coords
 loc_url= ''
 location= loc_result['iss_position']
 lat= float(location['latitude'])
 lon= float(location['longitude'])
# write turtle at new location coords


The above code block makes a call to the api, loads the json data, parses the overhead position at the current geographic coordinates and plots the iss icon. The while loop is infinite to constantly track the iss.

Screenshot (75)

If you like these blog posts or want to comment and or share something do so below and follow py-guy!

Pypack – compact packaging and reusable configuration


In this post I will talk about how to use pypack to program clean and reusable python code. For programming larger complex applications in python, import statements tend to clutter code readability and isn’t practicable to reuse for different projects. Let’s say if you want to code a data science app you have your “go-to” packages like numpy, matplotlib, math etc. or a web crawler like selenium, beautiful soup, and requests with compact packaging and reusable configuration programming is streamlined.

First specify the packages used in your program in a configuration file named ‘config,’ defining imports and statements in key value declaration spaced by one line.

# config file
imports: 'math','json','collections','itertools','numpy','pandas','matplotlib.pyplot',''

statements: '','','','','np','pd','pp'

This will specify a list of imports pypack will pull into the dev environment necessary for your project.

# packages from config file

import math
import json
import collections
import itertools
import numpy as np
import pandas as pd
import matplotlib.pyplot as pp


The above code snippet is the result of the configuration file contents listed at the beginning of the post.  pypack is a simple program written in python with less than 37 lines of code that reads the specified packages from the config file and writes those packages to a new python file for specialized coding projects.

import sys
# config file should be in same folder as pypack
# if not, specify

First pypack opens the config file and reads the contents to memory.

# parse config file
arr= s4.split(',')


Python syntax is such that assigning elements is as simple as encapsulating a loop with brackets. The first four lines of this snippet comma delimit the the config file and assign imports and statement elements to separate arrays.

# list comprehension of imports and statements
arr=[a for a in arr[:7]]
arr1= s4.split(',')[7:]
arr1.insert(0,' ')

Next imports and statements lines are split, concatenated and then double space delimited to an array for list comprehension.

.py =open(sys.argv[1],'w')
for i in range(len(arr)):
   if arr1[i]==' ':
       .py.write('import '+arr[i]+'\n')
   if arr1[i]!=' ':
       .py.write('import '+arr[i]+' as '+arr1[i]+'\n')

Finally pypack opens a new writable python file and effectively iterates through the two arrays, writing imports and statements to the new python file.

If you like these blog posts or want to comment and or share something do so below and follow py-guy!


Object Oriented Python Programming


In python object oriented programming is a simple way to build powerful applications. Consider a real-world object like a pair of shorts. This pair of shorts has a set of attributes and properties to make that pair of shorts unique. For example this pair of shorts might have pockets, buttons and zippers to put on and take off the shorts. Essentially, we have a blueprint to make any pair of shorts (give or take a few unique properties), this is known as a class and is the fundamental concept of object oriented programming and design. Each class defines attributes and methods instantiated by objects. Let’s take a look at some example code of out shorts class.

class shorts:
    def __init__(self,waist,length,color):

In python each class has an __init__ constructor to define unique parameters for each object. Self is the reference to the object at reference and initializes attributes unique to that class. Above, the constructor class takes parameters self, waist length and color and initializes those values as arguments utilized later in the program. Put on and take off methods pass those arguments by reference and updates self.wearing to false to let us know the shorts are off.

    def put_on(self):
        print("Putting on {}x{} {} shorts".format(self.waist,self.length,self.color))

    def take_off(self):
       print("Taking off {}x{} {} shorts".format(self.waist,self.length,self.color))


The code above defines methods to handle attributes of the shorts object ie. self.waist, self.length, self.color and self.wearing. When executed the passed attributes are printed to the console. The code below shows the class, instantiated as an object calling the defined methods.

new_shorts= shorts(32,33,"blue")


Screenshot (6)

If you like these blog posts or want to comment and or share something do so below and follow py-guy!





VR Development – BriteLites


The last post about VR technology I wrote with really no preface as to why write about VR other than I wanted to so this post serves (I should hope) as a preface as to why VR. VR Technology is not groundbreaking, its been around for years along with the buckets of scifi tropes giving VR the center stage, so what makes VR exciting?


There are the Oculus rift, the HTC Vive, PlayStation VR headsets positional tracking and specs to boast and each come with there own set of accessories these however require an expensive high end host PC. But these days everyone with access to a super computer in their pocket has the option of buying one of the mobile headsets to begin their own VR experience.

With mobile headsets like the Google Daydream, Samsung Gear, and flavors of Google Cardboard VR is an affordable option for anyone and everyone to develop and or consume VR content. Unity Game Engine, Unreal game engine to name a few support application development integrating the hardware sdk libraries and a great wealth of developer tools to quickly get develop a VR app.



Then the next question is what makes a great VR experience? I have the Samsung Gear with out any of the peripheral accessories, so I brainstormed simplicity. How can I make a VR experience enjoyable using only interaction supported by the headset, ie motion and touchpad? I reflected upon my early childhood playing litebrite with my friends and how that was such a fun experience and thought that would port to a great VR experience, and have started prototyping.


I decided the simpler the experience would immerse and ultimately give the user an intuitive sense of presence. The controls utilize user head movements to explore a world of spherical lites and the touchpad to select different colored lites and clear lites. BriteLites will be available to download through the oculus store for free in the near distant future.



If you like these blog posts or want to comment and or share something do so below and follow py-guy!


360 Image Viewer VR


This week I’m deviating from posting about data science and python modules to explore virtual reality with my new samsung gear vr to create a 360 image viewer application. This can be done with not much code simply utilizing Unity3d and the OVR sdk to some satisfying results. Following the rest of this post is an abstract walk-through of the steps to create your own 360 image viewer application.

For this tutorial you will need the latest version of Unity 5, and the OVR sdk.

First, make a new Unity Project file with the name “360Viewer”. Choose where you want to save the project on your computer, make sure 3D is selected and click Create Project.
Select OVR from the downloaded folder, drag and drop it into your Assets folder.
Before we jump into our 360Viewer application make sure you’ve created a Plugins folder for your oculus signature file with the structure: Plugins > Android > Assets and then your oculus signature file.


Screenshot (21).png

This file is necessary for the development to access the low-level VR functionality of your device. You can download your oculus signature file when you sign up as an Oculus developer at

Next we want to configure the Unity3D environment to develop for mobile VR. Navigate to File > Build Settings and select Android.

Screenshot (23)

Leave the Development Build field unchecked. This is how you build your application for testing. Click Player Settings and navigate to Player Settings in the Inspector.
Make sure Virtual Reality Supported is checked and select the Oculus SDK. Navigate to identification and set your package name.

Screenshot (16).png

You will want to develop for an API that can run Unity3d Development, Oculus SDK and supports your device. I’ve selected Android 7.0 ‘Nougat’ (API level 24).
Select your minimum and target API. Finally, navigate to Edit > Preferences > External Tools and set the Android SDK and Java SDK paths.

Screenshot (28)


That’s it, you’ve configured your mobile VR development environment!

Now we will create a sphere 3D gameobject by navigating to the Hierarchy, and selecting your scene. Right click and select 3D Object > Sphere.

Screenshot (22)

You will see the Sphere object appear in the scene, let’s make sure it is centered at the origin of our scene at position (0, 0, 0) and set the scale to (100, 100, 100).
Make sure Blend Probes are selected in your sphere’s Mesh Renderer.

Then create a folder for your 360 images or panorama photos and add your images to the folder. These image files will be applied to our sphere’s texture later, but first we need to create a shader that will map the images to the sphere. Create the following shader file and name it DoubleSided from which you will cycle textures.

Shader "DoubleSided" {
Properties {
_Color ("Main Color", Color) = (1,1,1,1)
_MainTex ("Base (RGB)", 2D) = "white" {}
//_BumpMap ("Bump (RGB) Illumin (A)", 2D) = "bump" {}
SubShader {
//UsePass "Self-Illumin/VertexLit/BASE"
//UsePass "Bumped Diffuse/PPL"
// Ambient pass
Pass {
Name "BASE"
Tags {"LightMode" = "Always" /* Upgrade NOTE: changed from PixelOrNone to Always */}
Color [_PPLAmbient]
SetTexture [_BumpMap] {
constantColor (.5,.5,.5)
combine constant lerp (texture) previous
SetTexture [_MainTex] {
constantColor [_Color]
Combine texture * previous DOUBLE, texture*constant
// Vertex lights
Pass {
Name "BASE"
Tags {"LightMode" = "Vertex"}
Material {
Diffuse [_Color]
Emission [_PPLAmbient]
Shininess [_Shininess]
Specular [_SpecColor]
SeparateSpecular On
Lighting On
Cull Off
SetTexture [_BumpMap] {
constantColor (.5,.5,.5)
combine constant lerp (texture) previous
SetTexture [_MainTex] {
Combine texture * previous DOUBLE, texture*primary
FallBack "Diffuse", 1

Create a C# script with name “textureCycler,” and copy and paste the code below. This will enable us to browse our images utilizing the oculus touchpad. Attach the script to your sphere, and select the size and images for your viewer.

using System.Collections;
using System.Collections.Generic;
using UnityEngine;

public class textureCycle : MonoBehaviour {
public Texture[] myTextures = new Texture[4];
int maxTextures;
int arrayPos = 0;
// Use this for initialization
void Start () {
maxTextures = myTextures.Length;
// Update is called once per frame
void Update () {
if (OVRPlayerController.touchRight == true)
GetComponent<Renderer>().material.mainTexture = myTextures[arrayPos++];
if (OVRPlayerController.touchLeft == true)
GetComponent<Renderer>().material.mainTexture = myTextures[arrayPos–];
if (arrayPos == maxTextures)
arrayPos = 0;


Next delete the main camera, navigate to Assets > OVR > Prefabs and drag and drop OVRPlayerController onto your sphere. In the Hierarchy select the OCRPlayerController, if you expand the contents you will see LeftEyeAnchor and Right EyeAnchor children objects. These are utilized to calibrate the virtual environment to the hardware’s optics. We want the OVRPlayerController object to see the inside of the sphere, the OVRPlayerController sends a raycast in the direction the samsung gear is facing and returns the first object it hits.

This is the sphere, we want to see the side of the sphere facing the camera, this is the texture, so select clear flags to skybox, culling mask to default and set field of view to 60. Do the same for the left and right.

Screenshot (29).png

Select your sphere object and make sure you’ve checked static and apply to all children objects. Add some in game lighting so you can view the scene!

If you like these blog posts or want to comment and or share something do so below and follow py-guy!






Topic Discovery in python!


So I still haven’t figured out if I want to make one blog post a week or make more than one post a week but I will try to effectively post at least once a week on topics in computer science. We’ll see where it goes, it will be very exciting and most certainly worth the click.

This week I plan on exploring a data set of over 5,000 film entries scraped from imdb in an effort to briefly discuss machine learning, particularly Latent Dirichlet Allocation. I will not go into any of the theory because that is beyond the scope of this blog, these aren’t the droids you’re looking for.

However, nltk and gensim provide extensive apis that enable processing human language. Anything from stemming down to root words and or tokenizing a document to perform further analysis it is made easy with the above modules.


import pandas as pd
from nltk.tokenize import RegexpTokenizer
from stop_words import get_stop_words
from nltk.stem.porter import PorterStemmer
from gensim import corpora, models
import gensim
import numpy as np
import matplotlib.pyplot as pp
import re


Let’s start by reading in the csv file, movie_metadata.csv. A link to the kaggle download is commented in the code below.



Screen Shot 2017-07-23 at 4.27.21 PM


 Latent Dirichlet Allocation is used to estimate word topic assignments and the frequency of those assignments for a fixed number of words called documents. Let’s assume each document exhibits multiple topics. So we will be looking at columns plot_keywords and genres.




Next let’s remove the pipe with some list comprehension and check if successful.


keyword_strings=[str(d).replace("|"," ") for d in movie['plot_keywords']]

Screen Shot 2017-07-23 at 4.27.29 PM



Stemming reduces words down to their root word and is particularly useful in developing insightful NLP models.


docs=[d for d in keyword_strings if d.count(' ')==5]

#create english stop words list
en_stop= get_stop_words('en')

# create p_stemmer of class PorterStemmer
# stemmer reduces words in a topic to its root word
p_stemmer= PorterStemmer()

# init regex tokenizer
tokenizer= RegexpTokenizer(r'\w+')

# for each document clean and tokenize document string,
# remove stop words from tokens, stem tokens and add to list
for i in docs:
  stopped_tokens=[i for i in tokens if not i in en_stop]
  stemmed_tokens = [p_stemmer.stem(i) for i in stopped_tokens]


The next block of code transforms the granular data into sets of identifiable tokens to manipulate later. To do so, let’s create a dictionary for the terms and value and a matrix for each document and term relationship.


# turn our tokenized docs into a key value dict
dictionary= corpora.Dictionary(texts)
# convert tokenized docs into a doc matrix
corpus=[dictionary.doc2bow(text) for text in texts]


The immediate next line of code generates the Latent Dirichlet Allocation model taking the corpus, the number of topics and the number of training iterations. Printing the model we see there is an estimate of observed words assigned to each topic, effectively (or ineffectively) predicted.




Let’s parse this data into something we can handle. We will also combine both topics into one array to get a nice plot and then plot the data.



for a in top:
  topic_str.append(" ".join(re.findall(r'"([^"]*)"',a[1])))
  w0,w1,w2,w3,w4=map(float, re.findall(r'[+-]?[0-9.]+', a[1]))

words0=topic_str[0].split(" ")
words1=topic_str[1].split(" ")


sorted_list0 = [(k,v) for v,k in sorted([(v,k) for k,v in worddict0.items()])]
sorted_list1 = [(k,v) for v,k in sorted([(v,k) for k,v in worddict1.items()])]y_pos = np.arange(5)

freqs=[a[1] for a in sorted_list0]
ws=[a[0] for a in sorted_list0]
freqs1=[a[1] for a in sorted_list1]
ws1=[a[0] for a in sorted_list1], freqs, align='center', alpha=0.5, color=['coral'])
pp.xticks(y_pos, ws)
pp.ylabel('word contributions')
pp.title('Predicted Topic 0 from IMDB Plot Keywords'), freqs1, align='center', alpha=0.5, color=['coral'])
pp.xticks(y_pos, ws1)
pp.ylabel('word contributions')
pp.title('Predicted Topic 1 from IMDB Plot Keywords')</pre>



This process then can be repeated for any genre of film in the imdb data set.

If you like these blog posts or want to comment and or share something do so below and follow py-guy!

Solar Radiation Prediction


Sci-kit learn is a fantastic set of tools for machine learning in python. It is built on numpy, scipy, and matplotlib introduced in the first py-guy post and makes data analysis and visualization simple and intuitive. sci-kit learn provides classification, regression, clustering, dimensionality reduction, model selection, and preprocessing algorithms making data analysis in python accessible to everyone. We will cover an example of linear regression in this weeks post exploring Solar Radiation data from a NASA hackathon.

First after importing packages let’s read in the SolarPrediction.csv data set. The link to the data set is commented in the code block.


Taking a first look at the data set, specifically, UNIXTime and Date, note it is not formatted to a particular type so we will look at this later.




Calling the describe method on the data frame returns some descriptive statistics on the data set and tells us there might be a relationship between radiation, humidity and or temperature.


So let’s look at a correlation plot to get a better feel for any possible relationships.

truthmat= df.corr()
sns.heatmap(truthmat, vmax=.8, square=True)


There is a strong relationship between radiation and temperature (unsurprisingly or surprisingly) so let’s choose two features with some ambiguity. Pressure and Temperature will do fine, we will use seaborn, a statistical visualization library based on matplotlib to explore the relationship between the two features.

p = sns.jointplot(x="Pressure", y="Temperature", data=df)
p.fig.suptitle('Temperature vs. Pressure')



There is a clear positive trend albeit noisy because of the low pressure gradient. Lets do some quick feature engineering to get a better look at the trend.


#Convert time to_datetime
df['Time_conv'] = pd.to_datetime(df['Time'], format='%H:%M:%S')

#Add column 'hour'
df['hour'] = pd.to_datetime(df['Time_conv'], format='%H:%M:%S').dt.hour

#Add column 'month'
df['month'] = pd.to_datetime(df['UNIXTime'].astype(int), unit='s').dt.month

#Add column 'year'
df['year'] = pd.to_datetime(df['UNIXTime'].astype(int), unit='s').dt.year

#Duration of Day
df['total_time'] = pd.to_datetime(df['TimeSunSet'], format='%H:%M:%S').dt.hour - pd.to_datetime(df['TimeSunRise'], format='%H:%M:%S').dt.hour

First we will convert to date time to manipulate later then add hour, month and year columns for a granular scope. Much Better!


With sklearn linear regression we can train python to model the data and then test the model for its accuracy. We will drop temperature column from the dependent variables  because that is what we want to learn.


y = df['Temperature']
X = df.drop(['Temperature', 'Data', 'Time', 'TimeSunRise', 'TimeSunSet','Time_conv',], axis=1)

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=101)
from sklearn.linear_model import LinearRegression
lm = LinearRegression(),y_train)

Now let’s predict the temperature given the features.


predictions = lm.predict( X_test)
pp.xlabel('Temperature Test')
pp.ylabel('Predicted Temperature')


MSE and RMSE values tell us the there is significance and the model performed well and as you can see there is a positive upward trend centered around the mean.

print(metrics.mean_squared_error(y_test, predictions))
print(np.sqrt(metrics.mean_squared_error(y_test, predictions)))

Screen Shot 2017-07-21 at 8.16.00 PM

If you like these blog posts or want to comment and or share something do so below and follow py-guy!

Note: I referenced kaggler Sarah VCH’s notebook in making todays blog post, specifically the feature engineering code in the fifth code block. If you want to see her notebook I’ve listed the link below.