# How I built basic reports š of the US Stock Market Big Data using Python?

# Introduction

Hello, my friends š

My name is Batyr and I am a graduate student, Computer Science major, at **Pace University**. And when I heard that our university offers a class (CS660) related to one of the hottest topic in IT world nowadays, Big Data Analytics, I immediately decided to register for this class and here I am writing my first post in Medium and sharing my thoughts about building some basic reports based on the big data from US Stock Market for 2016 year (taken from https://github.com/pydata/pandas-datareader). I will be using the most widely used language Python in Big Data Analytics and some of the very powerful open-source libraries, such as Pandas, Numpy, MatPlotLib and Seaborn. This is the first assignment, Project #1, that we have had from our professor. And if you have any improvements or thoughts about my work, please, feel free to share them in comments section, since we are all here for gaining some valuable knowledge š

# Contents

- Prerequisites
- #1. List all publicly traded corporations
- #2. Find relative returns for them
- #3. Best and worst performed corporations. Plot their total relative returns + using box and whiskers graphs
- #4. Total revenue if $10k is invested
- #5. Total return for Apple (40%), Amazon (30%) and Google (30%)

## Prerequisites

There is a short list of tools and applications that should be installed on your machine for successful proceeding all the steps below:

- Install
*Python (Anaconda)*on your machine and make sure that your versions are compatible. You can use this link to install Conda and this one to install Python using*conda*in the terminal. - Anaconda Navigator, from where you will be able to launch Jupyter Notebook local server.
- Thatās it. Now you can check versions of installed tools in the terminal window. I have following versions installed on my machine: python v. = 2.7.13, anaconda v. = 4.2.0, conda v. = 4.3.25

## #1. List all publicly traded corporations

So letās start! Our first task is to print a list of all publicly traded corporations found by calling the **get_nasdaq_symbols** module that is the part of the **pandas_datareader**. This list has to include a corporationās *Symbol*, *Security Name* and *Listing exchange* fields. If you would like to read more about Pandas DataReader then this link might be useful.

Before we proceed to printing a list of corporations, we have to import some libraries and check if they are set up correctly.

Now we can start talking about how to get a list with all the fields mentioned before. I am going to useĀ **.loc[]** function of Pandaās DataFrame that allow us to get labels of our table. Also we need to transform *List Exchange* symbols to more human-readable strings. You can get all symbols definitions here. And then we print first 5 rows (that is default value forĀ **.head()** function) of our table. And this is how my code looks like:

## #2. Find relative returns for them

Of course, if we start calculating relative returns for each corporation listed in the task #1, we will spend a lot of time. Accordingly, I have decided to limit number of corporations and picked following corporations: *Apple* (**APPL**), *Amazon* (**AMZN**), *Google* (**GOOGL**) and *IBM* (**IBM**), besides we are going to use three of them in the task #5. Also we are going to use *Yahoo* as data source.

Before moving on to the coding part, letās check what definition *Wikipedia* gives us ā**Relative return** is a measure of the return of an investment portfolio relative to a theoretical passive reference portfolio or benchmarkā. So basically, it shows a return achieved by an asset over a specific time period. We need to calculate relative returns for 2016 year based on the formula below:

As we have all of the *Close Prices* for 2016 year for each corporations we are working with, we can start calculating relative returns. Before we start implementing this, I would like to share our professorās note related to relative returns:

There are several ways one can go about when a trading strategy is to be developed. One approach is to use time-series which correspond not to actual values but changes in the monetary value of the asset. These time-series can and do assume negative values and also, their statistical properties are usually more stable than the ones of price time-series. The most frequently used forms used are relative returns defined as:

r-relative, r(t)=(p(t)āp(tā1)/p(tā1))

log-returns defined as r(t)=log(p(t)/p(t-1)), where p(t) is the price of the asset at time t.

For example, if p(t)=101 and p(tā1)=100 then r-relative (t)= (101ā100)/100 = 1%

So now we are ready to code this:

## #3. Best and worst performed corporations. Plot their total relative returns + using box and whiskers graphs

If we have all information about *Close Prices* for each corporations, we can calculate the best and worst performed of them, by finding difference between maximum and minimum prices of *Close Price* during 2016 year, and we just need to pick minimum and maximum number of them to know which one is best or worst performed. It seems like pretty easy task:

So as we can see on the picture, the best performed corporation is *Amazon* and the worst one is *Apple*. **.describe()** function of Pandaās DataFrame generates descriptive statistics that summarize the central tendency, dispersion and shape of a datasetās distribution, excluding NaN values (more info). And we just added extra label **min-max diff** to the data frame.

Here is a code for plotting their relative returns. For more information on how to use *MatPlotLib.PyPlot*, please go through this link or just google it š

And now we can plot whisker graphs for both of the corporations:

## #4. Total revenue if $10k is invested

The exact question for this task is āYou have invested $10,000 in 2016 by splitting the capital into the 2 securities from step #3. What would have been your total (relative) return?ā. So as we donāt have any limitations about which method of calculation to use, I have chosen to the way when you invest on **1.JAN.2016** and sell on **31.DEC.2016**. So here is a code:

And the output is:

1. Relative return for Apple = 12.11%

2. Relative return for Amazon = 4.98%

So as we can see, if use this method, we would earn more if we invest to *Apple* than to *Amazon*, but actually if we guess the dates of maximum and minimum values for *Close Price*, we could earn much more!

## #5. Total return for Apple (40%), Amazon (30%) and Google (30%)

And the last task of our assignment is to calculate total (relative) return for 2016, if we invested 40% on *Apple*, 30% on *Amazon* and 30% on *Google*, so thatās why I have started with these corporations in step #2. I am going to use the same method of calculating relative return as in the previous step:

Output:

1. Relative return for Apple = 11.54%

2. Relative return for Amazon = -0.02%

3. Relative return for Google = -20.75%

Our capital on the beginning of 2016 ā $10.000. Our income for the end of 2016 = 9838.30

If we change our method of calculation, because I used the simplest one, we can achieve more total revenue in the end.

So here I am going to stop because we successfully achieved what we were asked for in the beginning of this post. And again, please, if you have any improvements or thoughts about my work, feel free to share them in comments section, since we are all here for gaining some valuable knowledge š And here is a link to Github repository.

*Like this story? Please share and recommend to others. Together, we can achieve more šŖ*