Project: Building and Evaluating a Stock Index

market_cap_series.csv is a time series of market capitalizations for various companies.

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load market capitalization data from CSV, parsing dates and setting 'Date' as the index
market_cap_series = pd.read_csv('market_cap_series.csv', parse_dates=['Date'], index_col='Date')
print(market_cap_series.head())

To visualize the change in market capitalization for each company, I took their earliest and latest values and plotted them on a bar graph.

# Select the market capitalization of the first and last trading days
first_market_cap = market_cap_series.iloc[0]
last_market_cap = market_cap_series.iloc[-1]

# Concatenate and plot the market capitalizations of the first and last trading days
pd.concat([first_market_cap, last_market_cap], axis=1).plot(kind='barh')
plt.show()

To develop the index, I summed the market capitalizations of all the companies, normalizing the sum.

# Aggregate the total market capitalization for each trading day and print the result
raw_index = market_cap_series.sum(axis=1)

# Normalize the aggregated market capitalization to the first trading day and scale to 100
index = raw_index.div(raw_index.iloc[0]).mul(100)
print(index)

To evaluate the index, I first calculated the percentage change.

# Calculate the overall return of the index from the first to the last trading day
index_return = ((index.iloc[-1] - index.iloc[0]) / index.iloc[0]) * 100
print(index_return)

Next, I obtained the total market capitalization through company_info.csv, calculated the portion each company contributed to it, and then multiplied those values by the percentage change of the index to see how each company contributed to the index return.

# Load company information data from CSV
company_info = pd.read_csv('company_info.csv', index_col='Stock Symbol')

# Extract the 'Market Capitalization' column from the company information data
market_cap = company_info['Market Capitalization']

# Calculate the total market capitalization of all companies
total_market_cap = market_cap.sum()

# Calculate the weight of each company's market capitalization relative to the total market capitalization and print the result
weights = market_cap.div(total_market_cap)

# Calculate and plot the contribution of each company to the overall index return, sorted by contribution
index_contribution = weights.mul(index_return).sort_values()
index_contribution.plot(kind='barh')
plt.show()

Another way to evaluate the index is by comparing it against a benchmark.

# Convert the normalized index series to a DataFrame for further analysis
data = index.to_frame('Index')

# Load Dow Jones Industrial Average (DJIA) data from CSV, parsing dates and setting 'DATE' as the index
Djia = pd.read_csv('djia.csv', parse_dates=['DATE'], index_col='DATE')

# Normalize the DJIA series to the first trading day and scale to 100, then add as a new column to the data DataFrame
djia = Djia.div(Djia.iloc[0]).mul(100)
data['DJIA'] = djia

# Calculate and print the total return for both the custom index and DJIA
print((data.iloc[-1] / data.iloc[0] - 1) * 100)

# Plot the normalized values of both the custom index and DJIA
data.plot()
plt.show()

One final method to evaluate the index is by analyzing correlations between the stocks within the index. stock_prices.csv is a time series containing closing stock price information for each company.

# Load stock price data from CSV, parsing dates and setting 'Date' as the index
stock_prices = pd.read_csv('stock_prices.csv', parse_dates=['Date'], index_col='Date')

# Calculate the daily returns of the stocks
returns = stock_prices.pct_change()

# Calculate and print the pairwise correlations of daily returns between stocks
correlations = returns.corr()

# Plot a heatmap of the daily return correlations between stocks
sns.heatmap(correlations, annot=True)
plt.title('Daily Return Correlations')
plt.show()