Post Image

S&P500 analysis using Python, MySQL and Tableau

| Felipe Lima

See Python code in: https://github.com/felipeOL10/Credit-Card-Customer-MySQL/blob/main/analysis-of-the-stock-market%20(1).ipynb

See MySQL code: https://github.com/felipeOL10/SP500-analysis/blob/main/SP500%20analysis.sql

See Tableau vizualization: https://public.tableau.com/app/profile/felipe.lima6878/viz/SP500analysis/Painel1

This dataset comprises the SP500 companies in 2020, obtained in the kaggle datasets (https://www.kaggle.com/datasets/paytonfisher/sp-500-companies-with-financial-information).

The objective of this project is to analyse both companies and sectors in the SP500. Moreover, I´m going to perform a multiple regression analysis in order to undertand the impacts of fundamentals and ratios in the price of the stocks.

#Get name of columns in the dataset list(data.columns)

['Name', 'Sector', 'Price', 'Price/Earnings', 'Dividend Yield', 'Earnings/Share', '52 Week Low', '52 Week High', 'Market Cap', 'EBITDA', 'Price/Sales', 'Price/Book', 'SEC Filings']

#Missing values

data[pd.isnull(data)].count()

Name 0 Sector 0 Price 0 Price/Earnings 0 Dividend Yield 0 Earnings/Share 0 52 Week Low 0 52 Week High 0 Market Cap 0 EBITDA 0 Price/Sales 0 Price/Book 0 SEC Filings 0 dtype: int64

#Industries in the dataset data.Sector.unique()

array(['Industrials', 'Health Care', 'Information Technology', 'Consumer Discretionary', 'Utilities', 'Financials', 'Materials', 'Real Estate', 'Consumer Staples', 'Energy', 'Telecommunication Services'], dtype=object)

#Number of companies per industrie data.Sector.value_counts()

Consumer Discretionary 84 Information Technology 70 Financials 68 Industrials 67 Health Care 61 Consumer Staples 34 Real Estate 33 Energy 32 Utilities 28 Materials 25 Telecommunication Services 3 Name: Sector, dtype: int64

Descriptive statistics

data.describe()

  • We can see that the mean P/E was 24.8;

  • The lowest market cap for a company in the sp500 was around 2 billion;

  • In average, the dividend yield of a sp500 company was around 1.9%.

  • In average, companies in the sp500 are profitable.

Max and min p/e `

column = data["Price/Earnings"] max_value= column.idxmax() print(max_value) print(column.max())

CRM 520.15

min_value= column.idxmin() min_value print(min_value) print(column.min())

APA -251.53

P/E, P/S, Dividend yield and EBITDA per sector

  • The most profitable sector is the telecommunication services, providing also the highest dividend yield;

  • According to the P/E ratio, the energy sector is the most expensive and the telecommunication services is the least expensive;

  • According to the P/S, the real estate sector is the most expensive and health care the least.

Pearson correlation

  • EBITDA is higly correlated to the market cap of the companies.

  • P/S, P/E and P/B are almost non correlated to EBITDA;

  • Price is positivelly correlated with all multiples, especially with the P/E;

  • EPS is positivelly correlated with the price.

  • EPS is positivelly correlated with EBITDA.

Multiple linear regression

  • The independent variables in the model explain 42,4% of the price.

  • All variables in the model are significant at 1%.

  • EPS has a positive effect on price. In fact, an increase of one unit in the EPS will increase the price in 13.8, ceteris paribus.

  • The price to sales has also a positive effect. Moreover, an increase of one unit in the P/S will increase the price by 7,3.

  • The dividend yield has a negative effect on price. In fact, an increase of one unit in the EPS will decrease the price in 16.2, ceteris paribus.