Visual Analysis of Inc's 5000 most successful companies on 2019
- Installations
- Pandas for data manipulation and analysis
- Numpy for statistical functions
- Matplotlib.pyplot for charts
- Seaborn for data visualization based on matplotlib.
- %matplotlib inline for displaying plots in the notebook
- Project Motivation
This project started to take shape when I was looking for datasets for my first analysis project at Udacity DSND, and I received and email from Data.World with some datasets. One of those datasets was Inc. 5000 most successful companies. I visit the website and I really liked the idea of analyzing that dataset mostly for the three questions I based this project on:
- Which is the hottest industry in 2019 (according to the dataset)?
- What city/state showed most of the industry growth?
- From what type of companies does growth increased the most?
- File Descriptions
- ipynb notebook with the data loading, data wrangling and visual analysis
- csv file containing the list of 5012 most succesful companies in the USA
- How to Interact with your project
The way to go through this notebook is to start with the loading data setup. Run the exploratory data analysis code lines, proceed to run the transformations to clean the data. Finally go to the visual section for an understanding of the distributions of the data
I'm happy to receive feedback and suggestions for better readability, cleaner code or even more analysis ideas or ML model suggestions.
- Authors
I'm the only one who has contributed to this repository so far, but I want to mention some sources where I got the ideas for this repository, starting with Inc. Magazine that provided the project, second to data.world team for giving us access to this dataset and third to the so many Towards Data Science posts on Medium that brought insight to this project, in particular to the data wrangling section.
You can also see my Medium analysis of this project on this link: