DataVista is a data analysis tool that allows users to clean, analyze, and visualize large datasets efficiently. It provides both general analysis and custom analysis, offering insights into numerical and categorical data.
- General Analysis:
- Displays dataframe shape, column info, missing values, duplicate rows, and summary statistics.
- Generates correlation matrices for numerical columns.
- Provides first and last five rows for quick data preview.
- Custom Analysis:
- Analyze relationships between two numerical columns.
- Explore insights between a numerical and a categorical column.
- Visualization (Optional):
- Generates histograms, bar charts, pie charts, and word clouds for string columns.
- Displays a heatmap for correlation analysis.
- File I/O Support:
- Users can save reports locally for future reference.
- Exception Handling:
- Ensures smooth user experience with robust error handling.
- Python (Core Programming)
- Pandas (Data Manipulation)
- NumPy (Numerical Computations)
- Matplotlib (Visualization)
- Seaborn (Advanced Plotting)
- Tabulate (Formatted Data Representation)
- WordCloud (Text Analysis)
- Clone the repository:
git clone https://github.com/yourusername/DataVista.git cd DataVista
- Install dependencies:
pip install pandas numpy matplotlib seaborn tabulate wordcloud