Data Whisperer is an AI-driven tool that automates exploratory data analysis (EDA), generates actionable insights, and enables natural language querying of datasets. Built for the Deepnote x Streamlit Hackathon, it combines the power of AI (Google Gemini) with interactive visualizations and professional reporting.
Traditional EDA tools require technical expertise and hours of manual work. Data Whisperer solves this by:
- Democratizing Data Analysis: Non-technical users can explore data with zero coding.
- Speed: Get insights and visualizations in seconds, not hours.
- Actionability: AI-generated recommendations and exportable reports.
- Focus: Query subsets of data for targeted analysis.
- Instantly generates statistical summaries, histograms, correlations, and outlier detection.
- Handles missing values, duplicates, and data types automatically.
- Precomputed Insights: AI analyzes your dataset and highlights key patterns.
- Conversational Chat: Ask follow-up questions and get instant answers.
- Dynamic Recommendations: AI suggests next steps based on your data.
- Ask questions like "Show students with grades above 90" or "Find customers from California with purchases > $500".
- AI converts your query into SQL-like syntax, extracts subsets, and runs EDA on them.
- Dedicated AI insights and visualizations for subsets.
- Generate professional reports with:
- AI-generated insights.
- Visualizations (histograms, box plots, heatmaps).
- Summary statistics and recommendations.
- Perfect for sharing with stakeholders.
- Dark theme with gradient accents.
- Interactive Plotly visualizations.
- Collapsible sections and expandable insights.
- Upload Data: CSV/Excel file + optional EDA JSON (for precomputed stats).
- Explore Visualizations: Navigate tabs for numerical, categorical, and correlation analysis.
- Ask Questions: Use the AI chat or DataPeek to analyze subsets.
- Export Results: Generate PowerPoint reports with one click.
- Frontend: Streamlit (interactive UI/UX).
- AI Engine: Google Gemini (insights generation).
- Visualizations: Plotly Express.
- Reporting: Python-PPTX.
- Data Processing: Pandas, NumPy.
git clone https://github.com/satti-hari-krishna-reddy/Data-Whisperer
cd Data-Whisperer
pip install -r requirements.txt
Get your API key from Google AI Studio and export it as an environment variable:
export GEMINI_API_KEY="your_api_key_here"
streamlit run dataviz.py
Open your browser and go to:
http://localhost:8501
Try the included Students_Grading_Dataset.csv
or upload your own!
The EDA JSON file reveals a dataset of 5,000 students with 23 columns, including:
- Performance Metrics: Grades, study hours, attendance.
- Demographics: Age, gender, department.
- Behavioral Data: Stress levels, sleep hours, extracurricular activities.
- 59.8% of students scored "A" grades (potential grade inflation).
- Weak correlations between variables (suggests non-linear relationships).
- 10.3% of students lack internet access at home.
"Show students in CS department with final_score > 90"
"Find students with stress_level > 7 and sleep_hours < 6"
"Analyze students without internet access"
- Generates SQL-like query.
- Runs EDA on the subset.
- Provides insights + visualizations.
Generate an AI-powered PowerPoint report with insights and charts.
This project is licensed under the MIT License. See LICENSE for details.