Skills
Languages:
Python (Pandas, Seaborn, Matplotlib, Plotly, Panel Panes, OpenLM, BeautifulSoup)
Data Analysis:
MS Excel (Pivot Tables, VLOOKUP, XLOOKUP, Functions, Conditional Formatting, Dashboards)
Databases:
InfluxDB, SQL (MySQL)
Visualization & Reporting:
Microsoft Power BI, Tableau, AWS Quicksight, Google Looker
Testing & Data Validation:
Data Normalization, Data Cleaning
Project Management:
SCRUM, Agile, Atlassian Confluence, JIRA
Cloud:
Amazon (S3, Athena, Glue, Quicksight), Microsoft Azure (Data Studio, Data Factory, Synapse)
Pipeline Tools:
CI/CD, Jenkins
Version Control:
Git, Github, Gitlab, SVN
Other Skills & Tools:
Parquet, JSON, CSV, Scala
Projects
Wireless Equipment Monitoring System - Engineering Capstone
- Designed and implemented a real-time wireless monitoring system to track industrial manufacturing parameters, including magnetic field, humidity, temperature, and pressure.
- Developed a node network for sensor communication via Bluetooth, transmitting live data to a centralized database.
- Built a responsive web application using HTML, CSS/Bootstrap, PHP, and JavaScript for data visualization and user interaction.
- Integrated live data analytics using Google Data Studio (now Looker Studio) and embedded dashboards on a custom-built website.
- Designed and managed MySQL databases to store, retrieve, and integrate sensor data for real-time monitoring.
- Engineered back-end solutions using Python and the Django framework to enable seamless data flow between sensors, databases, and visualization tools.
- Documented technical work, ensuring clear and detailed records of system design and implementation.
Airbnb market in a Tableau dashboard (Tableau Public) - Mississauga, ON
- Combined 3 CSV files (Listings, Reviews, Calendar) into a single dataset, optimizing data structure for analysis.
- Conducted joins and transformations in Tableau to align listing IDs and ensure data integrity across tables.
- Filtered and cleaned 23+ million records, reducing dataset size to comply with Tableau Public’s 15M row limit.
- Created zip code-based price analysis, identifying Seattle’s highest-grossing areas for Airbnb rentals.
- Built a time-series revenue visualization, highlighting seasonal demand fluctuations and optimal listing periods.
- Developed a geo-spatial heatmap to showcase price variations across neighborhoods, aiding investment decisions.
- Analyzed Airbnb pricing trends by bedroom count, revealing higher revenue potential for 5+ bedroom properties.
- Generated a supply analysis, calculating the total number of listings per bedroom type to assess market competition.
- Implemented interactive filters, enabling dynamic comparisons of pricing, location, and seasonal trends.
- Designed a fully interactive Tableau dashboard, integrating 5 visualizations for comprehensive Airbnb insights.
- Standardized color schemes and tooltips to improve data storytelling and user experience.
- Published the project on Tableau Public, making it accessible for stakeholders and portfolio presentation.
Bike Sales Excel Dashboard - Mississauga, ON
- Designed an interactive Excel dashboard to analyze bike sales trends using cleaned demographic data.
- Performed data cleaning, including duplicate removal, formatting categorical variables (e.g., marital status, gender), and creating calculated fields like age brackets for improved analysis.
- Built pivot tables to explore relationships between variables, such as income, commuting distance, and bike purchase decisions.
- Developed interactive visualizations, including bar, line, and pie charts, to present key metrics such as customer demographics, purchasing behavior, and income distribution.
- Integrated slicers for dynamic filtering by marital status, region, and education level, enabling detailed exploration of customer trends.
- Delivered a visually appealing and user-friendly dashboard with a consistent layout and clear visual hierarchy to support data-driven insights.
- Demonstrated proficiency in Excel for data cleaning, analysis, and dashboard creation to support decision-making.
Cleaning & Standardizing Customer Data with Pandas - Mississauga, ON
- Utilized Pandas to clean and standardize a dataset with 1,020 customer records, ensuring consistency and usability.
- Removed duplicate rows using drop_duplicates() to eliminate redundant data entries.
- Dropped irrelevant columns such as Not Useful to focus on actionable insights.
- Standardized inconsistent Last Name entries by removing unwanted characters (slashes, dots, underscores) using .str.replace().
- Formatted and cleaned Phone Number column by removing non-numeric characters with regex and applying a consistent 123-456-7890 format.
- Split Address column into Street Address, State, and Zip Code for improved data clarity and usability.
- Standardized categorical columns like Paying Customer and Do Not Contact to uniform values ("Yes"/"No").
- Removed rows with Do Not Contact = "Yes" or blank Phone Number values to ensure data relevance.
- Replaced all missing values with blank strings using fillna() for consistent handling of null entries.
- Reset the DataFrame’s index using reset_index() to maintain clean row references after transformations.
E.D.A. with MySQL - Mississauga, ON
- Conducted EDA on global layoff data (2020-2023) to identify trends across industries, companies, and countries.
- Used SQL to analyze data by grouping dimensions (year, company, industry) and calculating rolling totals with window functions.
- Performed ranking using dense rank and CTEs to highlight companies with the highest layoffs per year.
- Extracted and transformed date components for time-series analysis to track layoff progression monthly and yearly.
- Uncovered insights such as COVID-19's impact on industries like retail and transportation, and large-scale layoffs by Amazon, Google, and Meta.
- Improved data visualization by implementing advanced queries to summarize and rank data.
- Delivered actionable insights for workforce planning and industry analysis.
Highest Earning Companies in the USA - Web Scraping - Mississauga, ON
- Developed a web scraping project to extract and structure data from a Wikipedia page using BeautifulSoup, Requests, and Pandas libraries.
- Targeted the "List of largest companies in the United States by revenue" table, containing data such as company name, rank, industry, revenue, and more.
- Utilized BeautifulSoup to parse the HTML structure of the web page and identify the desired table through class-based filtering and indexing, resolving issues caused by multiple tables.
- Extracted table headers (rank, name, industry, etc.) using the th tags and cleaned the data with Python's .strip() method for consistent formatting.
- Parsed rows of data (tr; tags) and their respective columns (td tags) to create structured lists, ensuring accurate alignment of data elements.
- Constructed a Pandas DataFrame to organize the extracted data efficiently, facilitating further analysis or manipulation.
- Exported the structured data to a CSV file using pandas.to_csv() for easy integration into external tools or dashboards.
- Implemented error handling and debugged issues related to inconsistent tags and empty rows, ensuring a smooth and reliable data extraction process.
- Demonstrated how to automate repetitive data collection tasks, showcasing proficiency in web scraping and data engineering principles.
- Encouraged scalability by leaving the solution adaptable for other web tables or similar datasets.
Survey Analysis and Dashboard in Power BI - Mississauga, ON
- Designed an interactive Power BI dashboard using real survey data from 630 data professionals to analyze industry trends and demographics.
- Cleaned and transformed raw survey data in Power Query, including handling text inconsistencies, splitting columns, standardizing fields like job titles and programming languages, and calculating average salaries from provided ranges.
- Developed multiple visualizations to highlight key insights, including:
- Clustered bar chart: Displaying average salary by job title, revealing data scientists as the highest earners with $93,000 on average.
- Tree map: Visualizing country-wise survey participation, with breakdowns for regions such as the United States and India.
- Gauge charts: Displaying average satisfaction scores for work-life balance (5.74) and salary satisfaction (4.23) on a scale of 0–10.
- Donut chart: Comparing average salaries by gender, showing similar earnings for males and females.
- Column chart: Highlighting favorite programming languages, with Python leading by a significant margin.
- Stacked bar chart: Breaking down job titles and average salaries by programming language preferences.
- Implemented interactive filters for exploring data by demographics such as country, gender, and programming language, allowing deeper insights.
- Customized the dashboard layout with themes, color schemes, and formatting to improve usability and aesthetics.
- Demonstrated proficiency in Power BI for data transformation, visualization, and storytelling, delivering actionable insights for decision-making.
Web Scraping on Amazon (Tim Horton's Coffee) - Mississauga, ON
- Built a Python-based web scraper to extract Amazon product details, including titles, prices, and timestamps.
- Used Beautiful Soup and Requests libraries to fetch and parse HTML content from static product pages.
- Identified specific HTML elements (id="product-title" and id="price-block_ourprice") to extract key product data.
- Cleaned and formatted scraped data by removing whitespace and special characters for usability.
- Created a CSV file to store product details, including headers for Title, Price, and Date.
- Automated data collection with a while loop and time.sleep() to append new data entries at regular intervals.
- Included an optional email alert feature using smtplib to notify users of price drops below a set threshold.
- Validated and structured the data for downstream analysis, enabling time-series tracking of price changes.
- Designed the script to run continuously in the background for long-term data collection.
- Highlighted the project as an introduction to web scraping, with potential for scaling to scrape multiple pages or complex datasets.
World Layoffs Data Cleaning - Mississauga, ON
- Imported and structured a raw dataset of 2,361 records, addressing issues such as duplicates, inconsistent data formats, and null values.
- Identified and removed duplicates using advanced SQL techniques, ensuring data integrity by partitioning and filtering based on unique attributes.
- Standardized inconsistent data, including correcting misspellings, trimming white spaces, and normalizing industry categories (e.g., merging "Crypto" and "Cryptocurrency").
- Addressed null and blank values through self-joins, updating incomplete records with relevant data from existing rows.
- Converted textual date formats to proper date data types, enabling seamless time-series analysis.
- Established a raw and staging table workflow to preserve original data integrity while allowing iterative transformations on staging tables.
- Removed irrelevant rows and redundant columns to optimize storage and performance for downstream analysis.
World Population Almanac EDA - Mississauga, ON
- Conducted an Exploratory Data Analysis on a world population dataset with over 230 rows and multiple columns, identifying patterns, relationships, and outliers.
- Reviewed dataset structure using info() and describe() to gain high-level insights, such as column types, null values, and basic statistical summaries (mean, standard deviation, percentiles).
- Identified missing values using isnull().sum() and quantified the extent of data gaps for better cleaning decisions.
- Determined unique values in categorical columns like Continent and Country using nunique() to validate dataset consistency.
- Sorted data based on key columns (e.g., population) using sort_values() to rank countries by specific metrics like highest population or growth.
- Computed correlations between numeric columns using corr() and visualized the results via heatmaps with Seaborn's heatmap() to uncover relationships between variables.
- Grouped data by continents using groupby() and calculated average population, growth rates, and densities for comparative analysis.
- Transposed datasets to reorganize columns and rows for better visualization of trends across decades using .transpose().
- Created box plots with boxplot() to detect outliers and visualize the distribution of population values and other metrics.
- Filtered columns by data type (e.g., numeric, object) using select_dtypes() to streamline targeted analyses on specific column types.
Popular Baby Names (AWS Glue & Glue DataBrew) - Mississauga, ON
- Created a sample project in DataBrew dataset to demonstrate data filtering, grouping, and sorting.
- Developed reusable recipes in DataBrew to apply consistent transformations across similar datasets.
- Automated ETL workflows by scheduling jobs to process full datasets and save outputs in Amazon S3.
- Discussed integration of IAM roles to manage Glue permissions for accessing S3 buckets and performing ETL operations.
- Configured AWS Glue Crawlers to automate data cataloging by extracting schema information from S3 datasets.
- Created and executed visual ETL jobs in AWS Glue Studio to transform and merge datasets using operations like union, aggregation, and schema modification.
- Explored options for output file formats, including CSV and Parquet, and optimized compression for large-scale ETL operations.
- Highlighted error resolution and permission adjustments when using Glue roles, ensuring seamless ETL execution.
- Emphasized real-world use cases of Glue and DataBrew for production environments, enabling scalable, automated data ingestion and transformation pipelines.