Data Analytics Projects

Ramtin's Data Analytics Projects & Skills

Skills

Languages:

Python (Pandas, Seaborn, Matplotlib, Plotly, Panel Panes, OpenLM, BeautifulSoup)

Data Analysis:

MS Excel (Pivot Tables, VLOOKUP, XLOOKUP, Functions, Conditional Formatting, Dashboards)

Databases:

InfluxDB, SQL (MySQL)

Visualization & Reporting:

Microsoft Power BI, Tableau, AWS Quicksight, Google Looker

Testing & Data Validation:

Data Normalization, Data Cleaning

Project Management:

SCRUM, Agile, Atlassian Confluence, JIRA

Cloud:

Amazon (S3, Athena, Glue, Quicksight), Microsoft Azure (Data Studio, Data Factory, Synapse)

Pipeline Tools:

CI/CD, Jenkins

Version Control:

Git, Github, Gitlab, SVN

Other Skills & Tools:

Parquet, JSON, CSV, Scala

Projects

Wireless Equipment Monitoring System - Engineering Capstone

Designed and implemented a real-time wireless monitoring system to track industrial manufacturing parameters, including magnetic field, humidity, temperature, and pressure.
Developed a node network for sensor communication via Bluetooth, transmitting live data to a centralized database.
Built a responsive web application using HTML, CSS/Bootstrap, PHP, and JavaScript for data visualization and user interaction.
Integrated live data analytics using Google Data Studio (now Looker Studio) and embedded dashboards on a custom-built website.
Designed and managed MySQL databases to store, retrieve, and integrate sensor data for real-time monitoring.
Engineered back-end solutions using Python and the Django framework to enable seamless data flow between sensors, databases, and visualization tools.
Documented technical work, ensuring clear and detailed records of system design and implementation.

Airbnb market in a Tableau dashboard (Tableau Public) - Mississauga, ON

Combined 3 CSV files (Listings, Reviews, Calendar) into a single dataset, optimizing data structure for analysis.
Conducted joins and transformations in Tableau to align listing IDs and ensure data integrity across tables.
Filtered and cleaned 23+ million records, reducing dataset size to comply with Tableau Public’s 15M row limit.
Created zip code-based price analysis, identifying Seattle’s highest-grossing areas for Airbnb rentals.
Built a time-series revenue visualization, highlighting seasonal demand fluctuations and optimal listing periods.
Developed a geo-spatial heatmap to showcase price variations across neighborhoods, aiding investment decisions.
Analyzed Airbnb pricing trends by bedroom count, revealing higher revenue potential for 5+ bedroom properties.
Generated a supply analysis, calculating the total number of listings per bedroom type to assess market competition.
Implemented interactive filters, enabling dynamic comparisons of pricing, location, and seasonal trends.
Designed a fully interactive Tableau dashboard, integrating 5 visualizations for comprehensive Airbnb insights.
Standardized color schemes and tooltips to improve data storytelling and user experience.
Published the project on Tableau Public, making it accessible for stakeholders and portfolio presentation.

Bike Sales Excel Dashboard - Mississauga, ON

Designed an interactive Excel dashboard to analyze bike sales trends using cleaned demographic data.
Performed data cleaning, including duplicate removal, formatting categorical variables (e.g., marital status, gender), and creating calculated fields like age brackets for improved analysis.
Built pivot tables to explore relationships between variables, such as income, commuting distance, and bike purchase decisions.
Developed interactive visualizations, including bar, line, and pie charts, to present key metrics such as customer demographics, purchasing behavior, and income distribution.
Integrated slicers for dynamic filtering by marital status, region, and education level, enabling detailed exploration of customer trends.
Delivered a visually appealing and user-friendly dashboard with a consistent layout and clear visual hierarchy to support data-driven insights.
Demonstrated proficiency in Excel for data cleaning, analysis, and dashboard creation to support decision-making.

Cleaning & Standardizing Customer Data with Pandas - Mississauga, ON

Utilized Pandas to clean and standardize a dataset with 1,020 customer records, ensuring consistency and usability.
Removed duplicate rows using drop_duplicates() to eliminate redundant data entries.
Dropped irrelevant columns such as Not Useful to focus on actionable insights.
Standardized inconsistent Last Name entries by removing unwanted characters (slashes, dots, underscores) using .str.replace().
Formatted and cleaned Phone Number column by removing non-numeric characters with regex and applying a consistent 123-456-7890 format.
Split Address column into Street Address, State, and Zip Code for improved data clarity and usability.
Standardized categorical columns like Paying Customer and Do Not Contact to uniform values ("Yes"/"No").
Removed rows with Do Not Contact = "Yes" or blank Phone Number values to ensure data relevance.
Replaced all missing values with blank strings using fillna() for consistent handling of null entries.
Reset the DataFrame’s index using reset_index() to maintain clean row references after transformations.

E.D.A. with MySQL - Mississauga, ON

Conducted EDA on global layoff data (2020-2023) to identify trends across industries, companies, and countries.
Used SQL to analyze data by grouping dimensions (year, company, industry) and calculating rolling totals with window functions.
Performed ranking using dense rank and CTEs to highlight companies with the highest layoffs per year.
Extracted and transformed date components for time-series analysis to track layoff progression monthly and yearly.
Uncovered insights such as COVID-19's impact on industries like retail and transportation, and large-scale layoffs by Amazon, Google, and Meta.
Improved data visualization by implementing advanced queries to summarize and rank data.
Delivered actionable insights for workforce planning and industry analysis.

Highest Earning Companies in the USA - Web Scraping - Mississauga, ON

Developed a web scraping project to extract and structure data from a Wikipedia page using BeautifulSoup, Requests, and Pandas libraries.
Targeted the "List of largest companies in the United States by revenue" table, containing data such as company name, rank, industry, revenue, and more.
Utilized BeautifulSoup to parse the HTML structure of the web page and identify the desired table through class-based filtering and indexing, resolving issues caused by multiple tables.
Extracted table headers (rank, name, industry, etc.) using the th tags and cleaned the data with Python's .strip() method for consistent formatting.
Parsed rows of data (tr; tags) and their respective columns (td tags) to create structured lists, ensuring accurate alignment of data elements.
Constructed a Pandas DataFrame to organize the extracted data efficiently, facilitating further analysis or manipulation.
Exported the structured data to a CSV file using pandas.to_csv() for easy integration into external tools or dashboards.
Implemented error handling and debugged issues related to inconsistent tags and empty rows, ensuring a smooth and reliable data extraction process.
Demonstrated how to automate repetitive data collection tasks, showcasing proficiency in web scraping and data engineering principles.
Encouraged scalability by leaving the solution adaptable for other web tables or similar datasets.

Survey Analysis and Dashboard in Power BI - Mississauga, ON

Designed an interactive Power BI dashboard using real survey data from 630 data professionals to analyze industry trends and demographics.
Cleaned and transformed raw survey data in Power Query, including handling text inconsistencies, splitting columns, standardizing fields like job titles and programming languages, and calculating average salaries from provided ranges.
Developed multiple visualizations to highlight key insights, including:

Clustered bar chart: Displaying average salary by job title, revealing data scientists as the highest earners with $93,000 on average.
Tree map: Visualizing country-wise survey participation, with breakdowns for regions such as the United States and India.
Gauge charts: Displaying average satisfaction scores for work-life balance (5.74) and salary satisfaction (4.23) on a scale of 0–10.
Donut chart: Comparing average salaries by gender, showing similar earnings for males and females.
Column chart: Highlighting favorite programming languages, with Python leading by a significant margin.
Stacked bar chart: Breaking down job titles and average salaries by programming language preferences.

Implemented interactive filters for exploring data by demographics such as country, gender, and programming language, allowing deeper insights.
Customized the dashboard layout with themes, color schemes, and formatting to improve usability and aesthetics.
Demonstrated proficiency in Power BI for data transformation, visualization, and storytelling, delivering actionable insights for decision-making.

Web Scraping on Amazon (Tim Horton's Coffee) - Mississauga, ON

Built a Python-based web scraper to extract Amazon product details, including titles, prices, and timestamps.
Used Beautiful Soup and Requests libraries to fetch and parse HTML content from static product pages.
Identified specific HTML elements (id="product-title" and id="price-block_ourprice") to extract key product data.
Cleaned and formatted scraped data by removing whitespace and special characters for usability.
Created a CSV file to store product details, including headers for Title, Price, and Date.
Automated data collection with a while loop and time.sleep() to append new data entries at regular intervals.
Included an optional email alert feature using smtplib to notify users of price drops below a set threshold.
Validated and structured the data for downstream analysis, enabling time-series tracking of price changes.
Designed the script to run continuously in the background for long-term data collection.
Highlighted the project as an introduction to web scraping, with potential for scaling to scrape multiple pages or complex datasets.

World Layoffs Data Cleaning - Mississauga, ON

Imported and structured a raw dataset of 2,361 records, addressing issues such as duplicates, inconsistent data formats, and null values.
Identified and removed duplicates using advanced SQL techniques, ensuring data integrity by partitioning and filtering based on unique attributes.
Standardized inconsistent data, including correcting misspellings, trimming white spaces, and normalizing industry categories (e.g., merging "Crypto" and "Cryptocurrency").
Addressed null and blank values through self-joins, updating incomplete records with relevant data from existing rows.
Converted textual date formats to proper date data types, enabling seamless time-series analysis.
Established a raw and staging table workflow to preserve original data integrity while allowing iterative transformations on staging tables.
Removed irrelevant rows and redundant columns to optimize storage and performance for downstream analysis.

World Population Almanac EDA - Mississauga, ON

Conducted an Exploratory Data Analysis on a world population dataset with over 230 rows and multiple columns, identifying patterns, relationships, and outliers.
Reviewed dataset structure using info() and describe() to gain high-level insights, such as column types, null values, and basic statistical summaries (mean, standard deviation, percentiles).
Identified missing values using isnull().sum() and quantified the extent of data gaps for better cleaning decisions.
Determined unique values in categorical columns like Continent and Country using nunique() to validate dataset consistency.
Sorted data based on key columns (e.g., population) using sort_values() to rank countries by specific metrics like highest population or growth.
Computed correlations between numeric columns using corr() and visualized the results via heatmaps with Seaborn's heatmap() to uncover relationships between variables.
Grouped data by continents using groupby() and calculated average population, growth rates, and densities for comparative analysis.
Transposed datasets to reorganize columns and rows for better visualization of trends across decades using .transpose().
Created box plots with boxplot() to detect outliers and visualize the distribution of population values and other metrics.
Filtered columns by data type (e.g., numeric, object) using select_dtypes() to streamline targeted analyses on specific column types.

Popular Baby Names (AWS Glue & Glue DataBrew) - Mississauga, ON

Created a sample project in DataBrew dataset to demonstrate data filtering, grouping, and sorting.
Developed reusable recipes in DataBrew to apply consistent transformations across similar datasets.
Automated ETL workflows by scheduling jobs to process full datasets and save outputs in Amazon S3.
Discussed integration of IAM roles to manage Glue permissions for accessing S3 buckets and performing ETL operations.
Configured AWS Glue Crawlers to automate data cataloging by extracting schema information from S3 datasets.
Created and executed visual ETL jobs in AWS Glue Studio to transform and merge datasets using operations like union, aggregation, and schema modification.
Explored options for output file formats, including CSV and Parquet, and optimized compression for large-scale ETL operations.
Highlighted error resolution and permission adjustments when using Glue roles, ensuring seamless ETL execution.
Emphasized real-world use cases of Glue and DataBrew for production environments, enabling scalable, automated data ingestion and transformation pipelines.