R Project
Laptop Performance vs Price R Project
Introduction
This summer I bought a laptop with specifications powerful enough to run the engineering software needed for my program. As a college student, I was on a tight budget and wanted to get the most powerful laptop for the least amount of money. I opted to use AI to help me with the search. I have since learned R skills, and used them to answer this question instead of taking AI’s word for it.
The purpose of this R project was to compare varying laptops in terms of price and performance for a productivity audience. I was hoping to find the best laptops for performance per dollar. I predicted that the best laptops would be introductory level gaming laptops due to their emphasis on CPU and their lesser known brand names. I used a data set for a laptop specifications list and a separate data set for CPU benchmark data.
I opted to only analyze laptops using Intel Core i() (4th-9th Gen) CPUs.
Approach
Data sets
Laptop Specs List:
- https://www.kaggle.com/datasets/pradeepjangirml007/laptop-data-set/data. This data was scraped from Smartprix.com. After analyzing the price column rigorously, I concluded that the price presented in “currency units” was in fact set in Indian rupees. The data presented ~4000 different laptops with their specifications: RAM GB, SSD vs HDD, CPU type etc.
CPU Benchmark Data:
- https://www.cpubenchmark.net/cpu_list.php. This data presents a wide list of CPUs and their benchmarks after running a benchmarking software.
Tools
R Studio
Google Gemini. I used this as a source of learning concepts, functions, and debugging not included in my R workshop. Gemini was not used in a copy-paste method.
Approach
The first step in tackling this project was cleaning and standardizing both data sets, so I could assure successful merging into one data set. This consisted of extracting and pasting the names of the CPUs before merging. I used the mean of the benchmarks for Intel Core processors that had the same i() number and generation number for ease of merging. This means that H,U,G etc series models for Intel Core processors of the same generation and i() number are averaged into a single Intel Core benchmark value for the specific generation and i() number. Translating to series models not being accounted for in this project.
To define performance I needed a performance grade variable. To solve this I crafted multiple formulas to conclude in a performance grade for each laptop. I ignored GPUs and HDD storage. The method is as follows including weighing, normalizing, and point system:
Scoring for each variable
CPU Mark = average mark score
RAM = 4 GB = 100, 8 GB = 200, 16 GB = 300, 32 GB = 400
Hard Drive Storage (SSD) = 8GB = 1, 16 GB = 5, 64 GB = 10, 128 GB = 15, 256 GB = 40, 512 GB = 50, 1024 GB = 60
There is a major jump from 128 GB to 256 GB as any laptop using 128GB of SSD or made major use of HDD.
Maximum Component Score Calculations
Average CPU Mark = 21029.467
RAM = 400
Hard Drive = 60
Normalizing Scores
Normalized Score = Component Score/Maximum Component Score x 100
Weights for performance grade algorithm for productivity/workstation audience
- CPU Norm.Score x 70%
RAM Norm.Score x 30%
Hard Drive Norm.Scorex 18%
error using 18%^ results in 118 percent
Final Algorithm
CPU norm.weight.score + RAM norm.weight.score + Hard drive norm.weight.score = Performance Grade
I also had to convert the price from Indian Rupees to USD for which I used the conversion value of 87.61. I also had to make a concluding variable which showed performance unit per dollar (performance grade/price USD). To present the data I chose to make three seperate graphs:
1. All laptops listed on Price vs Performance Grade scatterplot (with top nine highlighted)
2. All laptops listed on Price vs Performance Grade scatterplot (color coded per CPU type)
3. Top ten laptops per performance unit per dollar
Results
Laptop List Cleaning
<- laptop |> mutate( cleaned_cpu = case_when( str_detect(Processor_Name, "Intel Core i[0-9].*\\(([4-9]{1})th Gen\\)") ~ str_extract(Processor_Name, "Intel Core i[0-9]")|> paste0("-", str_extract(Processor_Name, "(?<=\\()([4-9]{1})")), TRUE ~ NA_character_ )) laptop_clean
CPU Benchmark Cleaning
<- cpu |> rename(cpu_name = 'CPU Name') |> mutate( cleaned_cpu = case_when( str_detect(cpu_name, "Intel Core i[0-9]-([4-9]{1})") ~ str_extract(cpu_name, "Intel Core i[0-9]-([4-9]{1})"), TRUE ~ NA_character_ ) ) cpu_clean_data
Averaging CPU Benchmarks per Intel Core i() and Gen
<- cpu_clean_data |> group_by(cleaned_cpu) |> summarise( avg_cpu_mark = mean(`CPU Mark `, na.rm = TRUE) ) average_cpu_scores
Merging the Data Sets
<- laptop_clean |> left_join(average_cpu_scores, by = "cleaned_cpu") merged_clean_data
Removing N/As from Data Set
<- merged_clean_data |> drop_na(cleaned_cpu) final_data
Weighing, Normalizing, and Performance Grade Formulas
<- list( cpu = 21029.467, ram = 400, hard_drive = 60 ) weights <- list( cpu = .7, ram = .3, hard_drive = .18 ) fd_pf <- final_data |> mutate( ram_score = case_when( RAM == "4 GB" ~ 100, RAM == "8 GB" ~ 200, RAM == "16 GB" ~ 300, RAM == "32 GB" ~ 400, TRUE ~ 0 ), hard_drive_score = case_when( SSD == "1024 GB SSD Storage" ~ 60, SSD == "512 GB SSD Storage" ~ 50, SSD == "256 GB SSD Storage" ~ 40, SSD == "128 GB SSD Storage" ~ 15, SSD == "64 GB SSD Storage" ~ 10, SSD == "16 GB SSD Storage" ~ 5, SSD == "8 GB SSD Storage" ~ 1, TRUE ~ 0), cpu_norm = (avg_cpu_mark / max_scores$cpu) * 100, ram_norm = (ram_score / max_scores$ram) * 100, hard_drive_norm = (hard_drive_score / max_scores$hard_drive) * 100, cpu_weighted = cpu_norm * weights$cpu, ram_weighted = ram_norm * weights$ram, hard_drive_weighted = hard_drive_norm * weights$hard_drive, Performance_Grade = cpu_weighted + ram_weighted + hard_drive_weighted ) max_scores
INR to USD
<- fd_pf |> mutate(price_usd = Price/ 87.61) fd_pf_usd
Performance Unit per Dollar
<- fd_pf_usd |> mutate(punit_per_usd = Performance_Grade/ price_usd ) omega_data
All laptops listed on Price vs Performance Grade scatterplot (with top nine highlighted)
ggplot() + geom_point(data = omega_data, mapping = aes(x = price_usd, y = Performance_Grade, position = "jitter" )) + geom_point(data = t10_punit_per_usd, aes(x = price_usd, y = Performance_Grade, color = name_shrt, position = "jitter")) + labs(x = "Price ($)", y= "Performance Grade", title = "Price vs Performance Grade", subtitle = "Top Nine Highlighted", color = "Top Nine Laptop Names")
All laptops listed on Price vs Performance Grade scatterplot (color coded per CPU type)
ggplot(data = omega_data, mapping = aes(x = price_usd, y = Performance_Grade, color = cleaned_cpu, position = "jitter")) + geom_point() + labs(x = "Price (USD)", y = "Performance Grade", title = "Price vs Performance Grade", subtitle = "Color Coded by CPU", color = "CPU Name")
Top ten laptops per performance unit per dollar
<- omega_data |> arrange(desc(punit_per_usd)) |> slice_head(n = 10) |> mutate(name_shrt = str_trunc(`Name`, width = 20)) ggplot(data = t10_punit_per_usd, aes(x = name_shrt, y = punit_per_usd)) + geom_point() + labs(x = "Laptops", y = "Performance per USD", title = "Top Ten Laptops", subtitle = "Per Performance Grade / USD Price") + coord_flip() t10_punit_per_usd
Graphs
Graphs 1-3 listed in order.

Discussion
The data reveals that my hypothesis was incorrect. The highest scoring laptops in performance unit per dollar were consumer grade laptops. The logic holds, as they are focused on being well balanced and affordable.