R Project

Laptop Performance vs Price R Project

Introduction

This summer I bought a laptop with specifications powerful enough to run the engineering software needed for my program. As a college student, I was on a tight budget and wanted to get the most powerful laptop for the least amount of money. I opted to use AI to help me with the search. I have since learned R skills, and used them to answer this question instead of taking AI’s word for it.

The purpose of this R project was to compare varying laptops in terms of price and performance for a productivity audience. I was hoping to find the best laptops for performance per dollar. I predicted that the best laptops would be introductory level gaming laptops due to their emphasis on CPU and their lesser known brand names. I used a data set for a laptop specifications list and a separate data set for CPU benchmark data.

I opted to only analyze laptops using Intel Core i() (4th-9th Gen) CPUs.

Approach

Data sets

Laptop Specs List:

  • https://www.kaggle.com/datasets/pradeepjangirml007/laptop-data-set/data. This data was scraped from Smartprix.com. After analyzing the price column rigorously, I concluded that the price presented in “currency units” was in fact set in Indian rupees. The data presented ~4000 different laptops with their specifications: RAM GB, SSD vs HDD, CPU type etc.

CPU Benchmark Data:

Tools

  • R Studio

  • Google Gemini. I used this as a source of learning concepts, functions, and debugging not included in my R workshop. Gemini was not used in a copy-paste method.

Approach

The first step in tackling this project was cleaning and standardizing both data sets, so I could assure successful merging into one data set. This consisted of extracting and pasting the names of the CPUs before merging. I used the mean of the benchmarks for Intel Core processors that had the same i() number and generation number for ease of merging. This means that H,U,G etc series models for Intel Core processors of the same generation and i() number are averaged into a single Intel Core benchmark value for the specific generation and i() number. Translating to series models not being accounted for in this project.

To define performance I needed a performance grade variable. To solve this I crafted multiple formulas to conclude in a performance grade for each laptop. I ignored GPUs and HDD storage. The method is as follows including weighing, normalizing, and point system:

Scoring for each variable

  • CPU Mark = average mark score

  • RAM = 4 GB = 100, 8 GB = 200, 16 GB = 300, 32 GB = 400

  • Hard Drive Storage (SSD) = 8GB = 1, 16 GB = 5, 64 GB = 10, 128 GB = 15, 256 GB = 40, 512 GB = 50, 1024 GB = 60

There is a major jump from 128 GB to 256 GB as any laptop using 128GB of SSD or made major use of HDD.

Maximum Component Score Calculations

  • Average CPU Mark = 21029.467

  • RAM = 400

  • Hard Drive = 60

Normalizing Scores

Normalized Score =  Component Score/Maximum Component Score x 100

Weights for performance grade algorithm for productivity/workstation audience

  1. CPU Norm.Score  x 70%
  1. RAM Norm.Score  x 30%

  2. Hard Drive Norm.Scorex 18%

error using 18%^ results in 118 percent

Final Algorithm

CPU norm.weight.score + RAM norm.weight.score + Hard drive norm.weight.score = Performance Grade

I also had to convert the price from Indian Rupees to USD for which I used the conversion value of 87.61. I also had to make a concluding variable which showed performance unit per dollar (performance grade/price USD). To present the data I chose to make three seperate graphs:

1. All laptops listed on Price vs Performance Grade scatterplot (with top nine highlighted)

2. All laptops listed on Price vs Performance Grade scatterplot (color coded per CPU type)

3. Top ten laptops per performance unit per dollar

Results

Laptop List Cleaning

laptop_clean <- laptop |>   mutate(     cleaned_cpu = case_when(       str_detect(Processor_Name, "Intel Core i[0-9].*\\(([4-9]{1})th Gen\\)") ~         str_extract(Processor_Name, "Intel Core i[0-9]")|>         paste0("-", str_extract(Processor_Name, "(?<=\\()([4-9]{1})")),              TRUE ~ NA_character_     ))

CPU Benchmark Cleaning

cpu_clean_data <- cpu |>   rename(cpu_name = 'CPU Name') |>   mutate(     cleaned_cpu = case_when(       str_detect(cpu_name, "Intel Core i[0-9]-([4-9]{1})") ~         str_extract(cpu_name, "Intel Core i[0-9]-([4-9]{1})"),       TRUE ~ NA_character_     )    )

Averaging CPU Benchmarks per Intel Core i() and Gen

average_cpu_scores <- cpu_clean_data |>   group_by(cleaned_cpu) |>   summarise(     avg_cpu_mark = mean(`CPU Mark                          `, na.rm = TRUE)   )

Merging the Data Sets

merged_clean_data <- laptop_clean |>   left_join(average_cpu_scores, by = "cleaned_cpu")

Removing N/As from Data Set

final_data <- merged_clean_data |>   drop_na(cleaned_cpu)

Weighing, Normalizing, and Performance Grade Formulas

max_scores <- list(   cpu = 21029.467,   ram = 400,   hard_drive = 60 )  weights <- list(   cpu = .7,    ram = .3,    hard_drive = .18 ) fd_pf <- final_data |>   mutate(     ram_score = case_when(       RAM == "4 GB" ~ 100,       RAM == "8 GB" ~ 200,       RAM == "16 GB" ~ 300,       RAM == "32 GB" ~ 400,       TRUE ~ 0     ),     hard_drive_score = case_when(       SSD == "1024 GB SSD Storage" ~ 60,       SSD == "512 GB SSD Storage" ~ 50,       SSD == "256 GB SSD Storage" ~ 40,       SSD == "128 GB SSD Storage" ~ 15,       SSD == "64 GB SSD Storage" ~ 10,       SSD == "16 GB SSD Storage" ~ 5,       SSD == "8 GB SSD Storage" ~ 1,       TRUE ~ 0),     cpu_norm = (avg_cpu_mark / max_scores$cpu) * 100,     ram_norm = (ram_score / max_scores$ram) * 100,     hard_drive_norm = (hard_drive_score / max_scores$hard_drive) * 100,          cpu_weighted = cpu_norm * weights$cpu,     ram_weighted = ram_norm * weights$ram,     hard_drive_weighted = hard_drive_norm * weights$hard_drive,     Performance_Grade = cpu_weighted + ram_weighted + hard_drive_weighted   ) 

INR to USD

fd_pf_usd <- fd_pf |>   mutate(price_usd = Price/ 87.61)

Performance Unit per Dollar

omega_data <- fd_pf_usd |>   mutate(punit_per_usd = Performance_Grade/ price_usd )

All laptops listed on Price vs Performance Grade scatterplot (with top nine highlighted)

ggplot() +   geom_point(data = omega_data, mapping = aes(x = price_usd, y = Performance_Grade, position = "jitter" )) +   geom_point(data = t10_punit_per_usd, aes(x = price_usd, y = Performance_Grade, color = name_shrt, position = "jitter")) +   labs(x = "Price ($)", y= "Performance Grade", title = "Price vs Performance Grade", subtitle = "Top Nine Highlighted", color = "Top Nine Laptop Names")

All laptops listed on Price vs Performance Grade scatterplot (color coded per CPU type)

ggplot(data = omega_data, mapping = aes(x = price_usd, y = Performance_Grade, color = cleaned_cpu, position = "jitter")) + geom_point() + labs(x = "Price (USD)", y  = "Performance Grade", title = "Price vs Performance Grade", subtitle = "Color Coded by CPU", color = "CPU Name")  

Top ten laptops per performance unit per dollar

t10_punit_per_usd <- omega_data |>   arrange(desc(punit_per_usd)) |>   slice_head(n = 10) |>   mutate(name_shrt = str_trunc(`Name`, width = 20)) ggplot(data = t10_punit_per_usd, aes(x = name_shrt, y = punit_per_usd)) +    geom_point() + labs(x = "Laptops", y = "Performance per USD", title = "Top Ten Laptops", subtitle = "Per Performance Grade / USD Price") + coord_flip() 

Graphs

Graphs 1-3 listed in order.

Discussion

The data reveals that my hypothesis was incorrect. The highest scoring laptops in performance unit per dollar were consumer grade laptops. The logic holds, as they are focused on being well balanced and affordable.