Project 1 Review

I recently completed Project 1 as a part of my curriculum for ST 558: Data Science for Statisticians.

Purpose of the project

The project’s main objective was to familiarize participants with R programming language and R environment for data science. It involved creating functions to read in csv files, process and transform data sets and combine them to be plotted to derive insights. Additionally, it attempted to provide a custom function that would automatically plot the returned data. The data sets used in the project contained information from the census bureau.

Learnings

Given that I am new to R, this was the ideal project for me to practice working with dataframes/tibbles, which appear to be crucial components of data science. I gained a lot of knowledge about R as a novice. I’ve attempted to list the key lessons I’ve taken away from this project:

Dplyr package: Although my homework offered me some practice with this package, the use cases for this project were more challenging, giving me more practical experience in subsetting, manipulating, and arranging the rows and columns of a tibble. This also included concatenating tibbles for combining two data objects into one object.
Regex: I had to alter a tibble using regex for a simple use case. This enabled me to revise its ideas.
Creating categorical variables in R using %in%.
Functions: Creating functions in R and working with optional arugments to functions. This was one the major learning point in the project.
Object oriented programming: Creating a custom function for plot based on the class of a tibble.
Plotting: Got an introduction to how to make line graphs in R with ggplot2 package.

What I would do differently

As part of this project, we had to create a narrative that outlined the stages involved in creating a function. Although this might be a good programming practice, I really feel that you very rarely get to present a beautiful R markdown explaining the actions followed as part of real life applications. Sometimes the only way to explain your code to your colleagues or clients is through inline code comments. That section, in my opinion, is easier to write than a narrative and is also more clear (in its own way!). So I would definitely put more focus on this part.

Data Scientists versus Statisticians

Project 2 Review