The Journey So Far

Grant Hicks
3 min readFeb 3, 2021

Since starting the Data Science Immersive course at General Assembly in December I have learned a lot, with much more to learn still ahead. The course is about halfway through right now and it has been quite the experience. I have not done any formal classroom learning since college, only taking some time here and there to do some learning on my own in areas that interest me. I’m glad that I started picking that up more when the pandemic hit as it definitely helped me to get ready for this course. I came into the course feeling comfortable working with Python, and I’m glad I did. The pace of the course is very fast, we are covering many topics throughout the week and bringing these topics together through projects.

The first few weeks of the course were spent getting up to speed with Python and going over some statistics like probability and distributions. After that we moved on to learning about regressions and classifications, and how to refine our models to improve their performance. We have also learned some HTML as well as web scraping and working with APIs. We have also covered natural language processing, which I have found to be a trick area to cover and something that I want to keep working on. We have since moved to learning about some advanced supervised learning topics, as well as touching on working with Amazon Web Services and Tableau, and we are currently learning to work with SQL.

Our first project had us looking at SAT and ACT data where I compared differences in scores and participation rates in states that offered one of these tests for free, required one of the tests, and states that did not require a test or offer one for free.

The second project involved working with a dataset from Kaggle about housing information in Ames, Iowa. This project had us predicting the price of a house sale based on a large variety of features provided in the dataset. This was good for implementing some of the techniques we had covered for creating new features to help our predictions as well as narrow down the features to the best ones for improving the performance of our models.

The most recent project gave us some more freedom in the data that we used. The project involved comparing two different subreddit forums on the site Reddit.com and building a model that could predict which of the two subreddits the post originated from. This was some good practice with natural language processing, and I hope to use the script that I developed for gathering posts from subreddits later down the line to do some comparisons on my own and get some more practice with natural language processing.

I have found the most difficult aspect of projects to be coming up with problem statements. I am feeling comfortable with the technical side of things but this is something that I need to get a better grasp on to make my projects better.

I am looking forward to the rest of the course as we still have much to cover. There is more SQL to cover, more models to learn, and we will be working on developing portfolios. I am really looking forward to learning about neural networks, as well as getting to work on my capstone project! Once the course has ended I plan to start diving deeper into some of these areas through some online courses or other means and keep working on some projects on my own to practice some of these skills related to some areas that I find interesting.

--

--