Google Summer of Code 2024


Information about GSoC

Overview

The Google Summer of code is a program that gives students the chance to contribute to over 150 of the world’s most important open source organizations. Since the program’s inception, more than 20,000 students have contributed from 116 countries to various open source software.

I first discovered the Google Summer of Code program in October 2023, and after some months of research, decided to try my hand at a first contribution to “data.table”. After a few contributions, I wrote out a proposal and submitted it via Google’s portal. A few months later, I was told that my proposal was selected, and here we are. See more about my progress via this blog or see the code at: https://github.com/joshhwuu/gsoc-2024

More About “data.table”

“data.table” is an extension of base R’s “data.frame”, providing a high-performance alternative data library to handle large data. On top of this, “data.table” provides many syntax and feature enhancements, making it a vital package for many professionals in the data science community.

The package’s main advantages compared to base R’s “data.frame” are listed below:

  • Concise syntax: fast to type, fast to read
  • Fast speed
  • Memory efficient
  • Careful API lifecycle management
  • Active community
  • Feature rich and always improving

See more here:

Github: https://github.com/Rdatatable/data.table

CRAN: https://cran.r-project.org/web/packages/data.table/index.html

About My Project

This year I will be working towards contributing to data.table by closing some of the hundreds of outstanding issues. My work will mostly consist of squashing bugs, writing documentation, along with improving/implementing new features. My goal is to resolve 10 or more small issues that improve the user experience, such as consistency fixes and clearer documentation. Then, I will start working on larger issues that I expect to take a lot longer, around 60-80 hours. These consist of new features and large refactorings, which may spiral in difficulty.

My goal is to close all issues listed in my proposal to the best of my ability, working with the “data.table” community, learning more about R, C, open source software development and professional communication.