Open source software development is central for reproducible open science. As part of my time as a Data Science Fellow at the UW eScience Institute I contribute to many open-source software projects on GitHub. Here are a few recent ones:
Xarray is an open source project and Python package that introduces labels in the form of dimensions, coordinates, and attributes on top of raw NumPy-like arrays, which allows for more intuitive, more concise, and less error-prone user experience. This is a fundamental piece of software for working with multi-dimensional gridded geoscience data!
I'm a member of the Xarray Core team, helping to bring in grants to fund software developers, run workshops and develop the Xarray Tutorial Website.
Sliderule is a Cloud-processsing framework for interactive analysis of NASA's ICESat-2 dataset https://slideruleearth.io/web/. It makes working with a very large and complicated geospatial vector dataset intuitive and interactive.
Reproducible scientific computations require sophisticated dependency management. This project maintains default environment for geospatial workflows with Python, including machine-learning libraries like Tensorflow and Pytorch https://github.com/pangeo-data/pangeo-docker-images
My first foray into Cloud computing was setting up software to generate thousands of InSAR interferograms (very computationally intensive) on AWS Batch. https://github.com/scottyhq/dinosar