Making MATLAB projects reproducible and shareable
♻️

Dr David Wilby (he/him)
RSE Team, The University of Sheffield
rse.shef.ac.uk | davidwilby.dev

OpenFest Thurs 15th September 2022

Who am I?

And why should I be talking about this?

What is reproducibility?

How the Turing Way defines reproducible research

Image: The Turing Way

Why is reproducibility important for research?


  • Verification ✔️
  • Share learning 🧑‍🎓
  • Re-use ♻️
  • Longevity
  • Efficiency
  • Make code accessible 📖

What is software?

The Challenge of Software Reproducibility


  • Document required software, data and where to get them
  • Document steps to reproduce analysis
  • Create a well-defined computational “environment”
  • Make it easy!

(some) Principles of Reproducible Code

  • Organise your project 📑

Instead of

first_try.m
pretty.fig
stuff/
how%20torun.docx
data.csv
data_(01).csv
data.txt

Try ✔️

data/
docs/
figs/
output/
src/
    01_download_data.m
    02_clean_data.m
    03_fit_model.m
    04_plot_figs.m
README.md

(some) Principles of Reproducible Code

  • Share the code 🔬 (use a license!)
  • Share the data 📊 (license here too!)
  • Define dependencies & environment 🤓
  • Use version control 🐙
  • Document how to use the code 📋
  • Get colleagues to try it out! 🧑‍🤝‍🧑
  • Just try your best! 👍

MATLAB can be … challenging

  • 📦 No package manager (à la python, R, node etc.)
  • 🖥️ No virtual envrironments (sort of)
  • 🤓 Limited dependency discovery/management
  • Proprietary source code
  • 💸 MathWorks get most of their income from industry - so aren’t largely motivated to develop open source tools and guidance
  • 🙃 Lots of weird binary file formats
  • 🧑‍💻 Doesn’t get a lot of attention from (research) software engineers
  • 😣 sigh…

So why bother?

Lots of researchers use MATLAB.

12,000 active MATLAB licenses at The University of Sheffield

Data from: Web of Science

What’s the solution?

Case study

Environment: MATLAB ‘Projects

Aside: Jargon Blast 💥

MATLAB Projects

A tool for defining files, paths and dependencies within a project, helping to improve portability.

MATLAB Toolboxes

A way of packaging up code as tools which will be used across your MATLAB installation. Kind of like ‘packages’ in other languages.

MATLAB Packages

Not really like packages in any other language.

A way of protecting namespaces, e.g. import MyPackage.MyClass.

In folders starting with + e.g. +MyPackage/

Dependency management

Dependency analyzer

Now what? 🤷

Either:

  • document how to obtain and install the dependencies

or

  • provide dependencies with the code, if small and if permitted

Documentation - readme files

🥡 MATLAB projects can be reproducible

Reproducibility Checklist

  • ☑️ organise project
  • ☑️ share your code and data
  • ☑️ document/capture dependencies
  • ☑️ write examples (documentation!)
  • ☑️ tutorials (also documentation!!)
  • ☑️ explain how to use the code! (documentation!!!)

Acknowledgements

Resources