Good practice:
How to suck less at
(research)
software engineering

Dr David Wilby (he/him)
RSE Team, The University of Sheffield
rse.shef.ac.uk | davidwilby.dev

Mon 13th February 2023

Firstly

Who am I?

And what am I doing here?

Previously: Scientist

Currently: Research Software Engineer

  • Developing research code
  • Educating researchers
  • Translating research into production software
  • Working with government
  • Advocating for quality research practice

Secondly

Who are you?


🚨 Audience participation 🚨

Go to menti.com and enter the code 6762 9150

Outline


  • Why do we need good practice?

  • Testing

  • Environment & Dependency Management

  • Git

  • GitHub

https://github.com/davidwilby/ResearchSoftwareMethods

https://bit.ly/RSEmethods

Why do we need good practice?

Can’t I just write some code?

Wikipedia: Replication Crisis

Credit: Anna Krystalli

Credit: Private Eye

Testing

Levels of testing

Smoke Testing: Very brief initial checks that ensures the basic requirements required to run the project hold. If these fail there is no point proceeding to additional levels of testing until they are fixed.

Unit Testing: Individual units of a codebase are tested, e.g. functions or methods. The purpose is to validate that each unit of the software performs as designed.

Integration Testing: Individual units are combined and tested as a group. The purpose of this level of testing is to expose faults in the interaction between integrated units.

System Testing: A complete, integrated system is tested. The purpose of this test is to evaluate whether the system as a whole gives the correct outputs for given inputs.

Acceptance Testing: Evaluate the system’s compliance with the project requirements and assess whether it is acceptable for the purpose.

From: The Turing Way

Test-driven development


  1. Write a test that fails


  1. Write code to make the test pass


  1. Refactor

flowchart TB
    classDef writecode fill:#7D7ABC,stroke-width:0px;
    classDef testspass fill:#23F0C7,stroke-width:0px;
    classDef testsfail fill:#EF767A,stroke-width:0px;
    subgraph id1 [<b>Code-driven development</b>]
        A((1. write test)) --> B((2. test<br>passes/fails))
        B-->|test passes|A
        B-->|test fails|C((3. Write only<br>enough code))
        C-->|test fails|C
    end
    subgraph id2 [<b>Refactoring</b>]
        C-->|test passes|D((4. Check all tests))
        D-->|tests pass|E((5. Refactor))
        E-->D
        D-->|Some tests fail|F((6. Update failing tests<br>Correct regressions))
        F-->D
    end
    class A,C,E,F writecode;
    class B testsfail;
    class D testspass;

Environment & Dependency Management

Python 🐍


flowchart TB
    subgraph Project 3
        A[fab:fa-python Python 3.10]
        A-->B[numpy 1.24.0]
        A-->C[Django 4.0]
        A-->D[TensorFlow 2.11]
    end
    subgraph Project 2
        E[fab:fa-python Python 3.8]
        E-->F[numpy 1.1.0]
        E-->G[matplotlib 3.1]
    end
    subgraph Project 1
        H[fab:fa-python Python 3.10]
        H-->I[numpy 1.1.0]
        H-->J[matplotlib 3.1]
    end
    K[fab:fa-python System Python]


pip freeze

numpy==1.1.0
matplotlib>3

requirements.txt

Tools: venv / pipenv / conda / poetry

Control yourself

Why you need version control

+

getting your head round it.

What makes a version control system?


  • πŸ“Έ Snapshot current version
  • 🏷️ Name specific versions
  • β†©οΈŽοΈ Revert back to a particular version


Perhaps

  • πŸ“š Compare and merge versions

Benefits of version control

Git (local) πŸ’»

  • Protect against breaking everything
  • Keep at least one working version of the code
  • Snapshot your progress

GitHub/GitLab (remote) 🌐

  • Work collaboratively
  • Share code easily
  • Remote backup

Without version control


πŸ˜• Make changes by making a copy of the entire codebase.


😐 Merging is a manual process.


😨 Lose track of which version contains what functionality.


😭 Collaborating is just emailing zip files and crying.


Version Control == Git

More often than not

How does Git work?

%%{init: { 'logLevel': 'debug', 'theme': 'base', 'gitGraph': {'showCommitLabel': true, 'rotateCommitLabel': true}} }%%
gitGraph
    commit id: "commit 1"
    commit id: "commit 2"
    commit id: "commit 3"
    commit id: "commit 4"
    commit id: "commit 5"
    commit id: "commit 6"
    commit id: "commit 7"

The most important concept in git is the commit - the name given to a unit of changes, and also to the process of making a commit.


Commits contain changes

Not actually snapshots of a file.

But can recreate a state from a sequence of changes.

Demo

https://onlywei.github.io/explain-git-with-d3

or

https://bit.ly/git-sandbox

or

Making a commit

    flowchart TB
        A(fa:fa-pen fa:fa-file-code Edit file) --> B
        B(fa:fa-download Save) --> C
        C(fa:fa-plus <strong>Stage</strong> changes) --> D(fa:fa-check Commit)

The commit hash


Git generates a hash string, uniquely identifying each commit.

Git uses a β€œMerkle tree” under the hood. (Don’t ask me how it works, I have no idea 🀷)


Hashes look like:

d3dd03f493707256c8528bc83ad280a460f05a56


But are most often seen as the first 7 characters, as this is easier to read/type and is normally enough to identify the commit.

d3dd03f

The commit message

Each commit has a message associated with it.


Summary/Title: <50-72 characters

Displayed most frequently.


Detailed description: no character limit.

Can be used to capture more detail. Not used that often.


This commit will…

  • ❌ some stuff
  • ❌ code
  • ❌ updates
  • βœ”οΈ add new module β€œrenderers”
  • βœ”οΈ update README with new install instructions
  • βœ”οΈ fix bug #17 with package update

Branches

Used to work on new features/changes/additions to the code.

%%{init: { 'logLevel': 'debug', 'theme': 'base'} }%%
gitGraph
    commit id:"8bc2520"
    commit id:"2a70480"
    branch experiment
    commit id:"089e06b"
    commit id:"bec84f4"
    commit id:"2420edd"

git branch experiment
git checkout experiment
git commit
git commit
git commit

Checkout: switching to a different branch.

Merging


Combine changes from two branches.


%%{init: { 'logLevel': 'debug', 'theme': 'base'} }%%
gitGraph
    commit id:"8bc2520"
    commit id:"2a70480"
    branch experiment
    commit id:"089e06b"
    commit id:"bec84f4"
    commit id:"2420edd"
    checkout main
    merge experiment
    commit id:"60489ec"

git checkout main
git merge experiment
git commit

Repositories (Repos)


Once a directory/folder is initialised with git it becomes a repository.



directory

.
β”œβ”€β”€ src/
β”œβ”€β”€ LICENSE.md
└── README.md

git init β€”β€”->

repository

.
β”œβ”€β”€ .git/
β”œβ”€β”€ src/
β”œβ”€β”€ LICENSE.md
└── README.md

How to interact with Git


command line git

via unix shell (or gitbash/WSL on Windows)

$ git add README.md
$ git commit -m 'initial commit'
$ git status


Git learning resources


Remember

Learning Git is a process.

Everyone makes mistakes.

Git vs GitHub or GitLab


Git

  • Local client for source code management
  • Interacting with remote git servers

GitHub/GitLab

  • Code hosting
  • Collaboration
  • OSS contribution
  • Project management
  • Automated workflows/continuous delivery

Repositories

Issues

Projects/Kanban Board

Continuous integration/Automated testing

Great resources