Good practice:
How to suck less at
software engineering

Dr David Wilby (he/him)
RSE Team, The University of Sheffield |

Mon 13th February 2023


Who am I?

And what am I doing here?

Previously: Scientist

Currently: Research Software Engineer

  • Developing research code
  • Educating researchers
  • Translating research into production software
  • Working with government
  • Advocating for quality research practice


Who are you?

🚨 Audience participation 🚨

Go to and enter the code 6762 9150


  • Why do we need good practice?

  • Testing

  • Environment & Dependency Management

  • Git

  • GitHub

Why do we need good practice?

Can’t I just write some code?

Wikipedia: Replication Crisis

Credit: Anna Krystalli

Credit: Private Eye


Levels of testing

Smoke Testing: Very brief initial checks that ensures the basic requirements required to run the project hold. If these fail there is no point proceeding to additional levels of testing until they are fixed.

Unit Testing: Individual units of a codebase are tested, e.g. functions or methods. The purpose is to validate that each unit of the software performs as designed.

Integration Testing: Individual units are combined and tested as a group. The purpose of this level of testing is to expose faults in the interaction between integrated units.

System Testing: A complete, integrated system is tested. The purpose of this test is to evaluate whether the system as a whole gives the correct outputs for given inputs.

Acceptance Testing: Evaluate the system’s compliance with the project requirements and assess whether it is acceptable for the purpose.

From: The Turing Way

Test-driven development

  1. Write a test that fails

  1. Write code to make the test pass

  1. Refactor

flowchart TB
    classDef writecode fill:#7D7ABC,stroke-width:0px;
    classDef testspass fill:#23F0C7,stroke-width:0px;
    classDef testsfail fill:#EF767A,stroke-width:0px;
    subgraph id1 [<b>Code-driven development</b>]
        A((1. write test)) --> B((2. test<br>passes/fails))
        B-->|test passes|A
        B-->|test fails|C((3. Write only<br>enough code))
        C-->|test fails|C
    subgraph id2 [<b>Refactoring</b>]
        C-->|test passes|D((4. Check all tests))
        D-->|tests pass|E((5. Refactor))
        D-->|Some tests fail|F((6. Update failing tests<br>Correct regressions))
    class A,C,E,F writecode;
    class B testsfail;
    class D testspass;

Environment & Dependency Management

Python 🐍

flowchart TB
    subgraph Project 3
        A[fab:fa-python Python 3.10]
        A-->B[numpy 1.24.0]
        A-->C[Django 4.0]
        A-->D[TensorFlow 2.11]
    subgraph Project 2
        E[fab:fa-python Python 3.8]
        E-->F[numpy 1.1.0]
        E-->G[matplotlib 3.1]
    subgraph Project 1
        H[fab:fa-python Python 3.10]
        H-->I[numpy 1.1.0]
        H-->J[matplotlib 3.1]
    K[fab:fa-python System Python]

pip freeze



Tools: venv / pipenv / conda / poetry

Control yourself

Why you need version control


getting your head round it.

What makes a version control system?

  • πŸ“Έ Snapshot current version
  • 🏷️ Name specific versions
  • β†©οΈŽοΈ Revert back to a particular version


  • πŸ“š Compare and merge versions

Benefits of version control

Git (local) πŸ’»

  • Protect against breaking everything
  • Keep at least one working version of the code
  • Snapshot your progress

GitHub/GitLab (remote) 🌐

  • Work collaboratively
  • Share code easily
  • Remote backup

Without version control

πŸ˜• Make changes by making a copy of the entire codebase.

😐 Merging is a manual process.

😨 Lose track of which version contains what functionality.

😭 Collaborating is just emailing zip files and crying.

Version Control == Git

More often than not

How does Git work?

%%{init: { 'logLevel': 'debug', 'theme': 'base', 'gitGraph': {'showCommitLabel': true, 'rotateCommitLabel': true}} }%%
    commit id: "commit 1"
    commit id: "commit 2"
    commit id: "commit 3"
    commit id: "commit 4"
    commit id: "commit 5"
    commit id: "commit 6"
    commit id: "commit 7"

The most important concept in git is the commit - the name given to a unit of changes, and also to the process of making a commit.

Commits contain changes

Not actually snapshots of a file.

But can recreate a state from a sequence of changes.




Making a commit

    flowchart TB
        A(fa:fa-pen fa:fa-file-code Edit file) --> B
        B(fa:fa-download Save) --> C
        C(fa:fa-plus <strong>Stage</strong> changes) --> D(fa:fa-check Commit)

The commit hash

Git generates a hash string, uniquely identifying each commit.

Git uses a β€œMerkle tree” under the hood. (Don’t ask me how it works, I have no idea 🀷)

Hashes look like:


But are most often seen as the first 7 characters, as this is easier to read/type and is normally enough to identify the commit.


The commit message

Each commit has a message associated with it.

Summary/Title: <50-72 characters

Displayed most frequently.

Detailed description: no character limit.

Can be used to capture more detail. Not used that often.

This commit will…

  • ❌ some stuff
  • ❌ code
  • ❌ updates
  • βœ”οΈ add new module β€œrenderers”
  • βœ”οΈ update README with new install instructions
  • βœ”οΈ fix bug #17 with package update


Used to work on new features/changes/additions to the code.

%%{init: { 'logLevel': 'debug', 'theme': 'base'} }%%
    commit id:"8bc2520"
    commit id:"2a70480"
    branch experiment
    commit id:"089e06b"
    commit id:"bec84f4"
    commit id:"2420edd"

git branch experiment
git checkout experiment
git commit
git commit
git commit

Checkout: switching to a different branch.


Combine changes from two branches.

%%{init: { 'logLevel': 'debug', 'theme': 'base'} }%%
    commit id:"8bc2520"
    commit id:"2a70480"
    branch experiment
    commit id:"089e06b"
    commit id:"bec84f4"
    commit id:"2420edd"
    checkout main
    merge experiment
    commit id:"60489ec"

git checkout main
git merge experiment
git commit

Repositories (Repos)

Once a directory/folder is initialised with git it becomes a repository.


β”œβ”€β”€ src/

git init β€”β€”->


β”œβ”€β”€ .git/
β”œβ”€β”€ src/

How to interact with Git

command line git

via unix shell (or gitbash/WSL on Windows)

$ git add
$ git commit -m 'initial commit'
$ git status

Git learning resources


Learning Git is a process.

Everyone makes mistakes.

Git vs GitHub or GitLab


  • Local client for source code management
  • Interacting with remote git servers


  • Code hosting
  • Collaboration
  • OSS contribution
  • Project management
  • Automated workflows/continuous delivery



Projects/Kanban Board

Continuous integration/Automated testing

Great resources