Introduction

In this post I will show you some of my ideas to fix Python code style in repositories that haven’t used code formatters so far. Without them, the files content may be difficult to read, especially when developed by many users. I will tell you about the ways to find modified files in the current working branch with git ls-files and git diff. Then, I will explain how to run several cleaning commands like flake8 and black with xargs to clean the code. Although the last one comes from Linux systems, Windows users are able to use it as well.

TLDR: If you want to just see the solution, without reading the whole article, scroll down to the Summary section.

Why it’s not that easy to clean all the code in repository

When you join a new project, chances are that existing code repository style leaves a lot to be desired. As for Python, there can be issues like inconsistency with enclosing strings (quotation marks and apostrophes are used interchangeably throughout the code). Or maybe code hasn’t followed PEP 8 rule for imports order.

If you do care about those small things that make code more readable, then you probably heard about tools like black or isort. You can use them to format code and sort imports. It’s possible to use in a given file as well as to recursively cleanup all files in a given directory.

So what you can do in case you would like to fix files you changed when developing a new feature? Well, there are few options:

  1. Execute e.g. black on every file you have touched. It’s a quite tedious task which becomes even worse when you want to use more tools this way, like flake8.
  2. Run the script in batch mode to process all source code files at once. If you work on large repository, your pull request may become very large and as a result – unclear. For example, you have run isort and fixed 200 files – how one during code review would find files you not only sorted for imports, but also modified with respect to your task?
  3. Open dedicated pull request e.g. “Black and iSort” to fix all the code. Thus, change in few hundred files for formatting etc. will not mess with your exact task. After the merge, you would work with a clean code. There are some downsides, though. For example, you may produce merge conflicts. The bigger repository, the higher chances that your cleanups will affect the code your colleagues work on. Additionally, flake8 would produce output related to every file. Maybe nobody will approve your pull request if it becomes too large. Maybe it would take too long to fix all files. With the full clean you can resolve all code style problems at once. But if you have your own doubts, proceed to next point.
  4. Ask git to find all files you modified. Then, iterate this list executing code style scripts to precisely clean them and only them. Get feedback from flake8 so that you make some nice improvements before you open a pull request. This iteration can be done quickly thanks to xargs.

Before I explain the use of this command, let’s see how we can make git output modified files. That will produce your input for xargs.

Find your modified code with git ls-files

You may already know git status command. It’s one of the most basic instructions that describes what happened in your repository since your last commit. For example:

$ git status
On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

    modified:   other/formatters.py
    modified:   other/helpers.py

no changes added to commit (use "git add" and/or "git commit -a")

It’s very descriptive and tells us much more than the files we changed. As a result, it won’t be easily readable for processing scripts. To get file paths, and only file paths, let’s use git ls-files:

$ git ls-files -m
other/formatters.py
other/helpers.py

Looks better, doesn’t it? The -m flag stays for modified files.

The downside is this command shows only files with changed content and not added to the git index yet. It means you won’t see there new files, staged ones or changed before your last commit. If you want to make cleanups related to the whole branch you fired for your task, we need a way to find all changes in the current branch.

Find all files you modified in the current branch with git diff

Suppose you want to fix code and analyze it after you have finished all your changes. They were done through many commits. If you want to make all the code analysis just before the pull request, you should find git diff more helpful:

$ git diff --name-only master...test-branch
other/formatters.py
other/helpers.py

Use notation source-branch...actual-branch to find all changes that happened to the actual-branch since the last common commit of source-branch and actual-branch. In other words, it will find all changes that happened since you started actual-branch. Take a look at the image below to better see the difference:

Git diff three dots ...

Don’t forget about the branches order – if you swap them, you will see changes in source-branch (like staging, master), since actual-branch was created.

Clean the code with xargs

Now we know how to find modified files. We can use xargs command to iteratively flake8 them:

$ git diff --name-only master...test-branch | xargs flake8

It’s enough if you want to use only one Python tool. If you want more than one, you should pack them into a chain of commands and execute with sh -c:

$ git diff --name-only master...test-branch | xargs -t -I % sh -c 'black %; isort %; flake8 %'

The -t flag will print every iteration of command with substituted values (e.g. sh -c black other/foo.py; isort other/foo.py; flake8 other/foo.py) for every input path passed to xargs. You can omit that flag. -I introduces a placeholder. In this example, we use % sign as a placeholder for paths.

If you find this command useful, consider making an alias to not type it again and again.

For Windows users

If you code on Windows, there is no xargs by default. But with a git installed, you probably have got a MINGW64 terminal. It doesn’t only provide you with git commands, it also comes up with several instructions shared by Linux systems, including xargs. There is one inconvenience, though. Executing sh -c seems to ignore venv settings, so you need to do one of either two things:

  • install code linters and formatters globally (not recommended)
  • introduce one additional command that will activate your venv before code formatters: $ git diff --name-only master...test-branch | xargs -t -I % sh -c '. venv/Scripts/activate; black %; isort %; flake8 %'

If you know the reason why MINGW64 doesn’t keep virtual environment information when executing xargs sh -c, please let me know in a comment.

TLDR / Summary

For Linux users. Fix current modified files:

$ git ls-files -m | xargs -t -I % sh -c 'black %; isort %; flake8 %'

Fix all files in a working branch you created from master:

$ git diff --name-only master...test-branch | xargs -t -I % sh -c 'black %; isort %; flake8 %'

For Windows users (use this in MINGW64). Fix current modified files:

$ git ls-files -m | xargs -t -I % sh -c '. ../venv/Scripts/activate; black %; isort %; flake8 %'

Fix all files in a working branch you created from master:

$ git diff --name-only master...test-branch | xargs -t -I % sh -c '. ../venv/Scripts/activate; black %; isort %; flake8 %'

Sources

Bonus

If you changed not only your Python files, but also another ones, you may get warnings when their sources are passed to Python tools. To ensure you will process only *.py files, filter xargs input with grep:

$ git diff --name-only master...test-branch | grep “\.py$” | xargs -t -I % sh -c 'black %; isort %; flake8 %'