Run Python Script In Gitlab Ci
This blog specifies how to automate python scripts to run by GitLab CI. Inthe following, I will talk about each element of the configuration with“.gitlab-ci.yml”, manage environments with anaconda and “environment.yml”, andsettings on GitLab.
- Gitlab Ci Cd Example
- Code
- Prerequisites
- What I Found That Works Is To Save It To A Temp File. Import Os RefName = Os.environ.get('CI_COMMIT_REF_NAME') PiplineID = Os.environ.get('CI_PIPE...
Requirements
Note that I used CI variables provided by GitLab. While they make the scripts really simple to run, it comes with the cost of more complicated testing, since you need to set up some environment variables before running scripts locally. If you prefer more explicit approach, just pass those variables as arguments to your script.
Before all, I would like to specify my working environment.
- OS: Debian 8
- GitLab Community Edition 10.8.0 (GitLab Installation)
- GitLab Runner (GitLab Runner Installation)
- Python 3.6.4
Git add.gitlabci.yml git commit -m 'Updated.gitlabci.yml' git push origin master. GitLab Ci will see that there is a CI configuration file (.gitlab-ci.yml) and use this to run the pipeline: This is the start of a CI process for a python project! GitLab CI will run a linter (flake8) on every commit that is pushed up to GitLab for this project. Create and run GitLab CI/CD pipeline. It’s time to create GitLab CI/CD pipeline. We want to achieve two goals using SSH: log remote server’s hostname and create an example file in user’s home directory. The pipeline is defined in.gitlab-ci.yml and we have two option to create/edit.
Project structure
Using GitLab CI to configure your jobs
.gitlab-ci.yml
is used by GitLab CI to manage your project’s jobs. It isplaced in the root of your repository and contains definitions of how yourproject should be built. On any push to your repository, GitLab will look forthe .gitlab-ci.yml
file and start jobs on CI according to the contents ofthe file, for that commit.
Following is one example:
In this example, there are several parts: stages
, image
, stage
, script
,only
, tags
. I’ll introduce them one by one.
stages
stages
is used to define stages that can be used by jobs and is definedglobally.
The specification of stages allows for having flexible multi stage pipelines.The ordering of elements in stages defines the ordering of jobs’ execution:
- Jobs of the same stage are run in parallel.
- Jobs of the next stage are run after the jobs from the previous stage completesuccessfully.
In the example, we have 2 stages ‘test’ and ‘report’. The jobs of stage ‘test’will run before running the jobs of stage ‘report’.
image
This allows to specify a custom Docker image that can be used for time of thejob. In our case, we need to use connect Teradata which officially supports onlyCentOS and RHEL Linux distributions, so we built upon Jeremy Marshall’sdocker-teradata-client container and moble’s miniconda-centoscontainer.
If the configuration interests you, more information here.
stage
stage
is defined per-job and relies on stages which is defined globally. Itallows to group jobs into different stages, and jobs of the same stage areexecuted in parallel.
In the example, the job in ‘tests’ corresponds the stage ‘test’, the job in‘weekly_job’ corresponds the stage ‘report’.
script
script
is the only required keyword that a job needs. It’s a shell scriptwhich is executed by the CI.
The scripts in ‘tests’ update pip, install packages in requirements.txt with pip,and discover unit tests that ends with ‘_test.py’.
We created a conda environment quietly from file jobs/environment.yml (I’ll talkabout this file in the next part), then switching to the new environment whichis defined by the file, and run python script jobs/weekly_job.py by python.
only
only
defines the names of branches and tags for which the job will run.
There are a few rules that apply to the usage of job policy:
Gitlab Ci Cd Example
only
is inclusive. If only is defined in a job specification, the ref isfiltered by only.only
allows the use of regular expressions.only
allows to specify a repository path to filter jobs for forks.
In addition, only
allows the use of special keywords. For example, variables
keyword is used to define variables expressions, in other words you can usepredefined variables / project / group or environment-scoped variables to definean expression GitLab is going to evaluate in order to decide whether a jobshould be created or not; using refs
to specify that a branch is pushed andscheduled pipelines.
In the example, the job only runs when it satisfies the following conditions:
- value of variable $weekly_job is “yes”(use
%variable%
in windows batch and$env:variable
in PowerShell) - following the scheduled pipelines
- script is on the master branch
tags
tags
is used to select specific Runners from the list of all Runners that areallowed to run this project.
Managing environments
With conda, you can create, export, list, remove and update environments thathave different versions of Python and/or packages installed in them. Switchingor moving between environments is called activating the environment. You canalso share an environment file.
Creating an environment from an environment.yml file
You can create an environment file manually to share with others.
An environment.yml file should specify environment’s name with name
, anddependable packages with dependencies
.
GitLab settings
Configuring GitLab CI
A Runner can be specific to a certain project or serve any project in GitLab CI.A Runner that serves all projects is called a shared Runner. You can find moreinformation about configuration here.
Before running, don’t forget to go to Settings > CI/CD > Runners settingsto active your runner.
Pipeline Schedules
Pipeline schedules can be used to run a pipeline at specific intervals, forexample every Monday at 7:00 for a certain branch.
In order to schedule a pipeline:
- Navigate to your project’s CI / CD > Schedules and click the New Schedule button.
- Fill in the form
- Hit Save pipeline schedule for the changes to take effect.
Moreover, You can pass any number of arbitrary variables and they will beavailable in GitLab CI so that they can be used in your .GitLab-ci.yml file.
References
Introduction
This blog post describes how to configure a Continuous Integration (CI) process on GitLab for a python application. This blog post utilizes one of my python applications (bild) to show how to setup the CI process:
In this blog post, I’ll show how I setup a GitLab CI process to run the following jobs on a python application:
- Unit and functional testing using pytest
- Linting using flake8
- Static analysis using pylint
- Type checking using mypy
What is CI?
To me, Continuous Integration (CI) means frequently testing your application in an integrated state. However, the term ‘testing’ should be interpreted loosely as this can mean:
- Integration testing
- Unit testing
- Functional testing
- Static analysis
- Style checking (linting)
- Dynamic analysis
To facilitate running these tests, it’s best to have these tests run automatically as part of your configuration management (git) process. This is where GitLab CI is awesome!
In my experience, I’ve found it really beneficial to develop a test script locally and then add it to the CI process that gets automatically run on GitLab CI.
Getting Started with GitLab CI
Before jumping into GitLab CI, here are a few definitions:
– pipeline: a set of tests to run against a single git commit.
– runner: GitLab uses runners on different servers to actually execute the tests in a pipeline; GitLab provides runners to use, but you can also spin up your own servers as runners.
– job: a single test being run in a pipeline.
– stage: a group of related tests being run in a pipeline.
Here’s a screenshot from GitLab CI that helps illustrate these terms:
GitLab utilizes the ‘.gitlab-ci.yml’ file to run the CI pipeline for each project. The ‘.gitlab-ci.yml’ file should be found in the top-level directory of your project.
While there are different methods of running a test in GitLab CI, I prefer to utilize a Docker container to run each test. I’ve found the overhead in spinning up a Docker container to be trivial (in terms of execution time) when doing CI testing.
Creating a Single Job in GitLab CI
The first job that I want to add to GitLab CI for my project is to run a linter (flake8). In my local development environment, I would run this command:
This command can be transformed into a job on GitLab CI in the ‘.gitlab-ci.yml’ file:
This YAML file tells GitLab CI what to run on each commit pushed up to the repository. Let’s break down each section…
The first line (image: “python: 3.7”) instructs GitLab CI to utilize Docker for performing ALL of the tests for this project, specifically to use the ‘python:3.7‘ image that is found on DockerHub.
The second section (before_script) is the set of commands to run in the Docker container before starting each job. This is really beneficial for getting the Docker container in the correct state by installing all the python packages needed by the application.
The third section (stages) defines the different stages in the pipeline. There is only a single stage (Static Analysis) at this point, but later a second stage (Test) will be added. I like to think of stages as a way to group together related jobs.
The fourth section (flake8) defines the job; it specifies the stage (Static Analysis) that the job should be part of and the commands to run in the Docker container for this job. For this job, the flake8 linter is run against the python files in the application.
Code
At this point, the updates to ‘.gitlab-ci.yml’ file should be commited to git and then pushed up to GitLab:
Prerequisites
GitLab Ci will see that there is a CI configuration file (.gitlab-ci.yml) and use this to run the pipeline:
This is the start of a CI process for a python project! GitLab CI will run a linter (flake8) on every commit that is pushed up to GitLab for this project.
What I Found That Works Is To Save It To A Temp File. Import Os RefName = Os.environ.get('CI_COMMIT_REF_NAME') PiplineID = Os.environ.get('CI_PIPE...
Running Tests with pytest on GitLab CI
When I run my unit and functional tests with pytest in my development environment, I run the following command in my top-level directory:
My initial attempt at creating a new job to run pytest in ‘.gitlab-ci.yml’ file was:
However, this did not work as pytest was unable to find the ‘bild’ module (ie. the source code) to test:
The problem encountered here is that the ‘bild’ module is not able to be found by the test_*.py files, as the top-level directory of the project was not being specified in the system path:
The solution that I came up with was to add the top-level directory to the system path within the Docker container for this job:
With the updated system path, this job was able to run successfully:
Final GitLab CI Configuration
Here is the final .gitlab-ci.yml file that runs the static analysis jobs (flake8, mypy, pylint) and the tests (pytest):
Here is the resulting output from GitLab CI:
One item that I’d like to point out is that pylint is reporting some warnings, but I find this to be acceptable. However, I still want to have pylint running in my CI process, but I don’t care if it has failures. I’m more concerned with trends over time (are there warnings being created). Therefore, I set the pylint job to be allowed to fail via the ‘allow_failure’ setting: