Create a pipeline with Outflow, part 1: Pipeline and plugins

In this tutorial, we will create a pipeline from scratch with Outflow, and walk through the main features of the framework.

First, you should install Outflow using pip install outflow to get the latest release from pypi. Then, you can check that outflow is correctly installed with :

$ python -m outflow --version

If Outflow is installed, you should see the version of your installation. If it isn’t, you’ll get an error telling “No module named outflow”.

Note

If you have any question or suggestions about the tutorial or Outflow in general, please come over to our Discord server

Creating a pipeline

To run properly, a pipeline built with Outflow needs some configuration files. You can use the following commands to generate the pipeline directory structure :

$ python -m outflow management create pipeline tuto_pipeline

You will get a directory called tuto_pipeline/ in the current directory containing multiple files :

tuto_pipeline
├── config.yml
├── plugins
├── manage.py
├── requirements.txt
└── settings.py
  • config.yml : Contains configurations about your pipeline. This can vary from one pipeline execution to another and you can have several configuration files and choose it in your command line.

  • plugins : This is where we will create our plugins. Plugins in this directory are automatically put in the python path by manage.py. However plugins can live anywhere, see plugins for how to create plugins outside this directory.

  • settings.py : This file is specific to your pipeline and should be versioned. This contains among other things a list of the plugins used by your pipeline. See settings for full specification.

  • requirements.txt : Contains the list of python dependencies

  • manage.py : The entry point of the pipeline.

Create a plugin

A pipeline is not much without some tasks to execute. In this tutorial, we will use the example of a data reduction pipeline, so our tasks will be computations on some data.

With Outflow, tasks are defined in plugins. A plugin is a dedicated python package containing commands, tasks and models.

  • Tasks are the building blocks of the pipeline, they have inputs, outputs, and a function to execute.

  • Commands are used to describe a graph of tasks dependencies, as well as a cli entrypoint and its arguments.

  • A model is a python class describing a database table. (optional)

We will see in the next tutorial chapters what those are in details.

For now, outflow plugins must use PEP 420 packages. This allows to have multiple plugins under the same namespace.

To create a new plugin, type the following command :

$ cd tuto_pipeline
$ python -m outflow management create plugin tuto.data_reduction --plugin_dir plugins/data_reduction

This creates all the needed files containing an example of a basic command.

Then in the tuto_pipeline/settings.py file, add your new plugin to the plugin list

PLUGINS = [
    'outflow.management',
    'tuto.data_reduction',
]

You can test your newly created plugin by calling the command generated in the commands.py:

$ python manage.py data_reduction

You should see the following output on the command line:

* tuto.data_reduction.commands - commands.py:49 - INFO - Hello from data_reduction

If you do, congratulations! We now have everything we need to get you started with Outflow.

In the next chapter, we will add new tasks and commands to this pipeline template.