Requirements to submit an algorithm - Help
Requirements to submit an algorithm
All integrated algorithms must follow an organization specific to MAAP. The best practice is to split the algorithm processing in modules. A module is a processing step which includes a scientific algorithm or calculation and which can be reused if possible for other processing.
1. Create a Project
A module must first be created in Gitlab. In Processing, you have access to Gitlab when you click on "Go to Gitlab".
To create your module project, you must click on the "New project" button and complete the empty fields:
- Project Name: the name of your project without any special characters
- Project slug: identical to the name of the project but without any capital letters
- Project Description: This description is very important because it will be shown in the algorithm store inside the platform.
- Visibility Level: with private, only you (the creator of the repository) can access to the algorithm, with public, all developers in the platform can see your algorithm, and work on it without asking for particular permissions
Your empty project is created when you click on the button "Create project" and you can after add it on your Eclipse Che workspace with the command git clone.
2. Module Organisation
Each module shall include the following folders:
- Configuration: this folder shall contain configuration files (e.g.: properties file with all variables used within the source code) and an Excel file providing information about code language and libraries used for the source code.
- Documentation: this folder shall contain files with information to understand how the algorithm works
- Input: this folder shall contain all input test data used by the code source
- Output: this folder shall contain all output test data generated by the code source
- Src: this folder shall contain source code files (e.g : .py, .c, .java files)
This hierarchy will allow the users to organize algorithm source code. The required hierarchy is presented below:
It is possible to add this hierarchy in an empty project from the Eclipse Che or Jupyter tools with the initTemplate.sh command.
To do this, you must go from your terminal in the project with the cd your_empty_project command then run the initTemplate.sh command
For more information about how to create a workspace and how to get the good project structure, a help page is available at the next link : How to create a new MAAP workspace
The management of inputs, outputs and parameters of a module is done in the configuration.properties file. The file found in the conf subfolder is the file that will contain both the path for the inputs and output as well as any module parameters. On COPA, this file is accessible and editable to allow, for example, to change a parameter or modify the input data. The next image is an example of a configuration file.
Project documentation is important for future users. It gives all the information on the inputs, outputs and parameters but also on the main functions used in the module. With initTemplate, the documentation.md and syntax.md files are created.
The syntax.md file gives all the syntaxes to use in the documentation to manage the paragraphs or the character styles
Documentation.md is the file that should contain the information. A template is already available as shown in the following image.
This documentation will be accessible later in the store of algorithms on the MAAP. For more example, look at the already existing algorithms.
5. Output management
The outputs follow a specific organization. Indeed, all data genereted by one module will be in the path /project/data/name_project. But the directory name_project doesn't exist and will be created with the os librairy and its function makedirs.
In parallel, if a module needs an output from another module as input, it will have to retrieve the data in the directory of this module. For example, if the moduleA needs for input1 the output1 of the moduleB, the path will be: input1 =/projects/data/moduleB/output1
6. Source code
The source code is the directory where all the scripts for the module are located. We have 2 types of script: the main script and the secondary scripts.
The main script will be the script to launch to run the module. This is where the configuration file information will be retrieved.
Secondary scripts are imported into the main script. They can contain constants or specific functions for example.
The main code is divided into 2 parts. The first contains the different functions of the module. The second part, the main one, contains the loading of the data, the execution of the functions and the saving of the outputs. The example below shows the different elements of the main code
The Dockefile file contains 3 parameters to fill in: WORKDIR, ADD and ENTRYPOINT. The first two provide information on the path of the module project (in the example below the slope_calculation project). The last one concerns the main script to be executed (the slopecalculation.py script in python3).
Dockerfile is also the file where it is possible to add libraries if there are missing in the stack. To do this, add a RUN pip install name_library between FROM et WORKDIR