ecFlow -- Installation & Usage

By micromamba (Recommend)

1
2
3
4
5
6
7
8
9
10
11
12
$ micromamba create -n ecFlow
Empty environment created at prefix: /home/wpsze/micromamba/envs/ecFlow
$ micromamba activate ecFlow
|ecFlow|wpsze:/home/wpsze $ micromamba install ecflow -c conda-forge
|ecFlow|wpsze:/home/wpsze $ ecflow_client --version
Ecflow version(5.13.0) boost(1.84.0) compiler(gcc 12.3.0) protocol(JSON cereal 1.3.0) openssl(enabled) Compiled on Jun 20 2024 08:28:01

$ ecf*
ecflow_client ecflow_http ecflow_logsvr.pl ecflow_standalone ecflow_stop.sh
ecflow_udp_client ecflow_ui.x
ecflow_fuse.py ecflow_logserver.sh ecflow_server
ecflow_start.sh ecflow_udp ecflow_ui
SH

and it installs,

1
2
3
+ libboost-python                  1.84.0  py312h389efb2_3      conda-forge      124kB
+ libboost-python-devel 1.84.0 py312h9cebb41_3 conda-forge 20kB
+ ecflow 5.13.0 py312hc3ad225_0 conda-forge 17MB
SH

where boost is 1.84.0

ecFlow Quickstart

ecFlow Quickstart

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
 $ ll ./ecflow_quickstart  
total 160K
drwxrwxr-x 3 wpsze wpsze 33K Jun 7 22:06 ./
drwxrwxr-x 3 wpsze wpsze 33K Jun 7 22:06 ../
-rw-rw-r-- 1 wpsze wpsze 2.4K Jun 7 17:51 ecflow_quickstart.tar.gz
-rw-r--r-- 1 wpsze wpsze 1.8K Nov 2 2022 head.h
-rw-r--r-- 1 wpsze wpsze 760 Nov 2 2022 instructions.txt
-rw-r--r-- 1 wpsze wpsze 209 Nov 1 2022 tail.h
drwxr-xr-x 2 wpsze wpsze 33K Nov 2 2022 test/
-rw-r--r-- 1 wpsze wpsze 222 Nov 2 2022 test.def

$ ll test
total 165K
drwxr-xr-x 2 wpsze wpsze 33K Nov 2 2022 ./
drwxrwxr-x 3 wpsze wpsze 33K Jun 7 22:06 ../
-rw-r--r-- 1 wpsze wpsze 627 Nov 2 2022 t1.1
-rw-r--r-- 1 wpsze wpsze 877 Nov 2 2022 t1.2
-rw-r--r-- 1 wpsze wpsze 119 Nov 2 2022 t1.ecf
-rwxr-xr-x 1 wpsze wpsze 2.1K Nov 2 2022 t1.job1*
-rwxr-xr-x 1 wpsze wpsze 2.1K Nov 2 2022 t1.job2*
-rw-r--r-- 1 wpsze wpsze 627 Nov 2 2022 t2.1
-rw-r--r-- 1 wpsze wpsze 118 Nov 2 2022 t2.ecf
-rwxr-xr-x 1 wpsze wpsze 2.1K Nov 2 2022 t2.job1*

$ cat test/t1.ecf
%include "../head.h"

echo "I, %ECF_NAME%, am part of a suite that lives in %ECF_HOME%"
sleep 2

%include "../tail.h"

$ cat test/t2.ecf
%include "../head.h"

echo "I, %ECF_NAME%, am part of a suite that lives in %ECF_HOME%"
sleep 2

%include "../tail.h"
SH

Inspect and update the suite definition

The file test.def defines a suite called "test". Open this file in a text editor and note the structure.

1
2
3
4
5
6
suite test
edit ECF_HOME "/replace/with/own/home/ecflow/quickstart"
task t1
task t2
trigger t1 eq complete
endsuite
SH

We define a suite called "test"; it defines a variable called ECF_HOME, then it defines two tasks ("tasks" are things that actually run) and specifies that task t2 will be run as soon as task t1 has completed. The scripts corresponding to the tasks are specified in ".ecf" files in the test directory. Have a look - they simply print some information about themselves, then sleep for 2 seconds.

The first thing you must do is to complete the path to your working directory in the ECF_HOME definition in test.def and save the file.

Start an ecFlow server

We will start a new running instance of an ecFlow server using the default port (# start ecFlow with default port 3141). It is possible to use a different port by adding --port=3500 (for example) to every ecFlow command-line action. Note also that we start it as a background task - it will run until the server is stopped. It can be run in the foreground, but in that case you will need to use a new terminal for any subsequent commands!

To start the ecflow server:

1
2
3
4
$ ecflow_server &
or
$ export ECF_PORT=3141
$ ecflow_start.sh -d /home/wpsze/xxx -p $ECF_PORT
SH

make sure this port can be used {:.info}

check what port numbers are being used

netstat -lnptu

You can check what port numbers are being used, with netstat: To list all open network ports on your machine, run netstat -lnptu. Here is a breakdown of the parameters:

l - List all listening ports n - Display the numeric IP addresses (i.e., don't do reverse DNS lookups p - List the process name that is attached to that port t - List all TCP connections u - List all UDP connections

1
2
$ netstat -lnptu
tcp 0 0 0.0.0.0:3141 0.0.0.0:* LISTEN 169655/ecflow_serve
SH

Check that it is running:

1
2
$ ecflow_client --ping --port=3141
ping server(localhost:3141) succeeded in 00:00:00.034207 ~34 milliseconds
SH

or check

1
2
 $ ps -u wpsze -f | grep ecflow | grep -v grep
wpsze 3248171 3700943 0 14:26 pts/6 00:00:00 ecflow_server
SH

Please note “Host” and “Port Number” here. Also note that each user must use a unique port number (we recommend using a random number between 2500 and 9999)

To stop the ecflow server:

1
ecflow_stop.sh -p $ECF_PORT
SH

e.g.

1
2
3
4
5
6
7
8
9
10
 $ ecflow_stop.sh -p 3141
Fri Jun 28 08:57:33 UTC 2024

User "10004" attempting to stop ecf server on ip:3141

Checking if the server is running on ip:3141
ping server(ip:3141) succeeded in 00:00:00.011797 ~11 milliseconds

Halting, check pointing and terminating the server
[1]+ Done ecflow_server
SH

and then,

1
$ ps -u wpsze -f | grep ecflow | grep -v grep
SH

has shown nothing now.

Load your suite definition into the server **

1
$ ecflow_client --load=test.def --port=3141
SH

More,

1
2
3
4
5
6
7
8
ecflow_client --help
ecflow_client --load=/my/home/fred.def
ecflow_client --load=/my/home/exotic.def # will error if suites of same name exists
ecflow_client --load=/my/home/exotic.def force # overwrite suite's of same name in the server
ecflow_client --load=/my/home/exotic.def check_only # Just check, don't send to server
ecflow_client --load=/my/home/exotic.def stats # Show defs statistics, don't send to server
ecflow_client --load=host1.3141.check # Load checkpoint file to the server
ecflow_client --load=host1.3141.check print # print definition to standard out in defs format
SH

Check that it is loaded by asking the server to give you back the suite definition:

1
2
3
4
5
6
7
8
9
$ ecflow_client --get --port=3141
#5.11.4
suite test
edit ECF_HOME '/home/wpsze/ecflow/ecflow_quickstart/'
task t1
task t2
trigger t1 eq complete
endsuite
# enddef
SH

Monitor and interact via the GUI

Start ecFlowUI: (Local PC)

When opening the ecflow GUI flow for the first time you will need to add your server to the GUI. In the GUI click on “Servers” and then “Manage servers”. A new window will appear. Click on “Add server”. Here you need to add the Name, Host, and Port of your server. For “Host” and “Port” please refer to the last section of output from the previous step.

1
$ ecflow_ui &
SH

ssh tunnel

SSH tunneling, or SSH port forwarding, is a method of transporting arbitrary data over an encrypted SSH connection. SSH tunnels allow connections made to a local port (that is, to a port on your own desktop) to be forwarded to a remote machine via a secure channel.

on local PC terminal,

1
2
$ ssh -L 9090:localhost:3141 server.ip
$ ssh -NL 9090:localhost:3141 server.ip
SH
where local PC's port 9090 to server.ip's port 3141

and you can set ecflow_ui config, - host=localhost - port=local PC's port 9090

ecFlowUI has started

Once ecFlowUI has started, you must tell it how to reach your server. Go to the

Servers → Manage Servers menu, click "Add server", then enter the details of your server. Name can be anything you want - it's for you to identify the server to your self; something like "localtest" would be fine here. Host should in this case be "localhost", and Port should be XXX. The other fields can be left blank, but keep the "Add server to current view" box ticked.

You should now see your suite loaded into the GUI! To make the server active, right-click on the top-level node representing the server ("localtest in our case) and choose "Restart". Now right-click on the "test" node and choose "Begin" to make the suite active. The default behaviour is to only refresh its view of the suite every 60 seconds, so you will need to click the green refresh button at the top every so often to see the progress of the tasks.

ecflow-ui_Task

The diagram shows the typical status changes for a Task.

Start

At this time, the service is stopped and needs to be started manually.

1
$ecflow_client --begin --host=login05 --port=3141
SH

Since no triggering mechanism is added, t1 will be executed immediately after the service starts.

Replace suite

Replace the entire suite,

1
ecflow_client --replace=/test test.def --port=3141
SH

check the status of the server

To check the status of the server, enter the following unix command

1
2
3
4
5
6
7
8
ecflow_client --host=login05 --port=3141 --stats
Server statistics
Version Ecflow version(5.13.0) boost(1.84.0) compiler(gcc 12.3.0) protocol(JSON cereal 1.3.0) openssl(enabled) Compiled on Jun 20 2024 08:28:01
Status RUNNING
Host xxx.com
Port 3141
Up since 2024-Jul-02 23:44:29
......
SH
If ecflow_server is in halted state, the server needs to be restarted.

ecFlow variables

ecFlow variables

ECF_HOME ECF_INCLUDE ECF_FILES

Execute, rerun, requeue

In ecflow_ui, it is important to understand the distinction between execute, rerun and re-queue.

These options are available with the right mouse button over a task.

Execute: This means run the task immediately. (Hence ignores any dependency that holds the task).

This option preserves the previous output, from the job. You should see output files t1.1, t1.2, t1.3, each time the task is run.

Rerun: This places the task, back into the queued state.

The task will now honour any dependencies that would hold the job. i.e. time dependencies, limits, triggers, suspend, etc. (You will be introduced to these terms later on in the tutorial) If the task does run, the previous output is preserved.

Re-queue: This resets the task back to the queued state.

If the task has a default status, this is applied. The task output number is reset, such that the next output will be written to t1.1. This will overwrite any existing output with that extension when the task runs. Any subsequent calls to execute or rerun will now overwrite the output files, t1.2,t1.3 etc.

Add description

The manual page makes the documentation in the ecf script visible in ecflow_ui.

The description page is the combination of all the text before the directives %manual and %end. (or create f1.man)

e.g. edit t2.ecf

1
2
3
4
5
6
7
8
9
10
11
12
13
14
%manual
Manual for task t2
Operations: if this task fails, set it to complete and report next working day
Analyst: Check something ?
%end

%include <head.h>
echo "I am part of a suite that lives in %ECF_HOME%"
%include <tail.h>

%manual
There can be multiple manual pages in the same file.
When viewed they are simply concatenated.
%end
SH

familys

Tasks can be organized into a tree structure similar to a Unix file system, where task is like a file and family is like a folder.

A suite is a family with additional properties (see Dates). family can contain other families. Similar to directories, tasks in different families can have the same name.

Unless a file location is specified, ecFlow assumes that the structure of the suite corresponds to the location of the task file in the file system.

1
2
3
4
5
6
7
8
9
# Definition of the suite test.
suite test
edit ECF_INCLUDE "$ECF_HOME/course" # replace '$ECF_HOME' with the path to your ECF_HOME directory
edit ECF_HOME "$ECF_HOME/course"
family f1
task t1
task t2
endfamily
endsuite
SH

Time Dependencies

Time Dependencies

Time Triggers

Time Triggers

Add a Cron

Add a Cron

Repeat *** Repeat

It is sometimes useful to repeat the same task or family several times, looping on a specific value. You can do that by defining a repeat attribute. You can iterate over sequences of:

  • strings
  • integers
  • dates

A sequence of integers or dates is created by specifying the first and the last element, with an optional increment (the default is one).

From ecflow5 we can also loop over an arbitrary list of dates.

1
2
repeat datelist YMD 20130101 20130102 20130103 20200101 20190101
repeat date YMD 20090331 20121212 1 # <start> <end> <increment>
SH

The following generated variables, are accessible for trigger expressions YMD, YMD_YYYY, YMD_MM, YMD_DD, YMD_DOW,YMD_JULIAN

e.g.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

# Definition of the suite test.
suite test
edit ECF_INCLUDE "$HOME/course"
edit ECF_HOME "$HOME/course"
family f4
edit SLEEP 2
repeat string NAME a b c d e f
family f5
repeat integer VALUE 1 10
task t1
repeat date DATE 20101230 20110105
label info ""
label date ""
endfamily
endfamily
endsuite
SH

repeat & trigger

When we use relative time attributes under a Repeat. They are automatically reset when the repeat loops. Take for example:

1
2
3
4
5
6
7
8
9
10
11
suite test
edit ECF_HOME "$HOME/course"
edit ECF_INCLUDE "$HOME/course"
edit ECF_FILES "$HOME/course"
family hc00
repeat integer HYEAR 1993 2017
task t1
task t2
trigger t1 == complete
endfamily
endsuite
SH

Repeat increment

A repeat under a family node will only increment when all the child nodes are complete.

In the example above, once task t1, and t2 are complete, the repeat integer rep will increment to the next value.

with SBATCH

sbatch.h

1
2
3
4
5
6
#SBATCH --qos=%QUEUE%
#SBATCH --job-name=%TASK%_%FAMILY1:NOT_DEF%
#SBATCH --account=%ACCOUNT%
#SBATCH --output=%ECF_JOBOUT%
#SBATCH --error=%ECF_JOBOUT%
#SBATCH --time=%RQ_TIME%
C++

References

  1. ECFLOW教程 2018
  2. CEMC ecFlow 实战教程 本教程用于 CEMC 2023 培训的 ecFlow 上机实习环节
  3. The latest ecflow is available for Linux and macOS from conda forge
  4. https://git.ecmwf.int/projects/USS
  5. HPC2020: Using ecFlow
  6. 基于ECFLOW实现华中区域快速更新-同化预报业务流程
  7. ecFlow学习笔记:编译V5.X版本

ecFlow -- Installation & Usage
https://waipangsze.github.io/2024/06/07/ecflow/
Author
wpsze
Posted on
June 7, 2024
Updated on
January 3, 2025
Licensed under