Cromtit as Raku alternative to Apache Airflow


Cromtit was born as a simple wrapper around Sparky jobs scheduler, but it turned out it could be more than that:

| host A | <--- [ job ] ---> | host B |
  ^                             ^
   \ [job]                    / [job]
    \                       /
     +-->| host C |--<---+

Hosts are VMs with installed Sparky agents:

Host(i) - sparky agent: http API (4000 port) + job queue  

So, in nutshell, Cromit allows one to run their jobs in distributed environments similar to Apache Airflow. However instead of Python Cromtit uses Bash and Raku as an underlying scenarios language.

> Three simple steps

1. Keep your project somewhere in Git:

tom --init
mkdir -p tasks/hello/
nano tasks/hello/task.bash
echo "Hello worker " $(config worker)!
export EDITOR=nano
tom --edit hello
#!raku
task-run "tasks/hello", %(
  worker => %*ENV<WORKER>
);
git init 
git add tasks/ .tom/hello.raku
git commit -a -m "my first job"
git push

2. Create Cromtit flow:

nano jobs.yaml
projects:
  job1:
    path: https://github.com/melezhik/myjobs.git
    action: hello
    hosts:
      -
        url: https://hostA:4000
        vars:
          WORKER: A
      -
        url: https://hostB:4000
        vars:
          WORKER: B
      -
        url: https://hostC:4000
        vars:
          WORKER: C  

Apply configuration:

cromt --conf jobs.yaml

3. Kick off a job:

Go to your local Sparky instance: http://127.0.0.1:4000, find project named job1 and start it.

> Result

“Hello world” job will be run in parallel on 3 hosts, passing WORKER environment variable specific to a host:

18:34:30 :: [repository] - index updated from http://sparrowhub.io/repo/api/v1/index
[task run: task.bash - tasks/hello]
[task stdout]
18:34:31 :: Hello worker B!

> Advanced topics

Job dependencies

So JobA depends on JobB that depends on JobC:

projects:
  jobA:
    path: https://github.com/melezhik/application.git
    action: deploy
    hosts:
      -
        url: https://app:4000
    before:
       -
         name: jobB # app needs a database
  jobB:
    path: https://github.com/melezhik/database.git
    action: deploy
    hosts:
      -
        url: https://database:4000
    before:
       -
         name: jobC # database needs a network
  jobC:
    path: https://github.com/melezhik/network.git
    action: deploy
    hosts:
      -
        url: https://infra:4000

Cromtit allows to achieve an arbitrary complexity in distributed jobs topology, similarly to AirFlow, job’s dependency graph should be DAG – acycling directed graph.


Job artfifacts

Job can share files with each other:

JobA scenario:

nano tasks/create/task.bash
echo "Hello worker " $(config worker)!
mkdir -p out
touch out/patch.sql


JobB scenario:

nano tasks/deploy/task.bash
echo "Hello " $(config worker)
cat .artifacts/patch.sql


Cromtit jobs configuration:

projects:
  jobA:
    path: https://github.com/melezhik/database
    action: deploy
    hosts:
      -
        url: https://database:4000
    artifacts:
       in:  
       - patch.sql
    before:
       -
         name: jobB # database needs a migration file
  jobB:
    path: https://github.com/melezhik/db-tools.git
    action: create
    hosts:
      -
        url: https://host1:4000
    artifacts:
       out:  
       -
         file: patch.sql
         path: out/patch.sql
  

Run jobs on docker containers:

projects:
  job1:
    path: https://github.com/melezhik/fastspec.git
    action: hello
    sparrowdo:
      docker: alpine
      no_sudo: true
    hosts:
      -
        url: https://sparrowhub.io:4000
 

> TODO List

  • Aggregate reports across distributed hosts (right now reports are accessible through scattered hosts UIs) via single entry point
  • Allow distributed load on a hosts pull (auto scaling) instead of specify hosts explicitly
  • Test Cromtit on real life examples (so far there are only 2 projects – FastSpec , Raku-Alpine-Repo , RakustaWIP)
  • Runs jobs sequentially VS in parallel (default mode) – DONE
  • Jobs errors handlings (retries and/or errors ignoring)

> END

Thank you for reading. Please send your comments here or to Raku IRC channel.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s