Cromtit was born as a simple wrapper around Sparky jobs scheduler, but it turned out it could be more than that:
| host A | <--- [ job ] ---> | host B |
^ ^
\ [job] / [job]
\ /
+-->| host C |--<---+
Hosts are VMs with installed Sparky agents:
Host(i) - sparky agent: http API (4000 port) + job queue
So, in nutshell, Cromit allows one to run their jobs in distributed environments similar to Apache Airflow. However instead of Python Cromtit uses Bash and Raku as an underlying scenarios language.
> Three simple steps
1. Keep your project somewhere in Git:
tom --init
mkdir -p tasks/hello/
nano tasks/hello/task.bash
echo "Hello worker " $(config worker)!
export EDITOR=nano
tom --edit hello
#!raku
task-run "tasks/hello", %(
worker => %*ENV<WORKER>
);
git init
git add tasks/ .tom/hello.raku
git commit -a -m "my first job"
git push
2. Create Cromtit flow:
nano jobs.yaml
projects:
job1:
path: https://github.com/melezhik/myjobs.git
action: hello
hosts:
-
url: https://hostA:4000
vars:
WORKER: A
-
url: https://hostB:4000
vars:
WORKER: B
-
url: https://hostC:4000
vars:
WORKER: C
Apply configuration:
cromt --conf jobs.yaml
3. Kick off a job:
Go to your local Sparky instance: http://127.0.0.1:4000, find project named job1
and start it.
> Result
“Hello world” job will be run in parallel on 3 hosts, passing WORKER
environment variable specific to a host:
18:34:30 :: [repository] - index updated from http://sparrowhub.io/repo/api/v1/index
[task run: task.bash - tasks/hello]
[task stdout]
18:34:31 :: Hello worker B!
> Advanced topics
Job dependencies
So JobA depends on JobB that depends on JobC:
projects:
jobA:
path: https://github.com/melezhik/application.git
action: deploy
hosts:
-
url: https://app:4000
before:
-
name: jobB # app needs a database
jobB:
path: https://github.com/melezhik/database.git
action: deploy
hosts:
-
url: https://database:4000
before:
-
name: jobC # database needs a network
jobC:
path: https://github.com/melezhik/network.git
action: deploy
hosts:
-
url: https://infra:4000
Cromtit allows to achieve an arbitrary complexity in distributed jobs topology, similarly to AirFlow, job’s dependency graph should be DAG – acycling directed graph.
Job artfifacts
Job can share files with each other:
JobA scenario:
nano tasks/create/task.bash
echo "Hello worker " $(config worker)!
mkdir -p out
touch out/patch.sql
JobB scenario:
nano tasks/deploy/task.bash
echo "Hello " $(config worker)
cat .artifacts/patch.sql
Cromtit jobs configuration:
projects:
jobA:
path: https://github.com/melezhik/database
action: deploy
hosts:
-
url: https://database:4000
artifacts:
in:
- patch.sql
before:
-
name: jobB # database needs a migration file
jobB:
path: https://github.com/melezhik/db-tools.git
action: create
hosts:
-
url: https://host1:4000
artifacts:
out:
-
file: patch.sql
path: out/patch.sql
Run jobs on docker containers:
projects:
job1:
path: https://github.com/melezhik/fastspec.git
action: hello
sparrowdo:
docker: alpine
no_sudo: true
hosts:
-
url: https://sparrowhub.io:4000
> TODO List
- Aggregate reports across distributed hosts (right now reports are accessible through scattered hosts UIs) via single entry point
- Allow distributed load on a hosts pull (auto scaling) instead of specify hosts explicitly
- Test Cromtit on real life examples (so far there are only 2 projects – FastSpec , Raku-Alpine-Repo , Rakusta – WIP)
- Runs jobs sequentially VS in parallel (default mode) – DONE
- Jobs errors handlings (retries and/or errors ignoring)
> END
Thank you for reading. Please send your comments here or to Raku IRC channel.
Leave a Reply