Cromtit was born as a simple wrapper around Sparky jobs scheduler, but it turned out it could be more than that:
| host A | <--- [ job ] ---> | host B | ^ ^ \ [job] / [job] \ / +-->| host C |--<---+
Hosts are VMs with installed Sparky agents:
Host(i) - sparky agent: http API (4000 port) + job queue
So, in nutshell, Cromit allows one to run their jobs in distributed environments similar to Apache Airflow. However instead of Python Cromtit uses Bash and Raku as an underlying scenarios language.
> Three simple steps
1. Keep your project somewhere in Git:
tom --init mkdir -p tasks/hello/ nano tasks/hello/task.bash
echo "Hello worker " $(config worker)!
export EDITOR=nano tom --edit hello
#!raku task-run "tasks/hello", %( worker => %*ENV<WORKER> );
git init git add tasks/ .tom/hello.raku git commit -a -m "my first job" git push
2. Create Cromtit flow:
projects: job1: path: https://github.com/melezhik/myjobs.git action: hello hosts: - url: https://hostA:4000 vars: WORKER: A - url: https://hostB:4000 vars: WORKER: B - url: https://hostC:4000 vars: WORKER: C
cromt --conf jobs.yaml
3. Kick off a job:
Go to your local Sparky instance: http://127.0.0.1:4000, find project named
job1 and start it.
“Hello world” job will be run in parallel on 3 hosts, passing
WORKER environment variable specific to a host:
18:34:30 :: [repository] - index updated from http://sparrowhub.io/repo/api/v1/index [task run: task.bash - tasks/hello] [task stdout] 18:34:31 :: Hello worker B!
> Advanced topics
So JobA depends on JobB that depends on JobC:
projects: jobA: path: https://github.com/melezhik/application.git action: deploy hosts: - url: https://app:4000 before: - name: jobB # app needs a database jobB: path: https://github.com/melezhik/database.git action: deploy hosts: - url: https://database:4000 before: - name: jobC # database needs a network jobC: path: https://github.com/melezhik/network.git action: deploy hosts: - url: https://infra:4000
Cromtit allows to achieve an arbitrary complexity in distributed jobs topology, similarly to AirFlow, job’s dependency graph should be DAG – acycling directed graph.
Job can share files with each other:
echo "Hello worker " $(config worker)! mkdir -p out touch out/patch.sql
echo "Hello " $(config worker) cat .artifacts/patch.sql
Cromtit jobs configuration:
projects: jobA: path: https://github.com/melezhik/database action: deploy hosts: - url: https://database:4000 artifacts: in: - patch.sql before: - name: jobB # database needs a migration file jobB: path: https://github.com/melezhik/db-tools.git action: create hosts: - url: https://host1:4000 artifacts: out: - file: patch.sql path: out/patch.sql
Run jobs on docker containers:
projects: job1: path: https://github.com/melezhik/fastspec.git action: hello sparrowdo: docker: alpine no_sudo: true hosts: - url: https://sparrowhub.io:4000
> TODO List
- Aggregate reports across distributed hosts (right now reports are accessible through scattered hosts UIs) via single entry point
- Allow distributed load on a hosts pull (auto scaling) instead of specify hosts explicitly
- Test Cromtit on real life examples (so far there are only 2 projects – FastSpec , Raku-Alpine-Repo , Rakusta – WIP)
- Runs jobs sequentially VS in parallel (default mode) – DONE
- Jobs errors handlings (retries and/or errors ignoring)
Thank you for reading. Please send your comments here or to Raku IRC channel.
Leave a Reply