cd ~/apps/marathon-0.6.0 &&MESOS_NATIVE_LIBRARY='/usr/local/lib/libmesos.dylib' ./bin/start --master localhost:5050 --zk zk://localhost:2181/marathon
Check the services are ok:
Marathon console is enabled at http://localhost:8080/
We should have no active marathon service
Repository
To deploy the same application in multiple nodes, an easy way is a shared repository.
Each version of the app and its dependencies must be available in the Deployment Repository.
Here is a simple structure for a Simple Web Service called SWS :
“null%”, so no error. everything looks good. Right ? Hmm. No.
Why is the task not started in the Marathon console ?
If we have a look at Marathon console logs, no details:
1234
[2014-11-04 11:43:47,195] INFO Assigned some ports for sws3: [0] -> [15603](mesosphere.marathon.MarathonSchedulerService:78)[2014-11-04 11:43:47,365] INFO Starting app sws3 (mesosphere.marathon.MarathonScheduler:223)[2014-11-04 11:43:47,371] INFO Need to scale sws3 from 0 up to 1 instances (mesosphere.marathon.MarathonScheduler:338)[2014-11-04 11:43:47,371] INFO Queueing 1 new tasks for sws3 (0 queued)(mesosphere.marathon.MarathonScheduler:344)
If we look at mesos console, we can see that the Framework is staging and repetedly failing:
But if we have a look at the Failed task error log, it’s clearer:
123456
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1104 16:52:49.402339 37630720 fetcher.cpp:76] Fetching URI 'http://127.0.0.1:8000/SWS/v1-first_revision/spray-test-assembly-0.1.jar'I1104 16:52:49.403532 37630720 fetcher.cpp:126] Downloading 'http://127.0.0.1:8000/SWS/v1-first_revision/spray-test-assembly-0.1.jar' to '/tmp/mesos/slaves/20141104-103858-16777343-5050-9421-0/frameworks/20140902-013203-16777343-5050-4471-0000/executors/sws3.a09d4717-643a-11e4-bf70-e0f84706f8b6/runs/4209c508-40b7-4660-8c15-6b8a1a5dbe23/spray-test-assembly-0.1.jar'E1104 16:52:49.404134 37630720 fetcher.cpp:129] Error downloading resource: Couldn't connect to serverFailed to fetch: http://127.0.0.1:8000/SWS/v1-first_revision/spray-test-assembly-0.1.jarFailed to synchronize with slave (it's probably exited)
So enable the repository and we finally get a running task:
Registered executor on 172.16.0.17
Starting task sws2.c368adc6-643c-11e4-bf70-e0f84706f8b6
sh -c 'java -jar spray-test-assembly-0.1.jar $PORT'Forked command at 17152
Command exited with status 1(pid: 17152)
No file is downloaded nor installed. The task is starting fine, but the java process terminates with status = 1 as the jarfile is not downloaded.
I1107 10:40:25.843163 2021741312 exec.cpp:132] Version: 0.20.0
I1107 10:40:25.844281 217133056 exec.cpp:206] Executor registered on slave 20141107-103704-16777343-5050-4138-0
sh: blah: command not found
There is another variant when we execute the task in another node : Marathon try to start the task with the same user as the current user having started the Marathon process. If it does not exist in the slave node, we get something like :
1
E0904 00:25:34.097573 147271680 slave.cpp:2484] Container 'c6d1a64f-1fd9-48e6-9fea-32bc794b508c'for executor 'webtest2.38a08eb7-33b9-11e4-b27e-e0f84706f8b6' of framework '20140902-013203-16777343-5050-4471-0000' failed to start: Failed to redirect stdout: Failed to chown: Failed to get user information for'jlcanela': Undefined error: 0
Important:
Be sure you know how access the mesos logs : without them, you’re lost
Make sure you repository is accessible from all the nodes
Always use a “uris” : [ “url1” ] syntax.
Always ensure the *nix accounts are created on all the slave nodes