Jean-Luc’s Blog

An IT Blog.

Starting Marathon [En]

Let’s start our first Marathon service and then present some common pitfalls in the troubleshooting section.

Basic setup

One can download Marathon directly from the website https://mesosphere.github.io/marathon/.

Let’s deploy Marathon in ~/apps/marathon-0.6.0

Starting the services

Start the marathon server:

1
cd ~/apps/marathon-0.6.0 && MESOS_NATIVE_LIBRARY='/usr/local/lib/libmesos.dylib' ./bin/start --master localhost:5050 --zk zk://localhost:2181/marathon

Check the services are ok:

Repository

To deploy the same application in multiple nodes, an easy way is a shared repository. Each version of the app and its dependencies must be available in the Deployment Repository.

Here is a simple structure for a Simple Web Service called SWS :

1
2
3
4
5
6
7
└── SWS
    ├── v1-first_revision
    │   └── spray-test-assembly-0.1.jar
    ├── v2-bug_included
    │   └── my_app_v1.1.jar
    └── v3-bug_fixed
        └── my_app_v1.2.jar

The easy step to deploy this repository:

1
2
3
git clone (**TODO : create a git repository **)
cd dm_app_repository
python -m SimpleHTTPServer

Check the repository is available: 

1
2
3
4
5
6
7
8
$ curl http://127.0.0.1:8000/SWS/
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"><html>
<title>Directory listing for /SWS/</title>
[]
<li><a href="v1-first_revision/">v1-first_revision/</a>
<li><a href="v2-bug_included/">v2-bug_included/</a>
<li><a href="v3-bug_fixed/">v3-bug_fixed/</a>
[]

Test the application locally:

1
2
curl -o spray-test-assembly-0.1.jar http://127.0.0.1:8000/SWS/v1-first_revision/spray-test-assembly-0.1.jar
java -jar spray-test-assembly-0.1.jar 8091

Open a browser at http://127.0.0.1:8091/, you should see an hello message.

Deploy with Marathon

We ready to start playing with Marathon. Let’s deploy our first app server with Marathon:

1
curl -X POST -H "Content-Type: application/json" -d '{ "id": "sws1", "cmd": "java -jar spray-test-assembly-0.1.jar $PORT", "mem": 256.0, "instances": 1, "uris": [ "http://127.0.0.1:8000/SWS/v1-first_revision/spray-test-assembly-0.1.jar" ]}' http://127.0.0.1:8080/v2/apps

We can access directly the service from Marathon console.

If we need to deploy a second instance:

  • go to marathon console at http://127.0.0.1:8080/, the console is displayed Marathon Console
  • click the scale button, the scale window is displayed Scale Window
  • select 2 instances and press OK, the marathon console is updated with 2 instances Marathon Console with 2 instances

Troubleshooting

task identifier does not support uppercase

Let’s use an UPPERCASE identifier (SWS1), starting the task with:

1
2
$ curl -X POST -H "Content-Type: application/json" -d '{ "id": "SWS1", "cmd": "java -jar spray-test-assembly-0.1.jar $PORT", "mem": 256.0, "instances": 1, "uris": [ "http://127.0.0.1:8000/SWS/v1-first_revision/spray-test-assembly-0.1.jar" ]}' http://127.0.0.1:8080/v2/apps
{"errors":[{"attribute":"id","error":"must match \”^(([a-z0-9]|[a-z0-9][a-z0-9\\-]*[a-z0-9])\\.)*([a-z0-9]|[a-z0-9][a-z0-9\\-]*[a-z0-9])$\""}]}%

OK, we should have been used sws1 as task indentifier.

repository is down when starting a task

(we have killed the repository, or the python server process in this case).

Start the task with:

1
2
curl -X POST -H "Content-Type: application/json" -d '{ "id": "sws3", "cmd": "java -jar spray-test-assembly-0.1.jar $PORT", "mem": 256.0, "instances": 1, "uris": ["http://127.0.0.1:8000/SWS/v1-first_revision/spray-test-assembly-0.1.jar"] }' http://127.0.0.1:8080/v2/apps
null%

“null%”, so no error. everything looks good. Right ? Hmm. No. Why is the task not started in the Marathon console ?  Unstarted task

If we have a look at Marathon console logs, no details: 

1
2
3
4
[2014-11-04 11:43:47,195] INFO Assigned some ports for sws3: [0] -> [15603] (mesosphere.marathon.MarathonSchedulerService:78)
[2014-11-04 11:43:47,365] INFO Starting app sws3 (mesosphere.marathon.MarathonScheduler:223)
[2014-11-04 11:43:47,371] INFO Need to scale sws3 from 0 up to 1 instances (mesosphere.marathon.MarathonScheduler:338)
[2014-11-04 11:43:47,371] INFO Queueing 1 new tasks for sws3 (0 queued) (mesosphere.marathon.MarathonScheduler:344)

If we look at mesos console, we can see that the Framework is staging and repetedly failing: Staging task

But if we have a look at the Failed task error log, it’s clearer:

1
2
3
4
5
6
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1104 16:52:49.402339 37630720 fetcher.cpp:76] Fetching URI 'http://127.0.0.1:8000/SWS/v1-first_revision/spray-test-assembly-0.1.jar'
I1104 16:52:49.403532 37630720 fetcher.cpp:126] Downloading 'http://127.0.0.1:8000/SWS/v1-first_revision/spray-test-assembly-0.1.jar' to '/tmp/mesos/slaves/20141104-103858-16777343-5050-9421-0/frameworks/20140902-013203-16777343-5050-4471-0000/executors/sws3.a09d4717-643a-11e4-bf70-e0f84706f8b6/runs/4209c508-40b7-4660-8c15-6b8a1a5dbe23/spray-test-assembly-0.1.jar'
E1104 16:52:49.404134 37630720 fetcher.cpp:129] Error downloading resource: Couldn't connect to server
Failed to fetch: http://127.0.0.1:8000/SWS/v1-first_revision/spray-test-assembly-0.1.jar
Failed to synchronize with slave (it's probably exited)

So enable the repository and we finally get a running task:  Running task

Not providing the uris correctly

Marathon is expecting the parameter “uris”: [ “http://127.0.0.1:8000/SWS/v1-first_revision/spray-test-assembly-0.1.jar” ]. What happen if we try to use the following command:

1
curl -X POST -H "Content-Type: application/json" -d '{ "id": "sws2", "cmd": "java -jar spray-test-assembly-0.1.jar $PORT", "mem": 256.0, "instances": 1, "uri": "http://127.0.0.1:8000/SWS/v1-first_revision/spray-test-assembly-0.1.jar" }' http://127.0.0.1:8080/v2/apps

Marathon console shows the task is switching between STAGING and FAILED states.

When we check the framework error log:

1
2
3
I1104 17:08:07.138018 125543168 exec.cpp:132] Version: 0.20.0
I1104 17:08:07.139219 184311808 exec.cpp:206] Executor registered on slave 20141104-103858-16777343-5050-9421-0
Error: Unable to access jarfile spray-test-assembly-0.1.jar

When we check the Framework STDOUT:

1
2
3
4
5
Registered executor on 172.16.0.17
Starting task sws2.c368adc6-643c-11e4-bf70-e0f84706f8b6
sh -c 'java -jar spray-test-assembly-0.1.jar $PORT'
Forked command at 17152
Command exited with status 1 (pid: 17152)

No file is downloaded nor installed. The task is starting fine, but the java process terminates with status = 1 as the jarfile is not downloaded.

An invalid command

Let’s use the (in)famous blah command:

1
curl -X POST -H "Content-Type: application/json" -d '{ "id": "err", "cmd": "blah", "mem": 256.0, "instances": 1 }' http://127.0.0.1:8080/v2/apps

In the Mesos console we get

1
2
3
I1107 10:40:25.843163 2021741312 exec.cpp:132] Version: 0.20.0
I1107 10:40:25.844281 217133056 exec.cpp:206] Executor registered on slave 20141107-103704-16777343-5050-4138-0
sh: blah: command not found

There is another variant when we execute the task in another node : Marathon try to start the task with the same user as the current user having started the Marathon process. If it does not exist in the slave node, we get something like :

1
E0904 00:25:34.097573 147271680 slave.cpp:2484] Container 'c6d1a64f-1fd9-48e6-9fea-32bc794b508c' for executor 'webtest2.38a08eb7-33b9-11e4-b27e-e0f84706f8b6' of framework '20140902-013203-16777343-5050-4471-0000' failed to start: Failed to redirect stdout: Failed to chown: Failed to get user information for 'jlcanela': Undefined error: 0

Important:

  • Be sure you know how access the mesos logs : without them, you’re lost
  • Make sure you repository is accessible from all the nodes
  • Always use a “uris” : [ “url1” ] syntax.  
  • Always ensure the *nix accounts are created on all the slave nodes

Comments