11) Message boards : Cafe : User of the year? (Message 610)
Posted 1404 days ago by whynot
Fabio, another one.
12) Message boards : Cafe : User of the year? (Message 607)
Posted 1467 days ago by whynot

Was that user "pseudo"?


No, *it* was 7681.
13) Message boards : Number crunching : All Linux tasks error with "process got signal 11" (Message 601)
Posted 1481 days ago by whynot
(that's me speculating here) As you can see, I'm living with this for years. So I'm observing this for years and I think what's happening here is:



  • science code requests shared memory segment;

  • for whatever reason that request fails (aka NULL);

  • in contrary with any other (aka respectful science code) uc passes it along (respectful science code exits "with zero status but no 'finished' file");

  • NULL is passed to libc;

  • libc croaks on uc with 'process got signal 11' (11 is misleading here, it's not real SEGFAULT);



Iff uc would be 'respectful science code' then it would be restarted by client some time later.
uc isn't.

p.s. (based on my previous success with crap on home page) Fabio, where are my fscking sources?

14) Message boards : Number crunching : App source code (Message 587)
Posted 1544 days ago by whynot


is the source code available for download ?

Not yet, but I'll put it in the next days online.
Fabio


That would be appreciated a lot. Since my times are closing to three hours those segfaults start to bother me.
15) Message boards : Number crunching : Always sending 50 WUs regardless of requested amount (Message 575)
Posted 1600 days ago by whynot
[didn't know we have message limits, forums had cut off patetic part of last week post; luckily, I'd network problems last time]


23-May-2013 01:33:33 [---] [wfd]: work fetch start
23-May-2013 01:33:33 [primaboinca] chosen: major shortfall CPU: 0.00 inst, 103518.66 sec
23-May-2013 01:33:33 [---] [wfd] ------- start work fetch state -------
23-May-2013 01:33:33 [---] [wfd] target work buffer: 28797.12 + 14402.88 sec
23-May-2013 01:33:33 [---] [wfd] CPU: shortfall 103518.66 nidle 0.00 saturated 28430.55 busy 0.00 RS fetchable 100.00 runnable 300.00
23-May-2013 01:33:33 [ABC@home] [wfd] CPU: fetch share 0.00 LTD 0.00 backoff dt 14122.52 int 86400.00
23-May-2013 01:33:33 [SZTAKI Desktop Grid] [wfd] CPU: fetch share 0.00 LTD -52535.69 backoff dt 0.00 int 0.00 (comm deferred)
23-May-2013 01:33:33 [PrimeGrid] [wfd] CPU: fetch share 0.00 LTD 0.00 backoff dt 0.00 int 0.00 (comm deferred)
23-May-2013 01:33:33 [primaboinca] [wfd] CPU: fetch share 1.00 LTD -151432.65 backoff dt 0.00 int 0.00 (overworked)
23-May-2013 01:33:33 [ABC@home] [wfd] overall LTD 0.00
23-May-2013 01:33:33 [SZTAKI Desktop Grid] [wfd] overall LTD -61975.66
23-May-2013 01:33:33 [PrimeGrid] [wfd] overall LTD -7903.51
23-May-2013 01:33:33 [primaboinca] [wfd] overall LTD -162476.08
23-May-2013 01:33:33 [---] [wfd] ------- end work fetch state -------
23-May-2013 01:33:33 [primaboinca] [wfd] request: 103518.66 sec CPU (103518.66 sec, 0.00)
23-May-2013 01:33:33 [primaboinca] Sending scheduler request: To fetch work.
23-May-2013 01:33:33 [primaboinca] Reporting 15 completed tasks, requesting new tasks
23-May-2013 01:33:52 [primaboinca] Scheduler request completed: got 80 new tasks
23-May-2013 01:33:52 [---] [wfd] Request work fetch: RPC complete


What can we do about it? We can either get out of this building or, as they say, "We can be patient". For a long time I've been bothered by client stuffing workload for a day per core. Now I understand what happens. After couple of tries (about a week) client gives up on stabilizing running workload and turns to daily basis (as you can see I have buffer set for 8+4 hour).

I have only one question. Fabio, tell me. Is it possible to do something with user-of-the-day? That litter hangs on front page for a darn month! If you don't care, shut the thing down, what a big deal?
16) Message boards : Number crunching : Always sending 50 WUs regardless of requested amount (Message 573)
Posted 1607 days ago by whynot
Recently I've made some observations, I'm not so glad to present (noise from parallel projects deleted).


20-May-2013 05:33:39 [---] [wfd]: work fetch start
20-May-2013 05:33:39 [primaboinca] chosen: minor shortfall CPU: 0.00 inst, 1393.80 sec
20-May-2013 05:33:39 [---] [wfd] ------- start work fetch state -------
20-May-2013 05:33:39 [---] [wfd] target work buffer: 28797.12 + 14402.88 sec
20-May-2013 05:33:39 [---] [wfd] CPU: shortfall 1393.80 nidle 0.00 saturated 41806.20 busy 0.00 RS fetchable 100.00 runnable 300.00
20-May-2013 05:33:39 [ABC@home] [wfd] CPU: fetch share 0.00 LTD 0.00 backoff dt 0.00 int 61440.00 (comm deferred)
20-May-2013 05:33:39 [SZTAKI Desktop Grid] [wfd] CPU: fetch share 0.00 LTD -18454.46 backoff dt 0.00 int 0.00 (comm deferred)
20-May-2013 05:33:39 [PrimeGrid] [wfd] CPU: fetch share 0.00 LTD -225.14 backoff dt 0.00 int 0.00 (comm deferred)
20-May-2013 05:33:39 [primaboinca] [wfd] CPU: fetch share 1.00 LTD -59228.18 backoff dt 0.00 int 0.00
20-May-2013 05:33:39 [ABC@home] [wfd] overall LTD 0.00
20-May-2013 05:33:39 [SZTAKI Desktop Grid] [wfd] overall LTD -28786.98
20-May-2013 05:33:39 [PrimeGrid] [wfd] overall LTD -6654.77
20-May-2013 05:33:39 [primaboinca] [wfd] overall LTD -75407.47
20-May-2013 05:33:39 [---] [wfd] ------- end work fetch state -------
20-May-2013 05:33:39 [primaboinca] [wfd] request: 1393.80 sec CPU (1393.80 sec, 0.00)
20-May-2013 05:33:39 [primaboinca] Sending scheduler request: To fetch work.
20-May-2013 05:33:39 [primaboinca] Reporting 46 completed tasks, requesting new tasks
20-May-2013 05:33:53 [primaboinca] Scheduler request completed: got 2 new tasks


At time of observation real estimated run-time is ~7000sec. As you can see client asks for 1393.80sec. What would be 0.199 WU. Instead gets 2.

Another one, just a couple hours later.


20-May-2013 09:33:19 [---] [wfd]: work fetch start
20-May-2013 09:33:19 [primaboinca] chosen: major shortfall CPU: 0.00 inst, 83650.42 sec
20-May-2013 09:33:19 [---] [wfd] ------- start work fetch state -------
20-May-2013 09:33:19 [---] [wfd] target work buffer: 28797.12 + 14402.88 sec
20-May-2013 09:33:19 [---] [wfd] CPU: shortfall 83650.42 nidle 0.00 saturated 27744.42 busy 0.00 RS fetchable 100.00 runnable 300.00
20-May-2013 09:33:19 [ABC@home] [wfd] CPU: fetch share 0.00 LTD 0.00 backoff dt 46862.72 int 86400.00
20-May-2013 09:33:19 [SZTAKI Desktop Grid] [wfd] CPU: fetch share 0.00 LTD -24033.97 backoff dt 0.00 int 0.00 (comm deferred)
20-May-2013 09:33:19 [PrimeGrid] [wfd] CPU: fetch share 0.00 LTD 0.00 backoff dt 0.00 int 0.00 (comm deferred)
20-May-2013 09:33:19 [primaboinca] [wfd] CPU: fetch share 1.00 LTD -65457.82 backoff dt 0.00 int 0.00
20-May-2013 09:33:19 [ABC@home] [wfd] overall LTD 0.00
20-May-2013 09:33:19 [SZTAKI Desktop Grid] [wfd] overall LTD -35727.97
20-May-2013 09:33:19 [PrimeGrid] [wfd] overall LTD -4461.58
20-May-2013 09:33:19 [primaboinca] [wfd] overall LTD -74931.11
20-May-2013 09:33:19 [---] [wfd] ------- end work fetch state -------
20-May-2013 09:33:19 [primaboinca] [wfd] request: 83650.42 sec CPU (83650.42 sec, 0.00)
20-May-2013 09:33:19 [primaboinca] Sending scheduler request: To fetch work.
20-May-2013 09:33:19 [primaboinca] Reporting 9 completed tasks, requesting new tasks
20-May-2013 09:33:30 [primaboinca] Scheduler request completed: got 50 new tasks


Client requests 83650.42sec. What would be 11.950 WU. Instead gets 50.

I have a theory what happens. Regulars should remember that from server's POV, estimated run-time is ~760sec. Now, 1393.80 divided by 760 is: 1.833 WU, what is pretty close to 2 WU from first example. 83650.42sec by 760 is: whooping 110 WU. Why it's 50 instead? We all know, because those are 50 WU that are always ready.

Now, what if some new-comer would, after couple of hours, find that he get 10 times more workload and then desperately trying to get reasonable amount by deleting everything (or aborting, what doesn't matter for the purpose of this theory)? In no-time server will get loads of resends (just like right now, at time of posting: ~2.5 kWU). Then there's a hard-coded limit:

[code]
23-May-2013 01:33:33 [---] [wfd]: work fetch start
23-May-2013 01:33:33 [primaboinca] chosen: major shortfall CPU: 0.00 inst, 103518.66 sec
23-May-2013 01:33:33 [---] [wfd] ------- start work fetch state -------
23-May-2013 01:33:33 [---] [wfd] target work buffer: 28797.12 + 14402.88 sec
23-May-2013 01:33:33 [---] [wfd] CPU: shortfall 103518.66 nidle 0.00 saturated 28430.55 busy 0.00 RS fetchable 100.00 runnable 300.00
23-May-2013 01:33:33 [ABC@home] [wfd] CPU: fetch share 0.00 LTD 0.00 backoff dt 14122.52 int 86400.00
23-May-2013 01:33:33 [SZTAKI Desktop Grid] [wfd] CPU: fetch share 0.00 LTD -52535.69 backoff dt 0.00 int 0.00 (comm deferred)
23-May-2013 01:33:33 [PrimeGrid] [wfd] CPU: fetch share 0.00 LTD 0.00 backoff dt 0.0
17) Message boards : Number crunching : Tasks flagged as 'Timed out - no response' but had been returned (Message 571)
Posted 1628 days ago by whynot
Got 15 timeouts too: example

EDIT: One thing. I've never seen them. Four minutes later I found two "Resent lost task"s (uc_1361286634_606641956839_0 uc_1361286634_606642016841_0). That's all that left.
18) Message boards : Number crunching : Running, high priority overrules other projects (Message 535)
Posted 1880 days ago by whynot
Enough beating that dead horse.
19) Message boards : Number crunching : Any linux work coming soon? (Message 525)
Posted 1922 days ago by whynot

Operating System Linux 3.2.0-24-generic-pae
BOINC version 7.0.24

Every work unit that downloads to this system errors out. Does "primaboinca" support this o/s?


Bubuntu thingy. It's up to you to fix it.
20) Message boards : Science : server down (Message 447)
Posted 2138 days ago by whynot
May I suggest to put some warning over here? Kind of: "ATTENTION! Moving target! Stay off my way!"


Next 10 posts

Main page · Your account · Message boards


Copyright © 2017 primaboinca.com