Time |
Nick |
Message |
01:43 |
|
RBecker joined #evergreen |
06:07 |
|
temur joined #evergreen |
06:28 |
|
Temur joined #evergreen |
06:32 |
|
Temur joined #evergreen |
07:00 |
|
jboyer-isl joined #evergreen |
07:32 |
|
graced joined #evergreen |
07:49 |
|
BigRig joined #evergreen |
07:52 |
|
jeff joined #evergreen |
07:52 |
|
jeff joined #evergreen |
07:53 |
|
mtate joined #evergreen |
07:55 |
|
julialima_ joined #evergreen |
08:00 |
|
pmurray joined #evergreen |
08:02 |
|
rjackson-isl joined #evergreen |
08:07 |
|
collum joined #evergreen |
08:23 |
|
akilsdonk joined #evergreen |
08:30 |
|
mdriscoll1 joined #evergreen |
08:40 |
|
Shae joined #evergreen |
08:40 |
|
StomproJ joined #evergreen |
08:40 |
|
Dyrcona joined #evergreen |
08:42 |
|
Stompro joined #evergreen |
08:43 |
|
mmorgan joined #evergreen |
08:43 |
|
abowling joined #evergreen |
08:44 |
Dyrcona |
Good morning, #evergreen! |
08:46 |
mmorgan |
Good Morning! |
08:59 |
|
yboston joined #evergreen |
09:03 |
|
mrpeters joined #evergreen |
09:13 |
Dyrcona |
Blizzard warning and for some reason, I'm stoked. |
09:18 |
Dyrcona |
I'm not usually a fan of snow, but I'm kind of looking forward to it. |
09:18 |
Dyrcona |
The snow we got this weekend was so pretty. |
09:19 |
* mmorgan |
hears that the coming snow will be a lot *prettier* ;-) |
09:19 |
dbs |
moin |
09:20 |
Dyrcona |
mmorgan++ |
09:20 |
Dyrcona |
dbs is no stranger to snow, eh? |
09:20 |
* Dyrcona |
works on a scrip to shutdown webrick. |
09:20 |
Dyrcona |
a script even. |
09:22 |
dbs |
Dyrcona: true true! but haven't had a 2-3 foot dump for quite some time. been skating on the lake and nordic skiing for the past month though; loving it |
09:29 |
Dyrcona |
Sounds fun! I've never been skiing, and the rivers don't freeze here like they used to. |
09:30 |
|
RoganH joined #evergreen |
09:30 |
Stompro |
Anyone used pg barman with your EG database? http://www.pgbarman.org/ |
09:30 |
Dyrcona |
And a script to stop webrick is only 7 lines long, including two blank lines and a sanity check that ps -C ruby -o pid= actually found a running process. |
09:36 |
Dyrcona |
As tsbere points out to me in a private chat, `ps -C ruby -o pid,args | grep edi_webrick | cut -f1 -d\ ` is safer, in case we run more than 1 ruby process. |
09:38 |
Dyrcona |
No, it's not for an init script. |
09:47 |
|
rjackson-isl joined #evergreen |
10:26 |
Dyrcona |
This is the theme song for our Evergreen server this morning: https://www.youtube.com/watch?v=a1IyrVeX0W8 |
10:27 |
Dyrcona |
It keeps hitting 200 cstores and then backing down. |
10:31 |
dbs |
Argh. That's so frustrating :( |
10:33 |
berick |
Dyrcona: have you applied any patches? |
10:33 |
Dyrcona |
berick: Yes. |
10:34 |
Dyrcona |
When we go down again, we're going apply all of the latest O/S updated and force reinstall/update all CPAN packages as well. |
10:34 |
berick |
yowza |
10:34 |
Dyrcona |
Our current thinking is the problem lies with the packages. |
10:35 |
Dyrcona |
Something from CPAN could be out of sync with a C library that was updated. |
10:35 |
Dyrcona |
I actually want it to go down, today. |
10:36 |
Dyrcona |
But, now we're back down to 83 cstore drones total across two servers. |
10:36 |
Dyrcona |
We're going to get new hardware soon, and we'll configure things very differently. |
10:38 |
Dyrcona |
Thing is, when the load hits 200, you'd expect to see 16 or so processes in top each 100% or more of CPU. |
10:38 |
Dyrcona |
That doesn't happen. |
10:38 |
berick |
Dyrcona: would it possible to share your log files? |
10:38 |
berick |
well, they donres are probably sitting there waiting for the processes that connected to them to do soemthing |
10:39 |
berick |
but said process has probably forgotton abou tthem |
10:39 |
Dyrcona |
berick: I could possibly share them. |
10:39 |
berick |
so they are just hanging out waitin to time out |
10:39 |
Dyrcona |
Well, I don't think CStoreEditor is the problem, though we should still probably clean up the code. |
10:40 |
Dyrcona |
I mentioned last week that I added a warn to the destructor and did some tests on my dev machine and every other line (not quite) in the logs was the destructor being called. |
10:41 |
berick |
that's great, but doesn't help when the destructor isn't called, like when refs are still in scope |
10:42 |
Dyrcona |
And now, we're down to 41 cstore drones. |
10:42 |
Dyrcona |
Thing is, I haven't found anywhere the refs look like they would still be in scope. |
10:42 |
Dyrcona |
If I do, I'll put that line back in and see if I can trigger that code. |
10:42 |
berick |
Dyrcona: if you could share the osrfsys and activity logs collected during one of the explosions, that would be great |
10:59 |
|
akilsdonk joined #evergreen |
11:01 |
|
mrpeters joined #evergreen |
11:12 |
bshum |
Ugh, more reingesting. |
11:12 |
bshum |
gmcharlt++ # patch for LP 1414112 |
11:12 |
pinesol_green |
Launchpad bug 1414112 in Evergreen 2.7 "Audience filter no longer searching blank spaces" (affected: 2, heat: 12) [Medium,Confirmed] https://launchpad.net/bugs/1414112 |
11:13 |
goood |
bshum: we can add audience without a reingest, don't worry. I'll whip up an update statement |
11:13 |
goood |
and also, |
11:13 |
goood |
gmcharlt++ |
11:25 |
goood |
ah, we'll need one more change for that... I'll push a collab branch |
11:38 |
|
RoganH joined #evergreen |
11:47 |
|
vlewis joined #evergreen |
11:48 |
|
sandbergja joined #evergreen |
11:49 |
|
vlewis_ joined #evergreen |
11:54 |
|
dreuther joined #evergreen |
12:33 |
|
buzzy joined #evergreen |
12:49 |
|
jihpringle joined #evergreen |
13:27 |
|
gdunbar joined #evergreen |
13:34 |
Bmagic_ |
Has anyone heard of OpenSRF performing 4 lookups per hold during item checkin? My logs show exactly 4 lookups for every potiential hold on an item during a checkin. It happens via SIP and via staff client (provided that the checkin_lib != call_number.owning_lib) |
13:36 |
jeff |
i have seen similar, yes. |
13:37 |
jeff |
it can lead to timeouts in scenarios where there are many (unsuitable due to age protection/etc) holds on the record for a copy. |
13:38 |
bshum |
Indeed. |
13:38 |
jeff |
i thought that in some very preliminary digging, signs had pointed to it being exacerbated by "clear shelf expired holds", but of course that wouldn't apply for SIP (or maybe it could be made to in some environments...) |
13:39 |
bshum |
For us i think it was dealing with large numbers of holds to check through. Opportunistic capture I think. |
13:40 |
bshum |
So the more holds there were, the longer it took in general to check in. |
13:41 |
Bmagic_ |
bshum: jeff: yes, however, even if there is only one hold, it still looks 4 times but you don't notice it because it doesn't take very long |
13:41 |
* jeff |
nods |
13:41 |
bshum |
Bmagic_: That doesn't surprise me. The checking code is weird.... |
13:42 |
Bmagic_ |
interestingly enough, it only checks once if you are checking the item in at it's home library |
13:44 |
Bmagic_ |
I have done the work to isolate the cases, and I have log files for each senario, is this a bug? |
13:45 |
jeff |
i recommend you open a bug (after doing your own check to see if there's one already, but I don't think there is) |
13:46 |
jeff |
i'll try to give it some eyes also, since we've observed similar. |
13:47 |
Bmagic_ |
jeff: ok, I will check that out. Not sure how to phrase my search terms.... |
13:47 |
* jeff |
nods |
13:49 |
Bmagic_ |
jeff: I'm creating a new bug |
13:51 |
tsbere |
jeff: I will point out that "clear shelf expired holds" should only add calls for copies that are flagged as on the hold shelf, so any other situation shouldn't do much more there. |
13:51 |
berick |
Bmagic_: can you clarify what you mean by "it looks 4 times" |
13:52 |
jeff |
tsbere: pretty sure what i saw agrees with your statement, yes. |
13:52 |
Bmagic_ |
berick: grep "checking if hold" osrfsys.log yeilds [INFO:16784:Holds.pm:3144:1422189177127601] circulator: checking if hold 216012 is permitted for copy PPL60121 x4 |
13:53 |
tsbere |
Bmagic_: Some of those may actually end up being dupes (different pieces logging the same call) though I can't be certain without looking closer. |
13:54 |
Bmagic_ |
tsbere: weird that it only looks once when checkin occurs at circ_lib |
13:54 |
Bmagic_ |
s/looks/logs/ |
13:54 |
* Dyrcona |
hasn't been this excited about a snow day since he was in primary school. |
13:55 |
berick |
Bmagic_: and it's checking the same hold ID with every call? |
13:55 |
Bmagic_ |
berick: indeed |
13:56 |
Bmagic_ |
4 times |
13:56 |
Bmagic_ |
always exactly 4 |
13:56 |
Dyrcona |
Bmagic_: It's being thorough, like checking fines twice on a renewal. |
13:56 |
Bmagic_ |
Dyrcona: lol no doubt |
13:57 |
Bmagic_ |
It's an issue when there are many potential holds. It can take so long in fact, the SIP client will time out. I have logs from SIP that log the response time, clocked at >15 seconds |
13:58 |
berick |
many potential holds doesn't explain why it would check the same hold more than once |
13:58 |
berick |
Bmagic_: can you tell if open-ils.storage.action.hold_request.nearest_hold.atomic is called once or multiple times? |
13:59 |
Bmagic_ |
berick: I am not saying that the cause is due to many holds. I think the cause is due to item checkin_lib != item owning_lib |
13:59 |
Bmagic_ |
berick: because it only checks the potential hold once if you are checking the item in at it's circ_lib/owning_lib |
14:00 |
tsbere |
Bmagic_: Are you using the "if things haven't gone home for interval whatever send them home" features? |
14:00 |
Bmagic_ |
http://pastebin.com/snAW2WkF |
14:02 |
Bmagic_ |
tsbere: "Max foreign-circulation time" ? |
14:04 |
Bmagic_ |
berick: open-ils.storage.action.hold_request.nearest_hold.atomic is called once |
14:06 |
berick |
yeah... |
14:06 |
berick |
that suggests that open-ils.storage.action.hold_request.nearest_hold needs to be unique-ifying its return list |
14:06 |
Bmagic_ |
afk |
14:06 |
bshum |
Interesting... |
14:08 |
RoganH |
un_helpful advice: wipe your server, start over |
14:12 |
berick |
i wonder if "SELECT DISTINCT(h.id)" in action:nearest_hold would do it |
14:17 |
tsbere |
berick: Just adding DISTINCT before the existing h.id should work (distinct the entire column list....which is one column) |
14:20 |
berick |
tsbere: yeah, that's what I meant. just insert DISTINCT |
14:21 |
* berick |
wonders if Bmagic_ is up for an experiment |
14:22 |
tsbere |
Bmagic_: Out of curiosity, does this query result in anything for you? SELECT count(id), from_org, to_org FROM actor.org_unit_proximity GROUP BY from_org, to_org HAVING count(id) > 1; |
14:23 |
Bmagic_ |
I'll check when I get back. Bout 45 minutes |
14:24 |
* dbs |
gets zero rows for that |
14:24 |
tsbere |
dbs: As far as I know you should get zero rows. Uniqeness of from/to org combinations is not, however, guaranteed at the DB level... |
14:25 |
* dbs |
wipes brow in relief |
14:25 |
tsbere |
Which could result in extra rows when looking up holds, which would result in the need for said DISTINCT addition and the extra lookups. |
15:00 |
|
kbutler joined #evergreen |
15:15 |
Bmagic_ |
tsbere: OMG, that query returns 5929 rows, with all counts=4! |
15:16 |
berick |
tsbere++ |
15:17 |
tsbere |
Bmagic_: You need to fix that. ;) autogen.sh -u may work? |
15:18 |
Bmagic_ |
tsbere: ok, that did fix that |
15:19 |
tsbere |
Bmagic_: In that case your "4 times" issue is likely fixed |
15:20 |
Bmagic_ |
tsbere: testing |
15:25 |
Bmagic_ |
tsbere: yep, solved |
15:25 |
Bmagic_ |
tsbere++ |
15:26 |
Bmagic_ |
tsbere: can you help me understand why the DB returned those rows in that function? |
15:26 |
Bmagic_ |
(I don't think we run autogen with -u switch) |
15:26 |
tsbere |
Bmagic_: Your org unit proximities were doubled up for some reason. You should only have one row for each from/to combination, you had 4, but likely only one for each org unit to itself. |
15:27 |
tsbere |
Thus, when checking in at the "home" library you got one back. When checking in anywhere else you got four. Thus, "extra" holds. The autogen.sh script, when run with -u, *empties* the table and re-builds the proximities. |
15:28 |
Bmagic_ |
tsbere: if we pass -u to autogen, does it do everything it does without the -u switch? |
15:29 |
* berick |
wonders how they get doubled up |
15:29 |
tsbere |
Bmagic_: Generally you don't need to pass -u to autogen. Something went wrong for you in the past, creating the extra entries. In this case passing -u to tell it to do that extra bit was a "fix the broken table contents" step. |
15:31 |
dbwells |
tsbere++ |
15:38 |
Bmagic_ |
LOL it turns out the extra look ups didn't contribute much to the total amount of time!! 13.860 down to 10.929 seconds |
15:38 |
Bmagic_ |
DB caching maybe? |
15:42 |
|
hopkinsju joined #evergreen |
15:42 |
|
Bmagic joined #evergreen |
15:43 |
Bmagic |
jeff: so I guess this is not a bug? |
15:44 |
jeff |
i'm with berick in wondering how they were doubled up. |
15:44 |
Bmagic |
that would be nice to know. Perhaps through an upgrade? |
15:44 |
jeff |
i don't know how worth it it would be to do either/both of 1) preventing the possibility of them being doubledup and/or 2) preventing their doubling-up from impacting that process |
15:44 |
Bmagic |
Like I said, I don't remember the last time we ran autogen -u - probably 2 years |
15:46 |
jeff |
bringing settings-tester (or something like it) back in vogue and teaching it things like "hey, your org unit proximities are doubled -- you should run autogen.sh -u" :-) |
15:48 |
|
doug_evanced joined #evergreen |
15:49 |
* jeff |
senses a SIP question |
15:58 |
jeff |
...or possibly not. |
15:58 |
jeff |
greetings, doug! |
16:00 |
doug_evanced |
hi, just browsing |
16:02 |
|
julialima_ left #evergreen |
16:11 |
* dbs |
thought "-u" was turned into a no-op years ago |
16:12 |
dbs |
guess not! |
16:34 |
|
eby joined #evergreen |
16:45 |
|
mdriscoll1 left #evergreen |
17:03 |
pinesol_green |
Incoming from qatests: Test Success - http://testing.evergreen-ils.org/~live/test.html <http://testing.evergreen-ils.org/~live/test.html> |
17:07 |
|
mmorgan left #evergreen |
18:16 |
|
mrpeters left #evergreen |
20:09 |
|
maryj joined #evergreen |
20:34 |
|
dcook joined #evergreen |
21:21 |
|
finnx joined #evergreen |
22:23 |
|
finnx left #evergreen |