IRC log for #evergreen, 2015-01-26

All times shown according to the server's local time.

Time	Nick	Message
01:43		RBecker joined #evergreen
06:07		temur joined #evergreen
06:28		Temur joined #evergreen
06:32		Temur joined #evergreen
07:00		jboyer-isl joined #evergreen
07:32		graced joined #evergreen
07:49		BigRig joined #evergreen
07:52		jeff joined #evergreen
07:52		jeff joined #evergreen
07:53		mtate joined #evergreen
07:55		julialima_ joined #evergreen
08:00		pmurray joined #evergreen
08:02		rjackson-isl joined #evergreen
08:07		collum joined #evergreen
08:23		akilsdonk joined #evergreen
08:30		mdriscoll1 joined #evergreen
08:40		Shae joined #evergreen
08:40		StomproJ joined #evergreen
08:40		Dyrcona joined #evergreen
08:42		Stompro joined #evergreen
08:43		mmorgan joined #evergreen
08:43		abowling joined #evergreen
08:44	Dyrcona	Good morning, #evergreen!
08:46	mmorgan	Good Morning!
08:59		yboston joined #evergreen
09:03		mrpeters joined #evergreen
09:13	Dyrcona	Blizzard warning and for some reason, I'm stoked.
09:18	Dyrcona	I'm not usually a fan of snow, but I'm kind of looking forward to it.
09:18	Dyrcona	The snow we got this weekend was so pretty.
09:19	* mmorgan	hears that the coming snow will be a lot prettier ;-)
09:19	dbs	moin
09:20	Dyrcona	mmorgan++
09:20	Dyrcona	dbs is no stranger to snow, eh?
09:20	* Dyrcona	works on a scrip to shutdown webrick.
09:20	Dyrcona	a script even.
09:22	dbs	Dyrcona: true true! but haven't had a 2-3 foot dump for quite some time. been skating on the lake and nordic skiing for the past month though; loving it
09:29	Dyrcona	Sounds fun! I've never been skiing, and the rivers don't freeze here like they used to.
09:30		RoganH joined #evergreen
09:30	Stompro	Anyone used pg barman with your EG database? http://www.pgbarman.org/
09:30	Dyrcona	And a script to stop webrick is only 7 lines long, including two blank lines and a sanity check that ps -C ruby -o pid= actually found a running process.
09:36	Dyrcona	As tsbere points out to me in a private chat, `ps -C ruby -o pid,args \| grep edi_webrick \| cut -f1 -d\ ` is safer, in case we run more than 1 ruby process.
09:38	Dyrcona	No, it's not for an init script.
09:47		rjackson-isl joined #evergreen
10:26	Dyrcona	This is the theme song for our Evergreen server this morning: https://www.youtube.com/watch?v=a1IyrVeX0W8
10:27	Dyrcona	It keeps hitting 200 cstores and then backing down.
10:31	dbs	Argh. That's so frustrating :(
10:33	berick	Dyrcona: have you applied any patches?
10:33	Dyrcona	berick: Yes.
10:34	Dyrcona	When we go down again, we're going apply all of the latest O/S updated and force reinstall/update all CPAN packages as well.
10:34	berick	yowza
10:34	Dyrcona	Our current thinking is the problem lies with the packages.
10:35	Dyrcona	Something from CPAN could be out of sync with a C library that was updated.
10:35	Dyrcona	I actually want it to go down, today.
10:36	Dyrcona	But, now we're back down to 83 cstore drones total across two servers.
10:36	Dyrcona	We're going to get new hardware soon, and we'll configure things very differently.
10:38	Dyrcona	Thing is, when the load hits 200, you'd expect to see 16 or so processes in top each 100% or more of CPU.
10:38	Dyrcona	That doesn't happen.
10:38	berick	Dyrcona: would it possible to share your log files?
10:38	berick	well, they donres are probably sitting there waiting for the processes that connected to them to do soemthing
10:39	berick	but said process has probably forgotton abou tthem
10:39	Dyrcona	berick: I could possibly share them.
10:39	berick	so they are just hanging out waitin to time out
10:39	Dyrcona	Well, I don't think CStoreEditor is the problem, though we should still probably clean up the code.
10:40	Dyrcona	I mentioned last week that I added a warn to the destructor and did some tests on my dev machine and every other line (not quite) in the logs was the destructor being called.
10:41	berick	that's great, but doesn't help when the destructor isn't called, like when refs are still in scope
10:42	Dyrcona	And now, we're down to 41 cstore drones.
10:42	Dyrcona	Thing is, I haven't found anywhere the refs look like they would still be in scope.
10:42	Dyrcona	If I do, I'll put that line back in and see if I can trigger that code.
10:42	berick	Dyrcona: if you could share the osrfsys and activity logs collected during one of the explosions, that would be great
10:59		akilsdonk joined #evergreen
11:01		mrpeters joined #evergreen
11:12	bshum	Ugh, more reingesting.
11:12	bshum	gmcharlt++ # patch for LP 1414112
11:12	pinesol_green	Launchpad bug 1414112 in Evergreen 2.7 "Audience filter no longer searching blank spaces" (affected: 2, heat: 12) [Medium,Confirmed] https://launchpad.net/bugs/1414112
11:13	goood	bshum: we can add audience without a reingest, don't worry. I'll whip up an update statement
11:13	goood	and also,
11:13	goood	gmcharlt++
11:25	goood	ah, we'll need one more change for that... I'll push a collab branch
11:38		RoganH joined #evergreen
11:47		vlewis joined #evergreen
11:48		sandbergja joined #evergreen
11:49		vlewis_ joined #evergreen
11:54		dreuther joined #evergreen
12:33		buzzy joined #evergreen
12:49		jihpringle joined #evergreen
13:27		gdunbar joined #evergreen
13:34	Bmagic_	Has anyone heard of OpenSRF performing 4 lookups per hold during item checkin? My logs show exactly 4 lookups for every potiential hold on an item during a checkin. It happens via SIP and via staff client (provided that the checkin_lib != call_number.owning_lib)
13:36	jeff	i have seen similar, yes.
13:37	jeff	it can lead to timeouts in scenarios where there are many (unsuitable due to age protection/etc) holds on the record for a copy.
13:38	bshum	Indeed.
13:38	jeff	i thought that in some very preliminary digging, signs had pointed to it being exacerbated by "clear shelf expired holds", but of course that wouldn't apply for SIP (or maybe it could be made to in some environments...)
13:39	bshum	For us i think it was dealing with large numbers of holds to check through. Opportunistic capture I think.
13:40	bshum	So the more holds there were, the longer it took in general to check in.
13:41	Bmagic_	bshum: jeff: yes, however, even if there is only one hold, it still looks 4 times but you don't notice it because it doesn't take very long
13:41	* jeff	nods
13:41	bshum	Bmagic_: That doesn't surprise me. The checking code is weird....
13:42	Bmagic_	interestingly enough, it only checks once if you are checking the item in at it's home library
13:44	Bmagic_	I have done the work to isolate the cases, and I have log files for each senario, is this a bug?
13:45	jeff	i recommend you open a bug (after doing your own check to see if there's one already, but I don't think there is)
13:46	jeff	i'll try to give it some eyes also, since we've observed similar.
13:47	Bmagic_	jeff: ok, I will check that out. Not sure how to phrase my search terms....
13:47	* jeff	nods
13:49	Bmagic_	jeff: I'm creating a new bug
13:51	tsbere	jeff: I will point out that "clear shelf expired holds" should only add calls for copies that are flagged as on the hold shelf, so any other situation shouldn't do much more there.
13:51	berick	Bmagic_: can you clarify what you mean by "it looks 4 times"
13:52	jeff	tsbere: pretty sure what i saw agrees with your statement, yes.
13:52	Bmagic_	berick: grep "checking if hold" osrfsys.log yeilds [INFO:16784:Holds.pm:3144:1422189177127601] circulator: checking if hold 216012 is permitted for copy PPL60121 x4
13:53	tsbere	Bmagic_: Some of those may actually end up being dupes (different pieces logging the same call) though I can't be certain without looking closer.
13:54	Bmagic_	tsbere: weird that it only looks once when checkin occurs at circ_lib
13:54	Bmagic_	s/looks/logs/
13:54	* Dyrcona	hasn't been this excited about a snow day since he was in primary school.
13:55	berick	Bmagic_: and it's checking the same hold ID with every call?
13:55	Bmagic_	berick: indeed
13:56	Bmagic_	4 times
13:56	Bmagic_	always exactly 4
13:56	Dyrcona	Bmagic_: It's being thorough, like checking fines twice on a renewal.
13:56	Bmagic_	Dyrcona: lol no doubt
13:57	Bmagic_	It's an issue when there are many potential holds. It can take so long in fact, the SIP client will time out. I have logs from SIP that log the response time, clocked at >15 seconds
13:58	berick	many potential holds doesn't explain why it would check the same hold more than once
13:58	berick	Bmagic_: can you tell if open-ils.storage.action.hold_request.nearest_hold.atomic is called once or multiple times?
13:59	Bmagic_	berick: I am not saying that the cause is due to many holds. I think the cause is due to item checkin_lib != item owning_lib
13:59	Bmagic_	berick: because it only checks the potential hold once if you are checking the item in at it's circ_lib/owning_lib
14:00	tsbere	Bmagic_: Are you using the "if things haven't gone home for interval whatever send them home" features?
14:00	Bmagic_	http://pastebin.com/snAW2WkF
14:02	Bmagic_	tsbere: "Max foreign-circulation time" ?
14:04	Bmagic_	berick: open-ils.storage.action.hold_request.nearest_hold.atomic is called once
14:06	berick	yeah...
14:06	berick	that suggests that open-ils.storage.action.hold_request.nearest_hold needs to be unique-ifying its return list
14:06	Bmagic_	afk
14:06	bshum	Interesting...
14:08	RoganH	un_helpful advice: wipe your server, start over
14:12	berick	i wonder if "SELECT DISTINCT(h.id)" in action:nearest_hold would do it
14:17	tsbere	berick: Just adding DISTINCT before the existing h.id should work (distinct the entire column list....which is one column)
14:20	berick	tsbere: yeah, that's what I meant. just insert DISTINCT
14:21	* berick	wonders if Bmagic_ is up for an experiment
14:22	tsbere	Bmagic_: Out of curiosity, does this query result in anything for you? SELECT count(id), from_org, to_org FROM actor.org_unit_proximity GROUP BY from_org, to_org HAVING count(id) > 1;
14:23	Bmagic_	I'll check when I get back. Bout 45 minutes
14:24	* dbs	gets zero rows for that
14:24	tsbere	dbs: As far as I know you should get zero rows. Uniqeness of from/to org combinations is not, however, guaranteed at the DB level...
14:25	* dbs	wipes brow in relief
14:25	tsbere	Which could result in extra rows when looking up holds, which would result in the need for said DISTINCT addition and the extra lookups.
15:00		kbutler joined #evergreen
15:15	Bmagic_	tsbere: OMG, that query returns 5929 rows, with all counts=4!
15:16	berick	tsbere++
15:17	tsbere	Bmagic_: You need to fix that. ;) autogen.sh -u may work?
15:18	Bmagic_	tsbere: ok, that did fix that
15:19	tsbere	Bmagic_: In that case your "4 times" issue is likely fixed
15:20	Bmagic_	tsbere: testing
15:25	Bmagic_	tsbere: yep, solved
15:25	Bmagic_	tsbere++
15:26	Bmagic_	tsbere: can you help me understand why the DB returned those rows in that function?
15:26	Bmagic_	(I don't think we run autogen with -u switch)
15:26	tsbere	Bmagic_: Your org unit proximities were doubled up for some reason. You should only have one row for each from/to combination, you had 4, but likely only one for each org unit to itself.
15:27	tsbere	Thus, when checking in at the "home" library you got one back. When checking in anywhere else you got four. Thus, "extra" holds. The autogen.sh script, when run with -u, empties the table and re-builds the proximities.
15:28	Bmagic_	tsbere: if we pass -u to autogen, does it do everything it does without the -u switch?
15:29	* berick	wonders how they get doubled up
15:29	tsbere	Bmagic_: Generally you don't need to pass -u to autogen. Something went wrong for you in the past, creating the extra entries. In this case passing -u to tell it to do that extra bit was a "fix the broken table contents" step.
15:31	dbwells	tsbere++
15:38	Bmagic_	LOL it turns out the extra look ups didn't contribute much to the total amount of time!! 13.860 down to 10.929 seconds
15:38	Bmagic_	DB caching maybe?
15:42		hopkinsju joined #evergreen
15:42		Bmagic joined #evergreen
15:43	Bmagic	jeff: so I guess this is not a bug?
15:44	jeff	i'm with berick in wondering how they were doubled up.
15:44	Bmagic	that would be nice to know. Perhaps through an upgrade?
15:44	jeff	i don't know how worth it it would be to do either/both of 1) preventing the possibility of them being doubledup and/or 2) preventing their doubling-up from impacting that process
15:44	Bmagic	Like I said, I don't remember the last time we ran autogen -u - probably 2 years
15:46	jeff	bringing settings-tester (or something like it) back in vogue and teaching it things like "hey, your org unit proximities are doubled -- you should run autogen.sh -u" :-)
15:48		doug_evanced joined #evergreen
15:49	* jeff	senses a SIP question
15:58	jeff	...or possibly not.
15:58	jeff	greetings, doug!
16:00	doug_evanced	hi, just browsing
16:02		julialima_ left #evergreen
16:11	* dbs	thought "-u" was turned into a no-op years ago
16:12	dbs	guess not!
16:34		eby joined #evergreen
16:45		mdriscoll1 left #evergreen
17:03	pinesol_green	Incoming from qatests: Test Success - http://testing.evergreen-ils.org/~live/test.html <http://testing.evergreen-ils.org/~live/test.html>
17:07		mmorgan left #evergreen
18:16		mrpeters left #evergreen
20:09		maryj joined #evergreen
20:34		dcook joined #evergreen
21:21		finnx joined #evergreen
22:23		finnx left #evergreen