IRC log for #evergreen, 2016-02-16

All times shown according to the server's local time.

Time	Nick	Message
06:40		rlefaive joined #evergreen
07:44		mrpeters joined #evergreen
07:44		JBoyer joined #evergreen
08:02	* csharp	wishes he had a dime for every time he says "God, I f*-ing hate reports" under his breath
08:24	tsbere	csharp: Why are you wishing small? Wish for $100 for every time. :P
08:28	jeff	if i had a nickel for every time someone had intentionally run an Evergreen reporter report in the last year, I'd have... hrm. $0.05.
08:28		collum joined #evergreen
08:29	* tsbere	comments on bug 1486592 but doesn't examine it much more closely
08:29	pinesol_green	Launchpad bug 1486592 in Evergreen "Copies in concerto data should have prices" [Wishlist,Confirmed] https://launchpad.net/bugs/1486592
08:31		Dyrcona joined #evergreen
08:36	jeff	based on the longer-than-usual delay between "job started" and "job's done" emails hitting my inbox, i think our statewide resource sharing catalog bibs are loading properly again.
08:42	Dyrcona	jeff: That's good. I usually have those jobs spit out what is going on, including any errors.
08:42	Dyrcona	Some jobs I only have it send me the errors.
08:43		rjackson_isl joined #evergreen
08:45	jeff	yeah, this is on a system we don't control. we just deliver the files, and at the usual time each day i get a series of three emails for each file.
08:45	jeff	begin, success (with a detailed log), end
08:45	jeff	or sometimes: begin, no files to load, end
08:47	jeff	and every so often, begin, failure, end.
08:47	jeff	good to have checks for things like "are we sending them files" and "are they loading files"
08:47	jeff	but what i was not checking for was "is the file we are sending them identical to the last file we sent them?"
08:48		mmorgan joined #evergreen
08:55		rlefaive joined #evergreen
09:00	Dyrcona	I've got one that checks for zero byte files, 'cause we may not have added new records on a weekend.
09:01	Dyrcona	But, I don't worry if the files are identical, because they shouldn't be. That would typically mean going a month without us adding or deleting records and that's inconceivable. :)
09:03	jeff	yeah, in this case it was a sequence conflict on a state table that's used to generate incrementals.
09:03	jeff	meaning no new entries in the state table, so for the past several mornings the update file had been identical.
09:04	jeff	so, the better thing to check going forward will be "are there new entries in the state table?" :-)
09:04	Dyrcona	Sounds like it, but I don't know exactly what your process is.
09:05	jeff	Dyrcona: it's one of those processes that i wouldn't know exactly what the process is if i didn't have it written down.
09:05	Dyrcona	:)
09:06	Dyrcona	If it has to do with ILL, I thought I knew the process, but now I'm not so sure.
09:06	jeff	shifting gears a bit, for those on Apache 2.4 did you find that you need to further limit the number of requests each Apache child processes before being recycled?
09:06	Dyrcona	Let me check.
09:07	jeff	as in, use a higher value for MaxConnectionsPerChild / MaxRequestsPerChild than before?
09:07	Dyrcona	I think we had increased things dramatically, but when we split so public and staff were on different bricks, we had to let Apache drop to defaults on public to avoice crashes.
09:07	jeff	default is 0, meaning no limit to "the amount of memory that process can consume by (accidental) memory leakage" :-)
09:09	jeff	on Debian Jessie with Apache 2.4 and recent master, I'm able to run a 4 GB machine out of memory rather quickly. I'm upping the RAM for the VM, but adjusting MaxConnectionsPerChild does seem to quite effectively resolve the symptoms.
09:10	Dyrcona	jeff: Yeah, on public I recently set MaxConnectionsPerChild to 1000, MaxSpareServers 20, MaxRequestWorkers 120, the rest are at defaults.
09:11	Dyrcona	By recently, I mean the file was last changed on Dec. 1.
09:11	* jeff	nods
09:11	jeff	thanks for looking!
09:11	Dyrcona	IIRC, it took a few weeks of tinkering.
09:11	Dyrcona	I'll check our staff brick, too. It's configuration is different.
09:11	jeff	do you recall if "hey, we're running out of memory" was the primary motivation behind the change?
09:12	Dyrcona	Yes, it was.
09:12	Dyrcona	Sometimes it was running out of memory. Other times, the box looked mostly OK, but load was stratospheric.
09:13	Dyrcona	This VM has 24GB of RAM configured, too. :)
09:13	Dyrcona	Presently, it is using 6.6GB, or 3.0GB +/- buffers and cache.
09:14	Dyrcona	It also has 24GB of swap, and yes we were seeing all of it consumed.
09:15	Dyrcona	We could probably drop the RAM down to 16GB or maybe as low as 8GB at this point.
09:15		jwoodard joined #evergreen
09:16	Dyrcona	The Apache vm on the staff brick has the same memory configuration, but is using 12GB (11GB +/- buffers and cache).
09:16	Dyrcona	They're both using about 10MB of swap, btw.
09:17	jeff	which mostly means that something pressured things to swap at some time probably long ago. :-)
09:17	Dyrcona	Yes, I don't worry about swap usage much. It doesn't seem to affect performance until you hit about 75%.
09:18	Dyrcona	It's probably just some data that was needed at startup and might be needed later.
09:18	Dyrcona	So, for the staff brick things are a bit different.
09:19	Dyrcona	We StartServers 100, MinSpareServers 25, MaxSpareServers 150, MaxRequestWorkers 500, and MaxConnectionsPerChild 1000.
09:20	Dyrcona	As I recall, the default for MaxRequestWorkers is 256, and I was seeing us starting to run out of RAM on the public side with around 150 Apache processes going.
09:20	Dyrcona	Thus, I chose 120 for public.
09:21	Dyrcona	We've found that the use patterns for the OPAC and the staff client are very different.
09:21	Dyrcona	So, splitting the bricks and configuring Apache differently for each has really helped.
09:22	Dyrcona	The other thing is, we had a lot of idle Apache processes on the public side with our original settings.
09:22	Dyrcona	We have fewer idlers now.
09:24	jeff	since in my experience "bricks" often gets applied to different things, can you clarify your usage? :-)
09:26	tsbere	jeff: We have two 3 vm "bricks", one for public and one for staff.
09:26	tsbere	(we also have single utility and sip vms)
09:27	tsbere	No load balancing or anything
09:28	jeff	with no load balancing, what components run on each of the three VMs in a given brick?
09:28	jeff	is public apache all on one of the three VMs in the public brick?
09:29	jeff	or am i failing to understand your answer?
09:29	tsbere	For both 3 vm bricks Apache is one vm, ejabberd/router the second (settings as well), all other drones on the third
09:29	jeff	okay, got it.
09:31	Dyrcona	Yeah, not really what people think of as the traditional brick setup.
09:32		RoganH joined #evergreen
09:32	jeff	'swhy i asked. :-)
09:32	Dyrcona	It seems to work for us that way, and certainly better than when we ran it all on 1 machine.
09:32	* jeff	nods
09:33	jeff	how many physical machines are you spread across now?
09:33	Dyrcona	'Course we later found out that the RAID on that 1 machine was switching between spares because one of the main drives had died.
09:33	Dyrcona	Counting the database server, 3.
09:34	Dyrcona	There's one physical machine for each "brick."
09:35	jeff	and the sip and utility VMs are shoved in there somewhere also?
09:35	Dyrcona	Yes. The utility runs on the staff brick hardware.
09:36	Dyrcona	And sip runs on the public side.
09:36	* Dyrcona	had to check to make sure.
09:36	jeff	heh
09:36		jvwoolf joined #evergreen
09:36	Dyrcona	tsbere is away from his desk. He would just know the answer.
09:37	Dyrcona	Four vms on each physical machine.
09:37	tsbere	Note that the sip/utility vms are standalone bricks themselves
09:37	tsbere	Also, if you want to count NFS/Logging there are 4 machines
09:37	jeff	heh
09:38	jeff	nfs/logging on a fourth physical machine? did you stick memcached there also, or somewhere else?
09:38	tsbere	I think I stuck memcached on the DB server, actually
09:38	Dyrcona	Is memcached still running on the db server?
09:38	Dyrcona	heh
09:38	Dyrcona	I forgot that sip and utility run their own ejabberd. I remembered that they do run drones and listeners.
09:39	* tsbere	wanted the "public" side to work even when the NFS box was down, so it has copies of all the configs instead of symlinks
09:39	tsbere	That includes sip, the utility vm and the staff brick all use NFS-hosted files directly
09:39	Dyrcona	The logging box would be an OK place for memcached, though.
09:39	jeff	with a single staff VM for all staff apache processes, what kinds of things do you end up using NFS for?
09:40	jeff	usual things like reporter output don't seem to apply, though maybe if you're running clark on the utility VM...
09:40	tsbere	jeff: Well, for starters, reports and utility stuff end up moving from the utility VM to the staff apache VM
09:41	Dyrcona	So, yeah, we do run clark on the utility vm.
09:41	tsbere	jeff: Also, I think there is at least one part of the system that apache writes files but drones read them, thus those VMs need to talk to NFS
09:47		maryj joined #evergreen
09:51	jeff	tsbere: ah, probably something in vandelay?
09:51	* jeff	tries to think
09:52	tsbere	jeff: Don't recall if it was vandelay or offline circ
09:52	tsbere	That being a programmer's or, meaning it could be both. :P
09:53	jeff	heh
10:01		yboston joined #evergreen
10:01		rlefaive_ joined #evergreen
10:20		Christineb joined #evergreen
10:33		Guest16800 left #evergreen
12:01		rlefaive joined #evergreen
12:02		bmills joined #evergreen
12:09		jihpringle joined #evergreen
12:49	pinesol_green	[evergreen\|Jason Stephenson] LP 1499123: Add release notes. - <http://git.evergreen-ils.org/?p=Evergreen.git;a=commit;h=eabd816>
12:49	pinesol_green	[evergreen\|Jason Stephenson] LP 1499123: Modify Perl code for csp.ignore_proximity field. - <http://git.evergreen-ils.org/?p=Evergreen.git;a=commit;h=831a808>
12:49	pinesol_green	[evergreen\|Jason Stephenson] LP 1499123: Add ignore_proximity to config.standing_penalty. - <http://git.evergreen-ils.org/?p=Evergreen.git;a=commit;h=63205ed>
13:23		bmills1 joined #evergreen
13:32		bmills joined #evergreen
13:53	berick	Dyrcona: FYI @ http://git.evergreen-ils.org/?p=working/random.git;a=shortlog;h=refs/heads/collab/berick/pingest -- adding some options to your record ingest script. let me know if you ever put it on github or similar, i'll send a pull request.
13:55	Dyrcona	berick: Cool! I'll have to give your changes a try some time, soon.
14:09		Stompro joined #evergreen
14:09	kmlussier	@hate scheduling meetings
14:09	pinesol_green	kmlussier: But kmlussier already hates scheduling meetings!
14:10	berick	@reallyhate
14:10	pinesol_green	berick: Try restarting apache.
14:10	kmlussier	@loathe scheduling meetings
14:10	pinesol_green	kmlussier: http://www.firstpersontetris.com/
14:15	Dyrcona	git apply is not being my friend.
14:17	Dyrcona	I downloaded bericks change above as a patch but git refuses to apply it.
14:17	Dyrcona	At first it complains about a whitespace change.
14:17	Dyrcona	After I fix that it, it says nothing but the file is upatched.
14:18	berick	maybe a -p or --directory param?
14:18	berick	the path is different
14:18	Dyrcona	Yeah, I did it in the directory where the file lives. I'll try -p.
14:21	Dyrcona	-p 1 didn't help, but doing it from my root and secifying --directory=./perl did.
14:22	berick	cool
14:26	Dyrcona	Patch applied in a test branch. I'll have to test it later this week.
14:27	jeff	oh hey, i got this to happen again:
14:28	jeff	OpenSRF Drone [open-ils.search]
14:28	jeff	\_ OpenSRF Drone [open-ils.search]
14:28	jeff	\_ OpenSRF Drone [open-ils.search]
14:32	jeff	alas:
14:32	jeff	Your search - "opensrf listener" "gets confused" "thinks it's a drone" - did not match any documents.
14:33	gmcharlt	jeff: does it seem to actually be confused, or is it just that the process name is wrong?
14:33	jeff	ejabberd sends it a message, it ignores it.
14:33	* Dyrcona	wonders if it didn't die and the oldest child somehow got promoted, not sure if each becomes its own process group leader.
14:34	* Dyrcona	thinks they do, so that shouldn't happen, but....
14:34	jeff	pid and process start time indicate that it is the Listener process, but it has changed name and behavior (but not pid or process start time)
14:34	Dyrcona	jeff: OK. That is strange. I've never seen that happen.
14:35	jeff	unusual circumstances: i've been intentionally running this machine out of memory by hammering it with tpac search requests, and OOM killer has killed apache processes.
14:35	Dyrcona	Right. I thought that might be part of it.
14:36	jeff	debian jessie, recent master of both opensrf and evergreen.
14:39	Dyrcona	Oh, beauty... git log -p reveals I have a byte order marker in a SQL query.
14:39	jeff	i can't reliably reproduce outside of the "i've done it twice in the past two days"
14:39	Dyrcona	Well, you're deliberately over stressing the system. I'd expect the listener to just die, though, and not start acting like a drone.
14:40	jeff	i'm wondering if the behavior could result from a failed attempt to fork.
14:40	jeff	that might be way off base, though.
14:41	Dyrcona	Query still works, though.
14:42	Dyrcona	I'm not sure what would happen in that case.
14:42	Dyrcona	Typically, if fork fails, you exit your program.
14:43	jeff	oh, that's exactly what's happening.
14:43	jeff	$child->{pid} = fork();
14:43	jeff	perl fork() returns undef if the fork failed.
14:43	jeff	if($child->{pid}) { # parent process
14:43	jeff	...
14:43	Dyrcona	Yes.
14:43	jeff	} else { # child process
14:43	Dyrcona	That looks like the culprit. Good catch!
14:44	jeff	and in that else $child->{pid} gets set to $$ and we eval $child->init, which is where $0 gets set to OpenSRF Drone [$service]
14:46	jeff	we should probably treat 0 and undef differently.
14:46	Dyrcona	Yes, definitely.
14:47	Dyrcona	It would probably be safe to log the failure to fork a new child and then do nothing if the result of fork is undef.
14:47	* jeff	nods
14:48	jeff	of course, that could lead to a situation where we chew CPU because there's no memory, so perhaps a sleep or some other backoff would be suitable also.
14:48	jeff	but at that point, you're probably already in deep trouble.
14:50	Dyrcona	Yeah, when you're out of resources and can't fork a process, it might be a good idea to hang it up and go home. :)
14:55	kmlussier	Calling 0951
15:01	* Dyrcona	decided to fire off a pingest.pl test on his dev vm anyway.
15:02		krvmga joined #evergreen
15:03	pinesol_green	[evergreen\|Kathy Lussier] LP 1499123: Stamping upgrade script for standing-penalty-ignore-proximity - <http://git.evergreen-ils.org/?p=Evergreen.git;a=commit;h=c1b64bf>
15:06		mmorgan1 joined #evergreen
16:01		jlitrell joined #evergreen
16:05	Dyrcona	berick++ I added and pushed your patch for pingest.pl.
16:05	pinesol_green	[evergreen\|Chris Sharp] LP#1486592 - Generate prices for concerto dataset. - <http://git.evergreen-ils.org/?p=Evergreen.git;a=commit;h=05e9a08>
16:06	berick	Dyrcona++
16:07		mmorgan joined #evergreen
16:08		vlewis joined #evergreen
16:09		rlefaive joined #evergreen
16:23	JBoyer	gmcharlt++ # Helping Clark overcome amnesia
16:50		vlewis_ joined #evergreen
16:53		vlewis joined #evergreen
16:57		vlewis_ joined #evergreen
17:07		ddale joined #evergreen
17:07		mmorgan left #evergreen
17:07	kmlussier	gmcharlt: I was just looking at bug 1067823 again. Do you know why MARC tag 659 was added to the definition for the genre field? I would have thought we would be using the mods definition there, and I don't see any documentation that points to 659 being a genre field.
17:07	pinesol_green	Launchpad bug 1067823 in Evergreen "tpac: genre links in record details page launch subject search" [Medium,Confirmed] https://launchpad.net/bugs/1067823 - Assigned to Kathy Lussier (klussier)
17:10	gmcharlt	kmlussier: as near as I can tell, it's a legacy of the NLM using 659 for genre back in the days when dinosaurs roamed the earth
17:11	gmcharlt	and my including it in that patch was basically just cargo-culting fromOpen-ILS/src/templates/opac/parts/record/subjects.tt2
17:11	gmcharlt	all of that said: I agree it's non-standard, and I certainly have no object to just sticking with 655
17:11	gmcharlt	*objectin
17:11	gmcharlt	*objection
17:11	gmcharlt	@coffee me
17:11	* pinesol_green	brews and pours a cup of Bonsai Blend Espresso, and sends it sliding down the bar to me
17:14	kmlussier	gmcharlt: OK, I'll play with that branch a bit.
17:14	kmlussier	@coffee gmcharlt
17:14	* pinesol_green	brews and pours a cup of Ethiopia Yirgacheffe, and sends it sliding down the bar to gmcharlt
17:14	kmlussier	Good luck sleeping on that
17:14	* gmcharlt	buzzes
17:40	kmlussier	I actually see two records in a production system with a 659. But, in one case, I'm pretty sure it's a mistake since I've never heard of a genre called "shark attacks"
17:41	gmcharlt	kmlussier: you must not watch the Syfy channel ;)
17:41	gmcharlt	but yeah, 2 records is just noise
18:28		jihpringle_ joined #evergreen
18:30		dluch_ joined #evergreen
18:32		dbs_ joined #evergreen
18:32		berick_ joined #evergreen
18:37	jwoodard	@decide haiku or no
18:37	pinesol_green	jwoodard: go with haiku
18:44	jwoodard	Warm wind blowing free, February moves onward, birds chirp happily.
18:44		Bmagic joined #evergreen
18:44		hopkinsju joined #evergreen
18:44		dluch joined #evergreen
18:45		_bott_ joined #evergreen
19:53		book` joined #evergreen
20:15	jlitrell	Ahh, February / Snow, snow, snow, rain, rain, snow, rain / I can't feel my feet.
20:17	jeff	heh
20:17	jeff	jwoodard++ jlitrell++
20:27		bmills joined #evergreen
21:55		scrawler joined #evergreen
21:56	scrawler	anybody here this evening?
21:57	scrawler	see you later...
21:57		scrawler left #evergreen
21:57	jeff	patience...
22:09		phil___ joined #evergreen