IRC log for #evergreen, 2017-01-31

All times shown according to the server's local time.

Time	Nick	Message
00:28		dcook joined #evergreen
00:33		dcook__ joined #evergreen
05:00	pinesol_green	News from qatests: Test Success <http://testing.evergreen-ils.org/~live>
06:46		genpaku joined #evergreen
07:14		rjackson_isl joined #evergreen
07:17		dteston joined #evergreen
07:32		agoben joined #evergreen
08:09		kmlussier joined #evergreen
08:23		collum joined #evergreen
08:24	collum	For a completed SendEmail triggered event, if the status is 'invalid' is that an indication that there was no email address in the patron record at the time of the event?
08:25	collum	Sorry. Not Status. Event State rather.
08:26	csharp	collum: no - if there's no email address, the SendEmail reactor runs anyway, but fails when it tries to send the email - it doesn't go back and update the event
08:26	csharp	that's something I've been looking at this week
08:27	csharp	all A/T all the time over here
08:27	collum	Thanks csharp
08:28	csharp	collum: sure thing - also, you can see all the failed emails in your osrfsys logs as WARN messages
08:28	csharp	big ugly perl errors
08:29		remingtron joined #evergreen
08:29	JBoyer	collum, re: invalid, is there a Validator set for that event definition? They may have been things that were scheduled (active events) but no longer applied when it came time to run them.
08:35	collum	JBoyer - a hold notification for an item that's still on the hold shelf. Not checked-out. I check the logs.
08:36	collum	blah - I will check the logs. It must be morning.
08:36	kmlussier	@coffee collum
08:36	* pinesol_green	brews and pours a cup of Mocha Java, and sends it sliding down the bar to collum
08:37	collum	thanks kmlussier, that might help.
08:38	kmlussier	I always find pinesol_green's coffee woefully inadequate.
08:38	csharp	I would love for there to be things like "is there an email address/sms number?" in the validators (on top of the "is it still checked out/on the hold shelf?"-type validators)
08:39	csharp	maybe a cascading chain of validation steps would work rather than a single validator per event def
08:40	tsbere	csharp: I seem to recall adding in flags to have hold validators check for "is this option enabled on the hold" - Though I don't recall if the email one checks for an email address or not.
08:40	csharp	I know that it doesn't - nor do the SMS ones check for sms_notify fields
08:41		mmorgan joined #evergreen
08:41	csharp	it's creating a huge headache for me right now since our SMS "holds are ready" notices are still busted :-/
08:41	csharp	and the presence/absence of data in action.hold_request.sms_notify is key to the problem
08:42	miker	stacked collectors, validators, reactors, and cleanup-ers is high on my want-tuits-to-implement list
08:42	miker	fwiw, which ain't much, lacking tuits
08:43	tsbere	csharp: Well, taking a quick look at the code, the "check_sms_notify" flag for the hold validators looks for $hold->sms_notify to return a true value from a perl POV
08:43	csharp	the current behavior is that it kills off the trigger drone if there are enough events that don't have sms_notify undef, but adding the check I added in bug 1660059 is creating massive cstore proliferation
08:43	pinesol_green	Launchpad bug 1660059 in Evergreen "Action trigger mechanism not handling null/undef values for grouping field" [Undecided,New] https://launchpad.net/bugs/1660059
08:43	csharp	hmm
08:44	csharp	miker: good to know it's on your wish list :-)
08:45	csharp	I went from not knowing it was an issue at all to really needing a fix so I can get back to my regular job :-)
08:46	csharp	"what do you do?" "Well, I mainly babysit action_trigger notices..."
08:47	collum	JBoyer: validator = HoldIsAvailable. Guess I will check the logs to see if staff was doing something wacky, as well.
08:48		bos20k joined #evergreen
08:48	JBoyer	csharp, It looks to me like Validator functions are among the easier things to add, if you're interested. ;)
08:49	JBoyer	(I've had one in mind myself, to make sure we don't send people damaged item notices for items they've already paid for...)
08:49	csharp	JBoyer: I'm looking into that right now
08:49	csharp	(adding a validator that is)
08:50	JBoyer	csharp++
08:51	tsbere	csharp: Do your event defs have the "check_sms_notify" or similar parameters set to a true value?
08:51	mmorgan	csharp: Could you use an a_t_filter to filter out the holds with no sms_notify?
08:51	csharp	tsbere: looking now
08:52	* tsbere	keeps getting pulled away from his desk by other things, and is about to be pulled away again
08:52	csharp	mmorgan: I considered that, but the hold notifications are not passive, so no filters (that I'm aware of)
08:52	csharp	tsbere++ # thanks for a push in the right direction
08:52	mmorgan	Ah. gotcha.
08:53	csharp	ok - looks like we're missing those checks for notify settings
08:54	tsbere	csharp: check_email_notify looks for email_notify set to true, check_sms_notify looks for sms_notify set (but ignores carrier), and check_phone_notify looks for phone_notify set
08:54	csharp	awesome - looks like that will solve our problem
08:54	csharp	I'll experiment
08:54	tsbere	check_email_notify could probably be made to look to see if the user has an email address set, possibly, but right now they all only pay attention to the holds themselves
08:55		Dyrcona joined #evergreen
08:59	csharp	034fb7bc
08:59	pinesol_green	csharp: [evergreen\|Thomas Berezansky] Enable notify checking for holds in A/T Validators - <http://git.evergreen-ils.org/?p=Evergreen.git;a=commit;h=034fb7b>
09:09	csharp	okay - I'm a bit deflated because check_sms_notify is set in our event params :-(
09:09	jeff	i do enjoy how some of these titles end up in logs:
09:09	jeff	2017-01-30 16:01:01,534 INFO: SNOWDEN is ready for pickup at the Kingsley Branch Library
09:09	csharp	I think it's not processing though until after the problem I'm seeing manifests itself
09:10	jeff	2017-01-30 12:52:02,715 INFO: Multiple items including THE QUEEN OF DISTRACTION are ready for pickup at the Woodmere Branch Library
09:10	jeff	don't make royalty wait! go get her now!
09:10	csharp	jeff++
09:12		jvwoolf joined #evergreen
09:13	Dyrcona	:)
09:14	Dyrcona	csharp: I was going to say a few minutes ago that if the change you made in e6f20b3 causes cstore proliferation, then the problem is probably not where you think it is.
09:15	Dyrcona	Unless there is another commit on that branch that I didn't look at. :)
09:15		yboston joined #evergreen
09:16	Dyrcona	To follow up on something kenstir and I were talking about on Sunday: You can send output from rsyslog to a program, but the syntax is weird (in my opinion).
09:23		maryj joined #evergreen
09:23	csharp	Dyrcona: nope - that's the only commit
09:24	csharp	other area of possibility is that our utility server is running on very new, blazing fast hardware with a 10Gb connection to the DB - something we didn't have in place pre-upgrade
09:25	csharp	I'm wondering if OpenSRF is too fast for the DB
09:25	csharp	and also wondering if opensrf.xml offers any levers for mitigating that if so
09:25	csharp	the behavior is this:
09:26	csharp	1) A/T runner gathers all the events (putting them in 'collected' state)
09:27	csharp	2) in the same second that the collection finishes, 72 cstore drones are created (72 is our max_children) and we get "no children available" in the WARN log
09:27	csharp	which, of course, halts everything else going on on the utility server
09:28	csharp	I need to re-confirm if this is true, but the SMS hold notification post-patch was the only event_def causing this behavior
09:29	Dyrcona	csharp: We had something similar happen over Christmas weekend, but I was not able to determine the exact cause.
09:31	csharp	the faster hardware also caused an issue where our init script starts apache too quickly after starting opensrf, and that broke lots 'o stuff
09:32	csharp	adding a brief sleep between the two solved the problem
09:32	Dyrcona	I've seen that, too, by hand.
09:33	Dyrcona	Takes a little bit for all of the drones to get going, apparently.
09:43	jeff	if osrf_control --diagnostic (or new option like --status and --status-all) returned different exit status when drones were running vs not-running, you could probably incorporate that into your startup scripts.
09:44	jeff	as it is, you could probably just grep the output of --diagnostic
09:44	jeff	in systemd, probably in ExecStartPost for opensrf.
09:54	csharp	good idea
09:56	jeff	alternately, a --wait option for osrf_control, which might help in some situations where you don't have the ability to do something like ExecStartPost
10:09	Dyrcona	Yeah. But, if the children are daemonizing themselves, what do you wait on?
10:14	jeff	you wait on a sub that does --diagnostic until it sees everything has running drones.
10:16	jeff	(does what --diagnostic does, essentially)
10:26	* Dyrcona	has been having fun with the PID files in the new VMs...Seems it doesn't matter what you put in opensrf*.xml, opensrf-perl.pl (aka osrf_control) does what it wants. :)
10:26	Dyrcona	I think I mentioned that a few months (weeks) ago.
10:27	* Dyrcona	is just about to sed the path in opensrf-perl.pl once again.
10:28	berick	osrf_control doesn't read opensrf.xml, it queries opensrf.settings for all of that
10:28	berick	but it needs a pid dir to start opensrf.settings
10:28	berick	so it has its own pid dir
10:29	berick	don't remember if anything still reads the opensrf.xml pid dir
10:30	jeff	Dyrcona: osrf_control accepts a --pid-dir argument -- possibly better than your suggested sed workaround, unless there's a bug and it doesn't honor that argument?
10:31	Dyrcona	Yep. I'm aware of all that. sed is easy enough, since we still do things manually.
10:35	Dyrcona	And, done. :)
10:38	Dyrcona	I may just put this in a git branch in the future. We can't use the default location because of NFS, though I'm not sure we really need to share /openils across the brick heads and drones.
10:38	Dyrcona	Actually, I'm pretty sure that we don't.
10:41	Dyrcona	I've had enough fun so far that I think I'll leave that alone for now.
10:41	Dyrcona	When we "upgrade" the O/S on the VMs, then I'll drop the NFS share between brick head and drones.
10:42	* Dyrcona	waits for the all clear to finish setting up brick 5.
10:45		Christineb joined #evergreen
10:57	Dyrcona	And done! :)
10:59	csharp	berick: miker: any thoughts on my "hardware is too fast for cstore" theory?
11:03	berick	uh, what?
11:04	csharp	berick: sorry http://irc.evergreen-ils.org/evergreen/2017-01-31#i_286395
11:05		mmorgan joined #evergreen
11:08	berick	csharp: no, it's a problem with the commit
11:10	berick	$e->update_start starts a new transaction in standlone mode (which is the case here) and it knows to commit, but it's leaving the cstore connection open
11:10	csharp	ooooooh
11:10	berick	you just exposed a bug is all
11:11	berick	csharp: this should fix it.. https://gist.github.com/berick/c70c08580a5f4ff9a8b1b3997ac6bee0
11:12	berick	the ->commit will force a disconnect from cstore, unlike xact_commit, which just commits the transaction
11:12	csharp	berick++ # thank you!
11:13	Dyrcona	Ah, I missed that.
11:13	berick	er, update_state, not update_start
11:13	berick	you know what i mean
11:13	Dyrcona	And, I should have known ... :)
11:15	Dyrcona	Actually, the problem wasn't directly in csharp's commit... :)
11:15	Dyrcona	Anyway, good catch, berick++
11:15	Dyrcona	That should be filed on Lp.
11:16		sandbergja joined #evergreen
11:18	Dyrcona	I've occasionally wondered (well, at least twice) if xact_commit has any legitimate uses?
11:19	berick	it does
11:19	berick	it's great for long-lived connection w/ multiple transactions. we could live without it, but the code would be less efficient.
11:20	Dyrcona	OK
11:21		remingtron joined #evergreen
11:24	csharp	berick: I probably owe you more than one already, but I'm totally buying you a beer at the conference :-)
11:24	csharp	@beer berick
11:24	pinesol_green	csharp: Thank you csharp! But our princess is in another castle!
11:24	csharp	@bartender berick
11:24	* pinesol_green	fills a pint glass with Samuel Adams Black Lager, and sends it sliding down the bar to berick (http://beeradvocate.com/beer/profile/35/21300)
11:25	berick	csharp: but did you test it yet? :)
11:25	csharp	doing so now
11:25	csharp	:-)
11:25	Dyrcona	I don't think you have to test that. It will fix the problem. ;)
11:26	Dyrcona	I was ready to commit it after just eyeballing it. :)
11:27	* Dyrcona	wonders what's for lunch.
11:30	csharp	no errors - now going to wait a little while for more events to accumulate and run a bigger batch at once
11:31	csharp	spoiler alert for Terran's and my presentation on A/T: it's basically impossible to mock up - you don't know it really works until you're live
11:33	mmorgan	csharp: That's true of, well, just about everything ;-)
11:33		khuckins__ joined #evergreen
11:34	Dyrcona	:)
11:39	miker	berick: I don't think we want to disconnect after every single update_state ... we're intentionally trying to reuse the connection. events get updated a lot, and quickly. csharp's change created a new code path that doesn't do the right thing with some events.
11:42	miker	csharp: I don't agree that a grouping field should be nullable ... but, even accepting that, you should be "next"-ing after you set the event to invalid. See the "unless" block immediately above your change
11:43	miker	also, "get off my lawn!" ... (I know I sound grumpy, sorry)
11:44	Dyrcona	heh
11:48	berick	miker: is it normal for events to be in standalone mode for the main batch processing calls? i thought that was atypical. this is a situation where we have more events, each with their own cstore connections, than a typical server can support at one time. potentially many times more.
11:48	berick	either they need a shared editor or the editors have to disconnect
11:50	miker	berick: right, either would be fine. I'm objecting to the blanket disconnecting after every single update, forcing churn (and a good bit of overhead) on every /other/ update_state
11:50	berick	but again that's only in standalone mode
11:51	berick	where every event has its own cstore.
12:02	dteston	I wrote a script that accepts username and plaintext password, crypts the password the same way EG does, then checks them against the DB if anyone is interested.
12:03	miker	I'm not following why that's important. it's simply a side effect of event creation. if the caller passes an editor, the caller is in charge of transaction management. otherwise, it makes its own editor and should last for the lifetime of the event. all individual events are standalone, afaict
12:08	miker	If we're going to codify that the group field can be null (btw, what does that mean? are null-grouped events collected together? in csharp's patch they are kept around after being invalidated), but that they should be auto-marked invalid and discarded, then IMO we should invent a different api to invalidate them, rather than pushing the problem around. ... I'll offer a patch in a minute
12:09		brahmina joined #evergreen
12:21		dteston joined #evergreen
12:21		jihpringle joined #evergreen
12:23	berick	it looks like the only time we don't use standalone mode is when processing grouped events. (EventGroup::new_impl). so faulty assumption on my part. and presumably for non-grouped events, we are only doing stuff to one event at a time
12:24	berick	which avoids the cstore exhaustion problem
12:25	berick	so, setting aside whether patch makes sense, the solution to the problem would be a shared editor at the top of Trigger:grouped_events() that is inserted into every event (or at least every event where update_state has to be called) -- and force it to standalone=false
12:26	pastebot	"miker" at 64.57.241.14 pasted "external (batch) invalidation api" (73 lines) at http://paste.evergreen-ils.org/43
12:27	berick	heh, that
12:27	berick	miker++
12:27	miker	that'll work as either an instance method and provide the same immediate effect as your patch, or as a batch method to quickly close all them... :)
12:31	miker	happy to branchify that if "null group field value means event is invalid" is useful in the wild and we can't see drawbacks. (I can't think of breakage or objections, other than my historical "group field is non-null" memory)
12:32	miker	csharp: is there an LP bug to attach this to?
12:33	miker	nm, found it
14:55		mmorgan1 joined #evergreen
15:38	csharp	miker: oops - I left out the "next;" in my git branch - it was added to my local file :-/
15:41	csharp	miker: I'll test your branch in a bit and let you know how it goes
15:45		agoben joined #evergreen
15:55		khuckins_ joined #evergreen
15:59	csharp	miker: so putting aside the functionality of the patch(es), you're saying that it's "wrong" to group on a field that is nullable (in this case, sms_notify on action.hold_request)? I see that all other notices group on usr, but I'm assuming we group on sms_notify since that's a per-hold setting...
16:00	csharp	that is to say, if there's a better way to do this in the first place, I'm game :-)
16:01	jeff	can you articulate why you were grouping by sms_notify and not usr? how do the stock phone / pbx A/T event defs group?
16:02	berick	i'm fairly certain it is because sms_notify is per-hold and not per-user
16:02	berick	(a topic I know jeff loves)
16:03	jeff	heh
16:04	jeff	i guess i half-answered my question partway through my line, which is why i asked about the A/T pbx defs.
16:04	jeff	information that I have available to me, so i'll go look.
16:05		sykeslewis joined #evergreen
16:06	csharp	miker++ # awesome - it works greate
16:06	csharp	great even
16:06	csharp	greate is even better because, hey, extra "e"!
16:09	Dyrcona	:)
16:09	jeff	ah, the only stock AstCall A/T event def is for overdues, therefore no hold-level phone_notify to even consider.
16:09	jeffdavis	csharp: that must be one of those extra e's miker freed up when he switched back from using "eeevil" as a nick
16:10	jeff	csharp: the way you're implementing, is there anything that limits the number of SMS messages you send a given user+phone+pickup_lib in a given day?
16:11	csharp	jeffdavis: yes!
16:11		stephengwills joined #evergreen
16:11	csharp	jeff: nope - the cron for that granularity runs every half-hour - theoretically a person could get dinged multiple times per day
16:12	csharp	but given the limits of SMS, that seems ok to us
16:13	stephengwills	what would cause money.materialized_billable_xact_summary.xact_finish to take one a date before a bill was paid? I'm not seeing bill summaries in the staff client.
16:15	jeff	csharp: which limits of sms?
16:16	jeff	(not that there aren't many, just that i don't follow which ones relate to this subject)
16:16	csharp	jeff: cutting off text at a character limit was what I was thinking about
16:16	jeff	ah.
16:16	csharp	"you have 15 items on hold: Harry Potter and the..."
16:17	csharp	stephengwills: we used to have that happen all the time, but not over the past few releases
16:17	csharp	also...
16:17	* csharp	waves at stephengwills
16:18	jeff	yeah, we have records that come in from the state ILL system that don't have subfields in the 245, so their titles are long, usually truncated like:
16:18	jeff	2017-01-31 15:16:01,980 INFO: LETTERS FROM SINNERS & STRANGERS SOUND RECORDING EILEN JEWEL... is ready for pickup at the Woodmere Branch Library
16:18	* stephengwills	waves back.
16:19	stephengwills	it's driving our libraries crazy. I wronte a script to null it out when there is a date in there and the pmt amount is zero but I have to remember to run it periodically. would love to get it fixed.
16:19	jeff	and users with multiple items coming available in one day get a max of two messages (per pickup_lib+usr+number), with the second just looking like:
16:19	jeff	2017-01-31 15:00:02,194 INFO: Multiple items including LIZ AT MARIGOLD LAKE are ready for pickup at the Woodmere Branch Library
16:19	csharp	stephengwills: what release are you on?
16:19	stephengwills	2.8.3
16:20	csharp	hmm - those should've been fixed by then if I recall correctly
16:20	csharp	jeff: so how to you limit the number of messages?
16:21	jeff	currently there are 13 patrons that we're suppressing further notifications of for today, because they have >2 intervals where something came available, but we've already sent them their two messages for today.
16:21	jeff	csharp: we're cheating.
16:21	stephengwills	maybe I should grep for procs on that table? make sure they are up to date?
16:22	jeff	csharp: every 5 minutes we pull the state of the holdshelf from the db and dump it into an external db. we do sms and phone notifications from that.
16:22	csharp	ah - interesting approach
16:22	jeff	csharp: so it's a matter of saying "hey, have i sent notifications to this patron already today and they haven't come in to pick up their things? okay, don't send them more -- they're already on their way in, and we've already said the "Multiple items, including ITEM TWO..."
16:23	csharp	stephengwills: I would use the timestamp on one of the wrongly-closed bills and dig into the opensrf logs (may need to ratchet up the loglevel)
16:23	jeff	and i think i was wrong, and we're limiting to phone+pickup_lib, not phone+usr+pickup_lib.
16:23	csharp	cool
16:23	stephengwills	hmm... ok... will start there.
16:23	stephengwills	tx
16:24	stephengwills	btw... planning a trip to Atlanta in April. up for a beer if y'all around?
16:25	csharp	stephengwills: perhaps a soda & lime, but sure :-)
16:25	csharp	stephengwills: also, my pres on EG logs from 2015 might be a good reference: https://docs.google.com/document/d/1BJ7kSr5LfPkxXRhcrYjNyPlbT9PlNFHg6zL-tyetnKA/edit
16:25	stephengwills	sounds good.
16:26	csharp	esp. the part about threadtraces
16:26	stephengwills	ok thanks will check it out
16:27	* csharp	wanders off
16:32	jeffdavis	stephengwills: to be clear, is the concern that there is a value in xact_finish when the balance is non-zero, or that the timestamp on the last billing is later than the xact_finish date?
16:33	stephengwills	the former is the issue. libraries cannot see that there are any bills at all until they click into the bills detail view.
16:33	stephengwills	balance_owed reads as $0.00
16:33	jeffdavis	ok
16:35	jeffdavis	I see we have 29 materialized billable xact entries so far this year where xact_finish is non-null and balance_owed is non-zero (according to the db table; haven't looked in the client)
16:35	jeffdavis	we're on 2.10
16:36	jeffdavis	in most of those cases the last payment was a manual account adjustment, plus a few forgive payments and cash payments
16:36	jeffdavis	I'm not sure of the cause, but I wonder if the total balance_owed on all xacts for any of those patrons adds up to $0
16:38	jeffdavis	...I think I need more caffeine before I think too hard about this
16:38	stephengwills	in my case it appears as if the xact_finish date == the date the item was checked in. the bills are all unpaid. not even partial payments on them
16:51		mmorgan joined #evergreen
16:56		khuckins__ joined #evergreen
17:00	pinesol_green	News from qatests: Test Success <http://testing.evergreen-ils.org/~live>
17:02		mmorgan left #evergreen
17:15	jeffdavis	Are folks around these parts experiencing memory leak issues with the staff client?
17:17	jeffdavis	there are some reports of this kind of thing in bug 1110817 (and in bug 1086458 which was "fixed" circa 2.5) but I'm not sure of prevalence of that kind of issue
17:17	pinesol_green	Launchpad bug 1110817 in Evergreen "staff client patron search results continuously eats memory" [Medium,Incomplete] https://launchpad.net/bugs/1110817
17:17	pinesol_green	Launchpad bug 1086458 in Evergreen 2.4 "Staff client memory leaks in 2.3 and later" [High,Fix released] https://launchpad.net/bugs/1086458
17:18	csharp	jeffdavis: we hear occasional reports of high RAM usage that the end users call "memory leaks" but are often due to underpowered hardware
17:18	csharp	as most of our libraries have recently upgraded PCs, we hear fewer and fewer reports of issues
17:19	csharp	in fact, 2016 was the first summer since 2008 when I joined GPLS where we weren't bombarded with reports of workstation issues
17:20	csharp	faster processors appear to be the factor (we just had a ticket from an end user with 8GB of RAM complaining of freezes/crashes - found to have an older i3)
17:21	jeffdavis	oho, interesting
17:23	jeffdavis	We have had persistent reports of memory leak symptoms/unusable slowness/freeze/crash from multiple sites, most recently from a multibranch whose workstations have 4GB RAM. I dunno what kind of processors they have though.
17:28	csharp	we also found that the end user was running a lot of extra programs (Outlook, multiple browsers, MS Office, etc.)
17:41		dcook joined #evergreen
17:52	berick	we have some branches that regularly restart the client because it bogs down over time. doesn't effect everyone, though.
18:07	jeffdavis	yeah, we advise regular restarts as well; unfortunately some of our libs are still finding it unmanageable
18:07	jeffdavis	has the experience with the web client been better so far?
23:47		gsams_ joined #evergreen