Evergreen ILS Website

IRC log for #evergreen, 2017-01-31

| Channels | #evergreen index | Today | | Search | Google Search | Plain-Text | summary | Join Webchat

All times shown according to the server's local time.

Time Nick Message
00:28 dcook joined #evergreen
00:33 dcook__ joined #evergreen
05:00 pinesol_green News from qatests: Test Success <http://testing.evergreen-ils.org/~live>
06:46 genpaku joined #evergreen
07:14 rjackson_isl joined #evergreen
07:17 dteston joined #evergreen
07:32 agoben joined #evergreen
08:09 kmlussier joined #evergreen
08:23 collum joined #evergreen
08:24 collum For a completed SendEmail triggered event, if the status is 'invalid' is that an indication that there was no email address in the patron record at the time of the event?
08:25 collum Sorry.  Not Status.  Event State rather.
08:26 csharp collum: no - if there's no email address, the SendEmail reactor runs anyway, but fails when it tries to send the email - it doesn't go back and update the event
08:26 csharp that's something I've been looking at this week
08:27 csharp all A/T all the time over here
08:27 collum Thanks csharp
08:28 csharp collum: sure thing - also, you can see all the failed emails in your osrfsys logs as WARN messages
08:28 csharp big ugly perl errors
08:29 remingtron joined #evergreen
08:29 JBoyer collum, re: invalid, is there a Validator set for that event definition? They may have been things that were scheduled (active events) but no longer applied when it came time to run them.
08:35 collum JBoyer - a hold notification for an item that's still on the hold shelf.  Not checked-out.  I check the logs.
08:36 collum blah - I will check the logs.  It must be morning.
08:36 kmlussier @coffee collum
08:36 * pinesol_green brews and pours a cup of Mocha Java, and sends it sliding down the bar to collum
08:37 collum thanks kmlussier, that might help.
08:38 kmlussier I always find pinesol_green's coffee woefully inadequate.
08:38 csharp I would love for there to be things like "is there an email address/sms number?" in the validators (on top of the "is it still checked out/on the hold shelf?"-type validators)
08:39 csharp maybe a cascading chain of validation steps would work rather than a single validator per event def
08:40 tsbere csharp: I seem to recall adding in flags to have hold validators check for "is this option enabled on the hold" - Though I don't recall if the email one checks for an email address or not.
08:40 csharp I know that it doesn't - nor do the SMS ones check for sms_notify fields
08:41 mmorgan joined #evergreen
08:41 csharp it's creating a huge headache for me right now since our SMS "holds are ready" notices are still busted :-/
08:41 csharp and the presence/absence of data in action.hold_request.sms_notify is key to the problem
08:42 miker stacked collectors, validators, reactors, and cleanup-ers is high on my want-tuits-to-implement list
08:42 miker fwiw, which ain't much, lacking tuits
08:43 tsbere csharp: Well, taking a quick look at the code, the "check_sms_notify" flag for the hold validators looks for $hold->sms_notify to return a true value from a perl POV
08:43 csharp the current behavior is that it kills off the trigger drone if there are enough events that don't have sms_notify undef, but adding the check I added in bug 1660059 is creating massive cstore proliferation
08:43 pinesol_green Launchpad bug 1660059 in Evergreen "Action trigger mechanism not handling null/undef values for grouping field" [Undecided,New] https://launchpad.net/bugs/1660059
08:43 csharp hmm
08:44 csharp miker: good to know it's on your wish list :-)
08:45 csharp I went from not knowing it was an issue at all to really needing a fix so I can get back to my regular job :-)
08:46 csharp "what do you do?" "Well, I mainly babysit action_trigger notices..."
08:47 collum JBoyer: validator = HoldIsAvailable.  Guess I will check the logs to see if staff was doing something wacky, as well.
08:48 bos20k joined #evergreen
08:48 JBoyer csharp, It looks to me like Validator functions are among the easier things to add, if you're interested. ;)
08:49 JBoyer (I've had one in mind myself, to make sure we don't send people damaged item notices for items they've already paid for...)
08:49 csharp JBoyer: I'm looking into that right now
08:49 csharp (adding a validator that is)
08:50 JBoyer csharp++
08:51 tsbere csharp: Do your event defs have the "check_sms_notify" or similar parameters set to a true value?
08:51 mmorgan csharp: Could you use an a_t_filter to filter out the holds with no sms_notify?
08:51 csharp tsbere: looking now
08:52 * tsbere keeps getting pulled away from his desk by other things, and is about to be pulled away again
08:52 csharp mmorgan: I considered that, but the hold notifications are not passive, so no filters (that I'm aware of)
08:52 csharp tsbere++ # thanks for a push in the right direction
08:52 mmorgan Ah. gotcha.
08:53 csharp ok - looks like we're missing those checks for notify settings
08:54 tsbere csharp: check_email_notify looks for email_notify set to true, check_sms_notify looks for sms_notify set (but ignores carrier), and check_phone_notify looks for phone_notify set
08:54 csharp awesome - looks like that will solve our problem
08:54 csharp I'll experiment
08:54 tsbere check_email_notify could probably be made to look to see if the user has an email address set, possibly, but right now they all only pay attention to the holds themselves
08:55 Dyrcona joined #evergreen
08:59 csharp 034fb7bc
08:59 pinesol_green csharp: [evergreen|Thomas Berezansky] Enable notify checking for holds in A/T Validators - <http://git.evergreen-ils.org/?p=​Evergreen.git;a=commit;h=034fb7b>
09:09 csharp okay - I'm a bit deflated because check_sms_notify is set in our event params :-(
09:09 jeff i do enjoy how some of these titles end up in logs:
09:09 jeff 2017-01-30 16:01:01,534 INFO: SNOWDEN is ready for pickup at the Kingsley Branch Library
09:09 csharp I think it's not processing though until after the problem I'm seeing manifests itself
09:10 jeff 2017-01-30 12:52:02,715 INFO: Multiple items including THE QUEEN OF DISTRACTION are ready for pickup at the Woodmere Branch Library
09:10 jeff don't make royalty wait! go get her now!
09:10 csharp jeff++
09:12 jvwoolf joined #evergreen
09:13 Dyrcona :)
09:14 Dyrcona csharp: I was going to say a few minutes ago that if the change you made in e6f20b3 causes cstore proliferation, then the problem is probably not where you think it is.
09:15 Dyrcona Unless there is another commit on that branch that I didn't look at. :)
09:15 yboston joined #evergreen
09:16 Dyrcona To follow up on something kenstir and I were talking about on Sunday: You can send output from rsyslog to a program, but the syntax is weird (in my opinion).
09:23 maryj joined #evergreen
09:23 csharp Dyrcona: nope - that's the only commit
09:24 csharp other area of possibility is that our utility server is running on very new, blazing fast hardware with a 10Gb connection to the DB - something we didn't have in place pre-upgrade
09:25 csharp I'm wondering if OpenSRF is too fast for the DB
09:25 csharp and also wondering if opensrf.xml offers any levers for mitigating that if so
09:25 csharp the behavior is this:
09:26 csharp 1) A/T runner gathers all the events (putting them in 'collected' state)
09:27 csharp 2) in the same second that the collection finishes, 72 cstore drones are created (72 is our max_children) and we get "no children available" in the WARN log
09:27 csharp which, of course, halts everything else going on on the utility server
09:28 csharp I need to re-confirm if this is true, but the SMS hold notification post-patch was the only event_def causing this behavior
09:29 Dyrcona csharp: We had something similar happen over Christmas weekend, but I was not able to determine the exact cause.
09:31 csharp the faster hardware also caused an issue where our init script starts apache too quickly after starting opensrf, and that broke lots 'o stuff
09:32 csharp adding a brief sleep between the two solved the problem
09:32 Dyrcona I've seen that, too, by hand.
09:33 Dyrcona Takes a little bit for all of the drones to get going, apparently.
09:43 jeff if osrf_control --diagnostic (or new option like --status and --status-all) returned different exit status when drones were running vs not-running, you could probably incorporate that into your startup scripts.
09:44 jeff as it is, you could probably just grep the output of --diagnostic
09:44 jeff in systemd, probably in ExecStartPost for opensrf.
09:54 csharp good idea
09:56 jeff alternately, a --wait option for osrf_control, which might help in some situations where you don't have the ability to do something like ExecStartPost
10:09 Dyrcona Yeah. But, if the children are daemonizing themselves, what do you wait on?
10:14 jeff you wait on a sub that does --diagnostic until it sees everything has running drones.
10:16 jeff (does what --diagnostic does, essentially)
10:26 * Dyrcona has been having fun with the PID files in the new VMs...Seems it doesn't matter what you put in opensrf*.xml, opensrf-perl.pl (aka osrf_control) does what it wants. :)
10:26 Dyrcona I think I mentioned that a few months (weeks) ago.
10:27 * Dyrcona is just about to sed the path in opensrf-perl.pl once again.
10:28 berick osrf_control doesn't read opensrf.xml, it queries opensrf.settings for all of that
10:28 berick but it needs a pid dir to start opensrf.settings
10:28 berick so it has its own pid dir
10:29 berick don't remember if anything still reads the opensrf.xml pid dir
10:30 jeff Dyrcona: osrf_control accepts a --pid-dir argument -- possibly better than your suggested sed workaround, unless there's a bug and it doesn't honor that argument?
10:31 Dyrcona Yep. I'm aware of all that. sed is easy enough, since we still do things manually.
10:35 Dyrcona And, done. :)
10:38 Dyrcona I may just put this in a git branch in the future. We can't use the default location because of NFS, though I'm not sure we really need to share /openils across the brick heads and drones.
10:38 Dyrcona Actually, I'm pretty sure that we don't.
10:41 Dyrcona I've had enough fun so far that I think I'll leave that alone for now.
10:41 Dyrcona When we "upgrade" the O/S on the VMs, then I'll drop the NFS share between brick head and drones.
10:42 * Dyrcona waits for the all clear to finish setting up brick 5.
10:45 Christineb joined #evergreen
10:57 Dyrcona And done! :)
10:59 csharp berick: miker: any thoughts on my "hardware is too fast for cstore" theory?
11:03 berick uh, what?
11:04 csharp berick: sorry http://irc.evergreen-ils.org/​evergreen/2017-01-31#i_286395
11:05 mmorgan joined #evergreen
11:08 berick csharp: no, it's a problem with the commit
11:10 berick $e->update_start starts a new transaction in standlone mode (which is the case here) and it knows to commit, but it's leaving the cstore connection open
11:10 csharp ooooooh
11:10 berick you just exposed a bug is all
11:11 berick csharp: this should fix it..  https://gist.github.com/berick/c​70c08580a5f4ff9a8b1b3997ac6bee0
11:12 berick the ->commit will force a disconnect from cstore, unlike xact_commit, which just commits the transaction
11:12 csharp berick++ # thank you!
11:13 Dyrcona Ah, I missed that.
11:13 berick er, update_state, not update_start
11:13 berick you know what i mean
11:13 Dyrcona And, I should have known ... :)
11:15 Dyrcona Actually, the problem wasn't directly in csharp's commit... :)
11:15 Dyrcona Anyway, good catch, berick++
11:15 Dyrcona That should be filed on Lp.
11:16 sandbergja joined #evergreen
11:18 Dyrcona I've occasionally wondered (well, at least twice) if xact_commit has any legitimate uses?
11:19 berick it does
11:19 berick it's great for long-lived connection w/ multiple transactions.  we could live without it, but the code would be less efficient.
11:20 Dyrcona OK
11:21 remingtron joined #evergreen
11:24 csharp berick: I probably owe you more than one already, but I'm totally buying you a beer at the conference :-)
11:24 csharp @beer berick
11:24 pinesol_green csharp: Thank you csharp! But our princess is in another castle!
11:24 csharp @bartender berick
11:24 * pinesol_green fills a pint glass with Samuel Adams Black Lager, and sends it sliding down the bar to berick (http://beeradvocate.com/beer/profile/35/21300)
11:25 berick csharp: but did you test it yet? :)
11:25 csharp doing so now
11:25 csharp :-)
11:25 Dyrcona I don't think you have to test that. It will fix the problem. ;)
11:26 Dyrcona I was ready to commit it after just eyeballing it. :)
11:27 * Dyrcona wonders what's for lunch.
11:30 csharp no errors - now going to wait a little while for more events to accumulate and run a bigger batch at once
11:31 csharp spoiler alert for Terran's and my presentation on A/T: it's basically impossible to mock up - you don't know it really works until you're live
11:33 mmorgan csharp: That's true of, well, just about everything ;-)
11:33 khuckins__ joined #evergreen
11:34 Dyrcona :)
11:39 miker berick: I don't think we want to disconnect after every single update_state ... we're intentionally trying to reuse the connection.  events get updated a lot, and quickly. csharp's change created a new code path that doesn't do the right thing with some events.
11:42 miker csharp: I don't agree that a grouping field should be nullable ... but, even accepting that, you should be "next"-ing after you set the event to invalid.  See the "unless" block immediately above your change
11:43 miker also, "get off my lawn!" ... (I know I sound grumpy, sorry)
11:44 Dyrcona heh
11:48 berick miker: is it normal for events to be in standalone mode for the main batch processing calls?  i thought that was atypical.  this is a situation where we have more events, each with their own cstore connections, than a typical server can support at one time.  potentially many times more.
11:48 berick either they need a shared editor or the editors have to disconnect
11:50 miker berick: right, either would be fine. I'm objecting to the blanket disconnecting after every single update, forcing churn (and a good bit of overhead) on every /other/ update_state
11:50 berick but again that's only in standalone mode
11:51 berick where every event has its own cstore.
12:02 dteston I wrote a script that accepts username and plaintext password, crypts the password the same way EG does, then checks them against the DB if anyone is interested.
12:03 miker I'm not following why that's important. it's simply a side effect of event creation. if the caller passes an editor, the caller is in charge of transaction management. otherwise, it makes its own editor and should last for the lifetime of the event. all individual events are standalone, afaict
12:08 miker If we're going to codify that the group field can be null (btw, what does that mean? are null-grouped events collected together? in csharp's patch they are kept around after being invalidated), but that they should be auto-marked invalid and discarded, then IMO we should invent a different api to invalidate them, rather than pushing the problem around. ... I'll offer a patch in a minute
12:09 brahmina joined #evergreen
12:21 dteston joined #evergreen
12:21 jihpringle joined #evergreen
12:23 berick it looks like the only time we don't use standalone mode is when processing grouped events.  (EventGroup::new_impl).  so faulty assumption on my part.  and presumably for non-grouped events, we are only doing stuff to one event at a time
12:24 berick which avoids the cstore exhaustion problem
12:25 berick so, setting aside whether patch makes sense, the solution to the problem would be a shared editor at the top of Trigger:grouped_events() that is inserted into every event (or at least every event where update_state has to be called) -- and force it to standalone=false
12:26 pastebot "miker" at 64.57.241.14 pasted "external (batch) invalidation api" (73 lines) at http://paste.evergreen-ils.org/43
12:27 berick heh, that
12:27 berick miker++
12:27 miker that'll work as either an instance method and provide the same immediate effect as your patch, or as a batch method to quickly close all them... :)
12:31 miker happy to branchify that if "null group field value means event is invalid" is useful in the wild and we can't see drawbacks. (I can't think of breakage or objections, other than my historical "group field is non-null" memory)
12:32 miker csharp: is there an LP bug to attach this to?
12:33 miker nm, found it
14:55 mmorgan1 joined #evergreen
15:38 csharp miker: oops - I left out the "next;" in my git branch - it was added to my local file :-/
15:41 csharp miker: I'll test your branch in a bit and let you know how it goes
15:45 agoben joined #evergreen
15:55 khuckins_ joined #evergreen
15:59 csharp miker: so putting aside the functionality of the patch(es), you're saying that it's "wrong" to group on a field that is nullable (in this case, sms_notify on action.hold_request)?  I see that all other notices group on usr, but I'm assuming we group on sms_notify since that's a per-hold setting...
16:00 csharp that is to say, if there's a better way to do this in the first place, I'm game :-)
16:01 jeff can you articulate why you were grouping by sms_notify and not usr? how do the stock phone / pbx A/T event defs group?
16:02 berick i'm fairly certain it is because sms_notify is per-hold and not per-user
16:02 berick (a topic I know jeff loves)
16:03 jeff heh
16:04 jeff i guess i half-answered my question partway through my line, which is why i asked about the A/T pbx defs.
16:04 jeff information that I have available to me, so i'll go look.
16:05 sykeslewis joined #evergreen
16:06 csharp miker++ # awesome - it works greate
16:06 csharp great even
16:06 csharp greate is even better because, hey, extra "e"!
16:09 Dyrcona :)
16:09 jeff ah, the only stock AstCall A/T event def is for overdues, therefore no hold-level phone_notify to even consider.
16:09 jeffdavis csharp: that must be one of those extra e's miker freed up when he switched back from using "eeevil" as a nick
16:10 jeff csharp: the way you're implementing, is there anything that limits the number of SMS messages you send a given user+phone+pickup_lib in a given day?
16:11 csharp jeffdavis: yes!
16:11 stephengwills joined #evergreen
16:11 csharp jeff: nope - the cron for that granularity runs every half-hour - theoretically a person could get dinged multiple times per day
16:12 csharp but given the limits of SMS, that seems ok to us
16:13 stephengwills what would cause money.materialized_billabl​e_xact_summary.xact_finish to take one a date before a bill was paid?  I'm not seeing bill summaries in the staff client.
16:15 jeff csharp: which limits of sms?
16:16 jeff (not that there aren't many, just that i don't follow which ones relate to this subject)
16:16 csharp jeff: cutting off text at a character limit was what I was thinking about
16:16 jeff ah.
16:16 csharp "you have 15 items on hold: Harry Potter and the..."
16:17 csharp stephengwills: we used to have that happen all the time, but not over the past few releases
16:17 csharp also...
16:17 * csharp waves at stephengwills
16:18 jeff yeah, we have records that come in from the state ILL system that don't have subfields in the 245, so their titles are long, usually truncated like:
16:18 jeff 2017-01-31 15:16:01,980 INFO: LETTERS FROM SINNERS & STRANGERS SOUND RECORDING EILEN JEWEL... is ready for pickup at the Woodmere Branch Library
16:18 * stephengwills waves back.
16:19 stephengwills it's driving our libraries crazy.  I wronte a script to null it out when there is a date in there and the pmt amount is zero but I have to remember to run it periodically.  would love to get it fixed.
16:19 jeff and users with multiple items coming available in one day get a max of two messages (per pickup_lib+usr+number), with the second just looking like:
16:19 jeff 2017-01-31 15:00:02,194 INFO: Multiple items including LIZ AT MARIGOLD LAKE are ready for pickup at the Woodmere Branch Library
16:19 csharp stephengwills: what release are you on?
16:19 stephengwills 2.8.3
16:20 csharp hmm - those should've been fixed by then if I recall correctly
16:20 csharp jeff: so how to you limit the number of messages?
16:21 jeff currently there are 13 patrons that we're suppressing further notifications of for today, because they have >2 intervals where something came available, but we've already sent them their two messages for today.
16:21 jeff csharp: we're cheating.
16:21 stephengwills maybe I should grep for procs on that table?  make sure they are up to date?
16:22 jeff csharp: every 5 minutes we pull the state of the holdshelf from the db and dump it into an external db. we do sms and phone notifications from that.
16:22 csharp ah - interesting approach
16:22 jeff csharp: so it's a matter of saying "hey, have i sent notifications to this patron already today and they haven't come in to pick up their things? okay, don't send them more -- they're already on their way in, and we've already said the "Multiple items, including ITEM TWO..."
16:23 csharp stephengwills: I would use the timestamp on one of the wrongly-closed bills and dig into the opensrf logs (may need to ratchet up the loglevel)
16:23 jeff and i think i was wrong, and we're limiting to phone+pickup_lib, not phone+usr+pickup_lib.
16:23 csharp cool
16:23 stephengwills hmm... ok... will start there.
16:23 stephengwills tx
16:24 stephengwills btw... planning a trip to Atlanta in April.  up for a beer if y'all around?
16:25 csharp stephengwills: perhaps a soda & lime, but sure :-)
16:25 csharp stephengwills: also, my pres on EG logs from 2015 might be a good reference: https://docs.google.com/document/d/1BJ7kSr​5LfPkxXRhcrYjNyPlbT9PlNFHg6zL-tyetnKA/edit
16:25 stephengwills sounds good.
16:26 csharp esp. the part about threadtraces
16:26 stephengwills ok thanks will check it out
16:27 * csharp wanders off
16:32 jeffdavis stephengwills: to be clear, is the concern that there is a value in xact_finish when the balance is non-zero, or that the timestamp on the last billing is later than the xact_finish date?
16:33 stephengwills the former is the issue.  libraries cannot see that there are any bills at all until they click into the bills detail view.
16:33 stephengwills balance_owed reads as $0.00
16:33 jeffdavis ok
16:35 jeffdavis I see we have 29 materialized billable xact entries so far this year where xact_finish is non-null and balance_owed is non-zero (according to the db table; haven't looked in the client)
16:35 jeffdavis we're on 2.10
16:36 jeffdavis in most of those cases the last payment was a manual account adjustment, plus a few forgive payments and cash payments
16:36 jeffdavis I'm not sure of the cause, but I wonder if the total balance_owed on all xacts for any of those patrons adds up to $0
16:38 jeffdavis ...I think I need more caffeine before I think too hard about this
16:38 stephengwills in my case it appears as if the xact_finish date == the date the item was checked in.  the bills are all unpaid.  not even partial payments on them
16:51 mmorgan joined #evergreen
16:56 khuckins__ joined #evergreen
17:00 pinesol_green News from qatests: Test Success <http://testing.evergreen-ils.org/~live>
17:02 mmorgan left #evergreen
17:15 jeffdavis Are folks around these parts experiencing memory leak issues with the staff client?
17:17 jeffdavis there are some reports of this kind of thing in bug 1110817 (and in bug 1086458 which was "fixed" circa 2.5) but I'm not sure of prevalence of that kind of issue
17:17 pinesol_green Launchpad bug 1110817 in Evergreen "staff client patron search results continuously eats memory" [Medium,Incomplete] https://launchpad.net/bugs/1110817
17:17 pinesol_green Launchpad bug 1086458 in Evergreen 2.4 "Staff client memory leaks in 2.3 and later" [High,Fix released] https://launchpad.net/bugs/1086458
17:18 csharp jeffdavis: we hear occasional reports of high RAM usage that the end users call "memory leaks" but are often due to underpowered hardware
17:18 csharp as most of our libraries have recently upgraded PCs, we hear fewer and fewer reports of issues
17:19 csharp in fact, 2016 was the first summer since 2008 when I joined GPLS where we weren't bombarded with reports of workstation issues
17:20 csharp faster processors appear to be the factor (we just had a ticket from an end user with 8GB of RAM complaining of freezes/crashes - found to have an older i3)
17:21 jeffdavis oho, interesting
17:23 jeffdavis We have had persistent reports of memory leak symptoms/unusable slowness/freeze/crash from multiple sites, most recently from a multibranch whose workstations have 4GB RAM. I dunno what kind of processors they have though.
17:28 csharp we also found that the end user was running a lot of extra programs (Outlook, multiple browsers, MS Office, etc.)
17:41 dcook joined #evergreen
17:52 berick we have some branches that regularly restart the client because it bogs down over time.  doesn't effect everyone, though.
18:07 jeffdavis yeah, we advise regular restarts as well; unfortunately some of our libs are still finding it unmanageable
18:07 jeffdavis has the experience with the web client been better so far?
23:47 gsams_ joined #evergreen

| Channels | #evergreen index | Today | | Search | Google Search | Plain-Text | summary | Join Webchat