Evergreen ILS Website

IRC log for #evergreen, 2017-06-05

| Channels | #evergreen index | Today | | Search | Google Search | Plain-Text | summary | Join Webchat

All times shown according to the server's local time.

Time Nick Message
02:50 genpaku_ joined #evergreen
04:30 pinesol_green News from qatests: Test Success <http://testing.evergreen-ils.org/~live>
07:07 rjackson_isl joined #evergreen
07:12 JBoyer joined #evergreen
07:31 agoben joined #evergreen
08:04 littlet joined #evergreen
08:06 rlefaive joined #evergreen
08:42 mmorgan joined #evergreen
08:48 bos20k joined #evergreen
08:53 kmlussier joined #evergreen
08:55 umarzuki joined #evergreen
08:56 umarzuki i got "Received no data from server" when testing srfsh request opensrf.math add 2,2
08:57 umarzuki shell output > https://pastebin.com/id9VWfrf
09:00 umarzuki srfsh.log https://pastebin.com/ppnbtLfR
09:23 Bmagic umarzuki: Trying to get the Evergreen server setup for the first time?
09:25 Bmagic umarzuki: You might be interested in the "one command to install it all" solution https://hub.docker.com/r/m​obiusoffice/evergreen-ils/
09:28 Dyrcona joined #evergreen
09:35 yboston joined #evergreen
09:43 jvwoolf joined #evergreen
09:45 maryj joined #evergreen
09:45 bshum umarzuki: Could be an ejabberd authentication issue.  What Linux distribution are you installing on?
09:46 bshum Also I know there can be special characters in ejabberd passwords can break things, so depending on what you decided to set up the ejabberd user passwords with, that could cause issue
09:49 kmlussier Bmagic: Have you considered adding that solution to https://wiki.evergreen-ils.org/doku.php​?id=server_installation:semi_automated?
09:49 Bmagic kmlussier: Yeah, I knew it belonged somewhere
09:52 umarzuki Bmagic: yes
09:52 umarzuki bshum: ubuntu xenial 16.04
09:53 umarzuki I'm installing this on vmware workstation
09:56 bshum With Xenial, another issue that's cropped up lately has been problems where the firewall blocks ejabberd from communicating
09:57 bshum I thought berick found a workaround for the ansible installer, not sure if that'll help
09:58 umarzuki bshum: firewalld service not enabled and iptables -L shows no rules
09:58 umarzuki I'm just using zaq12wsx for password, testing purposes only
10:00 bshum That doesn't seem too overly complex and should have worked then
10:00 bshum Hmm
10:01 Dyrcona umarzuki: Did you change the auth_passwrd_format to plain in ejabberd.yml?
10:01 umarzuki Dyrcona: yes
10:02 Dyrcona umarzuki: Looks like Ejabberd is only listening on tcp6 and not 4.
10:02 Dyrcona From your netstat output, that is.
10:05 berick bshum: the issue I had to fix was registering ejabberd users as root won't work in 16.04.  you have to sudo to 'ejabberd'.  or modify apparmor
10:06 Dyrcona berick: I've seen that, too, but I think umarzuki's issue is the ejabberd configuration.
10:09 umarzuki Dyrcona: ip: "::"
10:09 umarzuki that one in ejabberd.yml?
10:09 Dyrcona Yes, you probably want to change that one.
10:10 Dyrcona I'd also make sure that the entries in the hosts section are correct and all there.
10:12 Dyrcona http://evergreen-ils.org/documentat​ion/install/OpenSRF/README_2_5_0.ht​ml#_configure_the_ejabberd_server
10:12 Dyrcona In case you missed it.
10:12 mmorgan1 joined #evergreen
10:13 Dyrcona I think we should make the link to OpenSRF more obvious on the installation page.
10:17 umarzuki Dyrcona: ejabberd now on ipv4 but I'm still seing same error
10:17 umarzuki tcp        0      0 0.0.0.0:5222            0.0.0.0:*
10:18 umarzuki srfsh 2017-06-05 22:14:43 [WARN:33200:osrf_stack.c:144:1496672082332000]  * Jabber Error is for top level remote  id [router@private.localhost/opensrf.math], no one to send my message to!  Cutting request short...
10:18 Dyrcona umarzuki: You restarted osrf services after restarting ejabberd?
10:19 Dyrcona You might also need the --force-clean-process option to osrf_control.
10:21 Dyrcona umarzuki: You should also have logs in /var/log/ejabberd/. I'd look at those to see if they give any clues.
10:22 jeff umarzuki: what version of opensrf, and did you follow all of the recommended changes for the ejabberd config, including (as applicable) max stanza sizes and max sessions, etc?
10:23 umarzuki yes, 2.5.0
10:24 jeff are you able to post your ejabberd config file somewhere that we can examine it?
10:27 umarzuki jeff: ejabberd log https://pastebin.com/r4she46e
10:28 umarzuki ejabberd.yml https://pastebin.com/jUrEwDBL
10:31 jeff umarzuki: you should check your setting for max_user_sessions -- it appears to be at 10, and not the recommended 10000
10:34 bshum jeff++ # good eye on that
10:35 umarzuki jeff: thanks
10:36 umarzuki I mistook that part wher I only change all to 10000
10:46 mmorgan joined #evergreen
10:49 sandbergja joined #evergreen
10:52 collum joined #evergreen
10:52 bshum JBoyer: cesardv: I'll be curious to see what you guys find with those settings carrying over between retrieved users in the last comments on https://bugs.launchpad.net/evergreen/+bug/1642035 ; I think I saw a similar issue occurring with the patron stat cats where the last entry made on one user was also showing up when retrieving the next user (who didn't have an entry yet)
10:52 pinesol_green Launchpad bug 1642035 in Evergreen "Web Staff Client - Problems saving notification preferences" [Undecided,Confirmed]
10:59 cesardv bshum: yeah I noticed that while doing some testing... I'm still pretty new to Evergreen, but yea seems like (according to a comment on regctl.js)  there's caching of data unless explicitly click out of the patron tabs
11:02 kmlussier JBoyer / agoben: Can you remind me what the hack-a-way date is?
11:06 agoben The EOB day is November 6; the Hack-away is November 7-9.  (I've run into an issue with our location from last year, so am working on alternate options for the venue.  Will announce as soon as possible.)
11:07 berick thanks agoben , was wondering myself
11:08 agoben Yup, that was the week that everyone seemed to be available, so should be a good showing :)
11:11 kmlussier agoben: Thanks! Do you want me to add it to the dev calendar or should I wait until you have a venue lined up?
11:14 agoben kmlussier: give me this week to try to lock in the venue.  I don't anticipate changing the date, but would feel better if everyone could go ahead and get started with lodging arrangements asap to coincide.
11:14 kmlussier agoben: Sounds good
11:48 _adb joined #evergreen
12:10 jihpringle joined #evergreen
12:35 Dyrcona If a script that starts a cstore transaction times out, that transaction will never commit, will it?
12:35 berick Dyrcona: correct.
12:36 Dyrcona That's what I thought. Guess I'll kill the script and cancel the pg backend.
12:38 Dyrcona I guess a follow up question is does the transaction matter if the cstore calls a stored procedure directly.
12:40 Dyrcona It probably does....
12:42 * Dyrcona runs the stored procedure via psql command line.
12:48 littlet joined #evergreen
13:03 jvwoolf joined #evergreen
13:03 rlefaive_ joined #evergreen
13:17 berick Dyrcona: indeed, the transaction still matters.
13:18 maryj_ joined #evergreen
13:18 Dyrcona So, what happens if the transaction commits while the store procedure is till running? Because that is what appears to happen.
13:18 * Dyrcona is looking a purge_circulations.srfsh
13:19 berick probably get an error about having no open transaction to commit
13:20 berick after the json_query times out, the next cstore request goes to a different cstore drone
13:21 berick well, not 100% on that.. i'd have to look at the srfsh code
13:22 kmatthiesen joined #evergreen
13:23 berick in any event, psql++
13:24 Dyrcona OK. I did Ctrl-c on the running srfsh, so I don't see no transaction to commit in the logs.
13:24 Dyrcona When you haven't purged circulation for five years, it takes a while. :)
13:24 berick yeah
13:25 berick more so now that circ history doesn't live in the circ tables.  all that old stuff gets aged.
13:34 Dyrcona Well, we had some incidental aging when purging users.
13:40 maryj joined #evergreen
13:45 jeffdavis The open-ils.search service died on one of our servers again: "server: died with error Can't use an undefined value as a symbol reference at /usr/local/share/perl/5.18.2/OpenSRF/Server.pm line 307."
13:46 jeff another OOM kill?
13:50 jeffdavis Doesn't look like it.
14:00 jeffdavis I presume it's the $child var that's undefined there. So OpenSRF::Server->run() is either passing the request to an idle child that doesn't actually exist, or else failing to spawn a child for some reason.
14:04 jeffdavis As before, I don't see any helpful log messages prior to that: no "error creating data socketpair" or "child process died" for example.
14:06 jeffdavis Either of which would indicate an error while spawning a child.
14:13 berick jeffdavis: or pipe_to_child is undef, because the child died and was reaped (which removes all of $child->{*} fields) while the parent was in the middle of the write_child() function.
14:18 berick so some time after write_child() is called but before syswrite() is called, SIGCHLD occurs, $child is reaped and becomes a dead husk.
14:18 berick then syswrite($child->{pipe_to_child}, ..) fails becuase pipe_to_child is undef
14:19 berick if the child exited gracefully, there would be no log entry (unless loglevel is highest)
14:22 berick a few log lines in Server.pm would probably shed a lot of light on it
14:22 jeffdavis Yeah, I'll add some logging to warn if that scenario arises.
14:22 jeffdavis Is that likely to occur naturally, do you think? This is the second time in two weeks that we've seen this problem.
14:25 berick if my theory is correct, it's atypical, but not impossible for a child to finish what it's doing while another request is en route to the child.
14:26 Dyrcona joined #evergreen
14:27 berick jeffdavis: does the timing of these coincide with any updates?  osrf or eg?
14:29 jeffdavis First incident was just after we upgraded to OSRF 2.5 / EG 2.12.1 from 2.4/2.10.2
14:30 berick ah, ok
14:35 JBoyer berick, jeffdavis, I'll add that we first saw this same thing after that same upgrade (though from Eg 2.11.x rather than 2.10)
14:35 berick JBoyer: what osrf are you on?
14:35 JBoyer 2.5
14:36 berick as in, you upgraded osrf at same time?
14:36 JBoyer yes,
14:36 berick k
14:38 berick for logging, know if $child is undef, $child->{pipe_to_child} is undef, and logging $child->{pid} would be good starts.  with the pid it should be possible to backtrack what the child was doing prior to the problem
14:38 berick e.g. find the api call, whether it processed stuff already, etc.
14:51 JBoyer Well this should help: I graph the # of every opensrf service by the minute and there's no way any open-ils.search children should have been reaped at any time all day.
14:51 JBoyer UNLESS, I have completely misunderstood the min_spare_children value for Net::Server.
14:52 berick JBoyer: the drones (gracefully) die and re-spawn throughout the day.
14:52 berick each time they hit max-requests
14:52 JBoyer Oh, yes, yes.
14:53 JBoyer (I was momentarily woried that min_spare... had to be larger than min_children)
14:54 jeffdavis JBoyer: to confirm, you're seeing the same thing as me in your logs? open-ils.search dies with that error message and no other errors leading up to it?
14:55 JBoyer jeffdavis, yup.
15:03 JBoyer My assumption is that it's probably a race condition between the write_child call on lines 719 or 193 and actually calling syswrite on line 307 (the call on 185 seems fairly safe since there aren't any errors with the message from line 449). Am I correct in thinking that's your opinion too, berick ?
15:03 JBoyer (To be fair, I'm making that assumption based on berick's suggestions and questions but don't want to speak for him necessarily.)
15:09 berick now i'm questioning my logic.  write_child is only called to start a new conversation.  i can't think if any reason a child would gracefully die after becoming idle (or getting spawned), but before it received its first request.
15:10 babel_ joined #evergreen
15:10 berick unless it was killed by an external process
15:12 berick jeffdavis' comments about an idle child not existing or spawn failing make more sense to me now
15:19 berick notably spawn_child is not checking for a fork() failure (undef response).
15:19 berick though i would only expect that to be a problem if memory was exhausted, in which case, chaos everywhere
15:20 berick i think I already asked this, but jeffdavis JBoyer you're not seeing anything in /openils/var/log/open-ils.search_stderr.log ?
15:22 jeff there is the scenario where the listener becomes a drone when it fails to fork, but again -- that's generally due to memory exhaustion and is often coupled with the oom-killer killing... something.
15:23 jeff bug 1546683
15:23 pinesol_green Launchpad bug 1546683 in OpenSRF "fork() failure results in Perl service Listeners becoming Drones" [Undecided,New] https://launchpad.net/bugs/1546683
15:23 JBoyer I have (plenty...) of open-ils.storage.biblio.multic​lass.staged.search_fts.atomic failures for ISBN searches that end with a '*'
15:23 JBoyer But that's been going on for ages and open-ils.search dying is rather new.
15:24 JBoyer (lack of date/timestamps in those files makes it difficult to say)
15:25 berick JBoyer: oh, right
15:46 collum joined #evergreen
15:55 Jillianne joined #evergreen
16:30 pinesol_green News from qatests: Test Success <http://testing.evergreen-ils.org/~live>
17:14 mmorgan left #evergreen
19:33 jihpringle_ joined #evergreen
22:20 genpaku joined #evergreen

| Channels | #evergreen index | Today | | Search | Google Search | Plain-Text | summary | Join Webchat