Evergreen ILS Website

IRC log for #evergreen, 2016-12-08

| Channels | #evergreen index | Today | | Search | Google Search | Plain-Text | summary | Join Webchat

All times shown according to the server's local time.

Time Nick Message
01:09 StomproJ joined #evergreen
02:19 Stompro joined #evergreen
02:29 StomproJ joined #evergreen
03:00 Stompro joined #evergreen
03:21 StomproJ joined #evergreen
04:11 Stompro joined #evergreen
05:00 StomproJ joined #evergreen
05:00 pinesol_green News from qatests: Test Success <http://testing.evergreen-ils.org/~live>
06:48 Stompro joined #evergreen
07:13 TARA joined #evergreen
07:13 rjackson_isl joined #evergreen
07:30 agoben joined #evergreen
07:43 Dyrcona joined #evergreen
07:57 Dyrcona Updating a wheezy vm this morning gives a long warning that starts with: 9.1.24 marks the end of life of the PostgreSQL 9.1 branch.
07:58 Dyrcona I knew that was coming, but seeing the warning is nice.
07:58 Dyrcona Even tells you to try the apt.postgresql.org repository to upgrade.
08:15 collum joined #evergreen
08:16 jeff sounds like the claim on the postgresql download page is a little inaccurate, then. :-)
08:16 jeff ``the stable versions of Debians "snapshot" a specific version of PostgreSQL that is then supported throughout the lifetime of that Debian version''
08:16 jeff though I haven't read the warning you refer to.
08:16 jeff so they could both be accurate.
08:17 Dyrcona jeff: They are both accurate. I'll quote in just a second. :)
08:18 Dyrcona Guess I won't quote. I thought the warning was emailed to root, but apparently not.
08:19 Dyrcona Anyway, it says that they'll maintain a 9.1 with security updates on a best effort basis.
08:20 Dyrcona Hah. It was sent to root and alias sent it to opensrf. ;)
08:21 Dyrcona In Debian, an LTS version of 9.1 is in planning that will cover the lifetime of wheezy-lts. Updates will made on a best-effort basis. Users can take advantage of this, but should still consider upgrading to newer PostgreSQL versions over the next months.
08:26 * Dyrcona reminisces at the sound of the hiss in an ogg he made from a mp4 that was digitized from VHS. Sounds like listening to a cassette tape.
08:34 Dyrcona "Mnesia(ejabberd@rb): ** WARNING ** Mnesia is overloaded: {dump_log," can't be good.
08:36 Dyrcona It has happened a few times on this server on the 6th and 7th.
08:37 mmorgan joined #evergreen
08:40 Dyrcona That herring is most definitely red. It appears to be "mostly harmless," but I wonder if points to something else going on.
08:45 Dyrcona Nothing going on in the apache logs or osrfsys.log around that time, either....
08:52 Dyrcona Cute: [Wed Dec 07 10:21:14 2016] [error] [client 103.19.132.10] Invalid method in request \x16\x03\x01
08:55 jeff heh.
08:56 Dyrcona Seventeen of those from different IP addresses since Sunday, but they are not the problem.
08:56 Dyrcona Our Syrup server went nuts and had really high load yesterday, and I'm not finding anything in the logs that may have triggered it.
08:57 Dyrcona By the time I finally got a command prompt, 20 minutes after getting the login banner 5 minutes after typing the ssh command, the load was over 87 and dropping.
08:59 jeff If you didn't already have it that way, that's usually a client trying to initiate an HTTPS connection to Apache on a listening port that Apache is treating as HTTP.
09:00 jeff (the apache error log output above)
09:01 Dyrcona OK. Those connection attempts are from all over the world and not anyone who should be using this machine.
09:01 Dyrcona Mabye some firewall rules are in order....
09:02 rlefaive joined #evergreen
09:04 Dyrcona open-ils.cstore 2016-12-07 13:33:15 [INFO:14768:osrf_applicatio​n.c:1059:14810317791371523] CALL: open-ils.cstore open-ils.cstore.direct.confi​g.copy_status.search.atomic {"id":{"!=":null}}
09:05 Dyrcona That's the last message before the server started to come back. Well there are two more messages related to that request.
09:05 Dyrcona And that request was repeated a couple of times before.
09:06 Dyrcona It took 2 minutes to process that request. Doesn't seem right to me.
09:08 jeff is that a request for "all copies"?
09:08 Dyrcona Looks the gateway request came through 1 minute earlier.
09:08 Dyrcona Nope, for all copy statuses.
09:09 jeff ah.
09:09 Dyrcona osrf_json_gw 2016-12-07 13:32:10 [INFO:13715:./osrf_json_gate​way.c:435:14810317791371523] Completed processing service=open-ils.search, method=open-ils.search.conf​ig.copy_status.retrieve.all
09:09 jeff right.
09:09 Dyrcona So, load was already high at 13:30.
09:09 * Dyrcona didn't hear about it until about 15:30.
09:10 csharp berick: https://bugs.launchpad.net/ever​green/+bug/1596595/comments/13
09:10 pinesol_green Launchpad bug 1596595 in Evergreen "Hold targeter features and refactoring" [Wishlist,New]
09:16 Dyrcona Only 3,739 requests for yesterday.
09:18 Dyrcona Hi, Baidu! I see you making the request that triggered that copy status retrieve.
09:19 Dyrcona But, only two requests from Baidu and 2 from Bing in the hours before I got the call...
09:19 Dyrcona Something else went nuts.
09:21 Dyrcona Oh, nice! The server started running out of memory on the 6th, just before noon.
09:21 Dyrcona No idea why.
09:25 Dyrcona Dec  6 09:09:00 rb kernel: [37598332.141523] Pid: 13708, comm: osrf_control Not tainted 2.6.32-5-686 #1
09:25 yboston joined #evergreen
09:26 Dyrcona Dec  6 09:09:00 rb kernel: [37598332.141546]  [<c1089cb4>] ? oom_kill_process+0x60/0x201
09:26 Dyrcona It ran out of memory, and someone tried to restart/stop services?
09:28 JBoyer Dyrcona, sounds like it might be time to look into nagios/icinga/zabbix/etc.
09:29 Dyrcona JBoyer: We use nagios, but not sure it monitors this server.
09:29 * Dyrcona is not in charge of that.
09:29 JBoyer Ah, I see.
09:43 tarac joined #evergreen
09:48 maryj joined #evergreen
09:55 berick csharp++ # yay
09:58 JBoyer I do like those numbers.
09:58 JBoyer csharp++
09:58 JBoyer berick++
10:03 mmorgan1 joined #evergreen
10:12 krvmga joined #evergreen
10:18 Christineb joined #evergreen
10:52 Dyrcona Dec  6 09:08:41 rb kernel: [37598332.141503] osrf_control invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
10:53 Dyrcona Explain this: How could osrf_control invoke oom-killer if no one was logged in to run it and there are no cron jobs to run it automatically?
10:57 Dyrcona I don't think it stays resident in RAM after it finishes does it?
11:04 berick maybe that wording should be changed to "osrf_control caused the OS to invoke oom-killer"
11:05 Dyrcona berick: That is what the log line means.
11:06 Dyrcona What I'm asking is, how could osrf_control cause this when no one was loged in and I can find no cron jobs or other scheduled task to run it?
11:07 berick IIUC, if osrf_control gobbles up too much RAM, the OS automatically invokes oom-killer on it (or any such offending process).  no humans or jobs required.
11:08 Dyrcona How does osrf_control gobble too much RAM if it isn't running?
11:08 Dyrcona Is it a drone or something else that got killed and oom-killer is reporting the name of the process that started it?
11:08 jeff Dyrcona is noting that osrf_control is not usually a process that you see running in the output of "ps" unless you are actively in the middle of starting/stopping/getting diagnostics/etc.
11:08 berick aha
11:09 berick Dyrcona: i imagine that's what's happening.. it's a drone
11:09 berick or a listener
11:09 * berick is slow this morning
11:09 Dyrcona We have nagios, but the server isn't monitoring this machine, but I wonder if maybe there's a leftover config here....
11:10 jeff Two likely possibilities: something like a monitoring script or daemon is using osrf_control to query running opensrf processes or 2) the OpenSRF::Utils::safe_fork call within do_daemon in osrf_control leaves enough of a trace of the original process name / command line that it can appear in oom-killer messages under certain circumstances.
11:10 Dyrcona berick: That's OK. :)
11:10 Dyrcona jeff: Yeah, I'm about to check for any residual nagios monitoring.
11:10 jeff Either way, the PID in your earlier pasted log output might be a better clue, if you have logs enough to identify that PID with a reasonable degree of certainty during that timeframe.
11:12 brahmina joined #evergreen
11:12 jeff I did a quick check of /proc/*/comm on an app server here (running Jessie and evergren 2.10), and none currently matched osrf_control. You might check on your system, as kernel or other variances might mean that you actually DO have long-running processes with /proc/*/comm contents containing osrf_control
11:13 Dyrcona jeff: thanks for the suggestions, I'll check those, also.
11:13 Dyrcona No signs of nagios on this machine.... I'll look for munin, too, but no obvious daemons.
11:14 Dyrcona munin is not installed either.
11:16 Dyrcona no /proc/*/comm, I probably want /proc/*/cmdline
11:18 jeff you'd lack "comm" on an older kernel.
11:19 jeff hrm. "since Linux 2.6.33" according to proc(5).
11:19 Dyrcona 2.6.32. :)
11:19 jeff hah! that would explain it.
11:20 Dyrcona Several environs match though, so I'm gonig to assume berick is correct.
11:20 Dyrcona Only /proc/self/cmdline matched and I think /proc/self would be the grep command.
11:21 Dyrcona berick++ jeff++
11:25 jeff It would be interesting to see if your logs have reference to any process with a PID of 13708 around 9:09:00
11:25 rlefaive joined #evergreen
11:25 Dyrcona jeff: Yeah, I'm still looking into it.
11:27 Dyrcona Nope. Just it being killed. I'll check osrf logs.
11:27 jeff but the invocation happening exactly at 9:09:00 itself is a bit interesting. i'd still wonder if something (daemon or unnoticed cron job, whatever) is calling osrf_control routinely. almost enough to do something like use inotify to watch for something to open the script file, or modify the file to log execution, etc. :-)
11:29 Dyrcona Could be. I'm still waiting on the osrf log grep. They haven't been cleaned up in a while.
11:29 jeff (of course, the OOM invocation at 9:09:00 might just as easily be due to some other regular event, and not an indication that osrf_control itself was called at that exact time.)
11:35 mmorgan joined #evergreen
11:37 Dyrcona I haven't been able to find any references to that PID except when it was killed.
11:37 Stompro Hello, I'm trying to find out if having a status of "Damaged and Paid" has ever been discussed?  Should there be a different status when a damaged item has been fully paid for?  I'm trying to think of reasons why it isn't the same as lost and paid.  Any ideas?
11:39 * berick never quite understoond the 'lost & paid' status
11:40 berick the item is lost, the transaction is paid
11:40 Stompro berick, It seems like it is just a shortcut for that.
11:40 berick yeah, guess so
11:40 Dyrcona Staff want to know at a glance, and the copy status is an easy place to look.
11:41 mmorgan berick: Stompro: The big distinction I see is that Lost and paid means the patron is no longer responsible in any way for the item.
11:41 Dyrcona The line between transactions and copies is often blurred, particularly around status.
11:42 * mmorgan hates that blurriness.
11:43 Stompro Since we support lost and paid as a shortcut, should we also support "Damaged and paid" as a shortcut?
11:43 Dyrcona Well, it's a blurry model that you can't blame on the software.
11:44 Dyrcona Stompro: As long as it's optional like lost and paid, I don't have a problem with it.
11:45 Dyrcona But that's just me... Fly up the Lanchpad and see who salutes. :)
11:46 * Dyrcona can't type.... After all these years, you'd think I could.
12:03 Stompro If the goal is to give staff a easy shortcut... how about adding a status of "Damaged and Charged" also.  Item marked damaged but no one charged = "Damaged", Item marked damaged and charged to a customer = "Damaged and Charged", Item marked damaged, charged to a patron and paid for = "Damaged and Paid". ?
12:04 Dyrcona Then they'll ask for "and Charged" versions of other statuses, and that's just crazy talk. :)
12:06 Dyrcona Also, is it a distinction worth noting if a library does or not does not charge for damaged items? (There's a setting for that, right?)
12:06 Stompro Dyrcona, which other status would it make sense for?  Lost should always be charged... wait, I guess that is optional, isn't it.
12:07 Dyrcona Yeah, it's all optional, but is that a distinction worth making?
12:07 Dyrcona That's a serious question.
12:08 * mmorgan thinks it's helpful for staff to be able to look at the status of an item, and from that, know whether or not there's an open transaction on it.
12:08 Dyrcona I also don't recall if charging for lost things is really optional or not. I know you can set the price to zero or the percent to charge to 0.
12:09 Dyrcona mmorgan: Yeah, that's what staff really want, and using "and Paid" should make it clear.
12:10 * jeff pulls on various threads in his head
12:11 mmorgan Long overdue is also a status that is associated with a patron charge.
12:13 bmills joined #evergreen
12:14 * Dyrcona wonders if "Long overdue and Paid" will fit on the screen. :)
12:14 JBoyer I did not think that "cut film to vent" was a job that can be performed so poorly as to make any real difference. The violent spray of chili on the roof of the microwave has recently forced me to reevaluate my opinion on the matter.
12:14 Dyrcona Oops.
12:17 mmorgan If the item's been paid for, does there need to be a distinction between lost/longoverdue/damaged and paid?
12:17 * Dyrcona was careful not to overheat the mozzarella sticks so they remained sticks.
12:17 Stompro mmorgan, use a a generic "Charged and Paid" instead?
12:17 Dyrcona Well, I never completely agreed with the distinction between long overdue and lost. :)
12:18 Dyrcona How 'but just "Paid?"
12:18 Dyrcona bleh... 'bout....
12:18 jeff Isn't that essentially action.circulation.xact_finish?
12:19 jeff Would leaving statuses as they are, but surfacing more clues based on things like xact_finish be helpful?
12:19 jeff I know some of the things that are annoying to me, and I know some of the workflows here that are affected by this, but I'd be interested in hearing more.
12:21 Dyrcona "Charged and Paid" is probably better than just "Paid." 'Cause then the question is what was paid, overdues, or was it lost or damaged and paid.....
12:22 Dyrcona I'm not sure surfacing more clues will really help. In my experience, most staff are looking for one thing to tell them what happened, not collating information from 3 or 4 things.
12:24 jeff I think I chose my words poorly.
12:25 Dyrcona Visual cues like color might work, say use a background or highlight color if xact_finish is set and the status was lost or whatevs?
12:26 Dyrcona But anyway, I'm supposed to be eating lunch, and it's hard not to eat at your desk when your desk is the dining room table. :)
12:28 jihpringle joined #evergreen
12:40 jeffdavis So, I asked about this a while back but most folks were en route to the hackaway...
12:41 jeffdavis For ebook API integration (bug 1541559), I'd like to have a test module so that we can test the UI and services without requiring connections to actual third-partry APIs like Overdrive's.
12:41 pinesol_green Launchpad bug 1541559 in Evergreen "OneClickdigital API integration" [Wishlist,New] https://launchpad.net/bugs/1541559 - Assigned to Jeff Davis (jdavis-sitka)
12:44 jeffdavis The design is kind of like added content, where there's a main module and different handler submodules for each vendor.  The test module would be just another handler module.
12:44 collum_ joined #evergreen
12:45 Dyrcona jeffdavis: Sound like what SIPServer does.
12:45 kmlussier joined #evergreen
12:45 Dyrcona There's a dummy "ILS" implementation in the SIPServer code to use for testing.
12:46 jeffdavis That's encouraging. :)
12:47 jeffdavis I'm reluctant to hard-code test data into the module itself, though, but I'm not sure how else to go about it (since I don't want to create an entire HTTP service just for testing purposes).
12:47 Dyrcona jeffdavis: The test data for SIPServer is hard coded, so I don't think that's a problem, really.
12:48 Dyrcona You'd want to test against known values anyway, right?
12:48 Dyrcona I assume you're testing: I make this API call, I get this result.
12:48 jeffdavis Yeah.
12:49 Dyrcona Well, that sounds fine to me.
12:49 Dyrcona Do the vendors have ways to test their APIs?
12:50 Dyrcona But I imagine you might need an account in order to test properly.
12:51 jeffdavis That's my assumption, but I'm not sure yet - trying to find a contact to broach the question at Overdrive, and I've had problems getting responses from Oneclick lately.
12:51 Dyrcona Well, any tests are better than none.
12:51 Dyrcona tests++
12:51 Dyrcona jeffdavis++
12:52 Dyrcona Though, I suppose tests that just return PASS aren't very useful. :)
12:53 jeffdavis Well, I'd want the tests to include some requests that are expected to fail in various ways.
12:53 Dyrcona Yeah, we have a few in the Perl tests that pass by failing.
12:54 jeffdavis Having a test module that works with stock test data would also let folks try the UI.
12:59 Dyrcona jeffdavis: I honestly don't think anyone would object.
12:59 sandbergja joined #evergreen
13:01 jeffdavis yeah, I'll just proceed with the hard-coding approach - thanks!
13:09 kmlussier I know what the difference is between a soft and hard holds boundary. What's the difference between a soft and hard stalling interval?
13:10 tsbere kmlussier: I think the difference is "one works, the other doesn't" right now. I could be wrong.
13:11 kmlussier tsbere: Well, that would explain why I have a somewhat clear understanding of what soft stalling does, but not hard stalling.
13:11 tsbere kmlussier: Specifically, I don't think anything currently checks the hard one. At all.
13:12 kmlussier If that's true, then maybe the hard stalling setting is something that can be removed so that future confused users don't ask this same question.
13:15 bshum @dunno get 24
13:15 pinesol_green bshum: Dunno #24: "Sorry, that command is only available to Evergreen Premium™ Subscribers. Please upgrade your subscription ASAP!" (added by csharp at 10:32 AM, December 06, 2013)
13:18 collum joined #evergreen
13:44 sandbergja joined #evergreen
14:08 Dyrcona Syrup sure does a lot of this: GET /osrf-gateway-v1?locale=en-US&method​=open-ils.search.config.copy_status.​retrieve.all&service=open-ils.search
14:37 DPearl joined #evergreen
14:48 jihpringle joined #evergreen
14:51 Bmagic berick: the crons ran last night with the correct time! The change needed to take place here: /etc/init/cron.conf - I added a line TZ="America/Chicago"
14:52 Bmagic Dyrcona: I will probably have to include this in my Google presentation, lol
14:52 Dyrcona Yeah! I'll bet your servers are either set to UTC or in a different timezone from you.
14:53 Bmagic postgres.conf has timezone = 'localtime'  which seems to be right. when I query now() I get the right thing
14:53 Dyrcona Bmagic: You definitely need to do this presentation. I will be there, for sure.
14:53 Bmagic all of the users on the shell have the right timezone configured... pulling from /etc/timezone
14:54 Bmagic but, for some weird reason, cron doesn't use that.. it lives in it's own world.... even though the crons fired on the correct timezone AKA 0 0 * * * - once the script begins to execute, it's in UTC... lol. Therefore, DB inserts will be off 6 hours
14:56 Bmagic action triggers insert rows into action_trigger.event with a run_time 6 hours in the future and therefore are not reacted to right away. It was a brain buster for sure
14:56 Dyrcona ouch!
14:57 Bmagic billing time stamps for lost items, etc, were affected
14:58 berick Bmagic: fascinating.  glad it's working now.
15:02 Bmagic me too!
15:02 Dyrcona :)
15:02 Dyrcona Bmagic++
15:17 bmills joined #evergreen
15:20 jeff Bmagic: which cron are you using?
15:20 jeff Bmagic: also, what linux distro?
15:20 Bmagic xenail
15:20 Bmagic xenial*
15:21 jeff Bmagic: and where were the cron jobs defined? user-level jobs under /var/spool/cron?
15:22 Bmagic cron for opensrf
15:23 jeff and were you setting TZ in the crontab?
15:23 Bmagic not in crontab, but presumably . /etc/profile would load...
15:24 Bmagic I confirmed that the shell environment shows the right timezone for all of the users including opensrf
15:24 Bmagic TZ was setup in /etc/init/cron.conf which I assume affects all cron
15:30 Dyrcona Bmagic: When I was installing xenial via vmbuilder, it would always set the system TZ to UTC.
15:30 jeff ah, and the issue was that while you can specify the TZ for your scripts, the actual TZ used by cron would be different.
15:31 Dyrcona I wonder if there's not a similar bug with whatever Google is using.
15:31 Bmagic my ansible script addresses this upon docker build. It sets the timezone to our timezone all over the place, but apparently there is one more spot
15:31 jeff thus, the 0 6 * * * crontab entry would run at what you think of as the "wrong" time, because you're expecting it to run at 6 AM "your" time and you're not specifying your cron job schedules in UTC?
15:33 jeff Bmagic: was there an existing env TZ declaration in /etc/init/cron.conf?
15:33 Bmagic jeff: the funny thing is that the scripts kick off at the right time! The results of query inserts after they are kicked off, insert time UTC But without the timezone specified, therefore postgres has them at CST
15:33 Bmagic jeff: there was not an existing config in cron.conf - I added the line in the middle before the line "exec cron"
15:34 jeff can you share where "all over the place" is? i'd be interested in trying to reproduce.
15:35 Bmagic jeff: we have our crontab configured to run with Central Time zone in mind. Therefore, midnight should mean midnight. And the script does in fact run at midnight my time.... But because of the environment and the perl script making the inserts... it inserted stuff in postgres with a different time
15:37 Bmagic in conclusion, the linux process that is in charge of triggering the scripts is respecting the /etc/timezone time. The environment for the script running is using UTC despite my best efforts.
15:37 jeff feel free to point me at the dockerfile or ansible playbooks in question if you've already shared them, but if not i'd appreciate knowing just the timezone related parts that you referred to with "It sets the timezone to our timezone all over the place" above.
15:38 Bmagic https://github.com/mcoia/eg-docker/blob/mas​ter/docker_builds/ksl/install_evergreen.yml
15:38 jeff thanks! Bmagic++
15:39 jeff I may ask more questions later if I'm unable to reproduce. :-)
15:41 Bmagic jeff++
15:41 JBoyer I'm going to REALLY, REALLY strongly recommend using the user and group modules to Ensure User Conformity rather than lineinfile. you can specify the uid with it if you like.
15:42 JBoyer (Also, you can use with_items to make it a single task, but that's less important.)
15:52 Bmagic JBoyer - you're right
15:53 Bmagic The lineinfile works because the container is presicely the same 100% of the time
16:17 Bmagic joined #evergreen
17:00 pinesol_green News from qatests: Test Success <http://testing.evergreen-ils.org/~live>
17:06 mmorgan left #evergreen
20:10 cbush06 joined #evergreen
20:37 finnx1 joined #evergreen
22:13 StomproJ joined #evergreen
22:31 Stompro joined #evergreen

| Channels | #evergreen index | Today | | Search | Google Search | Plain-Text | summary | Join Webchat