Evergreen ILS Website

IRC log for #evergreen, 2024-11-21

| Channels | #evergreen index | Today | | Search | Google Search | Plain-Text | summary | Join Webchat

All times shown according to the server's local time.

Time Nick Message
06:33 collum joined #evergreen
07:07 kworstell-isl joined #evergreen
07:08 redavis joined #evergreen
08:08 cbrown joined #evergreen
08:10 BDorsey joined #evergreen
08:26 mantis joined #evergreen
08:33 dguarrac joined #evergreen
08:41 mmorgan joined #evergreen
08:45 BDorsey joined #evergreen
10:25 kworstell-isl joined #evergreen
10:27 kworstell-isl joined #evergreen
10:50 kmlussier joined #evergreen
10:50 kworstell_isl joined #evergreen
10:50 kmlussier Somebody with more Launchpad powers than mine may want to change bug 1810854 back to Confirmed.
10:50 pinesol Launchpad bug 1810854 in Evergreen "Trying to Merge a User in Collections Fails Silently" [Medium,Fix released] https://launchpad.net/bugs/1810854
10:52 kmlussier Also, good morning #evergreen!
10:52 redavis I'll get at it.  Hold on.
10:53 redavis Also, good morning
10:54 redavis kmlussier, fixed
10:54 Bmagic I think I might have found the smoking gun on this circulation problem:  circulator: HASH(0x55eb1b789438) : circ due data / close date overlap found : due_date=2024-12-19T09:51:46-0600 start=2024-11-27T22:59:00-06:00, end=2
10:54 Bmagic 025-12-01T23:00:59-06:00
11:01 kmlussier redavis++
11:02 kmlussier heh, I probably still have the credentials for that bugmaster account stored in my pw manager. I probably could've fixed it myself.
11:04 redavis no worries :-). I don't like to get into it unless I have a specific reason, and then log out pretty immediately and then back in as myself.
11:21 abowling joined #evergreen
11:27 csharp_ @decide sudo make me a sandwich or run0 make me a sandwich
11:27 pinesol csharp_: go with run0 make me a sandwich
11:34 jeff Bmagic: that looks likely. are you finding the closed date now, where you didn't find it before, or is it still not present in the table and you're possibly dealing with cached in-process data?
11:35 Bmagic I think I trusted the staff when they told me that the closed dates were sound
11:35 Bmagic :)
11:35 jeff you had also indicated that it was not happening with all circs -- did you find a reason for that also?
11:36 Christineb joined #evergreen
11:36 Bmagic yeah, the closed dates were either fixed overtime (but not completely), but the way the closed dates look today, there are about 7 branches affected by the thanksgiving holiday, closed for 368 days instead of 2 days
11:37 Bmagic so simple
11:38 mmorgan hindsight++
11:38 Bmagic Sorry to bother everyone about it! Yall are so great!
11:39 Bmagic but this segfault issue.....
11:45 Bmagic Last night at 23:08, 8 minutes after the print action trigger fired, the OpenSRF router had a segfault and died. The nice thing is that the system freezes at that point and I get to see all of the logs and current processes. Here's what I've compiled so far: https://pastebin.com/YVizyW7Q
11:46 Bmagic I'm thinking I'm going to setup a new machine with ejabberd and let it run for a few days to see if I have the same issue.
11:49 * berick looks
11:59 berick Bmagic: mind restarting and running both routers in gdb and trying to reproduce?  i can help w/ the command
12:00 Bmagic sure! Wanna continue with the redis machine?
12:00 Bmagic It's still powered on, you caught me just before I deleted it
12:00 berick yes, this is almost certainly an error introduced in the redis C code
12:02 Bmagic ok, I'm game
12:03 berick oh right, we can attach to running processes...
12:03 berick so just restart everything like normal
12:03 Bmagic I've isolated the issue to this machine by trial and error. The "main" util server isn't having issues, now that I've divided the crons up. And this one cron seems to be the one that consistently causes the segfault. In other words: we can use this as a test bed.
12:04 Bmagic ok, I'm leaving the machine alive, and restarting everything like normal
12:06 berick once it's up, get the PIDs of the 2 router proceses, open 2 terminals and run this for each pid: gdb /openils/bin/opensrf_router <pid>
12:06 berick i /think/ that will do it
12:06 berick you also have to enter 'continue' once gdb loads the router
12:06 berick well, each router
12:07 berick i'll try it here too..
12:07 Bmagic this file doesn't exist:  /openils/bin/opensrf_router
12:08 jihpringle joined #evergreen
12:08 Bmagic I found it: /openils/bin/opensrf_router
12:08 Bmagic must have been a typo or something
12:09 berick you may have to sudo to run gdb
12:09 berick ok seems to work here, and you do have to enter 'continue' once gdb loads
12:09 berick once they're both loaded and continued, start trying to break things again
12:10 Bmagic Could not attach to process.  If your uid matches the uid of the target
12:10 berick try sudo
12:10 Bmagic maybe no-go in docker environment?
12:10 Bmagic I am root
12:10 berick huh
12:10 berick dang
12:11 Bmagic ptrace: Operation not permitted
12:11 Bmagic cat /proc/sys/kernel/yama/ptrace_scope
12:11 Bmagic 1
12:12 Bmagic I tried it on non-docker, works fine
12:13 Bmagic I can get this rig setup outside of the container and do the gdb, and what? keep it running in a screen so we can get more output when it crashes next time?
12:14 berick right, once the router crashes, you can type 'bt' or 'backtrace' and it will show the full error output with line numbers.
12:15 Bmagic I'm all over it, no problem. I'm afraid that it won't break on the new machine thought :) we shall see
12:15 berick yeah...
12:15 berick Bmagic++
12:16 Bmagic berick++
12:22 berick Bmagic: in the meantime, something like this might work?  i've used it, but it's easy to try:  addr2line -e /openils/bin/opensrf_router 7a496e327000+20000
12:22 berick *I've never used it
12:23 berick oops, that would be /openils/lib/libopensrf.so.2.2.0
12:23 berick instead of /openils/bin/opensrf_router
12:24 Bmagic !! let me see
12:24 Bmagic how do I discover what hex to append?
12:25 berick i think it's the stuff in libopensrf.so.2.2.0[7a496e327000+20000] from your log lines, but I'm not 100% sure
12:26 Bmagic addr2line /openils/lib/libopensrf.so.2.2.0 7a496e327000+20000
12:26 Bmagic addr2line: 'a.out': No such file
12:26 Bmagic I've recycled the processes since the log was outputted, will it be the same?
12:26 berick put -e before the /openils.. part
12:27 Bmagic oops, right
12:27 Bmagic addr2line -e /openils/lib/libopensrf.so.2.2.0 7a496e327000+20000
12:27 Bmagic ??:0
12:27 berick yeah.. figured
12:29 Bmagic oh, you know, maybe I can gdb the process from the VM above the container
12:30 Bmagic if I feed it the exact same opensrf_router (it can't be the same one that inside the container, but a same copy of it) will that work?
12:30 Bmagic it just needs a reference to the code that's running the program in memory so it can track the lines back?
12:30 berick no gdb has to attach to the running processes
12:31 Bmagic a docker container exposes it's processes to the VM above. I can use it's PID and run gdb on the VM
12:31 berick oh, you mean the binary..
12:31 berick maybe?
12:31 Bmagic I'll try
12:32 Bmagic that would be nice, so I'm testing the same situation where I've had this segfault a few times
12:34 berick Bmagic: also https://stackoverflow.com/questions/​21395106/how-can-i-gdb-attach-to-a-p​rocess-running-in-a-docker-container
12:35 berick hm, don't see how that's really different from what you already tried
12:35 Bmagic lxc-attach is the magic
12:35 Bmagic berick++
12:37 Bmagic lol lxc-attach: command not found
12:39 berick Bmagic: oh, maybe attach with: docker exec --privileged -it <container> bash
12:39 Bmagic ok, thanks, I was getting to that actually
12:39 Bmagic yes, that seems to have done it
12:40 Bmagic getting into the machine via docker with the --privileged switch, is making gdb happy now
12:40 berick yay
12:40 Bmagic sing host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
12:40 Bmagic 0x000071a1fe8277e2 in __GI___libc_read (fd=3, buf=0x7fffec4b6630, nbytes=16384) at ../sysdeps/unix/sysv/linux/read.c:26
12:40 Bmagic 26      ../sysdeps/unix/sysv/linux/read.c: No such file or directory.
12:40 Bmagic not sure if it's completely happy
12:40 berick i think that's ok.  just do 'continue' once it's stopped loading
12:41 Bmagic (gdb) continue
12:41 Bmagic Continuing.
12:41 berick cool
12:41 Bmagic it's chillin now
12:41 berick so get both routers attached
12:41 berick and start the attack
12:41 Bmagic ok, well, I wasn't prepared for it to work. I need to back out and get some screens going
12:41 berick heh
12:41 Dyrcona joined #evergreen
12:47 kworstell-isl joined #evergreen
12:49 Dyrcona vm snapshots++
12:51 Bmagic berick: no segfault yet, Imma try resetting more triggers, going back 2 days worth
12:53 berick cool.  i have to disappear for about an hour
13:02 Bmagic no segfault! 2 days worth of stuff just finished (~5k events). well, I guess I'll just leave it running in debug until next natural cron execution. It seems to happen at night (probably because the pressure is higher during those hours).
13:05 jihpringle joined #evergreen
13:09 Dyrcona Y'know what? It installs. That's good enough for me at this point.
13:20 redavis Dyrcona++
13:39 mmorgan Dyrcona++
13:56 csharp_ thinking_you_have_a_vm_snapshot_when_you_don't--
13:57 csharp_ in better news, the spaghetti carbonara I made the fam for lunch was delicious
13:58 csharp_ @band add Waiting For EJabberD
13:58 pinesol csharp_: Band 'Waiting For EJabberD' added to list
14:15 Dyrcona csharp_++ carbonara++
14:18 csharp_ @band add Require All Grandad
14:18 pinesol csharp_: Band 'Require All Grandad' added to list
14:19 csharp_ @band add Automatic for your PeePaw
14:19 pinesol csharp_: Band 'Automatic for your PeePaw' added to list
14:19 mantis joined #evergreen

| Channels | #evergreen index | Today | | Search | Google Search | Plain-Text | summary | Join Webchat