Time |
Nick |
Message |
06:33 |
|
collum joined #evergreen |
07:07 |
|
kworstell-isl joined #evergreen |
07:08 |
|
redavis joined #evergreen |
08:08 |
|
cbrown joined #evergreen |
08:10 |
|
BDorsey joined #evergreen |
08:26 |
|
mantis joined #evergreen |
08:33 |
|
dguarrac joined #evergreen |
08:41 |
|
mmorgan joined #evergreen |
08:45 |
|
BDorsey joined #evergreen |
10:25 |
|
kworstell-isl joined #evergreen |
10:27 |
|
kworstell-isl joined #evergreen |
10:50 |
|
kmlussier joined #evergreen |
10:50 |
|
kworstell_isl joined #evergreen |
10:50 |
kmlussier |
Somebody with more Launchpad powers than mine may want to change bug 1810854 back to Confirmed. |
10:50 |
pinesol |
Launchpad bug 1810854 in Evergreen "Trying to Merge a User in Collections Fails Silently" [Medium,Fix released] https://launchpad.net/bugs/1810854 |
10:52 |
kmlussier |
Also, good morning #evergreen! |
10:52 |
redavis |
I'll get at it. Hold on. |
10:53 |
redavis |
Also, good morning |
10:54 |
redavis |
kmlussier, fixed |
10:54 |
Bmagic |
I think I might have found the smoking gun on this circulation problem: circulator: HASH(0x55eb1b789438) : circ due data / close date overlap found : due_date=2024-12-19T09:51:46-0600 start=2024-11-27T22:59:00-06:00, end=2 |
10:54 |
Bmagic |
025-12-01T23:00:59-06:00 |
11:01 |
kmlussier |
redavis++ |
11:02 |
kmlussier |
heh, I probably still have the credentials for that bugmaster account stored in my pw manager. I probably could've fixed it myself. |
11:04 |
redavis |
no worries :-). I don't like to get into it unless I have a specific reason, and then log out pretty immediately and then back in as myself. |
11:21 |
|
abowling joined #evergreen |
11:27 |
csharp_ |
@decide sudo make me a sandwich or run0 make me a sandwich |
11:27 |
pinesol |
csharp_: go with run0 make me a sandwich |
11:34 |
jeff |
Bmagic: that looks likely. are you finding the closed date now, where you didn't find it before, or is it still not present in the table and you're possibly dealing with cached in-process data? |
11:35 |
Bmagic |
I think I trusted the staff when they told me that the closed dates were sound |
11:35 |
Bmagic |
:) |
11:35 |
jeff |
you had also indicated that it was not happening with all circs -- did you find a reason for that also? |
11:36 |
|
Christineb joined #evergreen |
11:36 |
Bmagic |
yeah, the closed dates were either fixed overtime (but not completely), but the way the closed dates look today, there are about 7 branches affected by the thanksgiving holiday, closed for 368 days instead of 2 days |
11:37 |
Bmagic |
so simple |
11:38 |
mmorgan |
hindsight++ |
11:38 |
Bmagic |
Sorry to bother everyone about it! Yall are so great! |
11:39 |
Bmagic |
but this segfault issue..... |
11:45 |
Bmagic |
Last night at 23:08, 8 minutes after the print action trigger fired, the OpenSRF router had a segfault and died. The nice thing is that the system freezes at that point and I get to see all of the logs and current processes. Here's what I've compiled so far: https://pastebin.com/YVizyW7Q |
11:46 |
Bmagic |
I'm thinking I'm going to setup a new machine with ejabberd and let it run for a few days to see if I have the same issue. |
11:49 |
* berick |
looks |
11:59 |
berick |
Bmagic: mind restarting and running both routers in gdb and trying to reproduce? i can help w/ the command |
12:00 |
Bmagic |
sure! Wanna continue with the redis machine? |
12:00 |
Bmagic |
It's still powered on, you caught me just before I deleted it |
12:00 |
berick |
yes, this is almost certainly an error introduced in the redis C code |
12:02 |
Bmagic |
ok, I'm game |
12:03 |
berick |
oh right, we can attach to running processes... |
12:03 |
berick |
so just restart everything like normal |
12:03 |
Bmagic |
I've isolated the issue to this machine by trial and error. The "main" util server isn't having issues, now that I've divided the crons up. And this one cron seems to be the one that consistently causes the segfault. In other words: we can use this as a test bed. |
12:04 |
Bmagic |
ok, I'm leaving the machine alive, and restarting everything like normal |
12:06 |
berick |
once it's up, get the PIDs of the 2 router proceses, open 2 terminals and run this for each pid: gdb /openils/bin/opensrf_router <pid> |
12:06 |
berick |
i /think/ that will do it |
12:06 |
berick |
you also have to enter 'continue' once gdb loads the router |
12:06 |
berick |
well, each router |
12:07 |
berick |
i'll try it here too.. |
12:07 |
Bmagic |
this file doesn't exist: /openils/bin/opensrf_router |
12:08 |
|
jihpringle joined #evergreen |
12:08 |
Bmagic |
I found it: /openils/bin/opensrf_router |
12:08 |
Bmagic |
must have been a typo or something |
12:09 |
berick |
you may have to sudo to run gdb |
12:09 |
berick |
ok seems to work here, and you do have to enter 'continue' once gdb loads |
12:09 |
berick |
once they're both loaded and continued, start trying to break things again |
12:10 |
Bmagic |
Could not attach to process. If your uid matches the uid of the target |
12:10 |
berick |
try sudo |
12:10 |
Bmagic |
maybe no-go in docker environment? |
12:10 |
Bmagic |
I am root |
12:10 |
berick |
huh |
12:10 |
berick |
dang |
12:11 |
Bmagic |
ptrace: Operation not permitted |
12:11 |
Bmagic |
cat /proc/sys/kernel/yama/ptrace_scope |
12:11 |
Bmagic |
1 |
12:12 |
Bmagic |
I tried it on non-docker, works fine |
12:13 |
Bmagic |
I can get this rig setup outside of the container and do the gdb, and what? keep it running in a screen so we can get more output when it crashes next time? |
12:14 |
berick |
right, once the router crashes, you can type 'bt' or 'backtrace' and it will show the full error output with line numbers. |
12:15 |
Bmagic |
I'm all over it, no problem. I'm afraid that it won't break on the new machine thought :) we shall see |
12:15 |
berick |
yeah... |
12:15 |
berick |
Bmagic++ |
12:16 |
Bmagic |
berick++ |
12:22 |
berick |
Bmagic: in the meantime, something like this might work? i've used it, but it's easy to try: addr2line -e /openils/bin/opensrf_router 7a496e327000+20000 |
12:22 |
berick |
*I've never used it |
12:23 |
berick |
oops, that would be /openils/lib/libopensrf.so.2.2.0 |
12:23 |
berick |
instead of /openils/bin/opensrf_router |
12:24 |
Bmagic |
!! let me see |
12:24 |
Bmagic |
how do I discover what hex to append? |
12:25 |
berick |
i think it's the stuff in libopensrf.so.2.2.0[7a496e327000+20000] from your log lines, but I'm not 100% sure |
12:26 |
Bmagic |
addr2line /openils/lib/libopensrf.so.2.2.0 7a496e327000+20000 |
12:26 |
Bmagic |
addr2line: 'a.out': No such file |
12:26 |
Bmagic |
I've recycled the processes since the log was outputted, will it be the same? |
12:26 |
berick |
put -e before the /openils.. part |
12:27 |
Bmagic |
oops, right |
12:27 |
Bmagic |
addr2line -e /openils/lib/libopensrf.so.2.2.0 7a496e327000+20000 |
12:27 |
Bmagic |
??:0 |
12:27 |
berick |
yeah.. figured |
12:29 |
Bmagic |
oh, you know, maybe I can gdb the process from the VM above the container |
12:30 |
Bmagic |
if I feed it the exact same opensrf_router (it can't be the same one that inside the container, but a same copy of it) will that work? |
12:30 |
Bmagic |
it just needs a reference to the code that's running the program in memory so it can track the lines back? |
12:30 |
berick |
no gdb has to attach to the running processes |
12:31 |
Bmagic |
a docker container exposes it's processes to the VM above. I can use it's PID and run gdb on the VM |
12:31 |
berick |
oh, you mean the binary.. |
12:31 |
berick |
maybe? |
12:31 |
Bmagic |
I'll try |
12:32 |
Bmagic |
that would be nice, so I'm testing the same situation where I've had this segfault a few times |
12:34 |
berick |
Bmagic: also https://stackoverflow.com/questions/21395106/how-can-i-gdb-attach-to-a-process-running-in-a-docker-container |
12:35 |
berick |
hm, don't see how that's really different from what you already tried |
12:35 |
Bmagic |
lxc-attach is the magic |
12:35 |
Bmagic |
berick++ |
12:37 |
Bmagic |
lol lxc-attach: command not found |
12:39 |
berick |
Bmagic: oh, maybe attach with: docker exec --privileged -it <container> bash |
12:39 |
Bmagic |
ok, thanks, I was getting to that actually |
12:39 |
Bmagic |
yes, that seems to have done it |
12:40 |
Bmagic |
getting into the machine via docker with the --privileged switch, is making gdb happy now |
12:40 |
berick |
yay |
12:40 |
Bmagic |
sing host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". |
12:40 |
Bmagic |
0x000071a1fe8277e2 in __GI___libc_read (fd=3, buf=0x7fffec4b6630, nbytes=16384) at ../sysdeps/unix/sysv/linux/read.c:26 |
12:40 |
Bmagic |
26 ../sysdeps/unix/sysv/linux/read.c: No such file or directory. |
12:40 |
Bmagic |
not sure if it's completely happy |
12:40 |
berick |
i think that's ok. just do 'continue' once it's stopped loading |
12:41 |
Bmagic |
(gdb) continue |
12:41 |
Bmagic |
Continuing. |
12:41 |
berick |
cool |
12:41 |
Bmagic |
it's chillin now |
12:41 |
berick |
so get both routers attached |
12:41 |
berick |
and start the attack |
12:41 |
Bmagic |
ok, well, I wasn't prepared for it to work. I need to back out and get some screens going |
12:41 |
berick |
heh |
12:41 |
|
Dyrcona joined #evergreen |
12:47 |
|
kworstell-isl joined #evergreen |
12:49 |
Dyrcona |
vm snapshots++ |
12:51 |
Bmagic |
berick: no segfault yet, Imma try resetting more triggers, going back 2 days worth |
12:53 |
berick |
cool. i have to disappear for about an hour |
13:02 |
Bmagic |
no segfault! 2 days worth of stuff just finished (~5k events). well, I guess I'll just leave it running in debug until next natural cron execution. It seems to happen at night (probably because the pressure is higher during those hours). |
13:05 |
|
jihpringle joined #evergreen |
13:09 |
Dyrcona |
Y'know what? It installs. That's good enough for me at this point. |
13:20 |
redavis |
Dyrcona++ |
13:39 |
mmorgan |
Dyrcona++ |
13:56 |
csharp_ |
thinking_you_have_a_vm_snapshot_when_you_don't-- |
13:57 |
csharp_ |
in better news, the spaghetti carbonara I made the fam for lunch was delicious |
13:58 |
csharp_ |
@band add Waiting For EJabberD |
13:58 |
pinesol |
csharp_: Band 'Waiting For EJabberD' added to list |
14:15 |
Dyrcona |
csharp_++ carbonara++ |
14:18 |
csharp_ |
@band add Require All Grandad |
14:18 |
pinesol |
csharp_: Band 'Require All Grandad' added to list |
14:19 |
csharp_ |
@band add Automatic for your PeePaw |
14:19 |
pinesol |
csharp_: Band 'Automatic for your PeePaw' added to list |
14:19 |
|
mantis joined #evergreen |