Time |
Nick |
Message |
07:18 |
|
collum joined #evergreen |
08:45 |
|
Dyrcona joined #evergreen |
08:59 |
Dyrcona |
I've got two machines running practically the same code on the same Ubuntu release (20.04). One of them gives this error with reports "XMLENT XML Parse Error: mismatched tag." The other works just fine. |
09:07 |
Dyrcona |
No differences in eg_vhost.conf as far as reports go. |
09:17 |
csharp_ |
ansible++ |
09:19 |
csharp_ |
Dyrcona: by "reports" you mean you're seeing that in report outputs? may be too in my own head to grok that |
09:21 |
Dyrcona |
Yeah, when I open a report output, I get a 500 error and that's in the log. |
09:21 |
Dyrcona |
I mentioned it late yesterday. It's parsing HTML as XML, and <meta charset="utf-8"> is breaking it. |
09:21 |
Dyrcona |
But not in production, just on this test VM. |
09:22 |
Dyrcona |
I also diffed the branches, and the reports code is the same. |
09:26 |
Dyrcona |
Only differences in eg.conf concern a ping file and SSL cipher settings. |
09:27 |
csharp_ |
huh |
09:28 |
Dyrcona |
I suspect that there must be a patch applied to production that I don't have on the test VM or vice versa. |
09:30 |
Dyrcona |
Only code difference is array_agg vs. array_accum in Booking.pm because one has a patch applied that the other doesn't. (I'll need that patch, but not for this problem.) |
09:36 |
Dyrcona |
I thought this had come up before, but I can't find it in my IRC logs. |
09:38 |
csharp_ |
it sounds familiar |
09:43 |
Dyrcona |
I'm taking care of the array_accum patch before I forget. |
09:44 |
|
sandbergja joined #evergreen |
09:50 |
Dyrcona |
I tried searching Lp with fixed committed bugs enabled, and nothing came up there either. |
09:51 |
Dyrcona |
Production could have a patch applied from elsewhere... |
09:51 |
Dyrcona |
I suppose that I can test that locally. I do have the list of commits applied. |
09:54 |
sandbergja |
Has anybody run into a situation where logging in to the staff client starts to fail most (but not all) of the time? The browser console complains about open-ils.circ.offline.data.retrieve failing because it can't connect to the server to get stat cats. This is with redis, recent main with some bonus commits that don't seem related, and ubuntu |
09:54 |
sandbergja |
jammy. osrf_control --diagnostic reported that everything was happy. osrf_control --restart-all fixed the issue (at least for now...) |
09:54 |
sandbergja |
The error: https://gist.githubusercontent.com/sandbergja/c938c9a63286bededec1bb5a9e18a162/raw/bfed9be845978f0a21b518c53fe896d056585394/log%2520entries%2520-%2520login%2520issue |
09:54 |
sandbergja |
(it also is present in the server logs, which didn't have any additional details) |
09:55 |
berick |
sandbergja: drone/worker lost connection to postgres |
09:56 |
sandbergja |
berick++ |
09:56 |
berick |
i've seen this happen w/ automatic os updates, but could be other causes |
09:56 |
sandbergja |
gotcha, I could totally see that |
09:57 |
sandbergja |
Is there a good way to monitor for drones that have lost their db connection? |
09:57 |
berick |
sudo apt remove unattended-upgrades # to disable auto updates on critical servers |
09:58 |
berick |
(for ubuntu, anyway) |
09:58 |
berick |
sandbergja: hm, just the logs, i think |
09:58 |
sandbergja |
berick++ |
09:59 |
Dyrcona |
csharp_: I applied the production patches and no unexpected differences. I'm stumped as to why one server is trying to parse the reports html as xml and the other isn't. |
09:59 |
Dyrcona |
Of course, the server code could actually be different than what's in my local branch.... |
10:02 |
csharp_ |
maybe something where the perl libs are out of sync? |
10:03 |
csharp_ |
on upgraded servers the /usr/local/share/perl/{version} directories are preserved and I've seen upgraded servers try to use the older versions |
10:03 |
csharp_ |
OS upgrades, I mean |
10:04 |
csharp_ |
also seen OSes hang onto older APT perl packages |
10:09 |
Dyrcona |
csharp_: Thanks. I'll check that, but I think both of these are mostly fresh, not release-upgraded. |
10:09 |
Dyrcona |
Yeah, both only have /usr/local/share/perl/5.30.0 |
10:14 |
csharp_ |
Dyrcona: same version of clark-kent.pl? |
10:15 |
Dyrcona |
csharp_: I'll copy the files and diff them but I'm pretty sure they are the same. |
10:15 |
Dyrcona |
Both servers produce output with the <meta charset='utf-8'> tag, only the one complains about it. I think it has to be something in EGWeb. |
10:17 |
csharp_ |
and output from the "good" server fails on the bad and vice versa? |
10:17 |
Dyrcona |
csharp_: I haven't tried that, but will. |
10:17 |
Dyrcona |
I suspect output from the good server will fail on the bad one, and not vice versa. |
10:18 |
* csharp_ |
likes ignoring his own problems to help others with theirs :-) |
10:18 |
csharp_ |
yeah, I think the same thing |
10:22 |
Dyrcona |
Yeahp. reports from the good server error on the bad one. |
10:22 |
csharp_ |
Dyrcona++ |
10:22 |
Dyrcona |
I hesitate to go the other direction because it's a production machine. |
10:23 |
csharp_ |
SORRY I BROAK UR REPORTZ |
10:23 |
Dyrcona |
Looks like it could replace existing output. |
10:27 |
Dyrcona |
There has to be some kind of difference. |
10:30 |
Dyrcona |
I'm going to try a git clean -xfd and rebuild/install Evergreen. |
10:34 |
Dyrcona |
I doubt that different Postgres versions would affect this. I don't think just looking at the report output connects to Pg in anyway. |
10:38 |
Dyrcona |
Well, that didn't help.... |
10:38 |
Dyrcona |
*confused unga bunga* |
10:44 |
Dyrcona |
hm. I think I just might need another db patch. |
10:50 |
Dyrcona |
git supposedly makes this easier, but the patch workflow doesn't really help. I can't just do git branch --contain <commithash> and find all of the branches with the same commit. |
10:50 |
Dyrcona |
I have to resort to grepping the logs for Lp numbers, etc. |
10:50 |
Dyrcona |
I'd like to switch to merge. |
10:57 |
Dyrcona |
Looks like I need a bunch more patches, but finding them all is proving to be a PITA. |
11:02 |
Dyrcona |
I have to log --grep for keywords like postgresql, etc. |
11:20 |
|
Christineb joined #evergreen |
11:21 |
jeffdavis |
We've run into the lost db connection thing before too. I've opened bug 2098507 as a feature request. |
11:21 |
pinesol |
Launchpad bug 2098507 in Evergreen "Respond gracefully when database connection is lost" [Undecided,New] https://launchpad.net/bugs/2098507 |
12:52 |
Dyrcona |
Well, applying all of the Pg commits that I could find did not help my issue with the XML parser error on the test vm. |
13:08 |
Dyrcona |
I am tempted to delete the VM and rebuild it. |
13:29 |
Dyrcona |
And that's what I'm doing. |
13:57 |
* Dyrcona |
is about to find out if OpenSRF 3.3.2 works with Evergreen 3.7.4. |
14:06 |
jeffdavis |
Looking at bug 2073561 ... |
14:06 |
pinesol |
Launchpad bug 2073561 in Evergreen "Incorrect content in the config.coded_value_map after applying the upgrade script from 3.12.3 to 3.13.0" [High,Confirmed] https://launchpad.net/bugs/2073561 |
14:06 |
jeffdavis |
For sites that haven't upgraded to 3.13.0+ yet, it seems like the best approach is simply to skip upgrade 1416 altogether? |
14:07 |
Dyrcona |
jeffdavis: I think so, yes. |
14:07 |
Dyrcona |
Can we recall it somehow? I suppose with another Lp bug... |
14:10 |
jeffdavis |
In that case my inclination would be to delete 1416 from the 3.13.0 version upgrade script in the next round of 3.13+ releases. |
14:10 |
Dyrcona |
Open a Lp bug and see what others think. |
14:11 |
csharp_ |
I didn't have the bandwidth to read through all the functions - is that something we can skip? |
14:11 |
jeffdavis |
I haven't checked thoroughly and won't be able to for a while (vacation next week)... |
14:11 |
csharp_ |
ah |
14:12 |
csharp_ |
have a nice time! |
14:12 |
jeffdavis |
... but if I understand Llewellyn's input correctly, fresh installs are unaffected by this issue because they don't install 961.data.marc21-tag-tables.sql at all. |
14:12 |
jeffdavis |
And 1416 is identical to 961.data.marc21-tag-tables.sql. |
14:12 |
csharp_ |
we are upgrading from 3.12 to 3.14 on Saturday |
14:12 |
csharp_ |
oh |
14:12 |
csharp_ |
hmm - maybe we can save ourselves the pain |
14:13 |
jeffdavis |
So simply omitting 1416 would in theory mean that an upgraded system is a match for a clean install. |
14:13 |
jeffdavis |
I haven't tested this yet *at all* so please don't rely on me being right about this :) |
14:13 |
jeffdavis |
(and good luck with the upgrade either way!) |
14:13 |
csharp_ |
thanks! |
14:14 |
Dyrcona |
I think you can skip it without danger. |
14:17 |
csharp_ |
Dyrcona++ # gonna skip it! |
14:25 |
Dyrcona |
Ugh. Half of a script just blew up because i did one of the steps early. |
14:27 |
Dyrcona |
Well, not quite half. |
14:28 |
Dyrcona |
I think I should be able to test OpenSRF now. |
14:30 |
Bmagic |
does the SIP 98/99 keepalive refresh the memcached authtoken? |
14:30 |
Dyrcona |
Bmagic: Not sure. |
14:50 |
jeffdavis |
Bmagic: Are you using SIPServer or SIP2Mediator? |
14:51 |
Bmagic |
SIPServer |
14:53 |
jeffdavis |
It looks to me like SIPServer doesn't check any authtoken when responding to 98 (looking at sub send_acs_status in Sip/MsgType.pm) |
14:54 |
jeffdavis |
or rather, it doesn't talk to EG about anything so there's no point where an authtoken is verified for 98/99 |
15:05 |
|
bshum joined #evergreen |
15:05 |
Dyrcona |
Welcome back, bshum! |
15:05 |
bshum |
Huzzah! |
15:06 |
bshum |
Dyrcona++ # moral support |
15:07 |
Dyrcona |
backups++ |
15:22 |
csharp_ |
bshum++ # supportive morals |
15:29 |
Bmagic |
jeffdavis++ |
15:29 |
Bmagic |
bshum++ # wb |
15:30 |
Dyrcona |
Was gonna say that my installation worked on the first try, but I have to remove the legacy JSON gateway from the Apache config. |
15:31 |
Dyrcona |
Guess that's a side effect of using OpenSRF 3.2.3. |
15:33 |
Dyrcona |
Huh. I thought this branch had all of our customizations in it. |
15:34 |
* Dyrcona |
is more confuzzled than before. |