Evergreen ILS Website

IRC log for #evergreen, 2022-02-15

| Channels | #evergreen index | Today | | Search | Google Search | Plain-Text | summary | Join Webchat

All times shown according to the server's local time.

Time Nick Message
06:00 pinesol News from qatests: Failed Installing Evergreen database pre-requisites <http://testing.evergreen-ils.org/~live//arch​ive/2022-02/2022-02-15_04:00:02/test.39.html>
08:00 collum joined #evergreen
08:36 mantis1 joined #evergreen
08:38 mmorgan joined #evergreen
09:12 Dyrcona joined #evergreen
09:19 Dyrcona If I get this message while loading MARC records "no mapping found for [0x80]": a) is that a warning or an error (i.e. does the record load anyway) and b) anyone had to fix these before and got any tips that might save me an hour or so of fumbling around?
09:20 Dyrcona The records in the file claim to UTF-8, but we all know how that works out in reality.
09:26 Dyrcona Well, I can say that those messages are not making it to MARC::Record->warnings. Because I would log those to my log file.
09:28 Dyrcona Grr. Because I botched the shortname in 856$9, they're not showing up in my custom view....
09:29 Dyrcona That means I can't get a quick count.
09:30 Dyrcona I can also say that the messages didn't trigger my error handler in the eval because there's no error log.
09:31 Dyrcona So, I guess they were inserted, even with the junk characters.
09:31 Dyrcona @blame Vendor records
09:31 pinesol Dyrcona: Vendor records is probably integrated with systemd
09:35 jvwoolf joined #evergreen
09:39 * Dyrcona reloads the database to give it another go.
09:49 Dyrcona Think I'll redirect stderr to a file. I'm not sure where these messages are coming from, either when I create the MARC::Record from the raw marc data, or when doing the insert.
09:49 Dyrcona Most likely the former.
09:58 JBoyer Bmagic_, I'm told the cert for the MOBIUS bugsquash server could use a refresh sometime.
10:01 Bmagic_ oh! will do
10:41 csharp_ jvwoolf: re: slow reports, have you done any query analysis to see what it's doing? (e.g. EXPLAIN/EXPLAIN ANALYZE)
10:41 csharp_ note that EXPLAIN ANALYZE actually runs the query, so if it's timing out, that may not work
10:55 rfrasur joined #evergreen
10:59 Dyrcona So, back to my MARC::Charset thing from earlier. I basically copied the example from the MARC::Lint manpage and ran that on my MARC input file, and it doesn't report any character set issues, though it does complain about punctuation and the subfield 9 in the 856.
11:06 Dyrcona Oh, interesting. It looks like MARC::File::USMARC->next returns raw MARC and doesn't create a MARC::Record object.
11:09 Dyrcona On, never mind. It does. I was looking at _next().
11:09 csharp_ Dyrcona: I've seen that kind of thing when processing records with garbage characters too
11:10 csharp_ pretty sure were able to find them in the text and replace/remove them
11:10 Dyrcona csharp_: Yeah. It happens a lot. I may just jam the records in anyway.
11:10 csharp_ abowling had to do some of that during our most recent library migration
11:11 Dyrcona I'm still waiting on a database reload before I do another run.
11:11 Dyrcona If it looks like single characters, I may just use sed or something like that on the file.
11:12 csharp_ yeah - pretty sure we had a cataloger load the file in MARCEdit at some point and do a simple find/replace
11:12 csharp_ that was back before I was more comfortable with regexes
11:13 Dyrcona So, I don't see the bad character message from MARC::File::USMARC because it builds the MARC::Record differently. It adds the fields as it finds them, does some checking of its own. My program reads the raw MARC from the file and does MARC::Record->new_from_usmarc().
11:14 Dyrcona Some of these look like "smart" quotes. Probably Windows-1252, again. :(
11:14 csharp_ yeppers
11:14 Dyrcona I wish people would learn that you can't just copy and paste into a MARC record.
11:14 csharp_ edit in Word, paste into MARC
11:15 csharp_ WYSIWY(don't actually)G
11:15 Dyrcona :)
11:15 Dyrcona Everyone should just use unicode. It's 2022 all ready.
11:16 * Dyrcona uses en_US.UTF-8 on all his accounts.
11:18 csharp_ ǝpoɔıu∩ ƃuısn ǝq pןnoɥs ǝuoʎɹǝʌƎ
11:18 Dyrcona :)
11:19 berick heh
11:25 mmorgan csharp_++
11:26 mmorgan though my head is now spinning a bit :)
11:37 Dyrcona csharp_: Part of the point of what I'm testing is to avoid cataloging staff from using MarcEdit to edit the 856s....They say it's the most time consuming step.
11:37 Dyrcona Since I've got a Perl program to do that part, I may just add code to clean up certain junk characters.
11:38 csharp_ good plan
11:39 Dyrcona Yes, I've assigned myself to look at Bmagic's record loader, but don't think I'll use it for this process, just yet: Bug1947898
11:39 Bmagic ty!
11:39 Bmagic Dyrcona++
11:39 Dyrcona Missed a space: Lp 1947898
11:39 pinesol Launchpad bug 1947898 in Evergreen "Enhanced MARC importer script electronic_marc_import.pl" [Wishlist,Confirmed] https://launchpad.net/bugs/1947898 - Assigned to Jason Stephenson (jstephenson)
11:54 Dyrcona The reload finished in time for me to start another test run before lunch, so here goes.
11:55 jihpringle joined #evergreen
12:33 Dyrcona Yeah, after looking up the Windows-1252 character set, I'm pretty sure that what I'm seeing are the Euro symbol (0x80), double dagger (0x87), and right single quotation mark (0x92).
12:34 Dyrcona There may be others, but I remember those three codes from the output.
12:36 Dyrcona Hmm. Must be an iconv module for Perl...
12:38 jvwoolf csharp_: That (or something close to that) was going to be my next step until the conversation here yesterday.
12:39 jvwoolf The thing is that these are templates that I built and nothing about them has changed, they just all of a sudden started timing out.
12:40 Dyrcona jvwoolf: Done a reindex or a vacuum on the database lately?
12:40 jvwoolf Yep, this weekend
12:40 Bmagic JBoyer: bugsquash is back in business. Had to replace the machine
12:40 Dyrcona Well, that's not it, then. :)
12:41 jvwoolf Dyrcona: Now, that you mention it, it was just a vacuum
12:41 Dyrcona And, on my character set issue, looks Encode has a from_to function. Think I'll give that a whirl.
12:41 jvwoolf It's been a while since we reindexed
12:42 Dyrcona Well, reindex can take a long time, though IIRC you can reindex just  1 table. It should almost never be necessary.
12:43 Dyrcona Except when Pg or O/S upgrades break things, like character collation.... :)
12:43 mmorgan jvwoolf: All of a sudden after a system change or upgrade? Or just all of a sudden with no changes.
12:45 collum joined #evergreen
12:46 jvwoolf mmorgan: I'm trying to piece that together myself. We had call number reports time out sometimes prior to upgrading, but prior to our last major upgrade they started to time out every time.
12:46 csharp_ jvwoolf: I would definitely take a problem query, append it to EXPLAIN and see what it's trying to do - feel free to share via pastebin or https://explain.depesz.com/
12:47 jvwoolf We had both a minor upgrade and an OS upgrade within that time, so I'm trying to pin down if this started happening before or after either one of those.
12:47 jvwoolf Minor Evergreen upgrade, that is
12:47 Dyrcona jvwoolf: Did you upgrade to Ubuntu 20.04 perchance? That would definitely require a reindex in Pg.
12:47 csharp_ jvwoolf: tables can grow past a point where the indexes vs. sequential scan decision is made correctly by the planner
12:48 csharp_ but the planner doesn't "know" that so it works on bad statistics
12:48 Dyrcona And, that's also likely it. jvwoolf said she had 30 million? deleted call number rows.
12:48 jvwoolf Dyrcona: No, 18.04
12:48 csharp_ which should be fixable via ANALYZE but isn't always in my experience
12:49 csharp_ jvwoolf: also, which version of PG?
12:49 jvwoolf csharp_: 9.6
12:49 csharp_ ok, same here
12:49 Dyrcona For Ubuntu 20.04, there's this: https://elephanttamer.net/?p=61#brokenindexes
12:50 Dyrcona Also, upgrading to Pg 14, it's best to do a dump and restore, rather than in place upgrade, IIRC.
12:50 csharp_ was about to say that a dump/restore isn't a bad idea occasionally anyway, though we've done in-place for the last several
12:51 Dyrcona We're using Pg 10. Haven't noticed any real performance differences with Pg 9.6.
12:51 Dyrcona At Pg 11, most things start getting faster, but others seem to get hit.
12:52 Dyrcona We do a dump and restore at least whenever we get new hardware.
12:52 Dyrcona I do them weekly to keep some test databases up to date. Anyway, getting off topic.
12:54 Dyrcona Going back to my record load/character issues, the message isn't coming from MARC::Record, either. I've got a program to dump the 856s that uses the same code to read the records and it doesn't peep.
12:56 Dyrcona I wonder if I can just convert the raw MARC data or if I need to do it field by field....
13:04 Dyrcona Now, it's just looking like some garbage and not necessarily Windows 1252... Probably a mix.... :(
13:07 Dyrcona I think I know the source of the messages though. It looks like they're possibly coming from NFD() when the MARC is cleaned to go in the database. I can check that hypothesis real quick.
13:09 Dyrcona And, no. That's not it, either.
13:11 Dyrcona g0=ASCII_DEFAULT g1=EXTENDED_LATIN at /usr/local/share/perl/5.26.1/MARC/Charset.pm line 308, <GEN1> chunk 1752.
13:18 Dyrcona Still don't know where that's coming from. I'm not using MARC::Charset, and doesn't look like the modules that I use do either. It must be the database throwing that at me....
13:57 JBoyer Dyrcona, MARC::XML and friends are used in the ingest trigger functions, so depending on where you're actually seeing that message output the database is a likely source.
13:59 Dyrcona JBoyer: Yeah. It doesn't seem to be coming from anything running in my Perl code outside of the database, but I didn't think warnings from the database would just show up in my output.
14:00 Dyrcona Oh, never mind. They will because of the DBI options.
14:05 Dyrcona Hrm.... Doesn't show up in the postgresql logs, either..... Also, I'm using: {PrintError => 0, RaiseError => 1, AutoCommit => 1}
14:05 Dyrcona Puzzling evidence.
14:18 JBoyer I'm not sure what "stderr" translates to in a pg loaded process like those, which is where I assume MARC::Charset sends that.
14:23 Dyrcona Yeah...
14:26 * Dyrcona sings along with Sting: "Mais nous pouvons faire ce que nous voulons..."
14:30 Keith_isl joined #evergreen
14:57 collum joined #evergreen
15:00 Keith__isl joined #evergreen
15:34 Dyrcona Looking at a the output in a file, some appear to be a multibyte sequence, but not something that is valid in UTF-8: 0x0080 0x009D.
15:40 Dyrcona Aight, so it is a mangled UTF-8 "smart quote." The sequence should probably be 3 bytes: \xe2\x80\x9d.
15:41 Dyrcona Ah, this character appears before it: â
15:43 Dyrcona Which is \xe2.....
15:43 Dyrcona MARC::Charset is apparently not dealing with it correctly, or I need to update my Unicode support....
15:54 Dyrcona Ugh.... I somehow killed the program.... Not sure what I did.
16:00 Dyrcona gmcharlt: It looks like MARC::Charset doen'st handle \xe2\x80\x9d correctly.
16:02 Dyrcona I wonder what happens if I replace those in Perl with a "?
16:11 Dyrcona Also, if the file is UTF-8, and that is a valid UTF-8 sequence, what's MARC::Charset got to do with it?
16:12 Dyrcona I'ma just leave it alone and reload the file again tomorrow after a db reload.
16:13 mmorgan Going home and coming back again is always a good approach :)
16:14 Dyrcona Unfortunately, I am at home.
16:14 Dyrcona And, I'll be tomorrow, too. :)
16:15 mmorgan There's always turning it off and on again!
16:15 Dyrcona I don't think this should be my problem. Things actually look good for a change with the data. I was blaming the vendor, but it looks like MARC::Charset and/or Evergreen's use of it is at fault. Unless this one record isn't set to UTF-8 or something when it should be.
16:16 Dyrcona Also, I have little context for the warnings. I'd have to search the MARC for the offending text/codes.
16:16 Dyrcona No record number, which tells me this isn't coming from my code because all errors and warnings are logged with the record number from the file.
16:23 abowling joined #evergreen
16:24 abowling after a 3.7 update, links in 856 are no longer appearing in the opac. the library has custom templates, but i diffed the relevant ones and it seems nothing has changed. any ideas on what i might be misssing?
16:25 Dyrcona Is it the bootstrap OPAC? I think there's a bug for that.
16:26 abowling *besides including an s too many?
16:26 abowling Dyrcona: yes
16:26 abowling checking launchpad now...
16:26 Dyrcona abolwing: Lp 1950394
16:26 pinesol Launchpad bug 1950394 in Evergreen 3.7 "Electronic resource links can fail to display in Bootstrap OPAC" [Medium,New] https://launchpad.net/bugs/1950394
16:27 abowling Dyrcona++
16:28 Dyrcona I'm not sure if Elaine's comment regards the patch (it sounds like it), or if she's saying that she doesn't see the bug.
16:29 Dyrcona I guess since I
16:29 Dyrcona I'm having fun loading resources in a test database, I could check that one out, too...
16:38 Dyrcona Alright, so, some of the records were not set to UTF-8 in the LDR, and I suspect that at least one of them was giving me this warning/error.
16:39 abowling Dyrcona: thanks again! that fixed it.
16:40 Dyrcona abowling: Cool. If you could update the Lp bug, that would be great.
16:41 abowling update re: ehardy's comment, or just confirm that it works?
16:41 abowling the latter of which i will do, gladly
16:44 Dyrcona Yeah, just confirm it works. I'll take a look at it tomorrow.
16:47 jvwoolf left #evergreen
16:47 abowling Dyrcona: will do
17:04 mmorgan left #evergreen
18:00 pinesol News from qatests: Failed Installing Angular web client <http://testing.evergreen-ils.org/~live//arch​ive/2022-02/2022-02-15_16:00:02/test.29.html>
18:23 jeff joined #evergreen
18:59 jonadab joined #evergreen

| Channels | #evergreen index | Today | | Search | Google Search | Plain-Text | summary | Join Webchat