Evergreen ILS Website

IRC log for #evergreen, 2022-08-08

| Channels | #evergreen index | Today | | Search | Google Search | Plain-Text | summary | Join Webchat

All times shown according to the server's local time.

Time Nick Message
07:14 rjackson_isl_hom joined #evergreen
07:20 collum joined #evergreen
07:54 BDorsey joined #evergreen
08:38 mmorgan joined #evergreen
08:47 mantis1 joined #evergreen
09:10 Dyrcona joined #evergreen
09:40 * Dyrcona wishes grep had an option to change the "newline" character.
09:41 Dyrcona Or the line/record separator. I guess that's partly why awk exists.
09:48 Dyrcona Turns out that they aren't copyright characters causing my issues. Looks like they're supposed to n with tilde and e with acute accent, at least the majority seem to be those two.
09:50 Dyrcona However, they show up in my output as a combination of two other characters.
09:50 Dyrcona At least when I grep the rejected records converted to xml.
09:55 Dyrcona Yeahp.... \xC3\xA9 is rendered as A with circonflex, copyright symbol, when it should be e with acute.
09:56 Dyrcona This is not a problem with the input as far as I can tell, and may not be an issue until it hits the database.
10:00 Dyrcona I'm not finding any actually invalid Unicode sequences in the files.
10:03 Dyrcona Lowercase n with tilde is showing up in my console as the two single byte characters represented by its component bytes. Something isn't handling multibyte UTF8 properly somewhere.
10:04 Dyrcona So, It's probably the load program that I had so much trouble getting Unicode to work with in the past few months.
10:09 jvwoolf joined #evergreen
10:15 Dyrcona Defintitely coming from MARC::Record->new_from_usmarc() in the load program, regardless of whether or not I set the file handle to UTF-8 or not.
10:22 Dyrcona MARC::Charset is up to date (1.35).
10:28 Dyrcona MARC::Charset shouldn't be involved and doesn't look like it is. The error is not coming directly from MARC::File::USMARC::decode() either.
10:29 Dyrcona I tried installing the latest Encode.pm and no difference.
10:30 Dyrcona Oh, I may stand corrected: https://metacpan.org/module/M​ARC::File::USMARC/source#L172
10:31 Dyrcona Bingo! Source of my error, and it is Encode.pm or the edition of Unicode available to Perl on my system.
10:32 Dyrcona https://metacpan.org/module/M​ARC::File::Encode/source#L35
10:39 Dyrcona Or, maybe it's just "The Unicode Bug..." :(
10:45 Dyrcona Looks like I might be able to avoid this by converting the records to MARCXML first.
10:53 Dyrcona Outside of a MARC context, I can't make decode crash on those characters.
11:16 BDorsey joined #evergreen
11:20 Dyrcona Well, it blows up elsewhere using MARC::File::XML: 2 :129: parser error : Input is not proper UTF-8, indicate encoding !
11:20 Dyrcona Bytes: 0xA9 0x22 0x20 0x69
11:21 jihpringle joined #evergreen
11:25 Dyrcona @monologue
11:25 pinesol Dyrcona: Your current monologue is at least 23 lines long.
11:39 Dyrcona Ha! I want a goto for a valid reason. I want a label outside my main loop that I can branch to when there is an error. Otherwise, I don't want to interfere with the flow.
11:42 Dyrcona Maybe I just need to change my loop to a do while.
11:46 Dyrcona Well, MARC::File::XML just makes it worse. More records get spit out that way.
11:52 Dyrcona I think the increase in errors comes from the records with invalid lengths and indicators getting mangled when converted to XML.
12:31 csharp_ Dyrcona: fwiw, I usually learn from your monologues :-)
12:31 csharp_ Dyrcona++
12:36 mmorgan Dyrcona++
12:37 Dyrcona Thanks!
12:41 Dyrcona I used yaz-marcdump to convert the binary MARC to XML, and that was after I had preprocessed the file file from the vendor. So it could be that my preprocessor program is writing junk, but I can't find bad UTF-8 in it.
12:42 Dyrcona It just looks like when going through the MARC modules, Encode suddenly doesn't like otherwise valid \XC2 and \xC3 sequences.
12:56 Dyrcona Very interesting: If I use a program to split the binary file into records using \x1E\x1D as the input record separator and then rune the decode('UTF-8', $raw_record), I get no errors, so the issue is definitely coming from the MARC modules somehow.
12:57 Dyrcona That's using the preprocessed file. I'll see what happens with the files directly from the vendor.
12:58 Dyrcona Ditto... Zero errors.
13:06 jihpringle joined #evergreen
13:13 csharp_ Dyrcona: I know you've been working on this for days and have probably ruled this out, but I've seen stupid stuff where the \XC2 literal characters were themselves mis-encoded somehow
13:14 csharp_ as in literally "\XC2" where one of those was some unicode character that escaped notice
13:16 Dyrcona csharp_: I originally thought that is what the problem was, or rather a MARC-8 \xC2 that got into the UTF-8 data. \xC2 in MARC-8 is the P with a circle sound copyright symbol, and \xC3 is the regular copyright symbol.
13:17 Dyrcona However, the input actually has valid UTF-8 sequences and using Encode::decode on the raw data on a record by record basis does not output any errors. The errors come when MARC::File::USMARC::decode() is run on a record.
13:18 csharp_ ah
13:19 Dyrcona Hmm... I have another idea.....
13:23 Dyrcona I tried MARC::File::USMARC::decode() on the files from the vendor and there are no errors. When I run it on the preprocessed file, I get errors. So, my preprocessor must be doing something wrong, even though decode UTF-8 like the raw input....
13:25 Dyrcona If I have to set the output stream to UTF-8, I'll be upset. I spent several days fiddling with that before and I swear that I got it right.....
13:27 Dyrcona So, I'm already setting binmode on the output to :utf8. Maybe I should do :bytes or :raw?
13:32 Dyrcona I guess :raw isn't a thing....
13:33 Dyrcona We may have a winner. Using :bytes produced a smaller output file.
13:33 Dyrcona I get the same number of records.
13:35 Dyrcona Huzzah! USMARC::decode and Encode::decode both like all of the records, now!
13:35 Dyrcona csharp_++ mmorgan++ #evergreen++ For putting up with me.
13:38 Dyrcona csharp_++ again for suggesting an encoding issue. Looks like my preprocessor was double encoding some characters.
14:03 csharp_ Dyrcona: oh wow
14:11 Dyrcona This line in the perlunicode documentation is misleading: Use the ":encoding(...)" layer  to read from and write to filehandles using the specified encoding.
14:11 Dyrcona I suspect it only applies if you're not manually decoding the data, which the MARC code does.
14:32 jeffdavis bug 1979345 adds a new permission to govern the hold pull list; is that an OK change to include in a point release, assuming there's a release note?
14:32 pinesol Launchpad bug 1979345 in Evergreen "Angular Holds Pull List Doesn't Scope" [Medium,Confirmed] https://launchpad.net/bugs/1979345
14:34 csharp_ I would probably wait until the next release
14:34 csharp_ unless the Ang list is so broken that we need it more urgently
14:34 Dyrcona I was typing something along the lines of what csharp_ said.
14:46 jeffdavis I wouldn't say it's broken, it just has no access controls. 3.8 added the ability to view any library's pull list (previously you could only view it for your workstation location).
14:46 jeffdavis We're going live with the new perm when we upgrade to 3.9 this weekend, I'll reconcile myself to having to renumber our permissions at some point. :)
14:48 rfrasur joined #evergreen
14:59 Dyrcona We've had to do things like that after upgrades before because we've backported things from future releases.
15:00 Dyrcona On the subject of MARC and encoding, just to complicate things, if you're pulling records from Evergreen via DBI as MARCXML and then converting them to USMARC to write to a file. You have to set the output stream to utf8 encoding, or you get errors reading the output file.
15:02 Dyrcona It's always fun to relearn things like this every 3 or so years. :)
15:10 Dyrcona jeffdavis: One thing that I often do is wait for the code to make it into master, then I cherry-pick the commits into my local branch so I have the correct id numbers and db upgrade codes. This makes generating the db upgrade script for future upgrades easier.
15:11 jeffdavis hm, and I guess there's nothing preventing that commit from going into master even if it doesn't get backported to 3.8/3.9
15:12 Dyrcona If I add it before that, I'll go back and cherry-pick the commits that changed the upgrade script name and assigned the upgrade number for the same reason.
15:13 Dyrcona Yes, that's true about going into master. Also, keep in mind you've only heard from two of us here. I'm not totally opposed to backporting permissions, particularly if the thing is broken without it or if it was an oversight in a new feature.
15:13 Dyrcona If the permission was something that someone thought would be nice to have later, then its more of a feature than a bug fix to my mind.
15:49 * mmorgan reads up
15:51 mmorgan jeffdavis: Not an answer to your question, but the Holds Shelf has always allowed viewing the Shelves of other org units.
16:03 jeffdavis I think we ultimately want a perm to control that too, I'll have to check.
16:07 jvwoolf left #evergreen
16:14 jihpringle yes, if we haven't already we're planning on submitting the hold shelf scope as a bug too
16:14 jihpringle it cause patron privacy issues for us
16:23 mmorgan jihpringle: Understood. Patron privacy is a priority for us also, but our libraries share patrons.
16:25 mmorgan Do you limit access to patrons in other parts of Evergreen with permissions? Like in Patron search?
16:27 jihpringle mmorgan: we do limit in other places, but a lot of it is based on opt in boundaries through the library settings
16:28 mmorgan Ah. Ok.
16:30 * mmorgan was wondering about someting like the depth of the VIEW_USER permission
16:38 jihpringle we have that perm set pretty high up our org tree, I think so that staff are able to see the patrons who've opted in from other libraries
16:39 jihpringle we want Library A to be able to see patron X from Library B that's opted into Library A, but we don't want Library A to be able to see the entire hold shelf or pull list for Library B
16:45 jeffdavis we do also use VIEW_USER depth to restrict access in some cases - for example, public library staff can't view users at post-secondary libraries
16:46 mmorgan jihpringle: Makes sense. A little too "In your face". We strongly discourage showing patron names on the pull list at all. We have default columns set for the pull list that exclude patron information.
16:46 mmorgan Can't prevent people from adding it though.
16:47 jihpringle ya, we try and discourage people from including patron info on the pull list but as you said we can't prevent people from adding it
16:52 mmorgan1 joined #evergreen
17:15 mmorgan1 left #evergreen
17:50 degraafk_ joined #evergreen
17:51 troy_ joined #evergreen
22:55 jeffdavis_ joined #evergreen
22:56 jmurray_isl joined #evergreen
23:04 akilsdonk joined #evergreen

| Channels | #evergreen index | Today | | Search | Google Search | Plain-Text | summary | Join Webchat