IRC log for #evergreen, 2022-08-08

All times shown according to the server's local time.

Time	Nick	Message
07:14		rjackson_isl_hom joined #evergreen
07:20		collum joined #evergreen
07:54		BDorsey joined #evergreen
08:38		mmorgan joined #evergreen
08:47		mantis1 joined #evergreen
09:10		Dyrcona joined #evergreen
09:40	* Dyrcona	wishes grep had an option to change the "newline" character.
09:41	Dyrcona	Or the line/record separator. I guess that's partly why awk exists.
09:48	Dyrcona	Turns out that they aren't copyright characters causing my issues. Looks like they're supposed to n with tilde and e with acute accent, at least the majority seem to be those two.
09:50	Dyrcona	However, they show up in my output as a combination of two other characters.
09:50	Dyrcona	At least when I grep the rejected records converted to xml.
09:55	Dyrcona	Yeahp.... \xC3\xA9 is rendered as A with circonflex, copyright symbol, when it should be e with acute.
09:56	Dyrcona	This is not a problem with the input as far as I can tell, and may not be an issue until it hits the database.
10:00	Dyrcona	I'm not finding any actually invalid Unicode sequences in the files.
10:03	Dyrcona	Lowercase n with tilde is showing up in my console as the two single byte characters represented by its component bytes. Something isn't handling multibyte UTF8 properly somewhere.
10:04	Dyrcona	So, It's probably the load program that I had so much trouble getting Unicode to work with in the past few months.
10:09		jvwoolf joined #evergreen
10:15	Dyrcona	Defintitely coming from MARC::Record->new_from_usmarc() in the load program, regardless of whether or not I set the file handle to UTF-8 or not.
10:22	Dyrcona	MARC::Charset is up to date (1.35).
10:28	Dyrcona	MARC::Charset shouldn't be involved and doesn't look like it is. The error is not coming directly from MARC::File::USMARC::decode() either.
10:29	Dyrcona	I tried installing the latest Encode.pm and no difference.
10:30	Dyrcona	Oh, I may stand corrected: https://metacpan.org/module/MARC::File::USMARC/source#L172
10:31	Dyrcona	Bingo! Source of my error, and it is Encode.pm or the edition of Unicode available to Perl on my system.
10:32	Dyrcona	https://metacpan.org/module/MARC::File::Encode/source#L35
10:39	Dyrcona	Or, maybe it's just "The Unicode Bug..." :(
10:45	Dyrcona	Looks like I might be able to avoid this by converting the records to MARCXML first.
10:53	Dyrcona	Outside of a MARC context, I can't make decode crash on those characters.
11:16		BDorsey joined #evergreen
11:20	Dyrcona	Well, it blows up elsewhere using MARC::File::XML: 2 :129: parser error : Input is not proper UTF-8, indicate encoding !
11:20	Dyrcona	Bytes: 0xA9 0x22 0x20 0x69
11:21		jihpringle joined #evergreen
11:25	Dyrcona	@monologue
11:25	pinesol	Dyrcona: Your current monologue is at least 23 lines long.
11:39	Dyrcona	Ha! I want a goto for a valid reason. I want a label outside my main loop that I can branch to when there is an error. Otherwise, I don't want to interfere with the flow.
11:42	Dyrcona	Maybe I just need to change my loop to a do while.
11:46	Dyrcona	Well, MARC::File::XML just makes it worse. More records get spit out that way.
11:52	Dyrcona	I think the increase in errors comes from the records with invalid lengths and indicators getting mangled when converted to XML.
12:31	csharp_	Dyrcona: fwiw, I usually learn from your monologues :-)
12:31	csharp_	Dyrcona++
12:36	mmorgan	Dyrcona++
12:37	Dyrcona	Thanks!
12:41	Dyrcona	I used yaz-marcdump to convert the binary MARC to XML, and that was after I had preprocessed the file file from the vendor. So it could be that my preprocessor program is writing junk, but I can't find bad UTF-8 in it.
12:42	Dyrcona	It just looks like when going through the MARC modules, Encode suddenly doesn't like otherwise valid \XC2 and \xC3 sequences.
12:56	Dyrcona	Very interesting: If I use a program to split the binary file into records using \x1E\x1D as the input record separator and then rune the decode('UTF-8', $raw_record), I get no errors, so the issue is definitely coming from the MARC modules somehow.
12:57	Dyrcona	That's using the preprocessed file. I'll see what happens with the files directly from the vendor.
12:58	Dyrcona	Ditto... Zero errors.
13:06		jihpringle joined #evergreen
13:13	csharp_	Dyrcona: I know you've been working on this for days and have probably ruled this out, but I've seen stupid stuff where the \XC2 literal characters were themselves mis-encoded somehow
13:14	csharp_	as in literally "\XC2" where one of those was some unicode character that escaped notice
13:16	Dyrcona	csharp_: I originally thought that is what the problem was, or rather a MARC-8 \xC2 that got into the UTF-8 data. \xC2 in MARC-8 is the P with a circle sound copyright symbol, and \xC3 is the regular copyright symbol.
13:17	Dyrcona	However, the input actually has valid UTF-8 sequences and using Encode::decode on the raw data on a record by record basis does not output any errors. The errors come when MARC::File::USMARC::decode() is run on a record.
13:18	csharp_	ah
13:19	Dyrcona	Hmm... I have another idea.....
13:23	Dyrcona	I tried MARC::File::USMARC::decode() on the files from the vendor and there are no errors. When I run it on the preprocessed file, I get errors. So, my preprocessor must be doing something wrong, even though decode UTF-8 like the raw input....
13:25	Dyrcona	If I have to set the output stream to UTF-8, I'll be upset. I spent several days fiddling with that before and I swear that I got it right.....
13:27	Dyrcona	So, I'm already setting binmode on the output to :utf8. Maybe I should do :bytes or :raw?
13:32	Dyrcona	I guess :raw isn't a thing....
13:33	Dyrcona	We may have a winner. Using :bytes produced a smaller output file.
13:33	Dyrcona	I get the same number of records.
13:35	Dyrcona	Huzzah! USMARC::decode and Encode::decode both like all of the records, now!
13:35	Dyrcona	csharp_++ mmorgan++ #evergreen++ For putting up with me.
13:38	Dyrcona	csharp_++ again for suggesting an encoding issue. Looks like my preprocessor was double encoding some characters.
14:03	csharp_	Dyrcona: oh wow
14:11	Dyrcona	This line in the perlunicode documentation is misleading: Use the ":encoding(...)" layer to read from and write to filehandles using the specified encoding.
14:11	Dyrcona	I suspect it only applies if you're not manually decoding the data, which the MARC code does.
14:32	jeffdavis	bug 1979345 adds a new permission to govern the hold pull list; is that an OK change to include in a point release, assuming there's a release note?
14:32	pinesol	Launchpad bug 1979345 in Evergreen "Angular Holds Pull List Doesn't Scope" [Medium,Confirmed] https://launchpad.net/bugs/1979345
14:34	csharp_	I would probably wait until the next release
14:34	csharp_	unless the Ang list is so broken that we need it more urgently
14:34	Dyrcona	I was typing something along the lines of what csharp_ said.
14:46	jeffdavis	I wouldn't say it's broken, it just has no access controls. 3.8 added the ability to view any library's pull list (previously you could only view it for your workstation location).
14:46	jeffdavis	We're going live with the new perm when we upgrade to 3.9 this weekend, I'll reconcile myself to having to renumber our permissions at some point. :)
14:48		rfrasur joined #evergreen
14:59	Dyrcona	We've had to do things like that after upgrades before because we've backported things from future releases.
15:00	Dyrcona	On the subject of MARC and encoding, just to complicate things, if you're pulling records from Evergreen via DBI as MARCXML and then converting them to USMARC to write to a file. You have to set the output stream to utf8 encoding, or you get errors reading the output file.
15:02	Dyrcona	It's always fun to relearn things like this every 3 or so years. :)
15:10	Dyrcona	jeffdavis: One thing that I often do is wait for the code to make it into master, then I cherry-pick the commits into my local branch so I have the correct id numbers and db upgrade codes. This makes generating the db upgrade script for future upgrades easier.
15:11	jeffdavis	hm, and I guess there's nothing preventing that commit from going into master even if it doesn't get backported to 3.8/3.9
15:12	Dyrcona	If I add it before that, I'll go back and cherry-pick the commits that changed the upgrade script name and assigned the upgrade number for the same reason.
15:13	Dyrcona	Yes, that's true about going into master. Also, keep in mind you've only heard from two of us here. I'm not totally opposed to backporting permissions, particularly if the thing is broken without it or if it was an oversight in a new feature.
15:13	Dyrcona	If the permission was something that someone thought would be nice to have later, then its more of a feature than a bug fix to my mind.
15:49	* mmorgan	reads up
15:51	mmorgan	jeffdavis: Not an answer to your question, but the Holds Shelf has always allowed viewing the Shelves of other org units.
16:03	jeffdavis	I think we ultimately want a perm to control that too, I'll have to check.
16:07		jvwoolf left #evergreen
16:14	jihpringle	yes, if we haven't already we're planning on submitting the hold shelf scope as a bug too
16:14	jihpringle	it cause patron privacy issues for us
16:23	mmorgan	jihpringle: Understood. Patron privacy is a priority for us also, but our libraries share patrons.
16:25	mmorgan	Do you limit access to patrons in other parts of Evergreen with permissions? Like in Patron search?
16:27	jihpringle	mmorgan: we do limit in other places, but a lot of it is based on opt in boundaries through the library settings
16:28	mmorgan	Ah. Ok.
16:30	* mmorgan	was wondering about someting like the depth of the VIEW_USER permission
16:38	jihpringle	we have that perm set pretty high up our org tree, I think so that staff are able to see the patrons who've opted in from other libraries
16:39	jihpringle	we want Library A to be able to see patron X from Library B that's opted into Library A, but we don't want Library A to be able to see the entire hold shelf or pull list for Library B
16:45	jeffdavis	we do also use VIEW_USER depth to restrict access in some cases - for example, public library staff can't view users at post-secondary libraries
16:46	mmorgan	jihpringle: Makes sense. A little too "In your face". We strongly discourage showing patron names on the pull list at all. We have default columns set for the pull list that exclude patron information.
16:46	mmorgan	Can't prevent people from adding it though.
16:47	jihpringle	ya, we try and discourage people from including patron info on the pull list but as you said we can't prevent people from adding it
16:52		mmorgan1 joined #evergreen
17:15		mmorgan1 left #evergreen
17:50		degraafk_ joined #evergreen
17:51		troy_ joined #evergreen
22:55		jeffdavis_ joined #evergreen
22:56		jmurray_isl joined #evergreen
23:04		akilsdonk joined #evergreen