IRC log for #evergreen, 2022-03-01

All times shown according to the server's local time.

Time	Nick	Message
06:00	pinesol	News from qatests: Failed Running pgTAP tests <http://testing.evergreen-ils.org/~live//archive/2022-03/2022-03-01_04:00:02/test.42.html>
07:13		rjackson_isl_hom joined #evergreen
07:16		collum joined #evergreen
08:00		mantis joined #evergreen
08:01	JBoyer	^ A test about spaces around values was also testing whether or not URIs were deleted when records are changed. We don't do that anymore, and now the test is updated.
08:23	pinesol	News from commits: Update Test for LP1722827 After LP1482757 <https://git.evergreen-ils.org/?p=Evergreen.git;a=commitdiff;h=e9c6750312c9c1ba001119e918ac84f4bcd017a6>
08:33		mmorgan joined #evergreen
08:40	csharp_	JBoyer++
08:40		jvwoolf joined #evergreen
08:41	mmorgan	JBoyer++
08:43		Dyrcona joined #evergreen
08:47		JBoyer joined #evergreen
09:00		Keith-isl joined #evergreen
09:01		rfrasur joined #evergreen
09:09		mmorgan1 joined #evergreen
09:24	* Dyrcona	mumbles "smart quotes...."
09:27	Dyrcona	Also, just plain junk in these records.
09:33	Dyrcona	I wonder if we're using an outdated Unicode standard?--That "we" is meant to be vague, i.e. not necessarily Evergreen.
09:39	Dyrcona	So, it looks like the preprocessing script does something to the records. Maybe I need to tell MARC::Record not to mangle the characters, somwehow?
09:56	Dyrcona	Or, maybe it doesn't.... Comparing dumps of the processed versus raw records, the relevant bits of the busted records look the same.
10:08	Dyrcona	I wonder if converting the records to marcxml will make a difference?
10:10	Dyrcona	So, do I convert them with yaz-mardump or with MARC::File::XML?
10:15	Dyrcona	Right, so when I use yaz-marcdump to convert the input records to marcxml, my editor shows the characters correctly. For some reason, I think my editor is treating the dumps as latin-1, even when I tell it they are UTF-8. What I suspect is MARC::Record and friends are mangling the characters because I'm working with binary MARC.
10:16	Dyrcona	I have little proof, other than what I see in the files through my editor, and that the records get mangled by my preprocessor Perl program.
10:17	Dyrcona	I will adapt my preprocessor and loader to work with marcxml and see what happens.
10:18	Dyrcona	BTW, the input records say they are UTF-8 in the leader.
10:18	* Dyrcona	quacks.
10:19	Dyrcona	Hmm... Should I use MARC::File::XML on these records, or should I use LibXML? I can do what I want with either.....
10:23	* Dyrcona	should write a MARC mode for Emacs. It couldn't be that hard... :)
10:26	Dyrcona	@monologue
10:26	pinesol	Dyrcona: Your current monologue is at least 15 lines long.
10:28	Dyrcona	So, yeah, the MARC::Record code that I'm using is mangling the characters.
10:30	Dyrcona	When I tell Emacs to use UTF-8 with one of the files, I get this: "...encountered characters it couldn’t encode..." followed by a list of characters that won't paste into my IRC client.
10:30	Dyrcona	The error message is much more detailed.
10:31	Dyrcona	I open the original MARC file and the mangled characters show up correctly in Emacs.
10:31	Dyrcona	Proof!
10:55		rjackson_isl_hom joined #evergreen
11:01	Dyrcona	I wonder if the problem is how I'm reading the binary files. I open it via IO::File with the record separator set to \x1e\x1d because records with smart quotes would break with MARC::Batch or MARC::File::USMARC. After I get the raw MARC, I feed that to MARC::Record. I suspect that is where the breakage occurs.
11:01	Dyrcona	Could be that I need to decode the data before passing it to MARC::Record?
11:01		rjackson_isl_hom joined #evergreen
11:01	Dyrcona	Or would that be encode?
11:03	Dyrcona	Well, I can try it and see.
11:03	Dyrcona	That would be quicker than switching to MARCXML.
11:03	Dyrcona	@monologue
11:03	pinesol	Dyrcona: Your current monologue is at least 26 lines long.
11:07	Dyrcona	Another clue: using decode on the raw input gives me several "wide character at" messages.
11:07	Dyrcona	I may need to set the encoding to UTF-8 with IO::File....
11:08	Dyrcona	So, decode on the input doesn't work at least not without encode on the output.
11:09	Dyrcona	So, I'm already setting binmode on the input and output.
11:09	Dyrcona	Maybe that's my problem?
11:11	Dyrcona	Right. Removing the binmodes and doing the manual decode seems to work. I'm going to remove the decode to see what happens.
11:12	Dyrcona	Ok, so the decode is not necessary, either.
11:14	Dyrcona	Let's see what happens if I set the input file to utf-8 binmode. I suspect it was the output file that was the problem.
11:16	Dyrcona	Yeah, that still produces the correct output.
11:17	Dyrcona	Ok. setting the output to utf-8, but not the input, also works. I think it's doing both that causes the problem, and if not, I'm being gaslighted.....
11:18	Dyrcona	No! why does it work, now?
11:18	Dyrcona	What else changed?
11:19	Dyrcona	Oh, wait... was it just using the Encode module?
11:22	Dyrcona	I'm stumped.....
11:23	Dyrcona	facepalm
11:28	Dyrcona	All right, I take that facepalm back. It wasn't that obvious.... Encoding in Perl can be tricky....
11:31	Dyrcona	So, it is double encoding. Looks like I can specify encoding on either the input or output, but not both.
11:33	Dyrcona	Or, even not at all, since Perl defaults to UTF-8.
11:33	Dyrcona	@monologue
11:33	pinesol	Dyrcona: Your current monologue is at least 45 lines long.
11:34	Dyrcona	Thanks, ducky!
11:34	Dyrcona	I should check the load script. Not sure it does anyone encoding.
11:38	Dyrcona	Also, TRAMP++ EMACS++ # I edited the file on the remote test server, committed it to the repo on that test server, then pushed it to the main repo, all using Emacs running on my laptop.... BTW, I know you can do that in vim, but most vim users don't know that. :)
11:41	Dyrcona	It would be nice if there was an easy way in git to apply a commit to a different, almost identical file. There probably is, and I just don't have the magic sauce.
11:41	Dyrcona	@monologue
11:41	pinesol	Dyrcona: Your current monologue is at least 50 lines long.
12:10		jihpringle joined #evergreen
12:45		collum joined #evergreen
17:04	mmorgan1	qexit
17:04	mmorgan1	Oops. wrong window. Broke Dyrcona' monologue. :-(
17:04		mmorgan1 left #evergreen
17:18		jvwoolf left #evergreen
17:48		Keith_isl joined #evergreen
18:00	pinesol	News from qatests: Testing Success <http://testing.evergreen-ils.org/~live>
18:25		jihpringle joined #evergreen
20:25		JBoyer joined #evergreen
21:11		Keith-isl joined #evergreen