16:29 |
jeff |
if the normalization is causing problems only with your change attempt, and the problem is that the normalization makes it difficult to find the relevant records, could you change your approach to use mrfr metabib.real_full_rec to find *possibly* relevant records, then parse their marcxml to determine the non-normalized value in 347$b? |
16:29 |
jeff |
(you probably already have that idea or better) |
16:30 |
Dyrcona |
jeff: It's pretty simple. I'm searching mrfr using the value string converted to a tsvector to find the records that I want to update. |
16:31 |
Dyrcona |
I'm using that string, preconversion to look for matching 347$b in the MARC::Record. It's the $subfield eq $str that it is failing, basically. |
16:32 |
Dyrcona |
If normalize $subfield, then it will match, but then I'll be stuffing a different, normalized value into the MARC and our catalogers might not like that. |
16:33 |
Dyrcona |
Turns out, too, that it's only 49 records, so I may just ask the cataloging center to fix them. |
16:34 |
Dyrcona |
Now, I want to make t-shirts and wrist bands with "WWJSD" on them: What would Jon Skeet do? |
16:36 |
Dyrcona |
I'm tempted to stuff this code into my private scripts repo anyway. |
11:03 |
rhamby |
right |
11:11 |
Dyrcona |
Well, I like automating things because automated mistakes usually lend themselves to automated fixes. |
11:12 |
Dyrcona |
csharp_: I work with a project that includes a submodule. If you have questions, let me know. I might be able to help. |
11:13 |
Dyrcona |
So, I'm looking at MARC export, and I think it's a bug that it exports holdings in 852. I think it's supposed to be 952. |
11:31 |
Dyrcona |
Eh, maybe that isn't a bug. I should have looked at the full description again. ;) |
11:33 |
Dyrcona |
Hmm.. The query that I'm working on is going to be more complicated than I first thought... |
11:34 |
Dyrcona |
Or, maybe not. I probably don't have to include locations for deleted copies. |
11:35 |
Dyrcona |
Thanks, rubber ducky! |
11:44 |
mmorgan |
__(')< |
12:02 |
Dyrcona |
Wonder if I messed up, or if there are really that many copies: Record length of 2216371 is larger than the MARC spec allows (99999 bytes). at /usr/share/perl5/MARC/File/USMARC.pm line 314. |
12:03 |
Dyrcona |
mmorgan++ |
12:03 |
Dyrcona |
Guess I'll convert to XML and have a look after it finishes. |
12:05 |
Dyrcona |
Heh. I messed up my query.... |
12:50 |
Dyrcona |
Always fun when you get the submodule out of sync with the main code. :) |
12:52 |
Dyrcona |
Here's the project I'm talking about: https://github.com/Dyrcona/openfortigui |
12:54 |
Dyrcona |
Ugh. Looks like this code might be too slow to be useful. |
12:57 |
Dyrcona |
When it was just straight up dumping MARC, it took about 10 to 15 minutes to dump 48,000+ records. It has been running for about 45 minutes now and only dumped 1,296 records with holdings. I should probably modify the main query to return the marc and copy info. |
13:05 |
Dyrcona |
Wonder if I can array_agg over an array_agg? |
13:22 |
Bmagic |
Dyrcona: I've got some perl that dumps records in parallel, I've seen it hit 300 records/second |
13:24 |
Bmagic |
IIRC, 8 threads. Mind you, it's not using the perl "threads" module because Encode.pm. Instead it launches a system command and monitors a mutual file on the fs |
13:48 |
Dyrcona |
I don't bother with threads in Perl 5. I use fork. |
13:49 |
Dyrcona |
Anyway, I think I've got a solution. Rather than run this time consuming query once for each record, I'll do it once for all records and make a hash table of the information per record id. |
13:49 |
Dyrcona |
If I get the options right, I can probably have selectall_arrayref make the data structure for me. |
13:53 |
Dyrcona |
My program basically works like this: Get a list of bre.id using one of 3 queries. After that loop through the array of ids and grab the marc for each one. If this is a batch of deletes, set the leader 05 to d and write to the binary output file. Otherwise, delete the 852 tags in the marc, look up the copy location and org unt name for each copy and add a 852 to the marc for each. |
13:53 |
Dyrcona |
Then, write it to the output file. |
13:54 |
Dyrcona |
It got really slow when I added the copy location/org_unit query. |
14:02 |
Bmagic |
I had some of the same challenges |
13:26 |
Bmagic |
small ones tend to load completely. But this interface is still Dojo, so I don't imagine anyone is interested until it's Angular |
13:41 |
pinesol |
[opensrf|kenstir] Fix LP#1883169 by using growing_buffer - <http://git.evergreen-ils.org/?p=OpenSRF.git;a=commit;h=a3368f9> |
13:42 |
jvwoolf |
Interestingly, we had one with 6 items fail to load |
13:44 |
Dyrcona |
jvwoolf: How big is the MARC associated with those? |
13:45 |
Dyrcona |
I'm pretty sure there is some MARC pulled over as well, though I might be thinking of something else. |
13:45 |
Dyrcona |
Oops! |
13:45 |
|
jvwoolf joined #evergreen |
13:46 |
Dyrcona |
jvwoolf: Check the public irc log. I think you might have missed something that I said. |
13:47 |
jvwoolf |
Dyrcona: Will do, my client keeps crashing today :( |
13:48 |
jvwoolf |
Cataloger says the MARC is "nothing out of the ordinary" |
13:52 |
Dyrcona |
Default max stanza size on Ubuntu 18.04 is 64K, IIRC, so if the MARC is around 10K each, or even for a few of them, thee you go. |
13:57 |
Dyrcona |
s/thee/there/ |
13:58 |
jvwoolf |
Dyrcona: Fair enough |
14:00 |
|
jvwoolf joined #evergreen |
12:54 |
berick |
Dyrcona: it should pull from settings, but you can also override with --tempdir |
12:54 |
|
collum joined #evergreen |
13:01 |
Dyrcona |
berick: I see that, and I'm not sure that's my problem. The file in /tmp may or may not be related. It could be some Net::Server thing. |
13:04 |
Dyrcona |
Yeah, the 0 byte file in /tmp seems to get dropped when I kill the marc stream importer and then shows up when I start it. |
13:05 |
Dyrcona |
I think I remembered reading in the Net::Server docs somewhere that something required Socket::Linux, so I've installed that module via apt packages. We'll see if that makes a difference. |
13:09 |
Dyrcona |
ah ha. Maybe this is the problem: opensrf.settings.host_config.get training.cwmars.org, |
13:14 |
|
mixo joined #evergreen |
13:16 |
mixo |
thank you for help |
13:18 |
Dyrcona |
So, that opensrf.setting.host_config.get call seems to be coming from SettingsClient, but the subsequent opensrf.settings.default_config.get call does not appear to be logged, so I don't think getting the settings is the problem. |
13:18 |
Dyrcona |
mixo: On behalf of those who helped you, "You're most welcome!" |
13:22 |
Dyrcona |
berick: Can I just throw a binary marc file at the port to test it? (It looks like it, but thought I'd ask.) |
13:33 |
Dyrcona |
Well, i tried lobbing a MARC record at it using netcat and nothing happened. |
13:36 |
Dyrcona |
Interesting, it actually got something in my case: Sep 14 13:27:23 training /openils/bin/marc_stream_importer.pl: [INFO:5664:marc_stream_importer.pl:449:163163899556641] stream parser read 1603 bytes |
13:37 |
Dyrcona |
Whatever happens after that, I don't see authentication nor vandelay calls in the logs, and yes, I'm using syslog for everything. |
13:48 |
Dyrcona |
So, it is failing to create the temp file, but that failure is not logged. That's what it looks like right now. |
08:52 |
Dyrcona |
I'll run my output through yaz-marcdump and see what it says. Most of the length errors seem to involve records with multibyte characters and we say the record is two to four bytes longer than the vendor does. I supsect that they might not be counting lengths correctly. |
08:53 |
Dyrcona |
I know we do have a bunch of garbage records with bad indicators and other junk. |
08:55 |
|
mmorgan joined #evergreen |
08:58 |
Dyrcona |
I've also asked this vendor if they can accept records as UTF-8 and not MARC-8. |
09:04 |
Bmagic |
JBoyer++ |
09:07 |
Dyrcona |
yaz-marcdump's messages are next to useless. |
09:08 |
Dyrcona |
"Separator but not at end of field...." |
13:01 |
Dyrcona |
jeffdavis: What Eg version? |
13:01 |
jeffdavis |
3.7 beta-ish |
13:02 |
Dyrcona |
Ok. I've not seen that on 3.5.3, but we may also have very different Z39.50 use patterns. |
13:09 |
Dyrcona |
I wonder if I should open this bug on Evergreen or on marc-perl on github? I've got a bib that consistently produces a record with a bad length when exported in MARC8. |
13:10 |
jeffdavis |
bug 1940698 for the Z39.50 thing |
13:10 |
pinesol |
Launchpad bug 1940698 in Evergreen "Duplicate open-ils.search.z3950.search_class calls lead to drone exhaustion" [Undecided,New] https://launchpad.net/bugs/1940698 |
13:17 |
Dyrcona |
FYI: I just used the --pipe option on marc_export in production and it did exactly what I expected: Lp 1940662. |
13:00 |
tlittle |
terranm++ |
14:28 |
Bmagic |
Does Evergreen have a way to include the "call numbers" that are in the MARC into the "expert search" -> Call number? |
14:29 |
Bmagic |
090a and 090b and 099a |
14:29 |
Dyrcona |
Bmagic: You can search by MARC field in advanced search, though I know that's not what you're asking. |
14:30 |
Bmagic |
sure, I'm aware of the tag searching. This question is specifially about the "Call Number" Search |
14:30 |
Dyrcona |
Yeahp. You'd have to write some custom code to do what you're asking. |
14:31 |
Bmagic |
That search is searching asset.call_number right? |
14:36 |
Bmagic |
That's what I'm coming up with too. Dyrcona++ |
14:37 |
Bmagic |
I thought I understood this cataloger to say that this used* to work. But I don't think it ever could have |
14:37 |
JBoyer |
Heads up, Dev meeting is scheduled to be in ~30. There's nothing but placeholders and LP updates currently; if there's nothing else anyone wants to discuss I'd recommend we give this month a pass. |
14:38 |
Dyrcona |
Bmagic: Yeah, I don't think that ever worked. Only way to search those field is MARC expert search. |
14:38 |
Bmagic |
TY, thanks for confirming |
14:38 |
Dyrcona |
You could add an index for those fields and add it as an advanced search option, bu t it would still only search the MARC, not asset.call_number and MARC. |
14:39 |
Dyrcona |
You'd have to modify the backend somewhere to search both. |
14:39 |
Dyrcona |
JBoyer: I'm cool with having the meeting or skipping it. |
14:39 |
Bmagic |
JBoyer: skipping is fine with me as well |
12:36 |
rhamby |
I usually scan for that and have it print them to visually eyeball and spot problems pretty fast |
12:37 |
Dyrcona |
Bmagic: I did a quick perusal of my scripts and I don't see anything like what you mentioned. |
12:37 |
|
collum joined #evergreen |
12:38 |
Dyrcona |
The real problem is when 1 MARC file contains text in different encodings, or worse Windows-1252 with smart quotes. |
12:38 |
rhamby |
yeah, that's why my solution does a decent job of finding the issues |
12:39 |
rhamby |
note that breaking strings into arrays and scanning with ord can be really really slow on big files but I cheat and do titles and authors usually and that is usually a good indicator if I need to dig deeper |
12:39 |
rhamby |
and yeah, the declaration if the file is marc8 vs unicode is usually more a hopeful statement than anything factual |
12:39 |
Dyrcona |
I've used python chardet with non-marc data. It could be used on a field by field basis with pymarc. |
12:40 |
Dyrcona |
Unicode is often spelt ISO8859-X (where X is a number). :) |
12:42 |
Dyrcona |
That should be misspelt, not spelt. :) |
12:48 |
Dyrcona |
I do have a script that spits out the record warnings and the encoding as understood by MARC::Record. |
12:49 |
|
collum joined #evergreen |
12:50 |
Dyrcona |
Bmagic: https://pastebin.com/Pef0KLeL |
13:00 |
|
sandbergja joined #evergreen |
15:47 |
Bmagic |
correction "$file = MARC::File::USMARC->in($filename)" |
15:48 |
Dyrcona |
Are you having any particular problems other than encoding? |
15:49 |
Bmagic |
I'm thinking of potential issues with the way this script reads the records to make it more "compatible" for the masses |
15:50 |
Dyrcona |
My script reads the records that way because that is a) how MARC::File::USMARC->in() does it, but also b) when you have "smart quotes" pasted into a field, you actually need to split records on \x1E\x1D because \x1D is in the smart quote sequence. |
15:51 |
Dyrcona |
If you're having encoding issues with some records, I'd suggest trying pymarc and chardet to go over each field. You can then convert the data field by field if necessary. |
15:51 |
Bmagic |
wow, I guess the question is: should I write this to support "smart quotes" |
15:52 |
Bmagic |
maybe the contribution needs to land in MARC::File instead of my script? |
15:52 |
Dyrcona |
MARC::FILE may already work with smart quotes. I know that tsbere opened a ticket on CPAN about it. I don't know if his patch ever went in. |
15:53 |
Dyrcona |
gmcharlt should know, as i think he is one of the maintainers of MARC::File. |
15:55 |
gmcharlt |
I should check the patch queue, but yeah, at the moment smart quotes would break MARC::File::USMARC's expectations |
15:56 |
Dyrcona |
I was just looking at rt.cpan.org. |
15:56 |
Dyrcona |
I couldn't find the bug report. |
16:01 |
Dyrcona |
Well, it's not reported by tsbere, but here it is: https://rt.cpan.org/Ticket/Display.html?id=70169 |
16:01 |
Dyrcona |
Looks like rt.cpan.org is being shutdown. |
16:01 |
gmcharlt |
I'll take (another) look |
16:03 |
Dyrcona |
gmcharlt: If you want a patch, I could probably provide. I recall tsbere writing one for this. Maybe it was for MARC::Batch? |
16:03 |
Bmagic |
gmcharlt's comments from 2011 on that ticket are great: "not that encouraging such sloppy MARC records is a good idea. :)" |
16:03 |
gmcharlt |
Dyrcona: sure, happy to take a patch from you |
16:05 |
Bmagic |
just to be clear: I don't need to do anything special when I pass a UTF8 or a MARC8 or a MARC21 file into MARC::File::USMARC ? |
16:05 |
Dyrcona |
Ha! I thought rt.cpan.org would be closed by now, but the latest bug on MARC::File is 20 minutes, and it's a spam. |
16:06 |
Bmagic |
MARC::File::USMARC does all the work for me? Figuring out which character set to use and whatnot? |
16:06 |
Dyrcona |
Bmagic: Usually, yes. |
16:06 |
Dyrcona |
If the encoding is set correctly in the file. |
16:07 |
Dyrcona |
What are you actually trying to do? Load records from a migration/new library, a vendor? |
16:15 |
Dyrcona |
gmcharlt: Is there a git repository for MARC::Recod & company? |
16:16 |
gmcharlt |
Dyrcona: yeah: https://github.com/perl4lib/marc-perl |
16:17 |
Dyrcona |
Cool. I make an issue and pull request there. |
16:18 |
Bmagic |
Dyrcona: This sprang from the electronic_bib_import.pl work I'm doing |
16:19 |
Bmagic |
answer: probably not migration, but yes on the vendor |
16:21 |
Dyrcona |
Bmagic: USMARC records should be in MARC8 unless the leader says UTF-8. Trouble is, I've seen just about anything in actual MARC records, and it is difficult to tell at run time. |
16:24 |
Dyrcona |
For the logs and anyone else following along at home: It turns out that tsbere made a pull request on github for the issue, but his code breaks the tests. I'll take that up and see if I can fix it so it doesn't break the tests. |
16:27 |
Dyrcona |
Also, for Bmagic, and those following along, generally, the only way to detect the encoding of a MARC record that says it is MARC8 is to assume it is MARC8, convert it UTF-8 for Evergreen and catch any errors. That's another reason why I often read the records individually and convert them to MARC::Record inside an eval, so that 1 or two bad records don't spoil the whole batch. |
16:30 |
|
jihpringle joined #evergreen |
16:32 |
jeff |
indication of marc8 vs utf8 is at the record level, not the file level, right? |
16:33 |
Dyrcona |
jeff: Yes. |
09:50 |
bshum |
For same reasons |
09:50 |
Dyrcona |
Bmagic: How big are these invoices, i.e. # of line items? |
09:51 |
Bmagic |
60 or 70 lines |
09:51 |
Dyrcona |
Oh, and I guess that includes a chunk of MARC for each one, so if the records are detailed..... |
09:51 |
bshum |
Yep |
09:52 |
Dyrcona |
Assuming 15K per entry, you're looking about 1MB to retrieve all that in one go. |
09:53 |
Bmagic |
I solved the stanza size issue with a config tweak to ejabberd. But this other thing about incompletely loading the UI is a different thing. An issue I've seen for many months. Maybe years. But shrugged it off because the interface is still dojo |
12:36 |
Dyrcona |
Time is hard. |
12:43 |
Dyrcona |
When I add dates to milestones on Launchpad, I sometimes have to change the day because they can be off by a day. |
12:46 |
* Dyrcona |
wonders how difficult it would be for "Use Now" on workstation registration to just work without having to log in again. |
12:50 |
Dyrcona |
So, I just tried using MARC batch import with master, and I get upload progress 100%, but enqueue progress is 0%. There's also nothing in the /tmp directory AFAICT. |
12:51 |
jeff |
have to teach open-ils.auth how to assign a workstation to a workstationless login and change type. Might also violate some other assumptions. |
12:51 |
jeff |
Dyrcona: is your web server running with private temp, and you're using /tmp as the queue location? You'll need to either change to a different dir or stop using private tmp for the apache service. |
12:52 |
jeff |
(many public services like apache are launched by systemd with private /tmp by default on a lot of systems now) |
13:03 |
jeff |
apache's /tmp is not your /tmp :-) |
13:04 |
jeffdavis |
I believe private /tmp is a change between Ubuntu 16.04 and 18.04 fwiw |
13:07 |
Dyrcona |
Well, I hadn't noticed because I don't use it much on development, and we've been using /openils/var/tmp mounted via NFS in production for years. |
13:08 |
Dyrcona |
So, I'm actually testing a security bug and trying to import a MARC record to trigger it, but the record won't import. |
13:08 |
Dyrcona |
When I select it and hit Import Selected Records, the screen goes back to the main Import view, and the record does NOT end up in biblio.record_entry. |
13:09 |
Dyrcona |
FWIW, I have no idea what I'm doing in the staff client, particularly the Angular interface. |
13:28 |
Dyrcona |
So, maybe Vandelay is broken in master on Ubuntu 20.04? |
09:42 |
Dyrcona |
Either space or -. |
09:44 |
Bmagic |
alright - thems the breaks I guess |
09:44 |
Dyrcona |
We have a Genre entry. |
09:46 |
Dyrcona |
Put his in xpath: //marc:datafield[@tag='600'] |
09:47 |
Dyrcona |
Put this in browse_xpath and display_xpath: //*[local-name()='subfield' and contains('abcdfgklnpstuvxyz',@code)] |
09:48 |
Dyrcona |
Then whatever you want for the joiner, I'd recommend '-- '. NOTE the space after --. |
09:48 |
Dyrcona |
Try that see if it's close. |
10:25 |
* miker |
looks up |
10:25 |
Dyrcona |
Oh no, never mind. xpath syntax sucks. |
10:25 |
Bmagic |
xpath_syntax-- |
10:26 |
Dyrcona |
I think you should remove this from the xpatch field: and marc:subfield[@code="a"] and marc:subfield[@code="d"] |
10:26 |
Bmagic |
I threw that in there later. It was* identical to your suggestion above |
10:27 |
miker |
if you have a special-purpose cmf there's no reason you couldn't create an xslt to do something special |
10:28 |
Bmagic |
It sounds like we can't get subfield a+b and the rest of them joined with dashes - which is probably ok. At this point, I am trying to figure out why the index STILL has dahses with joiner set to null |
11:06 |
berick |
yeah, Wilmington got a nice blast. |
11:07 |
berick |
i watched some Outer Banks webcams this morning and it's just a grey windy mess |
11:07 |
Dyrcona |
Yeah.... |
11:08 |
Dyrcona |
2019-09-06 10:32:53 bd1-bh4 open-ils.vandelay: [ERR :3810:Vandelay.pm:272:1567780218267323] unable to read MARC file |
11:08 |
berick |
Dyrcona: that's likely the client_max_body_size issue |
11:08 |
Dyrcona |
Was typing the question! |
11:08 |
Dyrcona |
berick++ |
15:30 |
csharp |
the problem didn't exist in the new ng staff catalog when I tested that on one of the "problem" records |
15:30 |
csharp |
but does in the AngJS version on current-ish master |
15:33 |
Dyrcona |
csharp: Maybe that's what they were trying to explain to me yesterday. I was told it happens with tags that have a second indicator. |
15:34 |
Dyrcona |
I looked at the MARC edit view and not that one, let me look at the bibs they sent me again. |
15:34 |
Dyrcona |
I also was not sent screen shots. |
15:35 |
Dyrcona |
Oh, yeah. They also added this as a comment on a totally unrelated ticket, though I guess to the staff it could seem related when you don't know how it all works. |
15:38 |
Dyrcona |
Yeahp. I see it with one of the two sample records but not the other. |
10:13 |
|
collum joined #evergreen |
10:59 |
|
khuckins joined #evergreen |
11:10 |
|
khaun joined #evergreen |
11:27 |
Dyrcona |
Seems like after each Evergreen release messing with MARC records in the database gets slower. |
11:28 |
Dyrcona |
It's gotten to the point where it would be more efficient for staff to load records via Vandelay than using my Perl DBI scripts. |
11:40 |
csharp |
our match sets create ugly and super slow MARC loading through vandelay too |
11:43 |
|
sandbergja joined #evergreen |
11:54 |
dbs |
Also, I thought most of our vendors still use FTP or the like, rather than encrypted transfer methods... |
11:57 |
|
sandbergja_ joined #evergreen |
12:01 |
dbs |
Dyrcona: hrm, how do we spin that as a feature in the release notes? "Indexing now 5% slower!" |
12:02 |
Dyrcona |
dbs: Pretty much. Last time I tried timing things, it took 2 seconds to update a MARC record. |
12:03 |
dbs |
There was a lot of wisdom to the slim MODS approach for indexing, but I guess the demands for incredibly fine-grained search pushed us in a different direction |
12:03 |
|
jihpringle joined #evergreen |
12:03 |
dbs |
Also maybe complex Perl inside the database... |
12:08 |
Dyrcona |
"FTP was fine in 1999!" (No, it wasn't, but stil....) |
12:08 |
dbs |
Folio went with a MARC-centric approach to indexing and display and are now thinking about how to integrate BIBFRAME in parallel; their path will likely lead to a common intermediate format as well |
12:08 |
dbs |
Dyrcona++ |
12:09 |
* Dyrcona |
begins to suspect that MARC is part of the problem. |
12:09 |
csharp |
@quote add * Dyrcona begins to suspect that MARC is part of the problem. |
12:09 |
pinesol |
csharp: The operation succeeded. Quote #198 added. |
12:09 |
dbs |
(I believe slim MODS was supposed to be an intermediate format for Evergreen too but we've hard-coded MARC into frontend, middle layers, and backend all over now) |
08:44 |
|
aabbee joined #evergreen |
09:05 |
|
sandbergja joined #evergreen |
09:22 |
|
_bott_ joined #evergreen |
09:24 |
Dyrcona |
So, I'm being asked to make an Opac Icon Format and Search Format from a RDA field value. Can that even be done? The only examples I've seen so far come from MARC fixed fields. |
09:28 |
Dyrcona |
I guess that is possible.... |
09:28 |
JBoyer |
You can do it, I've added a bunch here based on the 753 (for gaming systems) |
09:29 |
Dyrcona |
JBoyer: Yeah, I just found it in the documentation, which tells me how to do it in the client. I would rather something that I can do in the database, but I can always set it up on a test server and extract the new table entries. |
09:31 |
JBoyer |
Getting them to actually show up will pretty much require tearing down and rebooting all of your memcache servers though, so that's a lot of fun. |
09:31 |
nfBurton |
Yup mine have sometimes taken 2 weeks to show |
09:32 |
nfBurton |
And noone seems to know why |
09:32 |
Dyrcona |
JBoyer: I've done set them up by hand before, but don't recall if I used arbitrary MARC fields in them that way. |
09:32 |
Dyrcona |
JBoyer: If I can set this up today, I can do it during the upgrade when I'm rebooting memcached anyway. :) |
09:33 |
JBoyer |
Dyrcona++ |
09:33 |
Dyrcona |
Is something supposed to happen when you click "Save" in the Coded Value Maps interface? |
15:31 |
makohund |
A bit better at handling large batches? |
15:35 |
Dyrcona |
Yes. You can run an arbitrary number of records through it without having to do smaller batches through Vandelay. |
15:36 |
Dyrcona |
Things may have improved in Vandelay recently, but we used to not be able to tens of thousands of records at once in Vandelay, at least not in a manner reasonable for the cataloging staff. |
15:38 |
Dyrcona |
I'd put the MARC file on a utility vm, and then schedule the program to run sometime at night. |
15:40 |
makohund |
Cool, thank you. Will have to take a bit of time to suss it out (I don't know the MARC functions very well), but bypassing chunking it up for Vandelay's sake sounds great to me. |
15:41 |
Dyrcona |
You'll notice it doesn't actually use MARC to parse the file. That's because "smart quotes" will break the MARC parsers in Perl. |
15:42 |
Dyrcona |
And, yes, you will eventually find MARC records with smart quotes in them. Copy and paste.... |
15:45 |
makohund |
Ack. Sounds like they need an "eat smart quotes, spit out regular quotes" filter of sorts. |
15:46 |
jonadab |
That would in principle not be a difficult regular expression. |
15:46 |
Dyrcona |
Well, the record terminator character in MARC 0x1E, is apparently part of a Windows smart quote regular expression. |
15:46 |
jonadab |
Oh. |
15:46 |
jonadab |
Eww. |
15:46 |
Dyrcona |
bleh.... reading and typing at the same time.... :) |
15:47 |
Dyrcona |
That's why I set the Perl record separator to the combination of the stop and start characters for the record. I think tsbere opened a bug on this for MARC::Record and/or MARC::Batch over a cpan some years ago. |
15:47 |
makohund |
Wow... what an ugly coincidence. |
15:51 |
makohund |
I'd wondered previously if perl might not be a better route for messing with the incoming 856's & $9's than MarcEdit. Any thoughts on that before I investigate either option? |
15:53 |
Dyrcona |
Well, it could be, but you'd have to layout what you want to add somehow. |
10:51 |
Dyrcona |
harmless wrong tab. :) |
10:52 |
Dyrcona |
And, since I just restarted apache on my test vm I have a question: |
10:53 |
Dyrcona |
Has anyone noticed that you have to restart/reload apache after restarting the opensrf.settings service lately? |
10:54 |
Dyrcona |
Since 3.0 or possibly the web client, if I restart opensrf.settings and open-ils.cat to pickup new MARC templates, for instance, neither the web client nor XUL will let me login until I restart apache2. |
10:54 |
Dyrcona |
That didn't used to happen, IIRC. |
10:57 |
kmlussier |
jeff++ # Continued efforts to make #evergreen usable while keeping the spammers out. |
11:08 |
|
Christineb joined #evergreen |
15:17 |
berick |
er, translate-toolkit |
15:18 |
berick |
the xliff files are lot more expressive, hopefully we can use them directly in the near future |
15:19 |
Dyrcona |
Well, po files were designed to work with compiled software in the last century, before XML was a thing. |
15:20 |
Dyrcona |
Kind of like using MARC for bibliographic record data.... |
15:20 |
berick |
*zing* |
15:22 |
|
nfburton joined #evergreen |
15:32 |
|
yboston joined #evergreen |
14:19 |
rjackson_isl |
probably just wait and see if it happnes again/regularly going forward - nothing spotted in logs |
14:25 |
Bmagic |
I think it's the ->as_usmarc that isn't giving me the utf character conversions |
14:26 |
Bmagic |
maybe I need to encode_utf8($marc->as_usmarc()) ? |
14:31 |
Dyrcona |
Bmagic: There's a way to do it, but I usuall don't have to. man MARC::Charset. |
14:32 |
Bmagic |
ah, maybe this is it $record->encoding( 'UTF-8' ); |
14:32 |
Dyrcona |
Bmagic: I don't think so. |
14:33 |
Dyrcona |
It isn't that simple. |
14:37 |
Bmagic |
I don't think these are weird records. I am just not handling them correctly. |
14:38 |
Dyrcona |
Bmagic: Are you ignoring me? |
14:38 |
Bmagic |
no? lol |
14:38 |
Dyrcona |
Bmagic: man MARC::Charset or use yaz-iconv |
14:39 |
Dyrcona |
Your best bet is probably converting the file, first, with yaz-iconv. |
14:40 |
Bmagic |
Dyrcona++ # I'll try some stuff |
14:41 |
dbwells |
Bmagic: If you can share your existing records and code, I don't mind helping with sanity checks. We've all been thrown for loops figuring out encoding issues at one time or another. |
14:41 |
Bmagic |
ty, if I get stuck using some of these other avenues, I will take you up on that! |
14:42 |
Dyrcona |
My favorites are the records that contain Windows-1252 with "smart" quotes. One of them looks like an end of record marker to MARC. |
14:42 |
Bmagic |
I remember you telling me about those! |
14:42 |
Bmagic |
yuk |
14:42 |
dbwells |
My all time favorite case was when a "zero-width space" somehow made its way into the record header somehow. Try to find that! |
12:38 |
jeff |
or what Dyrcona suggested, which will give you a JSON payload containing the biblio.record_entry object |
12:39 |
jeff |
(including a "marc" field with the marcxml of the record. |
12:39 |
jeff |
) |
12:39 |
Dyrcona |
And the marc is in the marc field. |
12:39 |
Dyrcona |
:) |
12:39 |
ejk |
Thanks! I'll try the pcrud call. |
12:40 |
Dyrcona |
I think there are other ways, but pcrud came to mind first. |
12:44 |
ejk |
Thanks so much! Dyrcona++ jeff++ |
12:45 |
Dyrcona |
ejk: Is this written in Perl? |
12:45 |
ejk |
*cough* PHP *cough* |
12:45 |
Dyrcona |
OK. Never mind. I don't know any MARC frameworks in PHP. ;) |
12:45 |
Dyrcona |
If you want to switch to Perl, Python, or Java, though..... |
12:46 |
ejk |
jeff: I think I actually started this library based on your Opensrf PHP library from way back when; but it's been expanded quite a bit from there. |
12:47 |
jeff |
good. mine was incomplete garbage. ;-) |
12:49 |
Dyrcona |
Speaking of MARC.....We have a record that shows nothing in the View MARC window, but shows up OK if you click the display MARC link in the OPAC. |
12:50 |
* Dyrcona |
wonders what is wrong... To the Batlogs! |
12:50 |
Dyrcona |
Oh. Now it works.... |
12:58 |
|
khuckins joined #evergreen |
14:14 |
ejk |
Material Type code and Additional Authors were the two that I could only find in the MARC record |
14:14 |
Dyrcona |
It works fine on my Ubuntu 16.04 test vm, but I can't get it to work on training server with Debian 7. |
14:15 |
|
jvwoolf left #evergreen |
14:15 |
Dyrcona |
ejk: There are tables with mappings to get values from MARC, material type is one of them. |
14:17 |
Dyrcona |
You want to look at config.marc21_ff_pos_map. |
14:17 |
Dyrcona |
Here's an example in Perl of how it might be used: https://github.com/Dyrcona/evergreen_utilities/blob/master/perl/loaderecords.pl#L359 |
14:18 |
Dyrcona |
Added authors, you'll have to pull from the appropriate fields. |
14:18 |
Dyrcona |
@marc 700 |
14:18 |
pinesol_green |
Dyrcona: An added entry in which the entry element is a personal name. (Repeatable) [a,b,c,d,e,f,g,h,j,k,l,m,n,o,p,q,r,s,t,u,x,3,4,5,6,8] |
14:18 |
Dyrcona |
And so on... |
14:21 |
ejk |
Any way I can get that table through an OpenSRF call? |
14:21 |
Dyrcona |
I usually see that referred to as item type. :) |
14:22 |
Dyrcona |
ejk: What you could do is write some code to build a static look up table from the database and pop that into your code. |
14:23 |
jeff |
ejk: for some purposes, i extract that kind of thing by transforming the MARCXML to MODS -- the MODS XSLT from LoC is what many parts of the Evergreen ingest/index process use -- with some modifications in a few places. |
14:23 |
Dyrcona |
Those entries are based on the LoC MARC docs and almost never change. |
14:23 |
Dyrcona |
MODS would be handy for the authors, for instance. |
14:24 |
jeff |
ejk: some messy python is here that might give you a sense of how that works -- see the various XPATH bits in the indexes{} hash: https://github.com/tadl/marc-indexing-for-es |
14:24 |
jeff |
specifically, https://github.com/tadl/marc-indexing-for-es/blob/master/index.py#L70 |