Evergreen ILS Website

IRC log for #evergreen, 2014-02-19

| Channels | #evergreen index | Today | | Search | Google Search | Plain-Text | summary | Join Webchat

All times shown according to the server's local time.

Time Nick Message
01:48 davidcandlestick joined #evergreen
03:08 v101 joined #evergreen
03:08 v101 left #evergreen
03:11 mrpeters joined #evergreen
07:04 timlaptop joined #evergreen
07:55 akilsdonk joined #evergreen
08:14 collum joined #evergreen
08:21 kbeswick joined #evergreen
08:38 Shae joined #evergreen
08:56 berick joined #evergreen
08:59 mmorgan joined #evergreen
09:02 Dyrcona joined #evergreen
09:06 rfrasur joined #evergreen
09:09 rfrasur rjackson-isl, is there something up with reports?
09:11 rjackson-isl reports are not back - it was decided to wait another day due to the response issues
09:11 rfrasur Okay, gotcha.  Then I'll stop querying the db ;)
09:40 yboston joined #evergreen
09:57 jl- I'm having some difficulties here,following this import example: http://docs.evergreen-ils.org/dev/_mig​rating_your_bibliographic_records.html -- 1) my marc records already are marcxml but not one record per line
09:57 jl- and I can't use bash or anything to parse/format xml
09:58 jl- what tool can I use to make it line format for a staging table?
09:59 paxed there is that piece of python code there...
10:00 jeff ...which could probably be easily modified to consume marcxml rather than marc
10:00 Dyrcona jl-: There is actually a Perl routine in the Evergreen utility code that you can use: OpenILS::Utils::Normalize::clean_marc.
10:00 jeff hrm. pymarc has MARCReader and JSONReader, but not a marcxml variant.
10:01 Dyrcona clean_marc is used by most of Evergreen to clean marcxml before insert into the database.
10:03 Dyrcona This little bit of Perl will make the marcxml be all on one line: $xml =~ s/\n//sog;
10:03 paxed or perhaps something (untested) like sed -i -e 's/\n//g' records.xml  && sed -i -e 's#</record><#</record>\n<#g' records.xml
10:03 Dyrcona TIMTOWTDI!
10:03 jl- jeff correct
10:04 dluch joined #evergreen
10:04 Dyrcona blarg. python. :)
10:04 jl- paxed I don't think bash should be used to parse xml?
10:04 jl- Dyrcona please explain that routine
10:04 * Dyrcona doesn't think xml should be used, but that's another debate. :)
10:04 dbs yaz-marcdump -i marcxml -o marc filename.xml > filename.mrc :)
10:05 * dbs suspects Dyrcona's regex would mash all of the records onto 1 line, instead of one per line
10:05 * Dyrcona assumes you're processing 1 record at a time at the point you use the regex.
10:05 dbs TIMTOWTDI and unfortunately It All Depends
10:06 dbs Dyrcona: oh, but jl- doesn't have anything parsing the marcxml qua marcxml records yet. just a marcxml file
10:06 Dyrcona jl-: The code is in Open-IlS/src/perlmod/lib/O​penILS/Utils/Normalize.pm if you want to look at it.
10:07 dbs ergo easiest method might be to roll back to MARC binary via yaz-marcdump and just use that little bit of python
10:07 jl- hmm
10:07 jeff a well reasoned and tested "what to do if your data is already in marcxml format, but not one-record-per-line" (and that might be use yaz-marcdump to go marcxml to MARC) would probably be a useful addition to http://docs.evergreen-ils.org/dev/_mig​rating_your_bibliographic_records.html
10:07 Dyrcona No. Write a little something with MARC::Batch.
10:08 jl- dbs so roll back to .mrc
10:08 yboston jeff++
10:08 jeff but updating the documentation doesn't help jl-'s immediate question :-)
10:08 yboston jeff: I am about to deal with that scenario,  "what to do if your data is already in marcxml format, but not one-record-per-line", any tips?
10:09 jeff none better than those already suggested here by Dyrcona and dbs and paxed
10:10 jl- rolling back
10:11 jeff paxed: your suggestion of sed is probably fast and could work for most scenarios, but i fear it may summon a hybrid marcthulu/xmlthulu, similar to http://stackoverflow.com/a/1732454/157515
10:12 dbs and there's always the possibility of xml namespaces ruining the generic sed fun
10:12 jl- I noticed that the .mrc file is about 1/3 the file of the xml
10:12 dbs jl-: sounds about right
10:12 jl- size
10:13 jl- now running the python script to parse xml
10:13 paxed jeff: possibly. perhaps i should've put a slightly stronger disclaimer in my suggestion... "may summon any of the Elder Gods"
10:14 jl- "may make you cry in the bathroom and reconsider your career"
10:14 dbs "What have I gotten myself into?"
10:15 Wyuli joined #evergreen
10:15 dbs "What have libraries done to themselves?"
10:15 jl- ^
10:19 ericar joined #evergreen
10:19 jeff every phone call for a patron password reset that i handle is another reminder to get moving on bug 1264746
10:19 pinesol_green jeff: Error: Could not gather data from Launchpad for bug #1264746 (https://launchpad.net/bugs/1264746). The error has been logged
10:20 jeff huh. well, that's ``Add "email password reset" button to user editor''
10:25 berick if anyone wants to help me quickly fix this : http://testing.evergreen-i​ls.org/~live/test.20.html
10:25 berick see working/user/berick/pgtap-​db-debs-check-sort-repair
10:27 jeff but after that is "phone rings, web browser brings up list of matching patrons in web staff client, options for usual things including triggering a suggested username and triggering an e-mailed password reset" :-)
10:27 * jeff looks at berick's url
10:28 jl- ergreen=# SELECT staging_importer();
10:28 jl- ERROR:  could not extend file "base/16385/371863.7": wrote only 4096 of 8192 bytes at block 969998
10:28 jl- dbs
10:28 jeff jl-: you may have run out of disk space.
10:28 paxed out of free space
10:29 jl- ergreen=# SELECT staging_importer();
10:29 jl- ERROR:  could not extend file "base/16385/371863.7": wrote only 4096 of 8192 bytes at block 969998
10:29 jl- HINT:  Check free disk space.
10:29 jl- CONTEXT:  SQL statement "UPDATE staging_records_import SET dest = currval('biblio.record_entry_id_seq')"
10:29 jl- PL/pgSQL function "staging_importer" line 6 at SQL statement
10:29 jl- yes that's what it said
10:29 jl- sorry for the spam
10:30 jeff berick: quick fix looks good (i have not run it through pgtap). any opinion on whether evergreen.upgrade_list_applied_deprecates should be updated to return sorted values itself?
10:32 berick jeff: thought about that, but couldn't think of a case where it mattered outside of the test
10:33 eeevil jeff / berick: and order may be important (but, of course, that's available from config.upgrade_log, I suppose)
10:40 _bott_ joined #evergreen
10:43 berick eeevil: as in, the order of entries in supersedes[] or deprecates[] may be important, so evergreen.upgrade_list_applied_deprecates (etc) should not apply asciibetical sorting of patches it returns?
10:45 eeevil berick: it's not important today, but may be in the future. but yes, that's what I meant
10:45 berick right
10:45 berick thanks
10:45 berick ok, so i stand by my quick fix, then
10:46 jeff have a launchpad bug yet?
10:46 jeff or would this just be on the original bug?
10:46 berick no, seemed like a waste of pixels
10:46 berick original bug
10:47 berick https://bugs.launchpad.net/evergreen/+bug/1262722
10:47 pinesol_green Launchpad bug 1262722 in Evergreen "DB patch supersede / deprecate func repairs; unit test" (affected: 1, heat: 6) [Undecided,Fix committed]
10:47 * jeff waits for new two-factor token
10:49 senator paxed: congratulations to your national hockey team (if you're into that sort of thing)
10:49 paxed senator: i'm not really into hockey, but it's always nice to beat russia... :P
10:49 mrpeters osrfsys.10.log:2014-02-19 10:39:08 brick01-head osrf_http_translator: [ERR :2223:CStoreEditor.pm:114:1392803222222329] editor[0|0] request error open-ils.cstore.json_query.atomic : {"from":["metabib.suggest_browse_ent​ries","\"","title","StartSel=\"<span class='oils_AS'>\", StopSel=</span>","-1","10","14"]} : Exception: OpenSRF::DomainObject::oilsMethodException 2014-02-19T10:39:08 OpenILS::Utils::CStoreEditor /usr/local/share/pe
10:49 mrpeters osrfsys.10.log:2014-02-19 10:39:08 brick01-head osrf_http_translator: [ERR :2223:EX.pm:66:1392803222222329] Exception: OpenSRF::DomainObject::oilsMethodException 2014-02-19T10:39:08 OpenILS::Utils::CStoreEditor /usr/local/share/perl/5.14.2/Op​enILS/Utils/CStoreEditor.pm:453 <500>  Severe query error -- see error log for more details
10:49 mrpeters osrfwarn.log:2014-02-19 10:39:08 brick01-head open-ils.cstore: [ERR :2228:oils_sql.c:5585:1392803222222329] open-ils.cstore: Error with query [SELECT * FROM metabib.suggest_browse_entries( '"', 'title', 'StartSel="<span class=''oils_AS''>", StopSel=</span>', '-1', '10', '14' ) AS "metabib.suggest_browse_entries" ;]: 0 ERROR:  syntax error in tsquery: ":*"#012CONTEXT:  PL/pgSQL function "suggest_browse_entries" line 12 at assignm
10:49 mrpeters those ring any bells for anyone?  newly upgraded 2.5.2 system...
10:49 senator paxed: :-)
10:50 mrpeters basically just stacking the system with queries from that and blocking normal activity from functioning
10:50 * jeff hands mrpeters a paste.evergreen-ils.org or gist.github.com or something
10:50 mrpeters argh yeah sorry, that was longer than i thought
10:51 jeff mrpeters: "severe query error" is a good clue to consult your postgresql logs to see what the error was from postgresql's POV.
10:51 senator mrpeters: that's an autosuggest query for a quotation mark
10:51 senator short term, you may consider disabling autosuggest until you figure why there are so many of those or why they take a long time/error out/whatever
10:51 berick @isitdown docs.evergreen-ils.org
10:51 pinesol_green berick: http://cat.evergreen-ils.org.meowbify.com/
10:51 mrpeters senator: aha, interesting
10:53 jeff of course, if the "severe query error" is that you killed the query because it was long running, your postgresql logs may not be helpful. i don't know for certain that severe query error is shown in the opensrf logs when you kill a query from pg_cancel_backend or similar
10:54 mrpeters jeff:  yeah, apparently they have some process that runs and kills things after 3 minutes
10:54 mrpeters i want to knock that off for now
10:55 jeff i'd probably go with senator's suggestion of disabling autosuggest as a stop-gap.
10:55 mrpeters yeah, i have --- restarting bricks now
10:58 * dbs suspects that killing pg backends leads to a cascade of other errors in cstore/storage thereafter, based on recent experience
10:59 berick and anything above the cstore level (middle-layer, client) would just see a request timeout, since they usually give up after a minute or so
11:00 jl- I'm assuming I can resolve the space issue by expanding the rootfs ?
11:01 mrpeters dbs: yeah?  I'd definetly like to get that little "sniper" stopped
11:03 paxed jl-: or there's a way to tell postgres to save the database files on somewhere else. (but, iirc, that has to be done when you create the db)
11:04 jl- paxed that's why I was wondering if it will even help postgres if I clean up rootfs spce
11:05 jl- or if it has allocated space set somewhere
11:05 eeevil mrpeters: I'd suggest looking at the schema, make sure all the indexes supporting autosuggest (which also support browse) are in place. you can configure autosuggest to not kick in until you have more than a certain number of characters from the user, IIRC
11:05 mrpeters eeevil: thanks, great info.  i hope autosuggest really is the culprit!
11:05 paxed jl-: it just uses as much space as you've got available.
11:07 eeevil mrpeters: and, short of disabling autosuggest, you can just disable visibility testing for autosuggest. I'd point you to the docs that, but the community docs sites seems broken at the moment...
11:08 mrpeters no i know how, thanks mike
11:10 mrpeters keyword searches are just bunk....title/author/etc. come back snappy....any particular index that might be corrupt/missing/etc.
11:10 jboyer-laptaupe joined #evergreen
11:16 dbs mrpeters: maybe your database just tipped over some point that triggers a different query optimizer path? That occasionally happens :/
11:18 mrpeters hmm
11:19 dbs Unfortunately hard to diagnose without EXPLAINs from prior to that tipping point
11:19 mrpeters sure
11:19 mrpeters im wondering if the autosuggest stuff, and the process sniping deal WAS the tipping point
11:24 mcooper joined #evergreen
11:27 mmorgan left #evergreen
11:28 jeff berick: tested your fix, pushed to master, commented in bug 1262722
11:28 pinesol_green Launchpad bug 1262722 in Evergreen "DB patch supersede / deprecate func repairs; unit test" (affected: 1, heat: 6) [Undecided,Fix committed] https://launchpad.net/bugs/1262722
11:28 jeff berick++
11:28 jeff tests++
11:29 berick jeff++
11:30 pinesol_green [evergreen|Bill Erickson] LP#1262722 Fix DB supersedes/deprecates pgtap test - <http://git.evergreen-ils.org/?p=​Evergreen.git;a=commit;h=6c16a44>
11:44 phasefx if we can get this one too, then we'll have a clean QA test run: https://bugs.launchpad.net/evergreen/+bug/1279420
11:44 pinesol_green Launchpad bug 1279420 in Evergreen "need Test::Output prerequisite" (affected: 1, heat: 6) [Medium,Triaged]
11:46 jwoodard joined #evergreen
11:46 jeff heh. i was just testing the waters in preparation for arguing for that. :-)
11:46 jeff phasefx: that system currently builds once nightly, around midnight?
11:46 phasefx correct, though I can fire it off practically at will over here
11:48 phasefx in #openils-evergreen, pinesol_green is following an RSS feed for it
11:48 * dbs fears what adding a thin plperl wrapper around xml::libxslt is going to do to ingest times
11:48 dbs all that I know is that solr is RAD
11:50 jl- wow I ran out of space again
11:50 jl- 200,000 records, -- 15 GB weren't enough
11:50 jl- any idea how much is neccessary
11:52 jeff "how much can you give it?"
11:52 dbs seems like it would be a good research area for planning docs - rough memory / storage requirements
11:53 * jeff nods
11:53 dbs for various levels of records, etc
11:53 mrpeters jl-: yeah, that sounds low for that many records
11:53 mrpeters i have a customer about that size, let me tell you what their db size is
11:53 jeff jl-: for your immediate purposes, if you can give it 30-60 GB, try that. :-)
11:54 mrpeters jl-: is that items or bibs?
11:56 mrpeters i have a customer with 121,070 bibs and 147,094 items and their DB hovers around 8GB
11:56 mrpeters just to give you a little perspective
11:57 jl- mrpeters bibs
12:02 mrpeters how many patrons do you plan to load?
12:02 mrpeters is 80GB feasible for you?  i'm thinking about circulation, auditor, etc. buildup over time as well
12:02 jl- possibly
12:03 jl- testing on a VM right now
12:03 jl- gonna try 60
12:03 mrpeters ah, i see
12:03 mrpeters just don't run your production DB in a VM, right eeevil ;)
12:03 mrpeters i think we've all seen the pain that can cause
12:04 jl- everything is virtualized these days
12:05 mrpeters i hear you, but we tried running a BIG production database in vmware and it just didn't perform well
12:05 mrpeters but, maybe for a smaller size consortium or library it would perform better, honestly not sure
12:05 mrpeters things just got WAY better for us when we went bare metal
12:05 mrpeters (still virtualizing everything else)
12:06 jl- all our library services are virtualized
12:06 jl- so we'll have to go with that
12:06 jl- ;)
12:09 mrpeters soooo, back to my keyword searching issues --- i'm the only one with access to the system --- keyword searches running over 7 minutes  --- http://pastie.org/8749161
12:09 mrpeters ill ask again, is there a particular index that may need rebuilding?
12:10 mrpeters title/author/subject/etc. work to perfection
12:11 dbs mrpeters: 7 minutes for NASCAR? That seems extreme, unless "NASCAR" or a stemmed version thereof is in half of your records. What does EXPLAIN say for that query?
12:12 mrpeters dbs: is that the full query there, or is that truncated?
12:13 eeevil dbs: re plperl wrapper, we gain the benefit of being able to cache the xslt object ... that will likely pay for any perl parsing overhead, I'd bet
12:13 dbs appears to be truncated
12:13 mrpeters i think so too
12:13 mrpeters let me find it in the pg logs
12:13 mrpeters appreciate the input dan
12:14 eeevil mrpeters: that's not a complex query. it suggests either tuning or schema deficiencies ... I'm guessing this is the same instance as the autosuggest one above?
12:14 dbs mrpeters: as far as "tipping points" go, that's usually associated with a growth in the amount of data that pushes postgresql to suddenly choose a different plan (from reading pgsql-performance on and off over the years)
12:15 mrpeters eeevil: yes, it is
12:15 mrpeters no change in amount of data, just the upgrade from 2.2.2 to 2.5.2
12:15 dbs So as eeevil says, could be tuning (various cost estimates) or table / index bloat or (maybe?) corrupted / missing index
12:16 mrpeters thats what im fearing -- a missing index
12:16 eeevil mrpeters: heh ... that changes a LOT of data :)
12:16 eeevil both size and shape
12:16 dbs ah, I didn't realize the 2.2.2 -> 2.5.2 was in the mix
12:16 mrpeters yeah, it is
12:17 mrpeters eeevil: true....those ingests are pretty massive
12:17 mrpeters i meant as far as new libraries, etc.
12:17 mrpeters as far as tuning, it's using the same hardware as before, and the same pgtuned values that have been in place for many years on this cluster
12:18 dbs seems very likely to be upgrade schema related. possibly reingest bloat?
12:19 mrpeters we did run a VACUUM FULL ANALYZE VERBOSE after the reingest
12:25 dbs mrpeters: if you can get an EXPLAIN ANALYZE of a keyword query, that's really going to help
12:25 sunnysnowcat joined #evergreen
12:26 mrpeters yep, im on it
12:27 mrpeters thought this was the query, but it's not seeming to work -- http://pastie.org/8749204
12:31 dbs needs more $core_query_25686$ methinks
12:31 mrpeters okk maybe i cut it off somewhere
12:31 mrpeters its a big honkin select :P
12:32 eeevil mrpeters: syslog mixes up incoming messages. that's what the [XXXX|YY] block is for before the incoming message itself, on each line. the XXXX part is the syslog message id, and the YYY part orders the lines of the message
12:34 eeevil find the XXXX for the line containing "-- bib search", then get all the lines for that XXXX, then stitch them together
12:34 eeevil you might find that the query you thought you were tracing was not the one that had the long duration...
12:35 mrpeters could be, im kind of curious to run it by hand now and see how it responds (and get you an explain)
12:44 mrpeters ok, query is churning, then ill get you the explain
12:46 mrpeters dbs:  damn, sorry!
12:46 mrpeters QUERY PLAN
12:46 mrpeters ------------------------------------------​------------------------------------------​------------------------------------------
12:46 mrpeters Function Scan on query_parser_fts  (cost=0.25..10.25 rows=1000 width=64) (actual time=34352.543..34352.622 rows=949 loops=1)
12:46 mrpeters Total runtime: 34352.730 ms
12:46 mrpeters (2 rows)
12:47 mrpeters 34 sconds is pretty damn reasonable
12:47 mrpeters but that seems to go up and up and up as load increases
12:48 mrpeters but, i did find out the indexes weren't primed into RAM (actually was able to fit the entire DB in ram this time) so maybe that's helped things
12:52 jihpringle joined #evergreen
12:55 kbeswick joined #evergreen
12:58 mrpeters and they've gone to crap again
12:58 mrpeters i can't even run that explain again now that the system had load again
12:59 stevenyvr joined #evergreen
12:59 mrpeters is there a particular table/index we should focus in on and run a vacuum on for keyword searches?
13:03 dbs metabib.keyword_field_entry would be the key table to focus on
13:04 * dbs slipped away for lunch
13:05 mrpeters no worries dan
13:05 dbs reingests going on? other things that might be trashing cache?
13:05 mrpeters no ingests
13:05 mrpeters my explain analyze verbose has been running for about 7 minutes now
13:06 mrpeters just getting no where.....before we let people back in it was returning within 30 seconds
13:06 mrpeters i assume what i gave above isnt enough info (i probably needed verbose?)
13:07 dbs IIRC you want everything from within the query_parser_fts() function; the function itself masks all of the real work going on
13:08 mrpeters http://pastie.org/8749326 is what im trying to get you
13:12 dbs mrpeters: try http://pastie.org/8749337
13:12 eeevil mrpeters: explain just the core query
13:12 mrpeters can do
13:12 eeevil (what dbs said)
13:12 * mrpeters realizes how little he knows about QP :(
13:13 mrpeters gonna be a few minutes on this...waiting until 1:20 to take the system down again from the public
13:14 mrpeters it won't budge right now
13:15 dbs :(
13:16 mrpeters thank you for baring with me
13:16 mrpeters i sincerely appreciate the help....ive not run into anything like this before
13:17 mrpeters can we stop sip/reports/utility
13:17 mrpeters no reason for reports to be running
13:17 mrpeters utility should be fine too
13:19 dbs mrpeters: we had great pain last week or two due to very broad queries like "A" and "The" appearing to result in lock escalation
13:19 mrpeters interesting
13:19 dbs Pretty much a denial of service attack. Amusingly, due to our own IT dept running a 3rd party link checker service.
13:19 dbs But could affect anyone :(
13:20 mrpeters sounds like a great adventure haha
13:24 mrpeters dbs: http://pastie.org/8749393 -- ah, finally
13:25 * dbs has eyes pop at "Seq Scan on keyword_field_entry fe" - twice
13:25 mrpeters drop some knowledge on me dbs!
13:25 dbs That's visiting every row in metabib.keyword_field_entry
13:25 dbs It's ignoring the index (if there is one)
13:25 mrpeters interesting......
13:26 * dbs would expect to see "Index Scan using metabib_keyword_field_entry_index_vector_idx on keyword_field_entry fe"
13:27 mrpeters http://pastie.org/8749400
13:27 dbs eeevil: you concur?
13:27 mrpeters is that missing an index then?
13:27 dbs it looks a lot like it
13:28 dbs what does \d metabib.keyword_field_entry say?
13:28 mrpeters thats in the paste, dbs
13:28 mrpeters just above
13:29 dbs yeah
13:29 dbs you're missing metabib_keyword_field_entry_index_vector_idx
13:29 mrpeters ok
13:29 dbs CREATE INDEX metabib_identifier_field_entry_index_vector_idx ON metabib.identifier_field_entry USING GIST (index_vector);
13:30 mrpeters dbs++
13:30 mrpeters eeevil++
13:30 mrpeters hrm
13:30 dbs realllllly weird that's missing
13:30 mrpeters evergreen=# CREATE INDEX metabib_identifier_field_entry_index_vector_idx ON metabib.identifier_field_entry USING GIST (index_vector);
13:30 mrpeters ERROR:  relation "metabib_identifier_field_entry_index_vector_idx" already exists
13:30 jeffdavis s/identifier/keyword/g in dbs' command
13:30 dbs jeffdavis++ # good catch, sorry!
13:30 mrpeters ah, thanks jeffdavis
13:30 dbs both uses of identifier there
13:30 * eeevil reads up
13:31 dbs CREATE INDEX metabib_keyword_field_entry_index_vector_idx ON metabib.keyword_field_entry USING GIST (index_vector);
13:31 eeevil well, around 2.4/2.5, the indexes are dropped and recreated (IIRC) to speed up the upgrade
13:31 dbs -- to be sure :)
13:31 eeevil the recreate seems to have failed... ?
13:31 dbs sounds like
13:31 eeevil you might want to switch to GIN indexes while you're at it
13:32 eeevil you can do that with CONCURRENTLY and drop the GIST ones after hours
13:32 mrpeters you think?  csharp mentioned that to me too
13:32 * dbs thought about mentioning that but also thought about just keeping things as close to stock as possible until the firestorm dies
13:32 mrpeters it was about 4AM and i wasn't up for tinkering
13:32 jihpringle joined #evergreen
13:32 mrpeters ^^^ what dbs said :)
13:33 mrpeters i know csharp has been happy with GIN indexes
13:34 jeffdavis we've been using GIN for that index too quite happily, I believe it requires more disk space but is faster
13:34 mrpeters just swap GIST for GIN in that query?
13:35 jeffdavis I believe so
13:35 dbs yep
13:35 dbs we've been using GIN too, except when upgrades switch them back to GIST
13:35 mrpeters good to know, awesome.   is there a list anywhere of which indexes are best to convert?
13:36 mrpeters 2.4.3 > 2.5 upgrade script maybe?  where it drops/recreates them?
13:57 kbeswick_ joined #evergreen
14:12 Wyuli Is there a button I can press that ships beer & pizza to devs? Can we add that in the Evergreen interface?
14:13 bshum We should add that to the "about" page link :)
14:14 Wyuli I'm getting stressed just watching the chatter roll through here...
14:15 Wyuli I wouldn't be opposed to shipping refreshments out if I could. :)
14:17 jboyer-laptaupe dbs++
14:17 jboyer-laptaupe eeevil++
14:17 jboyer-laptaupe I suspect we've been missing that index for longer than we know.
14:17 bshum It's alive!  alive!
14:17 bshum :)
14:23 eeevil ha! "laptaupe" ... nice :)
14:30 rjackson-isl system is currently smoking! - seconded on dbs++ and eeevil++
14:31 bshum Calling 0861
14:31 mllewellyn joined #evergreen
14:32 mrpeters yeah, much love for the help guys
14:32 mrpeters fast as can be!
14:37 stevenyvr joined #evergreen
14:37 smyers_ joined #evergreen
14:39 pinesol_green [evergreen|Jason Stephenson] LP1223903 - Add from_bare_hash to Fieldmapper.pm. - <http://git.evergreen-ils.org/?p=​Evergreen.git;a=commit;h=b9edf30>
14:39 pinesol_green [evergreen|Jason Stephenson] LP1223903 - Add indexes to authority.record_entry. - <http://git.evergreen-ils.org/?p=​Evergreen.git;a=commit;h=1108b8b>
14:39 pinesol_green [evergreen|Jason Stephenson] LP1223903 - Rewrite marc_export.in in support-scripts. - <http://git.evergreen-ils.org/?p=​Evergreen.git;a=commit;h=2c28f9b>
14:39 pinesol_green [evergreen|Jason Stephenson] LP1223903 - Add release notes for rewrite of marc_export. - <http://git.evergreen-ils.org/?p=​Evergreen.git;a=commit;h=e495dbb>
14:39 pinesol_green [evergreen|Ben Shum] LP1223903 - Stamping upgrade script for new authority_record_entry indexes - <http://git.evergreen-ils.org/?p=​Evergreen.git;a=commit;h=c0a0fc6>
14:41 pinesol_green [evergreen|Ben Shum] LP1223903 - Also bump the schema config - <http://git.evergreen-ils.org/?p=​Evergreen.git;a=commit;h=c0c1061>
14:57 smyers_ joined #evergreen
14:59 eeevil Dyrcona: since you've been testing http://git.evergreen-ils.org/?p=working​/Evergreen.git;a=shortlog;h=refs/heads/​user/dyrcona/lp1234845_ranked_volumes would you mind taking a look at the SQL version in http://git.evergreen-ils.org/?p=working​/Evergreen.git;a=shortlog;h=refs/heads/​collab/miker/lp1234845_ranked_volumes ?
14:59 * eeevil updates the LP bug too
14:59 Dyrcona No problem. I'll load it in a few.
15:00 Dyrcona It has mostly been kmlussier looking at it and she's on vacation at the moment.
15:00 eeevil Dyrcona: might need to re-drop the intermediate one by hand on your test server, just a head's up
15:00 eeevil thanks!
15:00 Dyrcona ok
15:01 rangi Dyrcona: when you get a chance (non urgent) you might like to do a pull of the ncip stuff too
15:01 Dyrcona rangi: Cool.
15:12 ktomita For browse search, does anyone know if the sample data has any "see" references?  I have seen "See Also" but no "See".
15:12 bshum I don't remember off hand what's the difference...
15:13 mmorgan joined #evergreen
15:13 ktomita bshum: "See" is 4XX for things like alternate names, "See Also" is 5XX for related things.
15:14 ktomita bshum: that is my understanding
15:15 bshum What's a record in the sample data with "see also"?  Just curious.
15:15 ktomita bshum: I did a subject browse for 'organic foods'
15:15 * bshum wonders if this is some sort of authorities thing that he's not using yet
15:16 phasefx in my concerto load, I see authority entries with 400 and 410 tags, but no 5XX
15:16 ktomita bshum: yes, this is authorities related.
15:16 kbeswick joined #evergreen
15:16 bshum ktomita: Well, there isn't any organic foods in the concerto dataset so I'm assuming you must be using something else then.
15:17 ktomita bshum: yeah sorry, I had imported some records to try and see it.  I haven't been able to get that to work either
15:21 ktomita I am going to go back and do some more research and ask again later.
15:23 Dyrcona rangi; Good stuff. I need to do some research on what's been pushed where for the moment.
15:23 Dyrcona eeevil: I loaded the new code. I did have to remove the intermediate one by hand.
15:24 akilsdonk joined #evergreen
15:24 * Dyrcona wonder if MNPALS is doing anything on the NCIP front. I've not seen anything of theirs go into the working repo in some time.
15:30 eeevil ktomita: IIRC, from writing that with senator, in order to see "see" references you need to link your authority records together. see the 2.5 release notes, sections 2.2.3, 2,6.2 and 2.6.3 for more information
15:30 ktomita eeevil: thanks, I will look into those areas
15:32 Dyrcona authority_authority_linker.pl... If memory serves.
15:32 eeevil Dyrcona: aye (2.2.3 in the release notes)
15:33 eeevil though, as with authority_control_fields.pl, it is for priming, not perfect cataloging (because, as we know, MIGO (marc in garbage out))
15:33 ktomita I did run that script.
15:33 eeevil ktomita: if you don't have authority records that would link to one another, it won't do anything for you
15:34 ktomita eeevil: that brings me back to my original question about it that concerto data does have records that will link with "see" and "see also".
15:34 eeevil I doubt it
15:35 ktomita I have tried to setup my own records that would have "see" and "see also" linking but have only been able to see "see also"
15:35 ktomita I was hoping to get an example of "see" linking.  thanks for the responses.  more research for me.
15:46 ktomita Do you know how to setup the authority marc record so that it would have a "see" link?
15:53 dbs SELECT * FROM authority.full_rec WHERE tag LIKE '5%'; says there are no auth records in the --load-all sample set with 5xx fields
15:55 gsams joined #evergreen
15:55 graced ktomita: I'm curious - are you implementing the authorities work that has been done in the community or writing your own?  I heard a rumor at a library conference that you were writing one from scratch...
15:58 ktomita graced: I am working with the community code
15:58 graced ktomita: cool, thanks for dispelling my rumor   :-)
16:01 ktomita graced: have you dealt with see and see also much?
16:09 Wyuli joined #evergreen
16:10 jl- the staging table is still importing (been about 4 hours).. would it be wise to sneak in and see how many rows there are in bibs_records
16:11 Dyrcona jl-: Probably not.
16:11 Dyrcona Depends on locking.
16:11 jl- k
16:12 jl- the initial available 70 GB have shrunken to 25 GB
16:12 jl- ./s/initially/initial
16:12 Dyrcona yep.
16:13 jl- I'll start worrying once I'm in the single digits
16:14 Dyrcona I don't know how much space you need to load via a staging table since I've never done it.
16:18 jl- ils curveballs
16:18 jl- wait, I don't even like baseball
16:18 jl- let's call it a sciscor tackle
16:19 jl- http://i.minus.com/iL5juqoZUPgkW.gif
16:20 jeffdavis jl-: how many MARC records are you importing?
16:22 jl- jeff 230,000
16:23 jl- jeffdavis
16:26 jeffdavis hm
16:27 jboyer_laptaupe joined #evergreen
16:27 jeffdavis I've got a test dataset with about 278,000 bib records, total size of the database after ingest is 10 GB
16:27 smyers__ joined #evergreen
16:27 jl- something's fishy then
16:27 jl- well
16:27 jl- after ingest
16:27 jl- but during?
16:28 jeffdavis not sure
16:28 jl- btw these are not 'test datasets'
16:28 jeffdavis I tried the staging_importer method that was discussed here recently and it didn't work for me, I ended up killing it after about 4 days and using a slightly different approach
16:28 jl- this is real data
16:28 jl- from a real university
16:28 jl- what worked for you?
16:29 jeffdavis getting a list of all the id's from the staging table and converting it into an SQL script with 278,000 individual 'INSERT INTO biblio.record_entry' statements
16:29 jeffdavis very inelegant, but it worked for me
16:29 jl- interesting
16:30 jeffdavis it's the only time I've tried that staging_importer approach, so I can't say if it normally works better or if it just doesn't work for large numbers of bibs
16:30 jl- what do you usually use?
16:30 jl- staff client?
16:30 jeffdavis fwiw these were real (public library) records too
16:30 jl- kk
16:30 jeffdavis I had been using the old method before (the perl scripts), wanted to try out the new approach
16:32 jl- jeffdavis I thought the perl scripts (like pg loader) take forever?
16:33 Dyrcona jl-: They're fast compared to the staff client.
16:33 jeffdavis that's one reason I wanted to try the other approach :)
16:33 Dyrcona If the staff client were even useful for that sort of thing.
16:34 jl- Dyrcona what's your take on this, 70GB almost used up, ran for 4 hours
16:34 jl- is it the method or the records
16:34 jl- or do I need more space
16:34 Dyrcona jl-: I'd say its the method more than the records.
16:35 Dyrcona Unless your records are huge or you're loading millions.
16:35 jl- the question is if it will work eventually or not..
16:35 jl- or if I should abort it
16:35 jl- I could try with giving it 100 GB
16:35 jl- but that seems bizzare
16:36 phasefx there are pg tuning things you can do temporarily to make imports faster, but that bit is voodoo to me
16:36 phasefx stuff with working mem and checkpoints
16:37 phasefx disable archive mode :)
16:37 bshum Well, tuning postgres is probably a good thing to do anyways.
16:38 phasefx just, differently tuned for normal production use as compared to huge batch imports
16:39 Dyrcona You are also probably having issues with work logs, 'cause your whole import is likely done as a single transaction.
16:39 Dyrcona Our entire database with indexes and that sort of thing takes up about 146GB of disk space.
16:40 * bshum keeps trying to shrink his
16:40 bshum :)
16:40 Dyrcona dumped, compressed and encrypted, our database is only 14GB. :)
16:42 Dyrcona jl-: Searching the IRC and dev mailing list lost from three or four years ago might turn up some discussions of what I was doing during our migration.
16:42 Dyrcona I got some tips on tuning postgres for large loads back then.
16:43 Dyrcona s/lost/logs/
16:43 jl- Dyrcona I could batch delete all log files
16:43 jl- from the sys?
16:43 Dyrcona jl-: No. Postgresql keeps a work log around in its own data until a transaction is complete or a checkpoint is reached.
16:44 hbrennan joined #evergreen
16:44 Dyrcona this doesn't show up in /var/log/ or anywhere like that.
16:44 Dyrcona I recommend getting a good book on PostgreSQL or do some online research about how it works.
16:45 * dbs used basically what is in the docs, but modified it to import in batches - not as extreme as jeffdavis' 1 insert per, but 100 records or 1000 records per transaction
16:46 * dbs checks -- batches of 5000 records, actually
16:47 dbs this was from an import in the summer of 2012.
16:48 hbrennan We're upgrading and missed Rose-URI perl module on the db server.... is it safe to rerun the whole 2.3-2.4.0-upgrade-db.sql a second time?
16:48 dbs http://pastie.org/8749954 - ah, cute, I started with 1000, then 5000, then batches of 10000, to get more feedback early...
16:48 jeffdavis http://pastebin.com/VXgUiVEj <- this is what I did, allows parallelization without dealing with locks
16:49 Dyrcona http://git.mvlcstaff.org/?p​=jason/hz2eg.git;a=summary <- the public version of what I did. :)
16:50 * dbs likes the simplicity of jeffdavis's approach, about 70 fewer files :)
16:51 hbrennan and apologies for being rude and cutting in
16:51 dbs hbrennan: not at all. I didn't answer you because I can't remember where Rose::URI fits in :/
16:51 * Dyrcona was moving a whole database from one ILS to another, not just bibs.
16:52 hbrennan we fixed that part
16:52 * dbs guesses that you probably don't want to rerun the whole upgrade script
16:52 hbrennan but would it hurt?
16:52 dbs Depends, actually; if the whole thing rolled back because Rose:URI was missing, then it wouldn't hurt.
16:52 hbrennan I'm pretending to be smart by typing what the real admin is saying across the room
16:53 dbs But without being there and seeing the logs...
16:54 * dbs used basically what is in the docs for imports because he wrote the docs based on what he did for his last import :)
16:54 hbrennan Ah. Just got a reply back from Equinox. They say it is a-okay to rerun
16:55 hbrennan Hooray! We've been twittling thumbs for an hour waiting
16:55 hbrennan Best to be safe though
16:55 dbs hbrennan++
16:55 hbrennan Thanks, Dan! dbs++
16:57 dbs But yes, once you get past a small set of demo records, you're not going to avoid PostgreSQL tuning and customized approaches to loading, which is what we've probably said about 10 times now :/
16:58 Wyuli joined #evergreen
17:19 mmorgan left #evergreen
17:36 dcook joined #evergreen
17:37 smyers_ joined #evergreen
17:43 jeff hrm. search errors in the logs on one of my test systems. will investigate further tonight.
17:44 jeff pgtap being present shouldn't break searches, should it?
17:44 jeff seems unlikely.
17:47 jeff oh. i believe i have a disconnect between my database and my services. all is broken as expected. :-)
19:13 nap joined #evergreen
19:14 nap left #evergreen
19:47 nap joined #evergreen
22:26 rjackson_isl joined #evergreen
23:38 zerick joined #evergreen

| Channels | #evergreen index | Today | | Search | Google Search | Plain-Text | summary | Join Webchat