Fix OAI-PMH *from* race condition between server and FeederService
An OAI-PMH server may update its data only once per day during a cronjob.
FeederService.feed(..) and BundlesSourceService.updateAfterFeeding(..) currently assume, with respect with their update to the incremental harvesting "from" parameter that the OAI-PMH server data is up-to-date, serving out all OAI records between from
and now. Here lies a race condition, exmplified with the following scenario.
todayAt22hrs: SchedulerConfiguration triggers -> FeederService harvests from lastDayAt22hrs until now (i.e. ~ todayAt22hrs) ... the OAI-PMH server does not report any new records in this timeframe. therefore, FeederServer doesn't get returned any records. FeederServer updates the from
value to today at 22 hrs
.
todayAt23hrs: the daily OAI-PMH server cronjob kicks in and adds to its index **all new records that have appeared between lastDayAt23hrs until now (i.e. ~ todayAt23hrs). It adds new records with their lastModifiedDate
value set to the corresponding time when these records were added during the OAI-PMH server editor's business hours.
This repeats on and on with no records ever being added, and actually records being skipped.
The Solution would be to have from
not being updated to the time of startOfTodaysIncrementalHarvesting, but instead keep (or update) it to the value of the last latest identified lastModifiedDate
encountered in some previous harvesting ... this may result in the same from interval start being queried continously for several days until at least one next record is found - whose lastModifiedDate will be used.