Skip to content

ContentResolver should return Set<InputStream> of content and Bundle should have Set<InputStream> of content

sometimes there are several content PDF documents for some Bundles. And/or ContentResolver cannot ultimatively decide which found content is the correct one.

E.g., http://dx.doi.org/10.3278/6004559w points to a landing page. Its HTML source provides an a-href ending in ".pdf", but this is just a marketing brochure. The correct content is in a a-href with "download" string.

Therefore:

  • expand ContentResolver to also check a-hrefs with "download" in its URL.
  • expand all of DDA to associate Set for content.
  • when uploading PDFs, iterate through and upload all elements in this Set