ContentResolver should return Set<InputStream> of content and Bundle should have Set<InputStream> of content
sometimes there are several content PDF documents for some Bundles. And/or ContentResolver cannot ultimatively decide which found content is the correct one.
E.g., http://dx.doi.org/10.3278/6004559w points to a landing page. Its HTML source provides an a-href ending in ".pdf", but this is just a marketing brochure. The correct content is in a a-href with "download" string.
Therefore:
- expand ContentResolver to also check a-hrefs with "download" in its URL.
- expand all of DDA to associate Set for content.
- when uploading PDFs, iterate through and upload all elements in this Set