Not like extracting just text separately or just images separately, requirement is to display contents of PDF file as like original file-means including images and tables right at place where it was in original file as HTML content.
Extracting data from a PDF file is fairly simple. There are multiple libraries out there that do it correctly. Extracting data, and preserving its layout, on the other hand the workflow the OP describes is a very difficult process.
When a PDF file, for example, displays a table, it's very easy for humans to see it, and understand this is indeed a table with some data in it. However, in the PDF file itself, this is a collection of vector lines, and some text runs in between.
Therefore when this data is converted to HTML, we don't know that we need to draw a table, but instead see this as vector art.
This is just one example of why this is difficult. There are many others that can be used to illustrate this point.
Subscribe to RSS
On the other hand, such a thing exists as "Tagged PDF" section It's a PDF where structure elements are actually defined, and extraction is fairly easy. However tagged PDF files are not as common as we would like, and in most cases you won't be guaranteed to work with one.
There are some tools on the market that use sophisticated logic to infer the structure of an untagged document. Some of them do a better job than others at this.
Both of them are commercial solutions. However, it doesn't have the ability to create an HTML file, and this is something that will have to be implemented outside of the library.
Asked 3 years, 4 months ago. Active 3 years, 4 months ago. Viewed times. Arunkumar S Arunkumar S 1 1 silver badge 12 12 bronze badges.
Properties and Methods
Vel Genov Vel Genov 5, 2 2 gold badges 13 13 silver badges 19 19 bronze badges. Thanks a lot..
Sign up or log in Sign up using Google. Sign up using Facebook.
Sign up using Email and Password. Post as a guest Name. Email Required, but never shown.
How to Embed PDF Document in HTML Web Page Using Embed - Ifrme - Swift Learn
How to create micro-interactions with react-spring: Part 1. This week, StackOverflowKnows syntactic sugar, overfit or nah, and the…. Featured on Meta.
The HTMLCollection Object
Thank you, Robert Cartaino. Change in roles for Jon Ericson leaving SE. Has Stack Exchange rescinded moderator access to the featured tag on Meta?
How do the moderator resignations affect me and the community? Linked Related Hot Network Questions. Question feed.