--- title: PDF to txt author: content: done_by: created_date: 2021-12-02 10:16:34 updated_date: 2021-12-02 10:16:34 --- ## pdfminer.six - link: [https://github.com/pdfminer/pdfminer.six](https://github.com/pdfminer/pdfminer.six) - languages: Python - intro: Pdfminer.six is a community maintained fork of the original PDFMiner. It is a tool for extracting information from PDF documents. It focuses on getting and analyzing text data. Pdfminer.six extracts the text from a page directly from the sourcecode of the PDF. It can also be used to get the exact location, font or color of the text. ## pdf-extract - link: [https://github.com/CrossRef/pdfextract](https://github.com/CrossRef/pdfextract) - languages: Ruby - A tool and library that can extract various areas of text from a PDF, especially a scholarly article PDF. It performs structural - analysis to determine column bounds, headers, footers, sections, titles and so on. It can analyse and categorise sections into - reference and non-reference sections and can split reference sections into individual references.