---
title: PDF to txt
author:
content:
done_by:
created_date: 2021-12-02 10:16:34
updated_date: 2021-12-02 10:16:34
---

## pdfminer.six

- link: [https://github.com/pdfminer/pdfminer.six](https://github.com/pdfminer/pdfminer.six)
- languages: Python
- intro: Pdfminer.six is a community maintained fork of the original PDFMiner. It is a tool for extracting information from PDF documents.
It focuses on getting and analyzing text data. Pdfminer.six extracts the text from a page directly from the sourcecode of the PDF.
It can also be used to get the exact location, font or color of the text.

## pdf-extract

- link: [https://github.com/CrossRef/pdfextract](https://github.com/CrossRef/pdfextract)
- languages: Ruby
- A tool and library that can extract various areas of text from a PDF, especially a scholarly article PDF. It performs structural
- analysis to determine column bounds, headers, footers, sections, titles and so on. It can analyse and categorise sections into 
- reference and non-reference sections and can split reference sections into individual references.