| Crates.io | rieltor_parser |
| lib.rs | rieltor_parser |
| version | 0.1.4 |
| created_at | 2024-11-20 19:33:32.922443+00 |
| updated_at | 2025-03-28 22:12:39.099577+00 |
| description | A parser for extracting detailed apartment information from the rieltor.ua website's HTML. |
| homepage | https://github.com/denisinside/rieltor_parser |
| repository | https://github.com/denisinside/rieltor_parser |
| max_upload_size | |
| id | 1455187 |
| size | 1,386,844 |
Rieltor.ua Apartment Parser is a Rust-based tool for parsing apartment listings from the Rieltor.ua website. It extracts detailed information about apartment listings and converts it into structured JSON data, suitable for further analysis or integration with other applications.
This parser uses a custom-defined grammar (grammar.pest) to tokenize and extract apartment data from the Rieltor.ua HTML pages.
Input:
The parser processes either:
Tokenization:
The grammar file defines the rules to split apartment listings into tokens such as id, price, address, description, etc.
Example of token breakdown:
Output:
The parsed data is saved in a structured JSON format, which looks like this:
{
"id": "11639857",
"price": {
"price_number": 34000,
"currency": "Uah"
},
"address": {
"street": "Менделєєва вул.",
"house_number": "1111",
"city": "Київ",
"district": "Печерський р-н"
},
"characteristics": {
"room_count": 2,
"area": {
"total": 36.0,
"living": 15.0,
"kitchen": 17.0
},
"floor": 3,
"max_floor": 6,
"house_type": "Бетонно монолітний",
"room_planning": "Роздільне",
"state": "Дизайнерський ремонт",
"statistics": {
"renewed": "3 дні тому",
"published": "3 міс. тому",
"views": {
"total": 94,
"today": 1,
"yesterday": 1
}
}
},
"description": {
"advert_description": "Без комісії! Довгострокова оренда квартири...",
"details_description": "Будинок - Бетонно монолітний, в квартирі 2 кімнати..."
},
"permits": {
"premium_advert": false,
"short_period": false,
"commission": {
"commission_rate": 50,
"commission_price": null
},
"allow_children": true,
"allow_pets": false,
"bargain": false
},
"infrastructure": {
"subway_station": [
{
"name": "Наукова",
"line": "Green"
}
],
"landmarks": [],
"residential_complex": null
},
"rieltor": {
"rieltor_name": "Малишко Максим",
"rieltor_phone_number": "0991232323",
"rieltor_position": "Рієлтор",
"rieltor_agency": "SoftyMeow"
},
"photo": [
"https://img.lunstatic.net/rieltor-offer-1600x1200/offers/411/11/1/????.jpeg",
"https://img.lunstatic.net/rieltor-offer-1600x1200/offers/422/22/2/????.jpeg",
"https://img.lunstatic.net/rieltor-offer-1600x1200/offers/433/33/3/????.jpeg"
]
}
parse - Parses a single HTML file or URL and outputs JSON data.
Arguments:
<source> - Specify the path to the HTML or URL file to parse.<output> - The path for saving the parsed result in JSON file. The file name is optional: it can be automatically generated.Examples:
cargo run parse https://rieltor.ua/flats-rent/view/12345678 apartment.json
cargo run parse fetched_apartment.html
parse_list - Parses a HTML file or fetched HTML from URL with list of apartments and displays their contents.<source> - Specify the path to the HTML or URL file to parse.<output> - The path for saving the parsed result in directory. The directory name is optional: it can be automatically generated in project output directory.Examples:
cargo run parse_list https://rieltor.ua/poltava/flats-rent/?price_min=8750"&"price_max=15000
cargo run parse_list fetched_apartment_list.html
credits - Shows credits and authorship information.help - Displays this help information.The parsing logic relies on a well-defined grammar file (grammar.pest) to extract and organize data. The grammar breaks down HTML into tokens like price, address, characteristics, etc., for structured processing.
Here’s a visual representation of the grammar:
Apartment:
This project is intended solely for personal and educational use. It must not be used for commercial purposes or violate the terms of service of Rieltor.ua.
This tool is designed specifically for parsing the structure of Rieltor.ua's website. Changes to the website's structure may require updates to the parsing logic.