this post was submitted on 10 Nov 2024
14 points (93.8% liked)

Collection of stories about useful scraper robots

38 readers
1 users here now

This community collects stories about robots harvesting data from the net (typically the web) and performing a beneficial service.

Related:

founded 1 week ago
MODERATORS
 

In Belgium real estate listings mostly omit addresses. This makes it extremely annoying for consumers looking to either buy or rent because they are forced into engagement with the landlord/seller just to find out the address. Very time-wasting. You must register on a site and disclose your email address, then wait for someone to reply with the address (and often they do not, or they want to speak on the phone and hear your voice -- which can go badly if you don’t speak the local language)†.

The published listings tend to only disclose what approximate neighborhood the dwelling is in (useless for my needs because you have no way of knowing if it’s near a tram stop that is relevant). But there are some exceptions. Maybe ~5—10% of listings have an address. I decided to ignore the majority of listings and only consider those with an address. This meant in order to get a decent number of choices I had to scrape every single real estate site that covers my city to harvest just the listings with addresses.

Then I used a geocaching API to convert the addresses to GPS coords. From there, I scraped the public transport websites. For every address in my city the tool would grab all weekday public transport routes from every GPS fix, which includes trams and transfer times. Then it calculated the walk time on both sides to/from the tram stops on every route to derive the shortest door to door time.

I also wanted to be within a certain cycling time from the center of the city, to ensure I don’t get too far from the center. That was calculated using an API.

The tool also accounted for the usual filters, like budget. I ended up selecting the dwelling that was the shortest commute without deviating from the proxity to center constraint.

The only problem with my approach was that one listing used a fake address. So my tool trusted the addresses and some jack ass published bogus info that lead me to a place that was occupied and unavailable. When I called to say “where are you” he said “down the street.. I gave an address that was close but incorrect”.. WTF. It was far enough to screw up the public transport option.

Anyway, this would have been impossible to do without scraping all those websites. I had freedom and power that’s denied to all other consumers who are trapped in the UIs of the real estate sites. But the next time I need a dwelling, the tool is certainly broken due to how rapidly websites change and also how increasingly anti-bot they have become. I think when I built that tool it was during the last moment of time that the web was relatively open access.

Everyone is generally forced to look for a place close to work. But close in terms of straight distance does not translate into a short tram commute because the routes are chaotic. You could be somewhat close but need 2 or 3 transfers. One interesting thing I noticed was a dwelling on the complete opposite side of the city was reasonable because it was close to a train station with no need for transfers. Trains are the fastest with much fewer stops. Also, there are express buses (fewer stops) and normal buses. So intuition is too inaccurate.

† The point of contact is often a real estate agent or property manager who has many listings. So if you call or write to ask for an address of many listings, the same person sees all your requests and ignores all of them because they assume you are not serious. They think: what kind of person looks all over the place.. surely they only want to see one or two neighborhoods. So this bullshit blocks consumers from searching for a place to live in a way that accounts for public transport schedules. They want to force you to choose where to live based on everything other than the address.

no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here