Ok well tumblrs api structure bothers the hell out of me because they do not let you pass the username in as a parameter. You actually have to use it as a subdomain.
which is really stupid because that means if you are trying to crawl through a large list of users you need to resolve all of those subdomains, which puts extra load on both ends. And an even bigger issue is that tumblr lets usernames begin and end with a dash. So if my username was “mystupidusername-” my url would be mystupidusername-.tumblr.com which is not RFC compliant, and that means its an invalid domain. If you try to ping it from a server you will have issues trying to resolve the domain.
I spent an hour trying to figure out a way around this issue, turns out tumblr had a twitter wrapper which works like this
Its decent but doesn’t support any of the page, max_id, or since_id params, basically all they do is grab every tumblr users last 200 updates and cache it in their little twitter api wrapper. So its pretty much useless.
Then it dawned upon me… Such a simple idea… Every time you actually request a URL it needs to get resolved (you need to obtain the IP) then you make a request to that domain sending the required headers.
So all you actually need to do is resolve the tumblr.com domain and forge a fake Host name and you can use any invalid domain you want.