PDA

View Full Version : ArabyBot???


WebSavvy
23-06-2007, 17:23/05:23PM
I have no idea what this "bot" is, but it's not a compliant bot. Never asked for robots.txt and hit 3,000+ files in my entire site 3-4 files every 1 second.

Not only that, it's lying about who it is! It's sending user_agent strings that it's FAST and Googlebot.

From my logs today:

Host: 209.85.31.217
ArabyBot (compatible; Mozilla/5.0; GoogleBot; FAST Crawler 6.4; http://www.araby.com;)

g1smd
23-06-2007, 18:12/06:12PM
Not a lot of information on this one, but a few useful snippets can be found:

http://www.google.com/search?num=100&q=ArabyBot



Seems to be fairly new.

Quadrille
23-06-2007, 18:29/06:29PM
Ouch!

38 true results; 131 with 'similar' included, 40+ of which seem to refer to one forum thread at WMW, and about 25+ to one other page at another site.

WebSavvy
23-06-2007, 18:48/06:48PM
Considering it's sending false user agent strings claiming to be Googlebot and FAST, and both of those programs are owned by Google, and Inktomi, respectively, wouldn't this fall under electronic fraud?

Seems both Google and Ink, could do something about it, no?

IncrediBILL
23-06-2007, 19:26/07:26PM
I've never seen it use an ArabyBot user agent just the "Mozilla/5.0 (compatible; FAST Crawler 6.3)"

However, the reverse dns is makbot1.araby.com

Their range of IP's is:

Maktoob.com Search EVRY-388 (NET-209-85-31-192-1)
209.85.31.192 - 209.85.31.255

Maktoob.com doesn't seem to work, but you can find out a bit more about them on http://www.maktoobgroup.com/

Our vision is to lead the Arab Internet World in providing innovative and leading edge Arabic and English community, communication, content and e-commerce services and solutions.

From there I found a description of Araby in English:

The first search engine in the world that offers advanced Arabic-language capabilities to users worldwide. Araby.com has been programmed to crawl and search through tens of millions of Arabic documents on the internet and index each result as it comes up. When given a search command, the most relevant search results is displayed through very advanced algorithms and relevancy criteria, unlike other directories in the Arab world that search through a limited database of sites.

I tried the search page and it generated a SERVER 500 ERROR, so it's good stuff.

FWIW, it's hosted at Everyones Internet which I block by default just because of all the junk they host.

WebSavvy
23-06-2007, 19:42/07:42PM
If it's limited to Arabic documents (supposedly) why's it crawling my ENGLISH-only website?

Thanks for the info, Bill. I'll just block their entire IP subnet right from server level.

I've already added it to by blocked bots list and just 5 mins ago checked my logs and it's still trying to access tons of pages even though it's getting served a 403 response.

g1smd
24-06-2007, 13:22/01:22PM
>> If it's limited to Arabic documents (supposedly) why's it crawling my ENGLISH-only website? <<

It has to crawl everything in order to then be able to select those that it actually needs to index.

IncrediBILL
24-06-2007, 15:45/03:45PM
Originally posted by g1smd
>> If it's limited to Arabic documents (supposedly) why's it crawling my ENGLISH-only website? <<

It has to crawl everything in order to then be able to select those that it actually needs to index.

While that may be true it doesn't explain why it just keeps coming back even though it found nothing in arabic the first 10 times it visited my site.

WebSavvy
24-06-2007, 16:53/04:53PM
That bot doesn't ask for robots.txt which means it's:

1. Not compliant.

That bot identifies itself as two search engine robots it is NOT; Googlebot & Fast -- which means the only thing it could be up to is:

2. Content theft.