View Full Version : psycheclone
rooandsue
03-07-2006, 19:53/07:53PM
[Split from thread (http://www.ihelpyou.com/forums/showthread.php?s=&threadid=22608)]
Originally posted by savvy1
Many of the ones listed below are complete site rippers, downloaders, scrapers, malware bots, and so on.
I am a new to this... I was wondering what do these "bad bots" do that warrants blocking them (i.e. - what do "site rippers, downloaders, scrapers, and malware bots" do?) I just saw that someone from Digital Infinity Ltd. had accessed my site, so I googled the name and came across this posting. My site is primarily for posting family pics and videoclips. Do I need to be worried about these "bad bots?"
Thanks,
Sharon
Connie
03-07-2006, 21:24/09:24PM
Hi Sharon and welcome. :hi:
I moved your post to this thread. The one you asked questions in is primarily for the listing of bad bots.
To answer a few of your questions.
Bad bots eat up your bandwidth which can slow your site down, or cost you more money in hosting fees.
They will crawl files that you do not want crawled.
They search your site for email address which is one reason any site that has a email address on the site gets so many spam emails.
A bot that scraps you site will include your content on another site.
WebSavvy
03-07-2006, 23:19/11:19PM
Hi Sharon, and welcome to IHY :hi:
As Connie said, that's exactly what "bad bots" do, and yes, it's something you need to be concerned about when running a website (especially if you don't want your site ripped off).
Scrapers will copy the content, and put it up on their site without any mention of you, nor credit to you for your work. They rebrand it as their own and usually slap adsense ads all the way around it to make money off of your work.
Dowloaders, will download your entire site and use parts (or all) of it in their own site, or for some distribution, etc.
Malware bots look for email addresses on your site to harvest them, or they look for files that can be exploited in order to allow them to be able to hack into your server.
Bad bots means specifically web robots that are of an automated nature that transverse the files on your website and do not follow the directives you've set out in your robots.txt file.
All bots are supposed to obey robots.txt protocol. Those that do not, are "rogue" bots (e.g., bad bots) and specifically ones that you may wish to deny access.
rooandsue
04-07-2006, 00:52/12:52AM
Thank you for the explanation! I feel like there is so much to keep up with when maintaining a site. Every little tid bit of information helps!
Sharon
WebSavvy
04-07-2006, 01:03/01:03AM
I just found this on Digital Infinity:
http://blog.cihar.com/archives/2006/06/13/i_must_be_popular_in_digital_infinity/
Seems they've been hitting a lot of "image" sites lately. In order to have this amount of widespread downloads being reported, it has to be a bad bot.
See if you can locate the referer info from your log files? If you're able to find it there, post it to this thread.
I'll see what else I can find out, and then add the lines to block it also.
Blue
04-07-2006, 14:42/02:42PM
Bad bot! BAD bad bot!!!
rooandsue
04-07-2006, 20:54/08:54PM
This is all there is in my log about this particular bot. I apologize for posting it all, but I didn't know exactly what you were looking for. These pages really don't compose that much of my site. Most of my pages are not linked to the homepage so that I don't have too many random people looking at my family site. Maybe you could tell me what you are looking for.
208.66.195.11 - - [01/Jul/2006:16:51:23 -0700] "GET /robots.txt HTTP/1.1" 404 510 "-" "psycheclone"
208.66.195.11 - - [01/Jul/2006:16:51:25 -0700] "GET / HTTP/1.1" 200 7383 "-" "psycheclone"
208.66.195.11 - - [01/Jul/2006:16:51:27 -0700] "GET /contactus-06.htm HTTP/1.1" 200 8738 "-" "psycheclone"
208.66.195.11 - - [01/Jul/2006:16:51:28 -0700] "GET /index.htm HTTP/1.1" 200 7383 "-" "psycheclone"
208.66.195.11 - - [01/Jul/2006:16:51:29 -0700] "GET /dearfriends-06.htm HTTP/1.1" 200 11558 "-" "psycheclone"
208.66.195.11 - - [01/Jul/2006:16:51:30 -0700] "GET /robby-06.htm HTTP/1.1" 200 10468 "-" "psycheclone"
208.66.195.11 - - [01/Jul/2006:16:51:31 -0700] "GET /sharon-06.htm HTTP/1.1" 200 10686 "-" "psycheclone"
208.66.195.11 - - [01/Jul/2006:16:51:32 -0700] "GET /littlerobby-06.htm HTTP/1.1" 200 10200 "-" "psycheclone"
208.66.195.11 - - [01/Jul/2006:16:51:33 -0700] "GET /jenna-06.htm HTTP/1.1" 200 10032 "-" "psycheclone"
208.66.195.11 - - [01/Jul/2006:16:51:34 -0700] "GET /adoption-story.htm HTTP/1.1" 200 11445 "-" "psycheclone"
208.66.195.11 - - [01/Jul/2006:16:51:34 -0700] "GET /photo1-06.htm HTTP/1.1" 200 10266 "-" "psycheclone"
208.66.195.11 - - [01/Jul/2006:16:51:35 -0700] "GET /ourfamilies-06.htm HTTP/1.1" 200 8811 "-" "psycheclone"
208.66.195.11 - - [01/Jul/2006:16:51:36 -0700] "GET /familyquiz-06.htm HTTP/1.1" 200 7333 "-" "psycheclone"
208.66.195.11 - - [01/Jul/2006:16:51:36 -0700] "GET /photo2-06.htm HTTP/1.1" 200 10564 "-" "psycheclone"
208.66.195.11 - - [01/Jul/2006:16:51:37 -0700] "GET /photo3-06.htm HTTP/1.1" 200 10576 "-" "psycheclone"
208.66.195.11 - - [01/Jul/2006:16:51:38 -0700] "GET /photo4-06.htm HTTP/1.1" 200 10488 "-" "psycheclone"
Thanks,
Sharon
WebSavvy
05-07-2006, 00:24/12:24AM
Hi Sharon, thanks.
What I was hoping to find from your logs was the user-agent string identifier used by Digital Infinity.
It's impossible to block a bad bot unless you have the exact user-agent string used by the bot to identify itself.
I'll do some research and see what I can locate. Once I find the user-agent string for this particular bot, I'll add that to our blocked bad bots list in the other thread.
In the meantime, you have psycheclone at your site and it's a site ripper. You might want to block that from your site using the .htaccess codes (in the other thread).
If you're not sure how to do this, ask here and we'll help you with it.
If you're not comfortable with adding the codes, I can do it for you if you'd like me to?
I've done this for a few members here, as well as logged in and fixed their mod_rewrite, or their 301 in .htaccess.
All I'd need is logins to access your CPanel to edit the .htaccess file. This could be sent via PM.
Hope this helps.
Connie
05-07-2006, 09:05/09:05AM
psycheclone was in my june logs too.
WebSavvy
07-07-2006, 09:54/09:54AM
Hey Connie, and Sharon ... don't know why this slipped my mind but, Digital Infinity is the psycheclone bot. See Gio's post here (http://www.ihelpyou.com/forums/showthread.php?s=&postid=234276#post234276).
Connie
07-07-2006, 10:26/10:26AM
I came across this discussion yesterday at WMW about psycheclone www.webmasterworld.com/forum11/3269.htm
vBulletin® v3.8.3, Copyright ©2000-2010, Jelsoft Enterprises Ltd.