View Full Version : Mystery clicks
harv
15-09-2001, 16:05/04:05PM
I've been puzzled for some time by the
exceptionally high number of hits to
a particular web page at my site,
which cannot be accounted for by the
searches made for it at the pay per click
search engines - the only place
where it can be accessed (it is not linked
from my site)
I've looked at the log files and a pattern
has emerged. In the case when the page is
not being searched upon with keywords the
source of the hit has an address
beginning with 63.104.
eg.
63.104.196.106
63.104.194.138
In fact these addresses only ever appear with
this page.
Any ideas please ?
Thanks
Harvey
NB. If any technical expert wants to investigate
futher I'll supply the Url and the log files.
ihelpyou
16-09-2001, 00:11/12:11AM
That is a back bone provider on the net called UUNet Technologies. They will ping different domains and servers from time to time of everyone on theri network. Every back bone provider will do this. It is also possible they send out a spider.
harv
16-09-2001, 05:11/05:11AM
Hi Doug,
But why are they consistently picking on just the
one page out of several hundred - and one which
is not even attainable from my home page
(it only appears in the pay per click engines)
Harvey
Spider Man
16-09-2001, 06:25/06:25AM
Does sound like an uptime monitor. Do the log files say it's a HEAD or GET request? and any referrer?
harv
16-09-2001, 07:18/07:18AM
Chris,
Here is a typical log entry
63.104.196.106 - - [02/Sep/2001:10:33:19 -0400] "GET /fw/368.htm HTTP/1.0" 304 - "-" "Mozillia/4.75 [en] (Windows 5.0; U)"
Harvey
ihelpyou
16-09-2001, 09:42/09:42AM
That IP resolves back to UUnet who is a backbone provider. I do think that for whatever reason, they are pinging that page. Hard to speculate as to why.
Spider Man
16-09-2001, 11:13/11:13AM
Well..looky at that. range 63.64.0.0 - 63.127.255.255 is indeed UUNet. Within that 63.104.192.0 - 63.104.199.255 seems to belong to Novell.
I've never seen the user agent before (even if Mozillia is Mozilla) and at that version it should probably be using HTTP/1.1 and not HTTP/1.0.
So it almost certainly is a program making the request - one that wants to appear to be a browser. Whilst it could be a backbone provider pinging, why would they do it with a fake user agent? More than likely right on the (experimental) spider theory to me...but we can only guess.
Personally, if it was my site I'd set htaccess to deny 63.104.192, 63.104.193,....63.104.199. But then i'm a scaredy cat. Oh yeh, and their Certified Novell exams are way too hard and it would be vengeance. :-)
harv
16-09-2001, 12:15/12:15PM
Two other items of information which may be of use
1) This 'pinging' occurs about 40 times a day
2) This page is only visible at 3 pay per click
search engines with one of the keywords
being "languages"
It is not linked from my site - this is typical
of over 100 other pages at my site.
One of these pay per click engines is Bay9.
The number of clicks I receive from Bay9 is woeful.
In one month it will typically be 0,1 or 2
hits at most for any keyword amongst all those
100 pages.
With one exception: Last month the number of
clicks on "languages" from Bay 9 was 44 !!
Coincidence ?
Spider Man
16-09-2001, 13:43/01:43PM
I don't know what click stats bay9 give so I do stress that i'm only speculating. This is also theoretical and just one way things could have happened in this situation (think i've protected myself there...you get the point). However, in light of the jump I would be inclined to find out from Bay9 the IP addresses of who was clicking. I would also never myself pay per click with them having found out the following:
# BEING SPECULATION
Assume that there is a rogue search engine. And assume that this search engine ignores robots.txt. Next assume that somebody has placed the link:
http://www.bay9.com/cgi-local/search.cgi?keyword=languages&link=&clicktrade=
somewhere on one of their pages and that the search engine has found it. Now if the search engine follows it...then bay9's site does have a robots.txt:
User-agent: *
Disallow /cgi-local/
Disallow /bannerserv/
Which is why we assumed it was a rogue search engine that - the rogue ignores this (robots.txt is not set in stone, its just a request). It will get a collection of links to go to similar to:
/cgi-local/click.cgi?owner=xxxxxx&keyword=Languages&link=&clicktrade=
Now as I can telnet through to bay9's web server and just input a GET request manually and get the redirect page (I owe somebody 1 cent by the way :-) ) then it appears that bay9 don't do any user agent checking since i didn't put one in (so they don't check it's a browser) - they simply rely on the robots.txt (note, i can't confirm whether the account is charged or not but there's no indication to say it isn't). Ouch.
So then the rogue robot takes the address from the redirect page and puts it in it's submission queue. Eh voila, it has your page.
Once it has your page, of course it can revisit it or the page that linked to it as often as it wants - From observing and writing my web crawler - if the database is small it tends to be more often.
# END SPECULATION
# BEGIN MAKING MENTAL NOTE NOT TO PAY PER CLICK WITH BAY9
harv
16-09-2001, 14:57/02:57PM
Hi Chris,
Thanks for your help in investigating this
- as a result I am going to let you off
that one cent you owe me.
I can grasp some - not all - of your
explanation. I will contact Bay9, but just
to clarify the position:
According to my logs
* The Bay9 entry was validly searched for
on the keyword "languages" 44 times last month
ie. about 1 or 2 times per DAY, and by
normal users (ie. not 63.104...)
And this is what I have been charged for on
my Bay9 account
* Access to that page was made by the Novell
agent some 40 times a DAY - and I have not
been charged for these hits.
So I am not concerned that these mystery clicks
are costing me, the key issues are
* Why is this page so popular on Bay 9
* Why is it being accessed by the Novell agent
And of course it seems that the two may be connected.
Spider Man
16-09-2001, 16:26/04:26PM
So whatd'ya know - i pick a page at random and it happens to be yours! There goes the FREE bit :-)
No. I haven't been too clear. Sorry. What I'm saying is that from my looking it appears the Bay9 PPC system has no adequate protection against a search engine crawler following the links if that crawler choses to ignore the robots.txt file. i.e. That is one potential way that this page could have been found. Obviously the same probably goes for the other two ppc engines you mentioned. In a worse case scenario it could keep coming back and following the links (but you're clearly okay on that), in a better scenario it'll just come through once (i.e. one address in the logs or in the logs from the other engines and from that range tells you). no crawler should ever need to follow a link twice but it might, hence i'd never use this PPC. If the address is there at all then that's how the page has been found.
There are numerous reasons why more people might have clicked on your PPC link. It is not necessarily and in my opinion probably not related to how they found your page.
As for being accessed by the novell agent. It will be hard to tell why they are doing it. e.g. test search engine, uptime monitor, web page change monitor etc. I'd just block them in your htaccess and hope they give up.
If you give me the log files i'll take a look and see if anything can be got from them but i'd have thought it'd be unlikely.
JuniorHarris
17-09-2001, 11:50/11:50AM
Any possibility this could be related to a bid monitoring tool?
vBulletin® v3.8.3, Copyright ©2000-2010, Jelsoft Enterprises Ltd.