Topic: Updating large projects
Due to the fact that many of our customers are having problems updating large projects, I decided to post this information here so that everyone can see it before posting a question related to it.
Google, Yahoo and other search engines have implemented a system to protect themselves from automated querying. There is no way to get past this system, other than play nice, regardless of what tool you are using. And we can't do anything about that, because it is implemented on their servers.
To comply with this requirement, Advanced Web Ranking has a "human emulation" feature that adds a random delay between queries so that you will not get banned by the search engines. This does not mean our application is slow. These delays are made on purpose, and you can see them in the Performance tab of the application options.
You can disable or decrease these delays (although I strongly recommend against this) and you can see for yourself how fast the application can be. Especially if you increase the number of simultaneous connections. But you will soon see that the search engines are no longer returning results to you, because your IP address has been banned.
My point is that these delays are there to help you. Without them, you can't query the search engines reliably.
Please allow me to show you some facts.
Let's assume the delay between queries accepted by Google is 15 seconds. If we divide the number of seconds in a day by 15, we get 5,760. This means that the maximum number of times we can query the Google search engine in a day is 5,760 times. If we have the search depth set to 5 pages, we further divide this number by 5 and we get a maximum of 1152 keywords that we can check per day.
Yahoo for example, is more restrictive. The minimum amount of seconds between queries is set to 25 by default. Doing the same calculation as above, we only get to check 691 keywords per day.
Not to mention that if you have two Yahoo search engines (Yahoo US and Yahoo UK for example) in your project, the application will only query one search engine at a time, thus dividing the number of keywords you can check in a day by 2.
Let's take an example. Let's suppose we have a project that contains about 1000 keywords, the Google and Yahoo search engines and a search depth of 5 pages. Using the calculus above, it will take more than 1 day to completely update our project.
Not very impressive you would say! But is the Advanced Web Ranking application the bottleneck here? I believe not. And all other similar applications present the same problem because of the way the protection system is implemented for each search engine.
The good news is that Advanced Web Ranking has a solution to this problem. You can speed up the updating of your project by querying the search engine from multiple IP addresses.
There are two ways to do that:
1) use multiple proxy servers (each with its own IP address) to make the queries
2) split the project into smaller ones and update them from different computers running Advanced Web Ranking, all connected to a centralized database using Advanced Web Ranking Server.
For more information about Advanced Web Ranking Server please see:
http://www.advancedwebranking.com/server.html
The reason why this works is because each computer has a different IP address, which tells the search engine that it's a different user. Thus you can divide the time it takes to update a number of keywords by the number of computers that are updating.
You can also make use of the APIs that Google and Yahoo provide, although they may limit you on the amount of queries you can do per day.
I hope the above information helps understand why these delays are needed how one can overcome this issue. Any comments or suggestions related to this issue are welcome.
Philip