Spiders, bots and crawlers
Seach Engine Spiders are software programs developed by the search engines that crawl the web for websites just to add them to their database. Spiders “read” your meta tags and see how far they are allowed to crawl into your website. (index-follow tag).
A spider or web crawler is a program that visits Web sites and reads their pages and other information in order to create entries for a search engine index. The major search engines on the Web all have such a program, which is also known as a “crawler” or a “bot.” Spiders are typically programmed to visit sites that have been submitted by their owners as new or updated. Entire sites or specific pages can be selectively visited and indexed.
Well known bots
Spiders are called spiders because they usually visit many sites in parallel at the same time, their “legs” spanning a large area of the “web.”
Spiders can crawl through a site’s pages in several ways. One way is to follow all the hypertext links in each page until all the pages have been read.
- GoogleBot.
Googlebot is obviously one of the most popular web crawlers on the internet today as it is used to index content for Google’s search engine. - Bingbot.
- Slurp Bot.
- DuckDuckBot.
- Baiduspider.
- Yandex Bot.
- Sogou Spider
- Exabot.
The List
The list of other search engine spiders you can expect to find in your log-file:
1.Acme.Spider
2.Ahoy! The Homepage Finder
3.Alkaline
4.Arachnophilia
5.ArchitextSpider
6.Aretha
7.ASpider (Associative Spider)
8.Atomz.com Search Robot
9.AURESYS
10.BackRub
11.Big Brother
12.Bjaaland
13.BlackWidow
14.Die Blinde Kuh
15.bright.net caching robot
16.BSpider
17.CACTVS Chemistry Spider
18.Calif
19.Cassandra
20.Digimarc Marcspider/CGI
21.Checkbot
22.churl
23.CMC/0.01
24.Combine System
25.Conceptbot
26.Web Core / Roots
27.CS-HKUST WISE:
28.Cusco
29.CyberSpyder Link Test
30.DeWeb(c) Katalog/Index
31.DienstSpider
32.Digital Integrity Robot
33.Direct Hit Grabber
34.DNAbot
35.DownLoad Express
36.DragonBot
37.DWCP (Dridus’ Web Cataloging Project)
38.EIT Link Verifier Robot
39.Emacs-w3 Search Engine
40.ananzi
41.Esther
42.nzexplorer
43.Felix IDE
44.Wild Ferret Web Hopper #1, #2, #3
45.FetchRover
46.fido
47.HSmShSkki
48.KIT-Fireball
49.Fish search
50.Fouineur
51.Robot Francoroute
52.Freecrawl
53.FunnelWeb
54.gazz
55.GCreep
56.GetBot
57.GetURL
58.Golem
59.Googlebot
60.Grapnel/0.01 Experiment
61.Gromit
62.Northern Light Gulliver
63.HamBot
64.Harvest
65.havIndex
66.HI (HTML Index) Search
67.Wired Digital
68.ht://Dig
69.HTMLgobble
70.Hyper-Decontextualizer
71.IBM_Planetwide
72.Popular Iconoclast
73.Ingrid
74.Imagelock
75.IncyWincy
76.Informant
77.InfoSeek Robot 1.0
78.Infoseek Sidewinder
79.InfoSpiders
80.Inspector Web
81.IntelliAgent
82.Iron33
83.Israeli-search
84.JCrawler
85.Jeeves
86.Jobot
87.JoeBot
88.The Jubii Indexing Robot
89.JumpStation
90.Katipo
91.KDD-Explorer
92.Kilroy
93.KO_Yappo_Robot
94.LabelGrabber
95.LinkScan
96.LinkWalker
97.Lockon
98.logo.gif Crawler
99.Lycos
100.Mac WWWWorm
101.Magpie
102.MediaFox
103.MerzScope
104.NEC-MeshExplorer
105.MOMspider
106.Monster
107.Metaspider
108.Muscat Ferret
109.Mwd.Search
110.NetCarta WebMap Engine
111.NetMechanic
112.NetScoop
113.newscan-online
114.NHSE Web Forager
115.Nomad
116.The NorthStar Robot
117.Occam
118.HKU WWW Octopus
119.Orb Search
120.Pack Rat
121.PageBoy
122.Patric
123.The Peregrinator
124.PerlCrawler 1.0
125.Phantom
126.PiltdownMan
127.Pioneer
128.html_analyzer
129.Portal Juice Spider
130.PGP Key Agent
131.PlumtreeWebAccessor
132.GetterroboPlus Puu
133.The Python Robot
134.RBSE Spider
135.Resume Robot
136.RoadHouse Crawling System
137.Road Runner
138.Robbie the Robot
139.ComputingSite Robi/1.0
140.Roverbot
141.SafetyNet Robot
142.Scooter
143.Search.Aus-AU.COM
144.SearchProcess
145.Senrigan
146.SG-Scout
147.Shai’Hulud
148.Sift
149.Simmany Robot Ver1.0
150.Open Text Index Robot
151.SiteTech-Rover
152.Inktomi Slurp
153.Smart Spider
154.Snooper
155.Solbot
156.Spanner
157.SpiderBot
158.SpiderMan
159.Spry Wizard Robot
160.Site Searcher
161.Suke
162.Sven
163.TACH Black Widow
164.Tarantula
165.tarspider
166.Tcl W3 Robot
167.TechBOT
168.Templeton
169.TitIn
170.TITAN
171.The TkWWW Robot
172.TLSpider
173.UCSD Crawl
174.UdmSearch
175.URL Check
176.URL Spider Pro
177.Valkyrie
178.Victoria
179.vision-search
180.Voyager
181.VWbot
182.The NWI Robot
183.W3M2
184.the World Wide Web Wanderer
185.WebBandit Web Spider
186.WebCatcher
187.WebCopy
188.webfetcher
189.The Webfoot Robot
190.weblayers
191.WebLinker
192.WebMirror
193.The Web Moose
194.WebQuest
195.Digimarc MarcSpider
196.WebReaper
197.webs
198.Websnarf
199.WebSpider
200.WebVac
201.webwalk
202.WebWalker
203.WebWatch
204.Wget
205.WhoWhere Robot
206.w3mir
207.WebStolperer
208.The Web Wombat
209.The World Wide Web Worm
210.WWWC Ver 0.2.5
211.WebZinger
212.XGET
213.Nederland.zoe