«

»

3 月
17

Detailed solves Robots.txt action to reach use met





無標題文件

SEO-Detailed solves Robots.txt action to reach use method-SEO Services


Search Engine Optimization

One: What is Robots.txt?

The reply that Baidu makes cites here, robots.txt is one must be put in the simple text file below root catalog, the letter that file name must be the ordinary form of a Chinese numeral entirely namely Robots.txt, state to the share that is visited by Robot does not consider in this website in this file, such, the part of this website or entire content need not be collected by search engine, perhaps appoint search engine to collect appointed content only.

2: Is Robots.txt specific how to use?

Action 1: Map of website of search spider capture guides in SEO, collect website page better.

The abroad such as Google\ Yahoo searchs engine to already supported the link in file of Sitemap of the demonstrate in Robots.txt file now, the position that your website map is in is signalled when the spider visits Robots.txt, collect the page of your website better at the spider with benefit. Use grammar is Sitemap: Http://www. ##.com/sitemap.xml(google) or Sitemap: Http://www. ##.com/sitemap.txt(Yahoo) . Among them you can use map document website cartography software is generated, or yourself writes a program to generate.

Action 2: Prohibit all content of your website perhaps appoint capture of all search spider catalog. Have in the actual combat that build a station so a few kinds of common particular cases:

The first kind of circumstance is to prohibit capture of all search spider any content of your website.

If my website just was passed to the server,debug on fictitious perhaps lead plane, but as a result of website page headline or the keyword had not optimized good, there was a website again outside outside catenary, but when still wanting to let search engine to collect, can prohibit any pages that all search engine will come to to collect you.

Here I cite a case of the opposite, 06 I built a certain website, used the content government program that weaves a dream, first time apply mechanically a pattern plate, added some of content to searched engine to refer to each with respect to agitato, was collected by search engine the following day, passed a few days to also give off hundreds of content again, but I looked for a more beautiful and pure and fresh pattern plate again later, changed to fell to make all pages afresh again, if here altered many times. Because the spider of each search is of the mother, website page often alters, especially of the principal property such as Title alter let her do not have safe feeling very much, produced serious distrust to the website, as a result my website page crossed 9 months to just restore to come over. So each stationmaster is on the website the line followed website fixed position to before the search is opened, must searching, after and be in been optimize, open to search engine again not late.

For example your website is your amative home with your lover only, merely you amuse oneself, and do not think by capture, for example your website is the website that company interior uses, it is completely secret content, do not need any pair of spider capture, perhaps hold the post of how again his particular case should prohibit of any search engine capture.

Prohibit all search engine collects a website the grammar of any pages is:

User-agent: *
Disallow: /

The 2nd kind of circumstance is to need to prohibit a few table of contents with specific website of capture of all search engine.
(1) the website is certain catalog is program list, do not have completely by capture necessary, to improve server performance, server resource is drained when avoiding to search capture, can prohibit capture of all search engine these catalog. (2) website part catalog is information of a few insiders or actually sensitive, the content of illicit close sex, prohibit searching engine capture. (3) the content below certain catalog is to collect the content that did not make any modification completely, this part content is to abound content only, but do not want to be collected by search engine, prohibit searching engine capture with respect to need at this moment. (a website that does before me for example, one part is the content that achieves formerly completely, in order to is searched of capture. One part content is complete collect and will be rich website content only, raise an user to experience, but do not want to let search engine to collect again consider as rubbish information and fall to the website authority, so this part catalog I am about screen searchs a spider! ) wait other condition a moment!

Prohibit capture of all search engine is specific the syntactic example of specific perhaps page is catalog:

User-agent: *
Disallow: / Plus/count.php
Disallow: / Include
Disallow: / News/old

If everybody has fun at, can go up newly to bamboo shadow cool breeze the Dianzhu2.com of the line goes examining the Robots.txt that issues me, there are a few specific example specifications inside.

Action 3: Prohibit some spider capture all content of your website.

Have here so a few kinds of circumstances, (1) you ever were fallen badly to had counterpoised by Baidu, had distained, humiliate passes, or you are the member that turns over Baidu alliance, should follow its break with thereby, should prohibit its capture any content of your website. (2) your website already NB must follow naughty like treasure, should prohibit in the round the page that Baidu collects you. Everybody can examine next Robots.txt that clean out treasure, clean out treasure to wait for an element because of commercial interest already screen of Bai D U – my drop, but because Baiduspider is of the mother, yun Shuai seeing a horse must follow an ET like, still collected with thickskinned clean out the content that treasure controls 1060 pieces. Everybody can input Site:(in Baidu search columnBelow Taobao.com) test and verify. (3) any other wanting that prohibit some search engine collects your website the circumstance of all content.

Prohibit some appoints search engine capture the grammar of any content is your website:

User-agent: Baiduspider
Disallow: /

Action 4: The search spider capture that allows to appoint only the content of your website.

As a result of us the discharge of the website basically comes from a few search engine basically greatly, you do not think abroad or other search spider of home, scampish spider comes server capture your website content, drain server resource thereby, this moment, this grammar acted well.

The grammar of the content of your website is the search spider capture that allows to appoint only:

User-agent: Baiduspider
Disallow:

User-agent: *
Disallow: /

Among them User-agent: Baiduspider Disallow: The spider of a few big searches that can allow you is listed come. Be in what this needs to remind particularly is, must write Robots.txt correctly, lest bring needless harm to the website. The spider of Baidu: The spider of BaiduspiderGoogle: Googlebot Tecent Soso:SThe spider of OsospiderYahoo: The spider of Yahoo SlurpMsn: Msnbot

Action 5: Prohibit capture of all search engine the specific type document that full specific perhaps list issues your website.

Prohibit all search engine allows capture webpage only, prohibit capture any pictures. Its grammar is:

User-agent: *
Disallow: .jpg$
Disallow: .jpeg$
Disallow: .gif$
Disallow: .png$
Disallow: .bmp$

If be to prohibit searching engine specificly only, so according to above introductory method, it is OK to will know spider name.

Action 6: Prohibit searching engine to show webpage snapshot in searching a result, and build index to the webpage only.

Its use a method to be:

Baidu support passes the Meta that installs a webpage, prevent to search engine to show the snapshot of the website. The method is as follows:

Want to prevent all search engine to show the snapshot of your website, label this yuan buy enters the <HEAD of the webpage please>Part: <meta Name= "robots" Content= "noarchive" > should allow other search engine to show snapshot, but prevent Baidu to show only, use the following number please: <meta Name= "Baiduspider" Content= "noarchive" > notes: This mark just prohibits the snapshot that Baidu shows this webpage, baidu can continue to construct index for the webpage, show webpage summary in searching a result. If be Google, it is < META NAME= “googlebot” CONTENT= “index, follow, noarchive” >

Final specification: Some friends may enable stationmaster log function, be taken with analysing a spider to climb and the user visits a circumstance, when the spider will seek Robots.txt document, if be not searched, the server also will record a 404 mistakes in the log, to reduce Log file, purify hash, suggest you add Robots.txt below website root catalog so, even if empty Robots file it may not be a bad idea.