JOURNAL

Kamis, 20 Mei 2010

Gigablast


Gigablast versi pertama dilaunching pada tahun 2002. Secara cepat dikenal sebagai search engine yang sangat efisien. Dalam 2 tahun terakhir Gigablast telah mengalami revamping, redesigning, dan re-architecting pada fungsi search-nya dan di “re-launch” pada tahun 2008 dengan tujuan menyediakan kepada user sebuah serach engine yang high-quality dan highly-relevant. Berikut ini adalah screesshotnya:

Inovasi yang ada dalam GigaBlast antara lain:

* Hyper Scalable - Scales to 200 billion full pages and 100,000 servers.
* Efficient - Uses very few computers to support a huge index and a large number of queries per second.
* Simple Interface - Get the search results via an XML feed
* Web Directory - Allows searching of all the sites in a particular topic, not just the pages. All directory pages can be returned through the XML feed, too.
* Gigabot - Gigablast’s fast and feature-rich spider is highly configurable.
* Document Injection - Bypass the spider and inject your document directly into Gigablast using simple HTTP POSTs or GETs
* Real-Time - URLs are indexed in real-time. Link analysis is done on the fly.
* Intelligent Update - Determines the update cycle of each document and tries to spider it at that frequency.
* Duplicate Detection - Spider can detect and discard duplicate web pages.
* Dual Mode - Uses idle cycles to spider and index documents, but will quickly yield resources to handle incoming queries.
* Maintainable - Comprehensive web-based GUI controls make it easy to administer.
* Spam Protection - Features a large array of anti-spam tools and algorithms used to keep spam out of the index.
* Document Cache - Has a cache to hold user-viewable copies of the pages it spiders and indexes. Obeys nocache meta tags as specified.
* Historic Cache - Spider can hold documents in the index long after they are 404.
* HTTPS support - Can spider and serve HTTPS pages.
* robots.txt - The Gigablast spiders support the robots.txt standard as well as certain related meta tags.
* Multiple Formats - Indexes PDF, Microsoft Word, Power Point, Excel, and Postscript documents. Supports user-definable filters.
* Dynamic Summaries - Search result summaries are generated so that they contain the query terms.
* Term Highlighting - Performs query term highlighting on the view of cached pages and on the dynamic summaries.
* Robust Query Syntax - Features many different field searches, + and - operators.
* Family Filter - Removes pages with undesirable content from the search results.
* Sort by date - Sorts search results by date, very fast and with high accuracy.
* Query Refinement - Search within a set of search results.
* Advanced Search - Allows users to perform power searches quickly and easily.
* Site Clustering - Can optionally cluster away results from the same web site, so the list of search results is not dominated by any one site.
* Fuzzy Deduping - Automatically removes search results that are X% similar to an above result, where X is adjustable.
* Query Weighting - Custom weight the query terms exactly how you want.
* Super Recall - Returns extra results which only have some of the query terms.
* Spell Checking - Performs spell checking based on a dictionary that is constructed from the index. Add your own words and phrases, too.
* Huge Document Support - Index documents that are hundreds of Megabytes in size.
* Huge Result Sets - Receive hundreds of thousands of results per page.
* Related Topics - Dynamically generated on a per query basis. (aka GigaBits)
* Reference Pages - Generates sets of expert web sites which contain lists of links relevant to the query.
* Custom Topic Search - Constrain searches to a list of up to 500 sites.
* Default AND Capable - Can easily limit search results to only pages that have all the query terms.
* Boolean Queries - Supports complex nested boolean queries using AND, OR and NOT operators.
* Turing Test - Uses simple Turing test to prevent real-time addurl abuse.
* Redundancy - If one server goes down then its twins take over for it.
* Error Correction - Corrupted data is automatically detected and patched from a mirror host.
* Load Balancing - Gigablast intelligently distributes load evenly among all hosts in the network.
* Collections - Allows the administrator to partition the index into many sub indexes.

Tidak ada komentar:

Poskan Komentar

Thank You