Mirago  
LIENS ASSOCIÉS
Technology
The Query System
The Robot
The Media Manager
Natural Language
Technical Insight
Infrastructure









Accueil > Company > Technology > Technical Insight
Technical Insight

In 1965 Gordon Moore predicted that, in essence, computing power would double every couple of years. He would appear to have been right. It seems that Moore's Law doesn't just apply to hardware. The growth of the internet and consequentially the volume of data indexed by search engines such as Mirago is another strong candidate.

This exponential growth strongly influenced the design of Mirago's software and hardware systems. Key elements of the design are scalability, robustness, performance and simplicity.

Scalability primarily affects Q3, The Mirago Query System. This is the system which handles both the searching of web pages and information indexed though the Trusted Feed Program.

Scalability is achieved by dividing the index into multiple collections, each of which is sub-divided into multiple indexes. Individual collections serve as a means to implement differing update frequencies for different types of sites. Some may be updated daily, others weekly and so on.

Individual indexes correspond on a one for one basis with actual computers which search those indexes. Each server handles roughly the same number of documents and the results for any given query are drawn from the best returned by each query server. Query servers are PC class machines, each being equipped with quite a lot of memory and a standard hard disk.

As the number of documents increases, further servers can be added to cope with the increased capacity. Since each server operates independently of all others, there is no reduction in the performance of the query system as a whole.

Robustness of design takes account of the physical nature of systems. No matter how well specified a server might be, it can still fail. Dual redundant power supplies, RAID disk systems and so on are all still susceptible one way or another. Memory sticks and processors can still fail albeit very occasionally.

The distributed nature of Mirago's architecture, allows for individual servers to fail without noticeably degrading the performance of the system as a whole. Strangely it is the robots which impose greater stress on equipment than the query system and thus it is these which require greater maintenance.

Mirago handles millions and millions of queries. Performance is essential. Experience shows that searchers expect to receive the most relevant information for any query just about immediately. Whilst there are some factors beyond Mirago's control, such as the speed of a searchers connection to the internet, many other factors are entirely within the control of Mirago. The size of the results page is one and the length of time taken to complete a search is another.

The average query should execute on any given query server in less than 10 milliseconds. Many queries complete within 1 millisecond making timing somewhat difficult especially under Windows NT.

Since Mirago started to develop the original query technology, the costs, capability of, and availability of computing power have changed dramatically. Consider, for example, the reduction in cost of a gigabyte of memory. Consequently, the design of software created and used by Mirago has seen radical change in that time. These days, who would consider storing data in any medium other than memory where high speed access is desired?

Those with some experience of software engineering may be interested in the following:

The software itself is written entirely in the C Language for performance reasons. Whilst C++, Object Orientation and other more modern techniques are very valuable, they sometimes lack performance. MFC Objects such as the CString may be simple to use but are not suitable within the kernel of a search engine. It wouldn't be overstating things to say that even memory allocation with functions such as malloc is kept to the barest minimum. Similarly the thread synchronisation necessary for dynamic index updating is also as simple as possible.

Mirago's software design is really a lesson in simplicity which benefits both performance and reliability. An essential software engineering concept can be expressed as 'The fastest way to do something is not to do it at all'. This refers to the fact that no matter how optimized a piece of code might be, if it can be avoided completely, performance can only be improved. Finally, from the reliability perspective, rigorous handling of errors is essential. Physical systems can and do fail. Only by handling errors properly can graceful performance be maintained.

 
 

Annoncer sur Mirago Partenariat