Share
Home  :  Available technologies  :  Software  :  Perilog

Technologies

Perilog applications

Perilog is ideal for mining large databases of information, such as airline safety incident reports and related data. The software is a staple of the Aviation Safety Reporting System, administered by NASA.

video link
See Perilog in action

 

Perilog

A contextual search method for improving search results

NASA’s Ames Research Center offers for license its patented Perilog software, a contextual search method that provides a simple-to-use means of finding and ranking text documents according to their relevance to particular words or phrases. Rather than simply finding documents that contain particular words or phrases, Perilog overcomes the shortcomings of other search methods by discovering contextual associations between words and phrases. Perilog’s ability to automatically identify contextual associations in a document set enables conceptual and semantic search without the need to maintain categorization for the documents. Users are quickly able to identify related topics, even if those topics do not co-occur in the same document and the user has no prior knowledge of the documents.

Perilog can be used for a wide range of conceptual search and semantic search applications—in knowledge management systems, as enhancements or add-ons to commercial search engines, and for contextual advertising solutions.

For more information about this licensing/development opportunity, please contact:

Technology Partnerships Office
NASA's Ames Research Center
http://www.nasa.gov/ames-partnerships

› Read our blog article about how this technology is keeping our skies safe.(link opens new browser window)

 

Return to Top

Benefits

  • Reliable results: Delivers more relevant search results, with fewer queries by the user, compared to other search engine technologies
  • Contextual relevance: Enables users to discover words, ideas, and situational details that are contextually associated with a specific query
  • Intelligent search: Allows users to discover key themes in large document sets, with no prior knowledge of the documents
  • Efficient classification: Eliminates the time and expense associated with maintaining document categorization (i.e., ontology), delivering semantic and conceptual search results
Return to Top

Applications

Perilog can enhance many search-related applications, including:

  • Large knowledge management and document retrieval systems, for legal research, market research, intellectual property asset management, claims management, etc.
  • Life sciences and medical research Intelligence analyses
  • Commercial search engines
  • Contextual online advertising solutions 
Return to Top

Technology Details

Perilog's underlying algorithm is based on the theory of experiential iconicity, which states that patterns of relatedness among things in the world of experience systematically influence patterns of relatedness among words in written discourse. Perilog's ability to deliver semantic and conceptual search results through an automated algorithm, without the need to rely on natural language processing or manually (or semi-manually) maintained categorization, follows directly from the theory of iconicity.


How it works

Perilog delivers key phrases as highlighted sections from a collection of narratives, as show above.

Perilog measures the degree of contextual association of large numbers of term pairs in text to produce network models that capture the structure of the text and, by virtue of Perilog's validated theory of iconicity, the structure of the domain(s), situation(s), and concern(s) expressed by the author(s) of the text. In fact, given alphanumeric representations of any other sequences in which context is meaningful—such as music or genetic sequences—Perilog can derive their contextual structure.

Operating on a document set (i.e., corpus) or a single document, Perilog creates a network model of contextually related words and phrases. When a user enters a keyword or key phrase search, Perilog creates a query network of “topical hubs,” based on the query words input by the user. Phrases may be of any number and length. Each phrase is represented by a network, and these networks are combined into a single query network.

By matching the phrase query network with document networks, Perilog's phrase search provides flexible and thorough phrase matching that is unavailable with other methods. Instead of the keyword search being limited to the query words alone, Perilog uses the relationships of keywords within their contextual associations to find documents in which those relationships are significant.

return to top of tech details section


Key features and methods

Perilog’s key features and methods encompass text analysis, modeling, relevance-ranking, keyword and phrase search, phrase generation, and phrase discovery:

Text Analysis: The process converts bodies of text to sequences of terms and measures the contextual associations among them. This determines the structure of text as a way of measuring the structure of the domains and situations represented by the text. Terms that are contextually related in the structure of text are considered contextually related in the world represented by the text.

Modeling: Each Perilog model consists of a network of contextually associated terms. A Perilog model can represent any body of text, from an entire database to a short phrase, and it can represent any domain, sub-domain, situation, situational detail topic, or subtopic.

Relevance Ranking: Perilog quantifies the similarity of any two models by comparing their paired terms and contextual measurements. One model’s features are used as relevance-ranking criteria and compared to a collection of models, enabling the models in the collection to be ranked according to their relevance to the criteria. By ranking a collection of models on every model in the collection, an association matrix can be created to provide data for input to clustering methods.

Keyword and Phrase Search: Perilog retrieves from a user-specified database documents that contain one or more user-specified keywords or phrases in typical or selected contexts, and ranks the documents on their relevance to the keywords or phrases in context. The most relevant documents are automatically highlighted and displayed in a Web browser window, allowing the user to scroll through and review them.

Each of the documents is accompanied by a list of related words or phrases that contribute to the relevance of the document. Experienced users can refer to these relations to understand which features were interpreted as contributing to the relevance of the document. In some cases, this can lead the user to modify or fine-tune the search strategy.

Phrase Generation: To aid a search, Perilog can produce a list of phrases from the database that contain a user-specified word or phrase that can be used to suggest queries for phrase searches. To generate phrases, the user provides a word or phrase that is to be contained in each of the output phrases. Perilog builds phrases around this input, based on its phrase models. The resulting phrases are displayed on the computer screen or can be redirected to a file. The phrases are sorted based on an estimate of their prominence in the document set.

Phrase Discovery: Further aiding the relevance of search results, Perilog can find phrases that are related to topics of interest. For example, given a topic such as “fatigue,” Perilog can discover related phrases such as “rest period,” “reduced rest,” “duty period,” and “crew scheduling.” Phrase discovery can help users understand the variations and scope of topics in a document set. They also can be used selectively as input to a Perilog phrase search, enabling retrieval of documents that contain particular topical variations.

The first step in phrase discovery is to perform a keyword or phrase search. Next, phrases are automatically extracted from the most relevant documents. The phrases produced at this point may be useful, but further processing will improve the results. From these phrases, topical phrases are distilled by a combination of manual and automated methods. The refined set of topical phrases then can be used to query the database, using phrase search. The cycle of phrase extraction and search is repeated to produce a final set of documents. If documents relevant to the topic are available in the document set, this final collection of documents will be highly relevant to the topic.

The main product is a list of topical phrases that are extracted from the final collection of documents.

return to top of tech details section


Why it is better

Perilog overcomes many of the fundamental flaws that hinder more basic search engines:

  • Reliance on the “bag of words” model, in which all words are treated equally, and relevance determined only by the frequency with which a keyword or phrase appears in the text
  • Word ambiguity, or the inability to distinguish between different meanings of the same word (e.g., “bark”), producing ambiguous results for such queries
  • Term mismatch, when the search engine yields only a fraction of relevant results because users select the same term to describe an object less than 20 percent of the time  
  • Query drift, when automatic expansion of queries results in unexpected and incorrect results

Perilog helps improve search results by addressing each of these flaws, determining the contextual association of words and word pairs in documents.  

This network model enables Perilog to answer questions such as:

  • What are the most prominent topics in this document (even if the user has never seen the document before)?
  • What are the most relevant sections of the document regarding a particular topic?
  • What other topics are related most closely to this topic?


Perilog overcomes the shortcomings of traditional search methods by discovering
contextual associations between words and phrases.

Extensive manual intervention would be required to answer these questions using traditional search engines, as well as possible repeat searching and manual review of potentially relevant documents. In contrast, Perilog is extremely useful for research-intensive, open-ended queries because of its capability to make contextual associations across a network model of relevant text, and requiring no prior knowledge of the document contents. Likewise, Perilog is very useful when searching domains where terminology changes frequently—in such areas as life sciences, medicine, and intelligence. The ability to intuit contextual associations enables search engines to deliver on the promise of conceptual search and semantic search.

Perilog’s keyword and search retrieval capabilities are augmented by its keyword and phrase discovery features, which can be used to discover and extract keywords and key phrases from an existing document or document set. This enables examination of a source document with other contextually relevant target documents in a document set. This feature is similar, on the surface, to the “Find More Documents Like This” feature in other search engines, but without Perilog’s phrase discovery method, other search engines are still subject to the flaws described above.

In short, Perilog facilitates a move from traditional keyword retrieval search to full conceptual and semantic searching, yielding more complete and relevant results in less time and with less manual intervention.

return to top of tech details section


System requirements

Software and Operating System Requirements

  • Perilog is packaged to run on any Apple computer running Mac OS X 10.2.x (ideally 10.2.6+). OS X 10.3 is preferred.
  • Although this version of Perilog must be run on any Apple computer running Mac OS X 10.2 or higher, it can be accessed over a network from any browser-capable computer.
  • Mac OS X Developer Tools (also called Xcode Tools on Mac OS X 10.3) are also required.
  • Netscape is the recommended Web browser.

Hardware Requirements

  • Hardware demands depend on the size of the document collection.
  • Less than 100,000 pages of text can be processed easily on any recent Mac laptop or desktop computer.
  • For less than 100,000 pages, a G4 1 GHz processor with 500MB RAM is recommended.
  • Perilog supports RAID disk striping.
  • For more information, refer to the Perilog User Guide, Section 5.

return to top of tech details section


Frequently Asked Questions


Installing and Configuring

How long does the installation take?

Installation in a test environment typically can be accomplished in less than 1 hour by a person with modest computer skills. Accomplished system administrators can complete the installation is less than 30 minutes.

 Return to Top

Does Perilog come with a starter set of data?

Yes. The installation includes a set of 100 aviation safety incident reports and example searches.

 Return to Top

Does Perilog support web-based access?

Yes. Perilog supports remote access via a Web browser. Refer to the user guide, Appendix 4 for more information.

 Return to Top

Configuring Document Collections

What document formats does Perilog support?

Input to Perilog must be in plain text format, using the following structure:

NONWORD item_1
This is the text of item 1.
NONWORD item_2
This is the text of item 2. And more text here.
This is additional text for item 2.

where item_1 and item_2 are identifiers for the associated text. The identifiers can be file names, document titles, or even section titles or page references. For example, the identifier v01_ch06_pg121-item0001 could refer to volume 1, chapter 6, page 121. For more information, refer to Appendix 1 of the user guide. 

 Return to Top

Does Perilog support configurable stopwords?

Yes. Perilog supports an unlimited number of configurable stopwords for each document collection. For more information, refer to Appendix 2 of the user guide.

 Return to Top

Does Perilog support word substitutions?

Yes. Perilog supports an unlimited number of configurable word and phrase substitutions for each document collection. For more information, refer to Appendix 3 of the user guide.

 Return to Top

Understanding the Search Algorithm and Conducting Searches

What is a "speed search" versus a "depth search"?

The speed and depth options provide control over search detail and speed of processing. Speed search uses the low-resolution database that contains all of the documents but represents each document with only 10 QUORUM Perilog relations. (Relations are contextually associated word pairs.) Depth search uses all of the available relations, of which there can be hundreds per document, to do a high-resolution search. Speed search is significantly faster than depth search.

 Return to Top

Can you do search within a search?

Yes. The parameter search “relevant reports from previous search” allows limiting the current query to those documents identified in the previous search.

 Return to Top

The Web site refers to phrase discovery capabilities, but the demo does not show how to use those features. How does phrase discovery work?

Phrase discovery can be accomplished through several automated and manual steps. It is described in detail on page 24 and in Appendix 5 of the inventor’s publication, Searching the ASRS Database Using QUORUM Keyword Search, Phrase Search, Phrase Generation, and Phrase Discover(link opens new browser window) y

 Return to Top

What is the table of values that appears after each relevant document in a search?

The table shows the words that are strongly contextually related to the query words. For example:

word1     word2        A    B        C
RWY       35R        100  100  21.2993
TXWY      RWY        146   48  19.4219
RWY       35L         59   59  16.7637
APCH      RWY        232   19  16.3299
NOT       RWY         80   40  16.3191

In the example, word1 and word2 are the related words. Column A shows a measure of the strength of the contextual association of the word pair in the entire document database. Column B shows a measure of strength of the relationship in this particular document . Column C is a product of the natural logarithms of A and B, indicating the overall strength of the relationship. Word pairs with the highest value for C have the strongest contextual relationships and are presented first.

 Return to Top

How is "Search by Example" different from "Phrase Search"?

These two search options operate in the same way, building a query model from the query string and comparing it to the document model. Search by Example enables entering a larger quantity of multi-line text, as opposed to a single line of text. For example, it is possible to copy and paste text from a document found elsewhere into the Search by Example box.

 Return to Top

What is RMV?

RMV stands for Relational Metric Value. RMV is a Perilog-specific measure of the contextual association between two words or phrases. It is described in more detail in the inventor’s April 2001 publication, Searching the ASRS Database Using QUORUM Keyword Search, Phrase Search, Phrase Generation, and Phrase Discovery(link opens new browser window) , beginning on page 32.

 Return to Top

How is the Perilog search algorithm different from Latent Semantic Indexing?

Perilog uses degree of co-occurrence between words in pairs as the data in its word-by-word matrix, while Latent Semantic Indexing (LSA) uses frequency of occurrence of words within a body of text in its word-by-document matrix. In LSA, word co-occurence is inferred by the fact that two words appeared in a document, regardless of their proximity. Perilog uses a word-by-word matrix to represent each body of text in a database (and another to represent each query), not one big matrix to represent all words and all bodies of text. In Perilog, a relevance value is found by comparing the query matrix to the document matrix as if they were vectors, where each dimension in the space represents the pair-wise contextual association of a pair of co-occurring words.

Return to Top

Return to technology details subhead


Patents

NASA has secured four patents that comprise the Perilog software: U.S. Patent Nos. 6,823,333(link opens new browser window) ; 6,741,981(link opens new browser window) ; 6,697,793(link opens new browser window) ; and 6,721,728(link opens new browser window) .

Return to technology details subhead

Return to Top

Publications and Awards

Success story


Published papers

*Note: QUORUM is an early version of Perilog


Video


"Perilog: A Patented Contextual Search Method" —Overview and demonstration video (approx. 19 minutes)


Awards

  • NASA Space Act Board Award, 2004


Other Links of Interest


Commercial Opportunity

This technology is part of NASA's Innovative Partnerships Program, which seeks to transfer technology into and out of NASA to benefit the space program and U.S. industry. NASA invites companies to inquire about the licensing possibilities for the Perilog technology (ARC-14512-1, ARC-14513-1, ARC-14514-1, and ARC-14515-1) for commercial applications.

Return to Top

For More Information

For more information about this licensing/development opportunity, please contact:

Technology Partnerships Office
NASA's Ames Research Center
http://www.nasa.gov/ames-partnerships



Return to Top

This technology is owned by NASA's Ames Research Center
ARC-14512 (AR-0016)