Natural Language Processing

Graduate School in Computer Science 2001/2002

Artificial Intelligence Laboratory

The Swiss Federal Institute of Technology

Large Scale Semantic Resources
A Tool for browsing EDR


Report

Supervisor: Dr. Martin Rajman

By David Portabella

 

Index

  1. List of tables and figures
  2. Introduction
  3. EDR Dictionary
    1. Word Dictionary
    2. Concept Dictionary
    3. Co-occurrences Dictionary
    4. Corpus Dictionary
  4. Statistics about EDR
    1. # Paths to reach the top concept
    2. Concepts per Level classification
    3. The Corpus Database
  5. Previous work on the project
  6. Retrieving the information from the database
    1. Using the Word Dictionary
    2. Using Concept Description Dictionary
    3. Using Corpus Dictionary
  7. The extended graphical interface
  8. Conclusion
Note: To get the full report, please send an email to either Martin Rajman or David Portabella

 

List of tables and figures

List of tables

Table 1. Word dictionary. Example 1
Table 2. Word dictionary. Example 2
Table 3. The 8/26 relational labels used in the Concept Description Dictionary
Table 4. Example of 4 of the 26 concepts related to the concept "to write down something"
Table 5. 4 of the 26 concepts related to the concept "to write down something"


List of figures

Figure 1. Concept Classification Dictionary. Example
Figure 2. # occurrences grouped by number of meanings per headword
Figure 3. # Paths to reach the top concept
Figure 4. Path #1 and #3 of the concept <#8361>Major
Figure 5. Concepts per Level classification
Figure 6. Relational Labels for representing facts or occurrences
Figure 7. And the pseudorelational labels
Figure 8. Relations Labels (From the Corpus Dictionary, Not found in the Concepts Dictionary)
Figure 9. Dictionary Interface. Showing word "calm"
Figure 10. Dictionary Interface. Showing word "tranquilize"
Figure 11. Graphical description of the syntactic information of one phrase of the Corpus Dictionary
Figure 12. Graphical description of the semantic information of one phrase of the Corpus Dictionary
Figure 13. Screenshot of the Concept Navigation interface

 

Introduction

Natural-language processing allows people to interact with computers without needing any specialized knowledge. You could simply walk up to a computer and talk to it. Unfortunately, programming computers to understand natural languages has proved to be more difficult than originally thought. Some rudimentary translation systems that translate from one human language to another are in existence, but they are not nearly as good as human translators. There are also voice recognition systems that can convert spoken sounds into written words, but they do not understand what they are writing; they simply take dictation. Even these systems are quite limited, you must speak slowly and distinctly.

Actual systems work fine at morpho-lexical and syntactic level, but don’t success very well at semantic and pragmatic level. This means that in the example “He eats a fish with a fork” the system has problems to understand whether he eats a fish using a fork, or he eats a fish and a fork. To achieve this goal, a knowledge base must be created to help the system choice between the different possibilities.

This project aims to advance in the use of Large Scale Semantic Resources in Natural Language Processing. Firstly it is present the EDR Dictionary, which is an electronic dictionary, thesaurus and corpora. Secondly, some statistics of this dictionary are shown. Afterwards it is commented the previous work done on this project, and it is shown some examples about how to retrieve the information from the database.
Finally, it is presented the new extended interface to browse the dictionary.

Related work is being done, with the UNL Project and the American CYC Project, both using a knowledge base.

 

EDR Dictionary

...

 

Statistics about EDR

...

 

Previous work on the project

The EDR Dictionary is written in a plain text format. Thus the information is not easily accessible for a standard application. The data in the dictionary needed to be translated into a database. This was done last year by Herbei Dacian, a student from the graduate school.

He used the MySQL database software. Now applications can retrieve the information in a efficient way. The translation was also motivated for reducing the redundancy existent in the EDR Dictionary.

However while he was reducing the redundancy, he was also making it more complex to retrieve. Also, the communication between the client application and the MySQL database server increased a lot to just get a simple information. This motivated the creation of a Client/Server API software to access the information in the database. In this way, the client needs much less communication.

The Client/Server API software runs in the following way:

1. The server is running and listening, waiting for a client to connect.

2. The client connects to the server.

3. The client asks for a specific task.

4. The server agrees if that task exists.

5. Then, the client sends a query specific to that task

6. The server replies with the information

7. Go back to point 5, until connection closed.


There was only one task implemented, the dictionary task. It is not a general task, so the client cannot ask information about a meaning or a concept. It can only ask information about a word, and it returns all the meanings of the word, with the grammatical information, and the concepts headwords and explanations.

The implemented tasks are indicated in the server configuration file “Servers.properties”. For every task, a line is written with the task name and the java class name. So, it contained only this line:
dictionary=server.DictionaryServer

It states that when a client ask for the task “dictionary”, the server initializes the class “server.DictionaryServer”, and the client can start making queries specific to this task.

To test the Server/Client API software, a graphical dictionary client application was also written. This application was then improved by Stefan Schmidlin, in his diploma project.

In the two examples below, it is shown a request for the words “calm” and “tranquilize”:



Figure 9. Dictionary Interface. Showing word "calm"


Figure 10. Dictionary Interface. Showing word "tranquilize"

 

Retrieving the information from the database

...

 

The extended graphical interface

With the dictionary graphical interface, it was possible to look for a word, and browse through its meanings and concepts. Selected a concept, a new graphical interface was also needed to be able to navigate through its super-sub concepts hierarchy and its related concepts.

First the Client/Server API had to be extended to be able to navigate through concepts.
A new server task was implemented, and added to the server configuration file “Servers.properties”:
conceptNavigation=server.ConceptNavigationServer
Then the Client API and the new interface were successfully implemented.
To represent the classification concepts, it uses NetEditor, an open source java graph library.


Figure 13. Screenshot of the Concept Navigation interface


When the application is started, it shows the Dictionary interface. There is the possibility of looking up a word. All the meanings and its corresponding concepts are shown. A concept can be selected and asked to be shown in the Concept Navigation interface.

Once in the Concept Navigation interface, in the right panel it is shown the related and super and sub concepts for the selected concept. It can be asked to show any of these concepts in the left navigation panel.

In the right panel it is also shown the meanings associated to the selected concept. It can then be asked to show any of these meanings in the Dictionary interface.

In the left navigation panel, it can be requested to show its super or sub concepts. It can also be requested to show all the super concepts until reaching the root concept.


Conclusion

Some programs to make statistics of the dictionary have been done, and the results are presented in this report.

The extension of the Client/Server API for the concept navigation has been successfully implemented.

The API would need to be extended to access also the Corpus information when needed.

The graphical interface has been extended, and now it is possible to look words in the dictionary interface, and navigate through the concepts as well. The new interface has proved to be a good resource to explore the EDR dictionary and thus get a better understanding of it. As the EDR dictionary was made for English and Japanese, we think that its complexity can be reduced if we are going to use only the English part. The GUI is going to be a great tool to get this around.

This tool is going to be used also by a third-party group to validate the correctness of the dictionary.

Now that the tool is ready, we can go on with the research of using Large Scale Semantic Resources.