Dick de Jongh, The Complexity of Finite Identification
Abstract:
To finitely identify a language means to be able to recognize it with certainty after receiving some (specific) finite sample of the language. Such a finite sample that suffices for finite identification is called definite finite telltale set. One can interpret such a DFTT as the collection of the most characteristic (from a certain point of view) elements of the set. It has also a different connotation that is based on the eliminative power of its elements. We can think of the information that is carried by a particular sample of the language in a negative way, as showing which of the hypotheses are inconsistent with the information that has arrived, and thereby eliminating them. A set S is finitely identifiable if its finite subset has the power of eliminating all possibilities under consideration which are different from S.
From the characterization of finite identifiability, we know that if a class of languages is finitely identifiable, then the identification can be done on the basis of corresponding DFTTs, i.e., finite subsets of the original languages that contain a sample sufficient for finite identifiability. A number of issues emerge when analyzing computational properties of the definite finite telltales used in identification. Since DFTTs are by no means unique, it can obviously be useful to obtain small definite telltales. In this context we distinguish two notions of minimality for DFTTs. A minimal DFTT is a DFTT that cannot be further reduced without losing its eliminative power with respect to a class of languages. A minimalsize DFTT of a set S, is a DFTT that is one of those which are smallest among all possible DFTTs of S. In order to investigate the computational complexity of finding such minimal DFTTs, we will have to restrict ourselves to finite classes of languages. Even though it is a very heavy restriction, it creates the possibility of grasping important aspects of the complexity of finite identification. We will next move back to more general cases and investigate how the use of the class of all (minimal) DFTTs can influence the speed of finite identification.
We introduce the notion of preset learner that performs identification on the basis of some DFTTs, and characterize the notion using the concept of subsetdriven learning. Then we ask the question of how difficult it is to find DFTTs of various kinds. We present the refined notions of minimal DFTT, and minimalsize DFTT. We show that the problem of finding a minimalsize DFTT is NPcomplete, while the problem of finding any minimal DFTT is PTIME computable. Therefore, it can be argued that it is harder for a teacher to provide a minimalsize optimal sample, than just any minimal sufficient information. Then we analyze the possibility of a recursive function that explicitly provides all minimal DFTTs of a finite language. We call the type of finite identification that requires the existence of such a function strict preset finite identification. For the case of finite classes of finite languages we apply a computational complexity analysis  here finding the set of all minimal DFTTs turns out to be NPhard. In the more general case of infinite classes of finite languages, we also show that there are recursively finitely identifiable classes which are not recursively strict preset finitely identifiable. In the end we compare finite identification with the concept of fastest finite identification and show classes for which recursive finite identifiers exist, but which cannot be recursively finitely identified in the fastest way. That is so because for those classes no recursive function exists that gives access to all minimal DFTTs for each language in the class.
This is a joint work with Nina Gierasimczuk.
