Google

Sunday, December 30, 2007

Speech recognition

The 1990s saw the first commercialization of spoken language understanding systems. Computers can now understand and react to humans speaking in a natural manner in ordinary languages within a limited domain. Basic and applied research in signal processing, computational linguistics and artificial intelligence have been combined to open up new possibilities in human-computer interfaces.

Voice recognition and speech synthesis technologies may not have developed to the degree some science fiction writers hoped, but have nevertheless seen some startling successes. ... Voice synthesis has been around for a long time. Bell Labs demonstrated a computer-based speech synthesis system running on an IBM704 in 1961, a demonstration seen by the author Arthur C. Clarke, giving him the inspiration for the talking computer HAL9000 in his book and film '2001: A Space Odyssey'. Forty-five years later, voice synthesis technology can be found in products as diverse as talking dolls, car information systems and various text-to-speech conversion services such as the one recently launched by BT. Many of these modern systems can convert text into a computer synthesised voice of quite respectable quality. ... Voice recognition has turned out to be a much harder task than researchers realised when work began on the problem over forty years ago. However, limited voice recognition applications are starting to creep into everyday use, voice input telephone menu systems are now commonplace, speech-to-text dictaphones are increasingly used for note-taking by doctors and lawyers, and voice input has started to appear in computer games systems. The success of some of these limited-application voice recognition systems has recently prompted the big software heavyweights, Microsoft and IBM, to make further investments. ... However, there are still a lot of technological hurdles to overcome; to understand what these are, we need to delve further into the technology. ... Speech recognition - Speech recognition, on the other hand, is a much harder task, and commercial off-the-shelf systems have only been available since the 1990s. Because every person's voice is different, and words can be spoken in a range of different nuances, tones and emotions, the computational task of successfully recognising spoken words is considerable, and has been the subject of many years of continuing research work around the world. A variety of different approaches are used, dynamic algorithms, neural networks, and knowledge bases, with the most widely used underlying technology being the Hidden Markov Model. These techniques all attempt to search for the most likely word sequence given the fact that the acoustic signal will also contain a lot of background noise."

Are you talking to me? Speech recognition: Technology that understands human speech could be about to enter the mainstream. The Economist Technology Quarterly (June 7, 2007). "Speech recognition has taken a long time to move from the laboratory to the marketplace. Researchers at Bell Labs first developed a system that recognised numbers spoken over a telephone in 1952, but in the ensuing decades the technology has generally offered more promise than product, more science fiction than function. ... Optimistic forecasts from market-research firms also suggest that the technology is on the rise. ... An area of great interest at the moment is in that of voice-driven 'mobile search' technology, in which search terms are spoken into a mobile device rather than typed in using a tiny keyboard. ... The resulting lower cost and greater reliability mean that speech-based systems can even save companies money. Last August, for example, Lloyds TSB, a British bank, switched all of its 70m annual incoming calls over to a speech-recognition system based on technology from Nuance and Nortel, a Canadian telecoms-equipment firm. ... Another promising area is in-car use. ... There are military uses, too. ..."

Computer Vision and Speech. " If you are interested in computers with human capabilities, vision and speech open an entirely new world of computers that can see and talk like we do. Computer vision is the moody input cousin of computer graphics-in graphics, you have all the time you can afford to program the rendering, but visual input is an unpredictable and messy reality. Computer speech is both input and output, like in systems capable of spoken dialogue. Viewed as enabling technologies, computer speech arguably holds the lead over computer vision. Even though a speech signal is enormously rich in information and we are still far from mastering important aspects of it like online recognition and generation of speech prosody, it is still much easier to shut up the people in a room in order to get a clear speech signal than it is to control the room's lighting conditions and to identify and track all of its 3-D contents independently of the viewing angle. Given the state of the art, it makes good sense that the papers in this issue of Crossroads are about speech or vision. Two articles address different stages of the process of making computers understand what is commonly called the speaker's communicative intention, i.e., what the speaker really wishes to say by uttering a sequence of words. Deepti Singh and Frank Boland [Voice Activity Detection] discuss approaches to the important pre-(speech)-recognition problem of detecting if and when the acoustic signal includes speech in the first place. ... Nitin Madnani's introduction [Getting Started on Natural Language Processing with Python] to natural language processing, or NLP, is likely to tempt computer scientists to try out NLP for themselves

"Speech recognition software matches strings of phonemes -- the sounds that make up words -- to words in a vocabulary database. The software finds close matches and presents the best one. The software does not understand word meaning, however. This makes it difficult to distinguish among words that sound the same or similar. The Open Mind Common Sense Project database contains more than 700,000 facts that MIT Media Lab researchers have been collecting from the public since the fall of 2000. These are based on common sense like the knowledge that a dog is a type of pet rather than the knowledge that a dog is a type of mammal. The researchers used the phrase database to reorder the close matches returned by speech recognition software. ... 'One surprising thing about testing interfaces like this is that sometimes, even if they don't get the absolutely correct answer, users like them a lot better,' said [Henry] Lieberman. 'This is because they make plausible mistakes, for example 'tennis clay court' for 'tennis player', rather than completely arbitrary mistakes that a statistical recognizer might make, for example 'tennis slayer','

Sunday, December 16, 2007

Genetic programming

Genetic programming is a branch of genetic algorithms. The main difference between genetic programming and genetic algorithms is the representation of the solution. Genetic programming creates computer programs in the lisp or scheme computer languages as the solution. Genetic algorithms create a string of numbers that represent the solution. Genetic programming, uses four steps to solve problems:
1) Generate an initial population of random compositions of the functions and terminals of the problem (computer programs).

2) Execute each program in the population and assign it a fitness value according to how well it solves the problem.

3) Create a new population of computer programs.
i) Copy the best existing programs
ii) Create new computer programs by mutation.
iii) Create new computer programs by crossover(sexual reproduction).

4) The best computer program that appeared in any generation, the best-so-far solution, is designated as the result of genetic programming .


Figure 1: genetic programming Flowchart.



Fitness Function
The most difficult and most important concept of genetic programming is the fitness function. The fitness function determines how well a program is able to solve the problem. It varies greatly from one type of program to the next. For example, if one were to create a genetic program to set the time of a clock, the fitness function would simply be the amount of time that the clock is wrong. Unfortunately, few problems have such an easy fitness function; most cases require a slight modification of the problem in order to find the fitness.


Gun Firing Program
A more complicated example consists of training a genetic program to fire a gun to hit a moving target. The fitness function is the distance that the bullet is off from the target. The program has to learn to take into account a number of variables, such as wind velocity, type of gun used, distance to the target, height of the target, velocity and acceleration of the target. This problem represents the type of problem for which genetic programs are best. It is a simple fitness function with a large number of variables.

Water Sprinkler System
Consider a program to control the flow of water through a system of water sprinklers. The fitness function is the correct amount of water evenly distributed over the surface. Unfortunately, there is no one variable encompassing this measurement. Thus, the problem must be modified to find a numerical fitness. One possible solution is placing water-collecting measuring devices at certain intervals on the surface. The fitness could then be the standard deviation in water level from all the measuring devices. Another possible fitness measure could be the difference between the lowest measured water level and the ideal amount of water; however, this number would not account in any way the water marks at other measuring devices, which may not be at the ideal mark.

Maze Solving Program
If one were to create a program to find the solution to a maze, first, the program would have to be trained with several known mazes. The ideal solution from the start to finish of the maze would be described by a path of dots. The fitness in this case would be the number of dots the program is able to find. In order to prevent the program from wandering around the maze too long, a time limit is implemented along with the fitness function.


Functions and Terminals
The terminal and function sets are also important components of genetic programming. The terminal and function sets are the alphabet of the programs to be made. The terminal set consists of the variables and constants of the programs. In the maze example, the terminal set would contain three commands: forward, right and left. The function set consists of the functions of the program. In the maze example the function set would contain: If "dot" then do x else do y. In the gun firing program is the terminal set would be composed of the different variables of the problem. Some of these variables could be the velocities and accelerations of the gun, the bullet and target. The functions are several mathematical functions, such as addition, subtraction, division, multiplication and other more complex functions.

Crossover Operation
Two primary operations exist for modifying structures in genetic programming. The most important one is the crossover operation. In the crossover operation, two solutions are sexually combined to form two new solutions or offspring. The parents are chosen from the population by a function of the fitness of the solutions. Three methods exist for selecting the solutions for the crossover operation.

The first method uses probability based on the fitness of the solution. If is the fitness of the solution Si and

is the total sum of all the members of the population, then the probability that the solution Si will be copied to the next generation is [Koza 1992]:


Another method for selecting the solution to be copied is tournament selection. Typically the genetic program chooses two solutions random. The solution with the higher fitness will win. This method simulates biological mating patterns in which, two members of the same sex compete to mate with a third one of a different sex. Finally, the third method is done by rank. In rank selection, selection is based on the rank, (not the numerical value) of the fitness values of the solutions of the population.

The creation of the offsprings from the crossover operation is accomplish by deleting the crossover fragment of the first parent and then inserting the crossover fragment of the second parent. The second offspring is produced in a symmetric manner.

Wednesday, December 12, 2007

Branches of AI

logical AI
What a program knows about the world in general the facts of the specific situation in which it must act, and its goals are all represented by sentences of some mathematical logical language. The program decides what to do by inferring that certain actions are appropriate for achieving its goals.

search
AI programs often examine large numbers of possibilities, e.g. moves in a chess game or inferences by a theorem proving program. Discoveries are continually made about how to do this more efficiently in various domains.

pattern recognition
When a program makes observations of some kind, it is often programmed to compare what it sees with a pattern. For example, a vision program may try to match a pattern of eyes and a nose in a scene in order to find a face. More complex patterns, e.g. in a natural language text, in a chess position, or in the history of some event are also studied. These more complex patterns require quite different methods than do the simple patterns that have been studied the most.

representation
Facts about the world have to be represented in some way. Usually languages of mathematical logic are used.

inference
From some facts, others can be inferred. Mathematical logical deduction is adequate for some purposes, but new methods of non-monotonic inference have been added to logic since the 1970s. The simplest kind of non-monotonic reasoning is default reasoning in which a conclusion is to be inferred by default, but the conclusion can be withdrawn if there is evidence to the contrary. For example, when we hear of a bird, we man infer that it can fly, but this conclusion can be reversed when we hear that it is a penguin. It is the possibility that a conclusion may have to be withdrawn that constitutes the non-monotonic character of the reasoning. Ordinary logical reasoning is monotonic in that the set of conclusions that can the drawn from a set of premises is a monotonic increasing function of the premises. Circumscription is another form of non-monotonic reasoning.

common sense knowledge and reasoning
This is the area in which AI is farthest from human-level, in spite of the fact that it has been an active research area since the 1950s. While there has been considerable progress, e.g. in developing systems of non-monotonic reasoning and theories of action, yet more new ideas are needed. The Cyc system contains a large but spotty collection of common sense facts.

learning from experience
Programs do that. The approaches to AI based on connectionism and neural netsMit97 is a comprehensive undergraduate text on machine learning. Programs can only learn what facts or behaviors their formalisms can represent, and unfortunately learning systems are almost all based on very limited abilities to represent information. specialize in that. There is also learning of laws expressed in logic.

planning
Planning programs start with general facts about the world (especially facts about the effects of actions), facts about the particular situation and a statement of a goal. From these, they generate a strategy for achieving the goal. In the most common cases, the strategy is just a sequence of actions.

epistemology
This is a study of the kinds of knowledge that are required for solving problems in the world.

ontology
Ontology is the study of the kinds of things that exist. In AI, the programs and sentences deal with various kinds of objects, and we study what these kinds are and what their basic properties are. Emphasis on ontology begins in the 1990s.

heuristics
A heuristic is a way of trying to discover something or an idea imbedded in a program. The term is used variously in AI. Heuristic functions are used in some approaches to search to measure how far a node in a search tree seems to be from a goal. Heuristic predicates that compare two nodes in a search tree to see if one is better than the other, i.e. constitutes an advance toward the goal, may be more useful. [My opinion].

genetic programming
Genetic programming is a technique for getting programs to solve a task by mating random Lisp programs and selecting fittest in millions of generations. It is being developed by John Koza's group.

Applications of AI

game playing
You can buy machines that can play master level chess for a few hundred dollars. There is some AI in them, but they play well against people mainly through brute force computation--looking at hundreds of thousands of positions. To beat a world champion by brute force and known reliable heuristics requires being able to look at 200 million positions per second.

speech recognition
In the 1990s, computer speech recognition reached a practical level for limited purposes. Thus United Airlines has replaced its keyboard tree for flight information by a system using speech recognition of flight numbers and city names. It is quite convenient. On the the other hand, while it is possible to instruct some computers using speech, most users have gone back to the keyboard and the mouse as still more convenient.

understanding natural language
Just getting a sequence of words into a computer is not enough. Parsing sentences is not enough either. The computer has to be provided with an understanding of the domain the text is about, and this is presently possible only for very limited domains.

computer vision
The world is composed of three-dimensional objects, but the inputs to the human eye and computers' TV cameras are two dimensional. Some useful programs can work solely in two dimensions, but full computer vision requires partial three-dimensional information that is not just a set of two-dimensional views. At present there are only limited ways of representing three-dimensional information directly, and they are not as good as what humans evidently use.

expert systems
A ``knowledge engineer'' interviews experts in a certain domain and tries to embody their knowledge in a computer program for carrying out some task. How well this works depends on whether the intellectual mechanisms required for the task are within the present state of AI. When this turned out not to be so, there were many disappointing results. One of the first expert systems was MYCIN in 1974, which diagnosed bacterial infections of the blood and suggested treatments. It did better than medical students or practicing doctors, provided its limitations were observed. Namely, its ontology included bacteria, symptoms, and treatments and did not include patients, doctors, hospitals, death, recovery, and events occurring in time. Its interactions depended on a single patient being considered. Since the experts consulted by the knowledge engineers knew about patients, doctors, death, recovery, etc., it is clear that the knowledge engineers forced what the experts told them into a predetermined framework. In the present state of AI, this has to be true. The usefulness of current expert systems depends on their users having common sense.

heuristic classification
One of the most feasible kinds of expert system given the present knowledge of AI is to put some information in one of a fixed set of categories using several sources of information. An example is advising whether to accept a proposed credit card purchase. Information is available about the owner of the credit card, his record of payment and also about the item he is buying and about the establishment from which he is buying it (e.g., about whether there have been previous credit card frauds at this establishment).

Introduction

The modern definition of artificial intelligence (or AI) is "the study and design of intelligent agents" where an intelligent agent is a system that perceives its environment and takes actions which maximizes its chances of success. John McCarthy, who coined the term in 1956, defines it as "the science and engineering of making intelligent machines."Other names for the field have been proposed, such as computational intelligence,[4] synthetic intelligence or computational rationality.

The term artificial intelligence is also used to describe a property of machines or programs: the intelligence that the system demonstrates. Among the traits that researchers hope machines will exhibit are reasoning, knowledge, planning, learning, communication, perception and the ability to move and manipulate objects.
It is the science and engineering of making intelligent machines, especially intelligent computer programs. It is related to the similar task of using computers to understand human intelligence, but AI does not have to confine itself to methods that are biologically observable.Intelligence is the computational part of the ability to achieve goals in the world. Varying kinds and degrees of intelligence occur in people, many animals and some machines.