Yes, but what *is* Artificial Intelligence?
12 October 2020In most cases, not actually much beyond a function fit.
A simple example
Let us take a look at one of the simplest examples - an example so simple that you might feel fooled. Imagine you have a list of twenty apartments, containing information about price and size of each apartment. You decide to create a two-dimensional diagram, with axes labeled "price" and "size", respectively. Within this diagram, you mark one tick for every apartment on the list, with coordinates given by its price and size. You might observe that the price is generally increasing with larger size. Now you draw a line through your diagram in an optimal way. Optimal might mean here that the distances between data point and line are as small as possible on average. Your newly drawn line gives you a prediction for the price of any apartment based on its size. You can tell a computer to find that optimal line based on your list of apartments. A suitable algorithm takes some initial line and evolves it step by step towards the optimal line. This process of convergence constitutes the essence of machine learning, and machine learning is how AI is approached these days.
A more realistic fit
A straight line is a strong restriction on the function fit. A strong restriction makes learning easier, but a straight line can only cover linear dependencies and this is in most cases just too restrictive. It underfits the more interesting dependencies. What the computer draws does not have to be a line, it could also be curved. The meaning of optimal then needs to be reassessed. With a curve, it is possible to directly connect all the ticks, rendering the average distance between data and curve zero. Although the average distance is zero, this curve is not necessarily optimal. For example, your list of apartments might contain one highly unusual sample, leading to correnspondingly unusual predictions for the prices of other apartments of the same size. This is called overfitting. Note that the clustering of similar apartments according to what is called an unsupervised algorithm also fits within this picture. Striking the right balance between underfitting and overfitting is an engineering problem, as is the distinction between supervised and unsupervised learning.
More realistic scenarios
Expecting the apartment price to depend only on its size is way too naive. What about its location, architecture, facilities, nearby infrastructure? Each apartment has a long list of properties, and many of them act as features for the pricing prediction. If we repeat the above process, every one of these features will require its own axis. Needless to state, we are unable to bijectively draw anything beyond two dimensions, and to imagine anything beyond three. But we might need a hundred dimensions. Making a prediction of price based on size and quality level requires us to draw a surface instead of a curve. Taking into account more than two features would require us to draw the higher-dimensional analog of a surface if we could. We can't, hence they are called hypersurfaces. The problems humans have with imagining a high number of dimensions do not exist for computers - the computer is pragmatic and treats a further dimension merely as one further element in a list of numbers. There is one big problem however.
What is the algorithm for optimizing on a high-dimensional feature space?
Computation time generally grows exponentially with increasing number of dimensions. Each one of the pixels of an image in a computer-vision problem could constitute its own dimension in feature space. The number of words in the vocabulary of a natural-language processing problem would map to the number of dimensions in feature space. Hence, efficient algorithms are needed. In an effort to reduce computational complexity, physicists create simplified models that need to be just complex enough to capture the relevant effects. Should it later turn out that not all relevant effects are mapped, a new and more complex model is needed. (Note that a more complex model can *look* simpler by the style of its formulation, reflecting a sense of logical completeness up to a certain degree.) Ranging from linear and logistic regression to very deep neural networks, the space of machine-learning models is vast. The properties of all these different models are discussed elsewhere. In essence though, the idea is always the same. Give the model just enough structure to be able to learn what it should. Give it eyes if you want it to see and give it ears if you want it to hear. But don't give it more structure than necessary, otherwise you'll be off the fine line between conceptual progress and present-day computational feasibility.