There is a vocabulary of keywords (eg begin, end, if ..., together with rules for forming valid identifiers of the programmer's choice (eg identifiers must begin with a letter...).
There is a syntax which states the order in which elements of the vocabulary can combine. For instance this is correct Pascal:
repeat Count := Count + 1 until Count = 10;but the following has the same vocabulary, but is not syntactically correct:
repeat until Count = 10 Count := Count + 1;There is a semantics. That is to say that certain combinations of the vocabulary in a certain order has a recognized meaning:
if Count > 10 then
if Count < 20 then write('Count is less than 20')
else write('Count is less than 10')
This is a bit of an unfair example, because the meaning is not immediately apparent to the reader, but to a correctly written Pascal compiler or interpreter will have no difficulty in assigning the "standard" meaning to this statement, and proving it has done so by performing the correct actions.
This doesn't directly answer the question of what natural language is, but it gives us something to contrast it against. Natural language is the language we write and speak in everyday social interaction. There are of course many varieties of natural language (Welsh, Cornish, Celtic, Gaelic, Manx and English are all used in the Great Britain and the North of Ireland). It is quite possible to argue that the spoken and the written forms of the language are different and may be largely independent. (If you don't believe this last point, try transcribing a conversation between several people and seeing if it is similar to written language. Radio plays are notoriously dissimilar to real-life drama.)
The claim of many who study natural language is that there are systems of vocabulary, syntax and semantics which can be observed (or similarly discovered) and recorded. Those working in NLP also would claim (or at least hope) that it is possible to "automate" these descriptions to produce useful systems that are based on these descriptions.
If you consider your natural language for a short while, you will be able to think of some elementary rules to describe it: for instance that its words are made up of upper and lower case letters and are bounded by space and punctuation (except in speech, where there seems to be no gap between words). You could go on to make rules for syntax, such as "the" always precedes the name of an object or concept (and you may be able to define "object" and "concept"), and you may well go on to suggest some rules of syntax, such as "her" refers to the last female human last appearing in the text (unless of course the text is about non human animals and/or boats). Well, you should be convinced by now that to create a good description of your language will take more than a few moments and will be less than easy.
You may consider the whole thing too difficult to be feasible. Don't worry, because there are many NLP systems that aim to use only parts of a natural language, for instance only the kinds of text found in aircraft maintenance manuals. The person who designs the system is creating an artificial language. This new language might be a lot more complicated than Pascal, but it is far less complicated than a full natural language.
The commonly held view among linguists is that a person should speak a version of the language suitable to their background and social position. Therefore no one version of a language can be said to be superior to another. There are several kinds of linguistics, and these are briefly outlined below.
the human condition
la condition humaine