短评: 这半年读过的最漂亮的一本书,各种计算模型及某些通用计算模型的等价性、语义分析/语法分析以及最后一章的类型检查系统(给我感觉就是在写一个小的编译器或者解释器)、还有停机问题的具体介绍都非常精彩。最最重要的是,所有的论述都用上了 Ruby 代码,大部分例子理论上都是可以运行的!Ruby 的灵活性(比如代码即数据在某些语言中就不是很方便)给本书某些章节写实现时带来不少便利,第八章作者甚至一本正经胡说八道地用 evaluate function 写了个停机问题的函数,然后想尝试解决哥德巴赫猜想。
笔记:
Notes of Understanding Computation
============================
## Just Enough Ruby
# Programs and Machines
## The Meaning of Programs
- semantics
- expressions vs. statements
- expressions: return a result, no side effect
- statements: may change the environment, have side effects
- operational semantics vs. denotational semantics
- operational semantics: define the meaning of a program by defining rules of how the program executes
- small-step (reduce the syntax in small steps, making the program closer to the final result step by step) vs. big-step (specify how to get an expression or statement straight to its result)
- denotational semantics: translated to another language (in the book the author translates everything into Ruby code, in forms of Ruby `proc`, and executes it directly in Ruby. It can be considered "compiled to Ruby")
- example: `while` in SIMPLE language
- small-step semantics
```Ruby
class While < Struct.new(:condition, :body)
def to_s
"while (#{condition}) { #{body} }"
end
def inspect
"«#{self}»"
end
def reducible?
true
end
def reduce(environment)
[If.new(condition, Sequence.new(body, self), DoNothing.new), environment]
end
end
```
- big-step semantics
```Ruby
class While
def evaluate(environment)
case condition.evaluate(environment)
when Boolean.new(true)
evaluate(body.evaluate(environment))
when Boolean.new(false)
environment
end
end
end
```
- denotational semantics
```Ruby
class While
def to_ruby
"-> e {" +
" while (#{condition.to_ruby}).call(e); e = (#{body.to_ruby}).call(e); end;" +
" e" +
" }" # where e is the environment
end
end
```
## The Simplest Computers
- DFA, NFA, regular expressions
- see notes for "Compilers", one thing to note is that we cannot make DFA more powerful by adding the nondeterministic feature, so DFA/NFA have relatively limited power: they could only accept or reject some types of sequences of characters, we need more powerful machines in following chapters
## Just Add Power
- pushdown automata
- previously we saw that DFA/NFAs have limited power, for instance, they could not determine whether a sequence of brackets of arbitrary number are matched. This problem can be solved simply by **using a stack as the memory of the machine**, and it leads to **pushdown automata (PDA)**.
- palindromes and nondeterministic PDA
- **recognizing palindromes** could not be done by using deterministic PDA (unless we use a marker to indicate halfway point), but it can be easily done with a NPDA, where a freemove step appears in the halfway point.
![](figures/1.png)
- nonequivalence
- unlike NFA/DFA, where we can combine multiple NFA states as one DFA state and convert the NFA to an equivalent DFA, there is no way to combine multiple stacks in NPDA as a single stack in DPDA. So not every NPDA can be converted to a DPDA
- syntactic analysis with PDA
- description of syntax using context-free grammar
- conversion from CFG to PDA
- basic idea:
> the symbol rules repeatedly expand the symbol on the top of the stack until it gets replaced by a token, then the token rules consume the stack (and the input) until they hit a symbol
-
> How does the PDA know which rule to choose at each step of execution? Well, that’s the power of nondeterminism: our NPDA simulation tries all possible rules, so as long as there’s some way of getting to an empty stack, we’ll find it.
- limitations
- The way we use stack limits the power of a PDA
- we can only read data at the top of the stack, once we pop it out, we lose it forever. So we are not able to recognize strings with equal numbers of three different types of character (e.g. 'aabbcc').
- we can only access data in the reverse order in a stack
- PDA does not have a good output mechanism
## The Ultimate Machine
- deterministic Turing machine
- definition: a deterministic Turing machine is a finite state machine with access to an infinitely long tape
- examples: https://en.wikipedia.org/wiki/Turing_machine_examples
- nondeterministic Turing machine
- previously we saw that DFA is as powerful as NFA, but NPDA is more powerful than DPDA since it is impossible to simulate multiple stacks using one stack, what about DTM vs. NTM?
- turns out that a DTM can simulate any NTM, by storing information of multiple configurations (each configuration = current state + tape content) in one tape. An implementation is as follows (note: we can define internal states of reading configurations/applying rules to configurations/erasing configurations, to distinguish among different processes):
> Each step of the simulated computation is performed by reading the configuration at the front of the queue, finding each rule that applies to it, and using that rule to generate a new configuration that is written onto the tape at the back of the queue. Once this has been done for every applicable rule, the frontmost configuration is erased, and the process starts again with the next configuration in the queue. The simulated machine step is repeated until the configuration at the front of the queue represents a machine that has reached an accept state. This technique allows a deterministic Turing machine to explore all possible configurations of a simulated machine in breadth-first order.
- maximum power
- DTM has maximum power, any attempt to upgrade the specification of Turing machines fails, following are some examples of extending DTM
- internal storage
- we may want to have a few internal registers to store temporary information instead of storing everything on the tape and move back and forth to read/write it. This can be simulated using different "current states". For each possible information we associate a specific internal state with it. For instance if we want to copy the first character of a string consisting of "a", "b" and "c" to the end, we can use three different states to represent the cases where the first character is "a", "b", "c" respectively:
![](figures/2.png)
- subroutines
- this can be simulated by appending multiple DTMs
- multiple tapes
- write information of multiple tapes on one tape,
- e.g. `ab(c), (d)ef, g(h)i` => `a_dXg_b_e_hXcXf_i_` ("X" marks the positions of three tape heads)
- multidimensional tape
- each multidimensional tape can be simulated using multiple tapes, so it could be further simulated using one tape
- general-purpose machines
- all machines mentioned above are hard-coded, i.e. each machine could only do one specific task. Universal Turing Machine is a **programmable** Turing machine, of which the tape is used to store **both instructions and data**. A UTM can be used to simulate any DTM.
# Computation and Computability
## Programming with Nothing
- Impersonating the Lambda Calculus
- in this part we are going to use minimal features in Ruby to simulate lambda calculus, the features we use are
- referring to variables
- creating procs (without loss of generality, we use only single-argument procs, since every proc that takes multiple arguments can be converted to single-argument procs via currying)
- calling procs
- in this chapter, the author's goal is to write a FizzBuzz program using lambda calculus. To do that, we need to implement several features.
- implementation of features
- numbers
```Ruby
ZERO = -> p { -> x { x } }
ONE = -> p { -> x { p[x] } }
TWO = -> p { -> x { p[p[x]] } }
THREE = -> p { -> x { p[p[p[x]]] } }
```
- Booleans and if
```Ruby
# note that it will make "IF" super simple if we define booleans this way
TRUE = -> x { -> y { x } }
FALSE = -> x { -> y { y } }
IF = -> b { -> x { -> y { b[x][y] } } } = -> b { b }
```
- predicate: is_zero?
```Ruby
# since zero is the only number that does not call "p", we can define a function p returning FALSE while using TRUE as x
IS_ZERO = -> n { n[-> x { FALSE }][TRUE] }
```
- pairs
```Ruby
PAIR = -> x { -> y { -> f { f[x][y] } } }
LEFT = -> p { p[-> x { -> y { x } } ] }
RIGHT = -> p { p[-> x { -> y { y } } ] }
```
- Numeric Operations
```Ruby
INCREMENT = -> n { -> p { -> x { p[n[p][x]] } } }
# it is more tricky to implement DECREMENT, since we cannot reduce the number of times a function is called directly. To do this, we use a sliding window to delay increment.
SLIDE = -> p { PAIR[RIGHT[p]][INCREMENT[RIGHT[p]]] } # second element is incremented while the first one is kept unchanged, this is the mechanism to delay increment
DECREMENT = -> n { LEFT[n[SLIDE][PAIR[ZERO][ZERO]]] } # use the first element (LEFT)
# it is straight forward to write following operations based on the definition of numbers
ADD = -> m { -> n { n[INCREMENT][m] } }
SUBTRACT = -> m { -> n { n[DECREMENT][m] } }
MULTIPLY = -> m { -> n { n[ADD[m]][ZERO] } }
POWER = -> m { -> n { n[MULTIPLY[m]][ONE] } }
# since ZERO is the smallest number
IS_LESS_OR_EQUAL = -> m { -> n {
IS_ZERO[SUBTRACT[m][n]]
} }
# incorrect version of MOD
MOD =
-> m { -> n {
IF[IS_LESS_OR_EQUAL[n][m]][
MOD[SUBTRACT[m][n]][n]
][
m
]
} }
# this version would get stuck in an infinite recursion, since every time MOD
# is called it will always call MOD[SUBTRACT[m][n]][n], even if IS_LESS_OR_EQUAL[n][m] is false.
# We need to delay evaluation of MOD[SUBTRACT[m][n]][n]:
MOD =
-> m { -> n {
IF[IS_LESS_OR_EQUAL[n][m]][
-> x { MOD[SUBTRACT[m][n]][n][x] }
][
m
]
} }
# why is MOD[SUBTRACT[m][n]][n] equivalent to -> x { MOD[SUBTRACT[m][n]][n][x] },
# since everything is a single-argument proc in this system, the only way to
# check if two expressions are the same is to call them, obviously
# MOD[SUBTRACT[m][n]][n].call(y) == -> x { MOD[SUBTRACT[m][n]][n][x] }.call(y)
# but there is another problem, in pure lambda calculus,
# we cannot use named proc in the expression, is it possible to define recursion
# without naming it?
# The answer is yes, by using Y-combinator
# (more information about Y-combinator: http://www.swarmagents.cn/swarma/detail.php?id=11966,
# and http://www.jianshu.com/p/64786c4396a7 for a good example
# basic idea is that for any lambda function g, we have Y g = g(Y g), we can use this property to
# implement anonymous recursive functions)
Y = -> f { -> x { f[x[x]] }[-> x { f[x[x]] }] }
# to avoid infinite recursion, we need to delay evaluation, so use Z combinator:
Z = -> f { -> x { f[-> y { x[x][y] }] }[-> x { f[-> y { x[x][y] }] }] }
# now we can rewrite MOD as
MOD =
Z[-> f { -> m { -> n {
IF[IS_LESS_OR_EQUAL[n][m]][
-> x {
f[SUBTRACT[m][n]][n][x]
}
][
m
]
} } }]
```
- lists
```Ruby
# use following to define lists
EMPTY = PAIR[TRUE][TRUE]
UNSHIFT = -> l { -> x {
PAIR[FALSE][PAIR[x][l]]
} } # insert x at the front of list l
IS_EMPTY = LEFT
FIRST = -> l { LEFT[RIGHT[l]] }
REST = -> l { RIGHT[RIGHT[l]] }
# we may also define related operations, e.g. FOLD, MAP, REDUCE
```
- strings
- string is implemented as list of characters
- using all previous features implemented, we are able to write a solution to FizzBuzz problem using pure lambda calculus.
- advanced programming techniques
- previously we saw that it is possible to write programs in lambda calculus, now we introduce a few advanced features
- infinite streams
```Ruby
ZEROS = Z[-> f { UNSHIFT[f][ZERO] }]
UPWARDS_OF = Z[-> f { -> n { UNSHIFT[-> x { f[INCREMENT[n]][x] }][n] } }]
```
> Since the contents of a stream can be generated by any computation, there’s nothing to stop us creating an infinite list of the Fibonacci series, or the prime numbers, or all possible strings in alphabetical order, or anything else computable. This abstraction is a powerful one and doesn’t require any clever features on top of what we already have.
- avoiding arbitrary recursion
- implementing the lambda calculus
- in this part, the author defines three classes `LCVariable, LCFunction, LCCall` and their reduction rules, to **actually implement** the lambda calculus, instead of using Ruby proc directly
## Universality is Everywhere
- previously we showed that universal Turing machine is powerful enough to do lots of work, and more importantly it is actually **programmable** with instructions. In this chapter, we are going to study a few other examples, which are capable of simulating a Turing machine.
- lambda calculus
- simulating a Turing machine with lambda calculus
```Ruby
# first we simulate tape
TAPE = -> l { -> m { -> r { -> b { PAIR[PAIR[l][m]][PAIR[r][b]] } } } }
TAPE_LEFT = -> t { LEFT[LEFT[t]] }
TAPE_MIDDLE = -> t { RIGHT[LEFT[t]] }
TAPE_RIGHT = -> t { LEFT[RIGHT[t]] }
TAPE_BLANK = -> t { RIGHT[RIGHT[t]] }
# then we can define functions to move the tape head left/right
# and to read/write character with tape head
# by doing this, we successfully simulate a Turing machine with lambda calculus.
```
> It turns out that the reverse is also possible: a Turing machine can act as an interpreter for the lambda calculus by storing a representation of a lambda calculus expression on the tape and repeatedly updating it according to a set of reduction rules, just like the operational semantics from “Semantics” on page 199.
- partial recursive functions
- a system that simulates a Turing machine with `#minimize to #zero, #increment, #recurse`, see the book for details
- SKI Combinator Calculus
- SKI calculus is a simpler calculus system than lambda calculus, it only has two kinds of expressions: calls and alphabetical symbols, and very simple reduction rules (Notice that there’s **no lambda-calculus-style variable replacement** going on here, just symbols being reordered, duplicated, and discarded according to the reduction rules):
- Reduce S[a][b][c] to a[c][b[c]], where a, b, and c can be any SKI calculus expressions.
- Reduce K[a][b] to a.
- Reduce I[a] to a.
- converting lambda calculus expression to SKI expression
- there are three components in lambda calculus
- variable: `SKISymbol.new(name)` (directly converted to SKI symbol)
- call: `SKICall.new(left.to_ski, right.to_ski)` (convert left/right parts and then create a SKI call)
- function: `body.to_ski.as_a_function_of(parameter)`
- this part requires `as_a_function_of()`, it is defined as
```Ruby
class SKISymbol
def as_a_function_of(name)
if self.name == name
I
else
SKICall.new(K, self)
end
end
end
class SKICombinator
def as_a_function_of(name)
SKICall.new(K, self)
end
end
class SKICall
def as_a_function_of(name)
left_function = left.as_a_function_of(name)
right_function = right.as_a_function_of(name)
SKICall.new(SKICall.new(S, left_function), right_function)
end
end
# basically this part define a function that converts an SKI expression
# into a new one that turns back into the original
# when called with an argument.
# for example, S[K][I] will be converted into S[S[K[S]][K[K]]][K[I]]
# when call it as S[S[K[S]][K[K]]][K[I]][x], we get S[K][I] as the result
# what is more interesting is that if the expression contains the parameter ("name"),
# when it is converted and called with another argument, it will be
# reduced to the original expression with that argument in place of the symbol,
# for example, for S[x][I], if we use x as function parameter and converts it to
# S[S[K[S]][I]][K[I]] (as a function of x), when we call it with y,
# we get S[y][I], where x gets replaced with y
```
- by doing the conversion, we show that SKI calculus is also universal
- Iota
- an even simpler calculus system, can be converted to SKI, therefore universal
- tag systems
- tag system
> A tag system is a model of computation that works like a simplified Turing machine: instead of moving a head back and forth over a tape, a tag system operates on a string by repeatedly adding new characters to the end of the string and removing them from the beginning.
- two parts of a tag system
- rules that specify some characters to append to the string when a particular character appears at the beginning
- a **deletion number**, which specifies how many characters to delete from the beginning each time
- to simulate a Turing machine with a tag system, we need some building blocks
- representation of numbers
- we use `aa` + (2n) `b` to represent number n
- why use (2n) `b` instead of n `b` here? for convenience of some functions below
- increment
- rules: a -> ccdd, b -> dd
- deletion number: 2
- double
- rules: a -> cc, b -> dddd
- deletion number: 2
- halve
- rules: a -> cc, b -> d
- deletion number: 2
- odd/even test
- rules: a -> cc, b -> d, c -> eo, d -> "", e -> e
- deletion number: 2
- simulating a Turing machine
- use only 0 and 1 in the tape
- treat left part (including current position) as a number and right part as a number written backward, use two numbers in the tag system to represent them as "aabb...bbccdd..dd"
- simulate tape operations as
- read current character: odd/even test of left part
- move head right: double left, halve right, increment left if current character = 1
- write character: increment/decrement (decrement = halve + double) left, together with odd/even test
- therefore a tag system is also universal
- cyclic tag systems
- definition
- A cyclic tag system is a tag system that has been made even simpler by imposing some extra constraints:
- a string can contain only two characters, 0 and 1.
- a rule can only apply when the current string begins with 1, never 0
- the deletion number is always 1
- the current rule cycles repeatedly through the rulebook
- we simulate original tag systems in this way
- encode characters in 0/1: assume we use four characters a,b,c,d, we encode a as 1000, b as 0100, c as 0010, d as 0001
- convert rules into 0/1 representations, and order them in alphabetical order (a/b/c/d), make sure that 1 in the four-digit string appears exactly when the corresponding rule is used (for instance, for a, 1 appears in the first position, therefore the corresponding rule should appear in the first position in the rulebook)
- Pad out the cyclic tag system’s rulebook with empty rules to simulate the original tag system’s deletion number, for instance, if the deletion number is 2, and the number of rules is 4, then we need to add additional 4 empty rules (total number of rules becomes 8)
- therefore a cyclic tag system is also universal
- Conway's game of life
- Rule 110
- Wolfram's 2,3 Turing machine
## Impossible Programs
- Previously we saw that many attempts to improve the functionality of the Turing machines failed, and we may wonder
> How much further can we push this progression of increasingly powerful systems? Perhaps not indefinitely: our attempts to make Turing machines more powerful by adding features didn’t get us anywhere, which suggests there may be a hard limit on
computational power. So what are computers and programming languages fundamentally capable of, and is there anything that they can’t do? Are there any impossible programs?
- universal systems can loop forever
-
> any system that’s powerful enough to be universal will inevitably allow us to construct computations that loop forever without halting.
- following is one way to construct such a program
```Ruby
# first we define a function that takes a source code of program and input, and returns the its output
require 'stringio'
def evaluate(program, input)
old_stdin, old_stdout = $stdin, $stdout
$stdin, $stdout = StringIO.new(input), (output = StringIO.new)
begin
eval program # very handy to use eval function in Ruby
rescue Exception => e
output.puts(e)
ensure
$stdin, $stdout = old_stdin, old_stdout
end
output.string
end
# then we use "program" as input to itself
def evaluate_on_itself(program)
evaluate(program, program)
end
# now create a file named "does_it_say_no.rb"
program = $stdin.read
if evaluate_on_itself(program) == 'no'
print 'yes'
else
print 'no'
end
# when we run "ruby does_it_say_no.rb < does_it_say_no.rb", it could never stop
```
- the example above is based on the assumption that a program can read its own source code, but this is not necessary, see more information in the book
- decidability and halting problem
- def: A decision problem is any question with a yes or no answer (a boolean), A decision problem is **decidable (or computable)** if there’s an algorithm that’s guaranteed to solve it in a finite amount of time for any possible input
- halting problem is one of the most famous undecidable problems
- what if halting problem is decidable? Then a bunch of difficult problems can be solved using this method, for instance, we can construct following program to search for a counterexample of Goldbach conjecture:
```Ruby
require 'prime'
def primes_less_than(n)
Prime.each(n - 1).entries
end
def sum_of_two_primes?(n)
primes = primes_less_than(n)
primes.any? { |a| primes.any? { |b| a + b == n } }
end
n = 4
while sum_of_two_primes?(n)
n = n + 2
end
print n
# if this program could stop, Goldbach conjecture is false.
```
- proof of undecidability of the halting problem
```Ruby
# suppose such a program "halts?(program, input)" exists, that takes program and input as
# input and returns if it can stop eventually.
# then using the same trick
def halts_on_itself?(program)
halts?(program, program)
end
program = $stdin.read
if halts_on_itself?(program)
while true
# do nothing
end
end
# this program halts only when it does not halt, so it could never return true or false,
# therefore halting problem is undecidable
```
- other undecidable problems
- "hello world" problem: determine if a program would print out "hello world"
- turns out that it is also undecidable, otherwise, we can construct a function to solve halting problem this way
```Ruby
def halts?(program, input)
hello_world_program = %Q{
program = #{program.inspect}
input = $stdin.read
evaluate(program, input) # evaluate program, ignoring its output
print 'hello world'
}
prints_hello_world?(hello_world_program, input)
end
```
- Rice's theorem
> any nontrivial property of program behavior is undecidable, because the halting problem can always be reduced to the problem of deciding whether that property is true
> The implications of Rice’s theorem are depressing too: not only is the question “does this program halt?” undecidable, but so is “does this program do what I want it to do?” We live in a universe where there’s no way to build a machine that can accurately predict whether a program will print hello world, calculate a particular mathematical function, or make a particular operating system call, and that’s just the way it is.
- coping with uncomputability
## Programming in Toyland
- this part basically talks about power of abstraction with the example of constructing a type checker. See compilers notes for more information.