일  월  화  수  목  금  토 

1  2  3  4  5  6  
7  8  9  10  11  12  13 
14  15  16  17  18  19  20 
21  22  23  24  25  26  27 
28  29  30  31 
 시스템
 Multi Task
 문서 유사도
 JMT Model
 Logistic Regression
 vector space model
 luckyzzang
 sis
 하드콘
 CLASS
 linear regression
 Webpack
 POS tagging
 Relatedness
 꼬리 재귀
 Textual Entailment
 Softmax Regression
 BOF
 system hacking
 Dependency parsing
 Mechanize 로그인
 Tail Recurion
 ReactJS
 gevent 사용
 React
 mprotect
 phpfpm
 Least Squared Method
 deep learning
 hdcon
 Today
 23
 Total
 108,238
Yumere
Data Structure, 자료구조 정리 본문
Data Structure
이번에 졸업하면서 배웠던 과목들에 대해 정리중이다.
Linked Lists
Singly Linked List
연속적인 노드들로 이루어진 실체가 있는 Data Structure 각 노드들은 데이터와 다음 노트를 가리키는 링크를 저장
Insert at the Head
 Allocate a new Node
 Insert new element
 Have new node point to old head
 Update head to point to new node
Removing at the Head
 Update head to point to next node in the list
 Allow garbage collector to reclaim the former first node
Inserting at the Tail
 Allocate a new Node
 Insert new element
 Have new node point to null
 Have old node point to new node
 Update tail point to new node
Removing at the Tail
Singly List 에서 Tail 부터 삭제하는 것은 효과적이지 못하다 *Tail이 이전 node를 Constant Time에 가리키게 하는 방법은 존재하지 않는다*
Source Code(Linked List)
myList.hpp
main.cpp
Analysis of Algorithms
Running Time
Most Algorithms transform input objects to ouput objects The running time of algorithm typically grows with the input size Average case time is difficult to determine Should focus on Worst Case Running Time
n size를 가진 A 배열 중 최대 값을 찾는 알고리즘
Algorithm arrayMax(A, n)
Input array A of n integers
Output maximum element of A
currentMax A[0]
for i 1 to n 1 do
if A[i] currentMax then
currentMax A[i]
return currentMax
최대 값을 가진 element가 배열 중 임의의 위치에 있더라도, 그 값이 최대 값인지 확인하기 위해 배열의 마지막까지 비교를 해야 한다. 고로 무조건 n번 반복을 하게 된다.
BigOh Notation
\(f(n)\)과 \(g(n)\)이 존재하고 양의 상수 \(c\)와 \(n_0\)이 존재하면, \(f(n)\)은 \(O(g(n))\)이다. (\(f(n) \le cg(n), n \ge n_0\))
Example : \(2n+10\) is \(O(n)\),
 \(2n+10 \le cn\)
 \((c2)n \ge 10\)
 \(n \ge \dfrac{10}{(c2)}\)
Example : \(n^2\) is not \(O(n)\),
 \(n^2 \le cn\)
 \(n \le c\)
Asymptotic Algorithm Analysis
알고리즘의 Asymptotic Analysis는 bigoh notation에 의한 running time을 결정한다.
Asymptotic Analysis를 하기 위해,
 알고리즘의 input size에 따른 Primitive Operation^{1}의 Worst Case 값을 찾는다.
 값을 BigOh Notation으로 표현한다.
Example :
 위의
arrayMax(A,n)
알고리즘의 Primitive Operation 값은 \(8n2\)이다  이를 BigOh Notation으로 표현하면,
arrayMax(A,n)
은 "Runs in \(O(n)\) time"
Stacks
ADTs(Abstract Data Types)
An ADT specifies Data stored, Operation on data and Error conditions associated with operations
Stack ADT
 Stack ADT stores arbitary objects
 Insertion and Deletions follow the lastinput, firstout scheme
 Main Stack Operations
push(object)
 inserts an elementpop()
 removes the last inserted element
Array based Stack
Stack을 구현하는 가장 쉬운 방법은 Array를 이용하는 것 element들을 배열의 왼쪽 부터 오른쪽으로 향하면서 채워 넣는다
Performance of array based stack
 Stack에 있는 element들의 개수를 n이라 한다
 memory usage는 \(O(n)\)
 각각의 Operation은 \(O(1)\) 시간에 가능
Limitation of array based stack
 Stack의 크기가 사전에 정의되어야 하고, 이후 변경이 불가능
 그러므로 Stack이 가득 차면 예외가 발생
Source Code(Array based Stack)
Source code(List based Stack)
Queues
Queue ADT
 The Queue ADT stores arbitary objects
 Insertions and Deletions follow the Firstinput, Firstout scheme
 Insertions are at rear of the queue, Removals are at front of the queue
 Main Queue Operations
enqueue(object)
 inserts an object at the end of queuedequeue()
 removes the element at the front of queue
Array based Queue
 \(N\) 크기의 배열을 원형 방식으로 이용
 Three variables keep track of the front and rear
 \(f\)  index of the front element
 \(r\)  index immediately past the rear element
 \(n\)  number of items in the queue
Vector based Lists
Vector ADT
 The Vector or Array List extend the notation of array by storing sequence of objects
 An element is accessed, inserted or removed by specifying its index
 Main Operations
at(integer i)
 returns element at index i without removing itset(integer i, object o)
 replace the element at index i with oinsert(integer i, object o)
 insert a new element o to have index ierase(integer i)
 removes element at index i
Array based Implementation
 Use an array A of size N
 A variable \(n\) keeps track of the size of the vector(number of elements stored)
 Operation
at(i)
is implemented in \(O(1)\) time by returning A[i]  Operation
set(i,o)
is implemented in \(O(1)\) time by performing A[i] = o  Operation
insert(i,o)
is implemented i \(O(n)\) time need to make room for the new element by shifting forward
 In the Worst case \((i = 0), this takes \)O(n)$ time
 Operation
erase(i)
is implemented in \(O(n)\) time need to fill the hole left by the removed element by shifting backward
 In the Worst case \((i = 0)\), this takes \(O(n)\) time
Growable Array based vector
배열이 가득 찼을 때 insert(o)
를 수행하면 배열의 크기를 늘리고, 이를 복사해야 한다. 이 때 Incremental Strategy, Doubling Strategy를 이용할 수 있다
Lists
Doubly Linked List
 A doubly linked list provides a natural implementation of the List ADT
 This has two link that each point to previous or next node
Sequence ADT
 The Sequence ADT is the union of the Vector and List ADTs
 Elements accessed by Rank, Position
 Vector based methods
elemAtRank(r)
replaceAtRank(r,o)
insertAtRank(r,o)
removeAtRank(r)
 List based methods

first()
,last()
,before(p)
,after(p)

replaceElement(p, o)

swapElements(p, q)

insertBefore(p, o)

insertAfter(p, o)

insertFirst(o)

insertLast(o)

remove(p)

Sequence Implementations
Operation  Array  List 

size, isEmpty  1  n 
atRank, rankOf, elemAtRank  1  n 
first, last, before, after  1  1 
replaceElement, swapElements  1  1 
replaceAtRank  1  n 
insertAtRank, removeAtRank  n  n 
insertFirst, insertLast  1  1 
insertAfter, insertBefore  n  1 
remove  n  1 
Trees
 A tree is an abstract model of a hierarchical structure
 Consists of nodes with a parentchild relation
Terminology
Root: node without parent
Internal Node: node with at least one child
External Node: node without children
Ancestors of Node: parent, grandparent, grandgrandparent, etc.
Depth of a node: number of ancestors
Height of a tree: maximum depth of any node
Descendant of a node: child, grandchild, grandgrandchild, etc.
Tree ADT
 Generic Methods
size()
empty()
 Accessor Methods
root()
positions()
 Positionbased Methods
p.parent()
p.children()
 Query Methods
p.isroot()
p.isExtrenal()
Preorder Traversal
 A traversal visits the nodes of a tree in a systematic manner
 In a preorder traversal, a node is visited before its descendants
Postorder Traversal
 A node is visited after its descendants
Binary Trees
 Binary Tree has following properties
 Each internal node has at most two children
 The children of a node are an ordered pair
 Call Left child, Right child
Properties of Proper Binary Trees
Notation
n: number of nodes
e: number of external nodes
i: number of internal nodes
h: height
Properties
 \(e = i +1\)
 \(n = 2e  1\)
 \(h \le i\)
 \(h \le (n1)/2\)
 \(e \le 2^h\)
 \(h \ge \log_2 e\)
 \(h \ge \log_2 (n+1)  1\)
Inorder Traversal
 A node is visited after its left subtree and before its right subtree
Algorithm inOrder(v)
if is not v.isExternal()
isOrder(v.left())
visit(v)
if is not v.isExternal()
inOrder(v.right())
Euler Tour Traversal
 Generic traversal of a binary tree
 Includes a special cases the preorder, postorder and inorder traversals
 Walk around the tree and visit each node three times
 on the left(preorder)
 from below(inorder)
 on the right(postorder)
벽을 따라 간다고 생각
Priority Queues
Priority Queue ADT
 An entry in a pair(key, value), where the key indicates the priority
 Main Operations
insert(e)
 inserts an entry eremoveMin()
 removes the entry with smallest key
Priority Queue Sorting
Can use a priority queue to sort a set of comparable elements
 Insert the elements one by one with a series of insert(e)
operations
 Remove the elements in sorted order with a series of removeMin()
operations
Algorithm PQSort(S, C)
Input sequence S, comparator C for the elements of S
Output sequence S sorted in increasing order according to C
P < priority queue with Comparator C
while S.empty() is not
e < S.front();
S.eraseFront()
P.insert(e, null)
while P.empty() is not
e < P.removeMin()
S.insertBack(e)
Selection Sort
Selection sort is the variation of PQSort, is implemented with an unsorted sequence
insert(e)
 Insert the elements into the priority queue with n operations, takes \(O(n)\) timeremoveMin()
 Remove the elements in sorted order from the priority queue with n operations, takes \(O(n^2)\) time So, Selection Sort runs in \(O(n^2)\) time \[1 + 2 + 3 + ... + n = \dfrac{n(n+1)}{2}\]
Insertion Sort
Insertion sort is the variation of PQSort, is implemented with an sorted sequence
insert(e)
 Insert the elements into the priority queue with n operations, takes \(O(n^2)\) time \[1 + 2 + 3 + ... + n = \dfrac{n(n+1)}{2} = O(n^2)\]removeMin()
 Remove elements in sorted order from the priority queue with n operations, takes \(O(n)\) time So, Insertions Sort runs in \(O(n^2)\) time
Sequence based Priority Queue
Unsorted List
insert(e)
 insert the item at the beginning or end of the sequence, takes \(O(1)\) timeremoveMin()
 Have to traverse the entire sequence to find the smallest key, takes \(O(n)\) time
Sorted List
insert(e)
 Have to find the place where to insert the item, takes \(O(n)\)removeMin()
 Smallest key is at the beginning, takes \(O(1)\)
Heaps
 A heap is a binary tree storing keys at its nodes and satisfying the following properties
 HeapOrder  for every internal node v other than the root, \(key(v) \ge key(parent(v))\)
 Complete Binary Tree^{2}  let \(h\) be the height of the heap
 for \(i = 0, ..., h1\), there are \(2^i\) nodes of depth \(i\)
 at depth \(h1\), the internal nodes are to the left of the external nodes
 The last node of a heap is the rightmost node of maximum depth
Height of a Heap
A heap storing n keys has height \(O(logn)\)  Let \(h\) be the height of a heap storing n keys  Since there are \(2^i\) keys at depth \(i = 0, ..., h1\) and at least one key at depth \(h\), we have \(n \ge 1+ 2 + 4 + 2^{h1} + 1\)  Thus, \(n \ge 2^h, i.e, h \le logn\)
Heaps and Priority Queues
 Can use a heap to implement a priority queue
 Store a (key, element) item at each internal node
 Keep track of the position of the last node
Insertion into a Heap
The insertion algorithm consists of three steps 1. Find the insertions node \(z\)(the new last node) 2. Store \(k\) at \(z\) 3. Restore the heaporder property
Upheap
 After the insertion of a new key \(z\), the heaporder property may bo violated
 Algorithm upheap restores the heaporder property by swapping \(z\) along an upward path from the insertion node
 Upheap terminates when the key \(k\) reaches the root or a node whose parent has a key smaller than or equal to \(z\)
 Sice a hea has height \(O(logn)\), upheap runs in \(O(logn)\) time
Removal from a Heap
The removal algorithm consists of three steps 1. Replace the root key with the key of the last node \(w\) 2. Remove \(w\) 3. Restore the heaporder property
Downheap
 After replacing the root key with the key \(k\) of the last node, the heaporder property may be violated
 Algorithm downheap restores the heaporder property by swapping key \(k\) along a downward path from the root
 Upheap termiates when key \(k\) reaches a leaf or a node whose children have keys greater than or equal to \(k\)
 Since a heap has height \(O(logn)\), downheap runs in \(O(logn)\) time
Updating the Last Node
The insertion node can be found by traversing a path of \(O(logn)\) nodes  Go up until a left child or the root is reached  If a left child is reached, go to the right child  Go down left until a leaf is reached Similar algorithm for updating the last node after a removal
Heap Sort
 The space used in \(O(n)\)
 methods
insert
andremoveMin
take \(O(logn)\) time  methods
size
,empty
, andmin
take time \(O(1)\) time  Using a heap based priority queue, Can sort sequence of \(n\) element in \(O(nlogn)\) time
Vector based Heap Implementation
 For the node at rank i
 the left child is at rank \(2i\)
 the right child is at rank \(2i+1\)
 Links between nodes are note explicitly sotred
 Operation
insert
corresponds to inserting at rank \(n+1\)  Operation
removeMin
corresponds to removing at rank n  Yields inplace heap sort
Merging Two Heaps
 Given two heaps and a key \(k\)
 Create a new heap with the root node storing \(k\) and with the two heaps as subtrees
 Perform downheap to restore the heaporder property
Bottom up Heap Construction
 Can construct a heap storing \(n\) given keys in using a bottomup construction with \(logn\) phases
 In phase \(i\), pairs of heaps with \(2^i1\) keys are merged into heaps with \(2^{i+1}1\) keys
Hash Tables
 A hash function \(h\) maps keys of a given type to integers in a fixed interval [0, \(N1\)]
 The integer \(h(x)\) is called the hash value of key \(x\)
 A hash table for a given key type consist of
 Hash function \(h\)
 Array(called table) of size \(N\)
Hash Functions
A hash function is specified as the composition of two functions
Hash code: \(h_1\) : keys > integers
Compression functions:
\(h_2\) : integers > [0, \(N1\)]
\[h(x) = h_2(h_1(x))\]
Hash codes
Memory address
 Reinterpret the memory address of the key object as an integer
 except for numeric and string keys
Integer cast
 Reinterpret the bits of the key as an integer
 Suitable for keys of length less than or equal to the number of bits of the integer type
Compression functions
Division
 \(h_2(y) = y\) mod \(N\)
 The size \(N\) of the hash table is usually chosen to be a prime
 The reason has to do with number theory and is beyond the scope of this course
Multiply, Add and Divide(MAD)
 \(h_2(y) = (ay+b)\) mod \(N\)
 \(a\) and \(b\) are nonnegative integers such that \(a\) mod \(N \ne 0\)
Collision Handling
 Collisions occur when different elements are mapped to the same cell
 Separate Chaining : let each cell in the table point to linked list of entires that map there
 Separate chaining is simple, but requires additional memory outside the table
Linear Probing
 Open addressing
 The colliding item is placed in a different cell of the table
 Linear probing
 Handles collisions by placing the colliding item in the next available table cell
 Each table cell inspected is referred to as a probe
 Colliding items lump together, causing future collisions to cause a longer sequence of probes
Double Hashing
 Double hashing uses a secondary hash function \(d(k)\) and handles collisions by placing an item in the first available cell of the series \[(i+jd(k)) \mod N (j=0, 1, ..., N1)\]
 \(d(k)\) can not have zero value
 Table size \(N\) must be a prime to allow probing of all the cells
 Compression function for the secondary hash function \[ d_2(k) = qk \mod q(q \lt N, q\textrm{ is prime})\]
Dictionaries
Dictionary ADT
 The dictionary ADT models a searchable collection of keyelement entires
 The main operations of a dictionary are
searching
,inserting
, anddeleting
items  Multiple items with the same key are allowed
Methods
find(k)
 If there is an entry with key \(k\), returns an iterator to it, ele returns the special iterator endput(k, o)
 insert and returns an iterator to iterase(k)
 remove an entry with key \(k\)size()
,empty()
List based Dictionary
put
takes \(O(1)\) time since we can insert the new item at the beginning or at the end of the sequencefind
anderase
take \(O(n)\) time since in the worst case we traverse the entire sequence to look for an item with the given key
Binary Search
 Binary search performs operation
find(k)
on a dictionary implemented by means of an array based sequence, sorted by key
Binary Search Trees
 binary search tree is a binary tree storing keys at its internal nodes and satisfying the following property
 Let \(u\), \(v\) and \(w\) be three nodes such that \(u\) is in the left subtree of \(v and \)w\( is in the **right subtree** of \)v$
 \(key(u) \le key(v) \le key(w)\)
Performance
 The space used is \(O(n)\)
 Methods
get
,put
anderase
take \(O(h)\) time  The height \(h\) is \(O(n)\) in the worst case and \(O(logn)\) in the best case
AVL Trees
 AVL trees are balanced
 An AVL Tree is a binary search tree such that for every internal node \(v\) of \(T\), the heights of the children of v can differ by at most 1
Insertion
 Insertion is as in a binary search tree
 Always done by expanding an external node
Trinode Restructuring
 let \((a,b,c)\) be an inorder listing of \(x,y,z\)
 perform the rotations needed to make \(b\) the topmost node of the tree
Removal
 Removal begins as in a binary search tree, which means the node removed will become an empty external node. Its parent, w, may cause an imbalance
blah, blah, blah... next time
AVL Tree Performance
 A single restructure takes $O(1)time
find
takes \(O(logn)\) time height of tree is \(O(logn)\), no restructures needed
put
takes \(O(logn)\) time initial find is \(O(logn)\)
 Restructuring up the tree, maintaining heights is \(O(logn)\)
erase
takes \(O(logn)\) time initial find is \(O(logn)\)
 Restructuring up the tree, maintaining heights is \(O(logn)\)
Graphs
 A graph is a pair
(V, E)
where \(V\) is a set of nodes, called vertices
 \(E\) is a collection of pairs of vertices, called edges
 Vertices and edges are positions and store elements
Edge Types
Directed edge
 ordered pair of vertices \((u,v)\)
 first vertex \(u\) is the origin
 second vertex \(v\) is the destination
Undirected edge
 unordered pair vertices \((u,v)\)
Directed graph
 All the edges are directed
Undirected graph
 All the edges are undirected
Terminology
End vertices(or endpoints) of an edge: \(U\) and \(V\) are the endpoints of a
Edges incident on a vertex: \(a,d,\) and \(b\) are incident on \(V\)
Adjacent vertices: \(U\) and \(V\) are adjacent
Degree of a vertex: \(X\) has degree 5
Parallel edges: \(h\) and \(i\) are parallel edges
Selfloop: \(j\) is a selfloop
Path
 sequence of alternating vertices and edges
 begins with a vertex
 ends with a vertex
 each edge is preceded and followed by its endpoints
Simple path: path such that all its vertices and edges are distinct
Cycle: circular sequenc of alternating vertices and edges each edge is preceded and followed by its endpoints
Simple Cycle: cycle such that all its vertices and edges are distinct
구현 방법
Edge List Structure
Adjacency List Structure
Adjacency Matrix Structure
Performance
\(n\) vertices, \(m\) edges \(\\\)no parallel edges \(\\\) no selfloop  Edge List  Adjacency List  Adjacency Matrix 

Space  \(n+m\)  \(n+m\)  \(n^2\) 
v.incidentEdges() 
\(m\)  \(deg(v)\)  \(n\) 
u.isAdjacentTo(v) 
\(m\)  \(min(deg(v), deg(w))\)  1 
insertVertex(o) 
1  1  \(n^2\) 
insertEdge(v,w,o) 
1  1  1 
eraseVertex(v) 
\(m\)  \(deg(v)\)  \(n^2\) 
eraseEdge(e) 
1  1  1 
DepthFirst Search
Subgraphs
 A subgraph \(S\) of a graph \(G\) is a graph such that
 The vertices of \(S\) are a subset of the vertices of \(G\)
 The edges of \(S\) are a subset of the edges of \(G\)
 A Spanning subgraph of \(G\) is a subgraph that contains all the vertices of \(G\)
Connectivity
 A graph is connected if there is a path between ever pair of vertices
 A connected component of a graph \(G\) is a maximal connected subgraph of \(G\)
Trees and Forests
 A (free) tree is an undirected graph \(T\) such that
 \(T\) is connected
 \(T\) has no cycles
 A forest is and undirected graph without cycles
 The connected components of a forest are trees
Spanning Trees and Forests
 A spanning tree of a connected graph is a spanning subgraph that is a tree
 A spanning tree is not unique unless the graph is a tree
 A spanning forest of a graph is a spanning subgraph that is a forest
DepthFirst Search
 DFS is a general technique for traversing a graph
A DFS traversal of graph \(G\)
 Visits all the vertices and edges of \(G\)
 Determines whether \(G\) is connected
 Computes the connected components of \(G\)
 Computes a spanning forest of \(G\)
DFS on graph with \(n\) vertices and \(m\) edges takes \(O(n+m)\) time
Properties of DFS
 \(DFS(G, v)\) visits all the vertices and edges in the connected component of \(v\)
 The discovery edges labeled by \(DFS(G,v)\) form a spanning tree of the connected component of \(v\)
Breadth First Search
 BFS is a general technique for treversing a graph
A BFS traversal of a graph \(G\)
 Visits all the vertices and edges of \(G\)
 Determines whether \(G\) is connected
 Computes the connected components of \(G\)
 Computes a spanning forest of \(G\)
BFS on graph with \(n\) vertices and \(m\) edges takes \(O(n+m)\) time
Step 1
Step 2
Step 3
Properties
\(G_s\) : connected component of \(s\)
 \(BFS(G, s)\) visits all the vertices and edges of \(G_s\)
 The discovery edges labeled by \(BFS(G, s)\) form a spanning tree \(T_s\), of \(G_s\)
 For each vertex \(v\) in \(L_i\)
 The path of \(T_s\) from \(s\) to \(v\) has \(i\) edges
 Every path from \(s\) to \(v\) in \(G_s\) has at least \(i\) edges
'Class' 카테고리의 다른 글
Algorithm 정리 (0)  2015.05.08 

computer network 정리 (0)  2015.05.08 
OS(Operating System) 정리 (2)  2015.04.29 
Data Structure, 자료구조 정리 (0)  2015.04.18 