Data Structures and Algorithms Go

Overview

Algorithm

Simple Sorts

Selection Sort

Sort a list of []Type by searching for the smallest (or largest) element of the array, removing the found element from the list, and copying it to a new list.

Time Complexity: $O(n^2)$

Example
 1func selectionSort(list []int) []int {
 2	newList := []int{}
 3	for range list {
 4		smallest, smallestIndex := findSmallest(list)
 5		list = append(list[:smallestIndex], list[smallestIndex+1:]...)
 6		newList = append(newList, smallest)
 7	}
 8	return newList
 9}
10
11func findSmallest(list []int) (int, int) {
12	var smallest, smallestIndex int = list[0], 0
13	for i, number := range list {
14		if number < smallest {
15			smallest = number
16			smallestIndex = i
17		}
18	}
19	return smallest, smallestIndex
20}

Bubble Sort

Thanks Obama

Starts with the last item in the array, comparing it to n-1. If item is smaller than the item before it in the array then the two items are swapped. The sorted values are "bubbled up"

Time Complexity: $O(n^2)$

Example
 1func bSort(list []int) []int {
 2	for start := 0; start < len(list)-1; start++ {
 3		index := start
 4		for end := len(list) - 1; end > index; end-- {
 5			if list[end] < list[end-1] {
 6				temp := list[end-1]
 7				list[end-1] = list[end]
 8				list[end] = temp
 9			}
10		}
11	}
12	return list
13}

Insertion Sort

Search each element in the array, inserting each element into its proper place in respect to the other already sorted elements

Time Complexity: $O(n^2)$

Example
1func insertionSort(list []int) []int {
2	for i := 1; i < len(list); i++ {
3		for j := i; j > 0 && list[j-1] > list[j]; j-- {
4			list[j], list[j-1] = list[j-1], list[j]
5		}
6	}
7	return list
8}

$O(n\log n)$ Sorts

Quick Sort

A divide-and-conquer sorting algorithm that uses recursion. To perform a quick sort:

  1. Select a pivot element in the array
  2. iterate through the array, move every element greater than the pivot to a one side, and every element less than to the other side of the pivot
  3. Recursively preform the pivot and moving until the resultant sub-arrays are either empty or contain 1 element

The pivot can effect the time complexity, the best case is that both sub-arrays are of equal size.

Time Complexity:
average: $O(n\log n)$
worst: $O(n^2)$

Example
 1func quicksort(list []int) []int {
 2	if len(list) < 2 {
 3		return list
 4	} else {
 5		pivot := list[len(list)/2]
 6		left, right, middle := []int{}, []int{}, []int{}
 7		for _, value := range list {
 8			if value < pivot {
 9				left = append(left, value)
10			} else if value > pivot {
11				right = append(right, value)
12			} else {
13				middle = append(middle, value)
14			}
15		}
16		return append(append(quickSortFunctional(left), middle...), quickSortFunctional(right)...)
17	}
18}
Example using in-place quickSort

In place expands upon the quick sort algorithm, by sorting values while scanning for elements less than and greater than the pivot element value. This method of quick sort is less efficient than above (Hoare's method) due to requiring more swaps and comparison.

 1func main() {	
 2	list := []int{5, 1, 2, 18, 3}
 3	quickSort(list, 0, len(list)-1)
 4	fmt.Println(list)
 5}
 6
 7func quickSort(list []int, low, high int) []int {
 8	if low < high {
 9		var pivot int
10		list, pivot := partition(list, low, high)
11		list = quickSort(list, low, pivot-1)
12		list = quickSort(list, pivot+1, high)
13	}
14	return list
15}
16
17func partition(list []int, low, high int) ([]int, int) {
18	pivot := list[high]
19	i := low
20	for j := low; j < high; j++ {
21		if list[j] < pivot {
22			list[i], list[j] = list[j], list[i]
23			i++
24		}
25	}
26	list[i], list[high] = list[high], list[i]
27	return list, i
28}

Merge Sort

Merge search is another divide and conquer sorting method like quicksort. In order to sort, cut the array into 2 subarray, recursively call mergeSort method, until the sub-arrays have a length less than 2, then combine and sort the final 2 resulting arrays.

Time Complexity: $O(n\log n)$

Example
 1func mergeSort(list []int) []int {
 2	if len(list) < 2 {
 3		return list
 4	}
 5
 6	mid := len(list) / 2
 7	left := mergeSort(list[:mid])
 8	right := mergeSort(list[mid:])
 9	return merge(left, right)
10
11}
12
13func merge(leftArray []int, rightArray []int) []int {
14	tempArray := []int{}
15	i, j := 0, 0
16	for i < len(leftArray) && j < len(rightArray) {
17		if leftArray[i] < rightArray[j] {
18			tempArray = append(tempArray, leftArray[i])
19			i++
20		} else {
21			tempArray = append(tempArray, rightArray[j])
22			j++
23		}
24	}
25
26	// if one array was longer than the other
27	for ; i < len(leftArray); i++ {
28		tempArray = append(tempArray, leftArray[i])
29	}
30	for ; j < len(rightArray); j++ {
31		tempArray = append(tempArray, rightArray[j])
32	}
33	return tempArray
34}

Searching

Search an sorted list by bisecting it after every lookup action. The effeciency of binary search increases directed related to the size of the list (a function of logrithmic time)

Time Complexity: $O(\log n)$

Example

 1func binarySearch(list []int, item int) int {
 2	low := 0
 3	high := len(list)
 4	for low <= high {
 5		// go will round here, bc mid is a type int
 6		mid := (low + high) / 2
 7		if item == list[mid] {
 8			return mid
 9		} else if item < list[mid] {
10			high = mid - 1
11		} else {
12			low = mid + 1
13		}
14
15	}
16	return -1
17}

Data Structures

Arrays, Slices, and LinkedLists

Arrays are a collection of values, stored in sequential memory. In go arrays are of a fixed length, and cannot be resized. More often in go developers will interact with the Slice data type, and allow arrays to act as the underlying building block for slices.

 1testArray := [4]int{2, 4, 6, 8}
 2
 3for i, value := range testArray {
 4	fmt.Printf("testArray[%d] memory address: %v\n", i, &value)
 5}
 6
 7// testArray[0] memory address: 0xc00011a040
 8// testArray[1] memory address: 0xc00011a048
 9// testArray[2] memory address: 0xc00011a060
10// testArray[3] memory address: 0xc00011a068

Linked lists are a collection of values AND a pointer to the next value contained within the list. They are distinct from arrays in that they are not stored in contiguous memory. As a result appending a new item to the list does not requrie an copying the entire list to a new data structure.

The trade-offs between these two structures should become apparent quickly;

  • Arrays are better for reading items as they allow for random access whereas linked lists require sequential access
  • Linked lists more easily allow for insertions and deletions of data in the middle of lists as only one element has to be updated
  • Arrays require more memory in reserve, as they cannot be resized after initialize, where linkedlists only require additional memory available to store additional values

Hash Tables

Hash tables, sometimes called hash maps, maps, dictionaries, and associative arrays, is array of key value pairs, that uses a hash function to consistantly map values to an index in a deterministic way. Hash functions accept byte data and should return an index. Hash functions allow data look ups in O(1) time complexity making them one of the most powerful data structures in computer science.

Good hash functions will determine the effectiveness of the underlying data structure.

  • They should consistently map data to the same index
  • should map different data to different indexes
  • should be aware of the underlying array structure and only return valid array indexes
  • should minimize the number of collisions (returning the same index for multiple values)

Because of these capabilities hash tables are extremely effective at

  • preforming lookups
  • modeling relationships from one thing to another
  • checking for duplicative entries in a given set of data
  • acting as a cache

Collisions

As mentioned above hash tables should avoid collisions as much as possible. But what if the set of data is greater than the length of the underlying array, or by its nature the hashing function must return duplicative indexes. One such example would be if you were sorting names in an employee directory, and wanted a hash table where the first letter in the last name was the hash. Of course there would be multiple entries for each letter.

To accomidate this, while still maintaining some of the benefits of hash tables, a linked list can be created to allow multiple entries to be added to a single hash index. This is not the only solution, but a common pattern.

A comparison of the above data structures and their average time complexity

Hash Maps (Worst)Hash Maps (Average)ArraysLinked Lists
Search$O(n)$$O(1)$$O(1)$$O(n)$
Insert$O(n)$$O(1)$$O(n)$$O(1)$
Delete$O(n)$$O(1)$$O(n)$$O(1)$

Tree data structures

a tree is a nonlinear structure (e.g. a list, or queue) in which each node is cdapable of having many successor nodes, called children.

  • Trees are recursive structures. every node must have a unique parent, with the exception of the root node.
    • another way of thinking of this is that there must be a unique path to each and every node.

Binary trees

a binary tree is a tree where each node is capable of having up to 2 children.

if a binary tree node has no children, then it is called a leaf

Binary search trees

A unique feature of binary search trees is how data is added to the trees

When added data to a binary search tree:

  • nodes with smalled data than the root node are inserted in the left subtree
  • nodes with larger data than the root node are inserted in the right subtree

There are 3 traversal patterns of a binary search tree

  • preorder traversal - visit the root, visit the left subtree, visit the right subtree
  • inorder traversal - vist the left most subtree, visit the root, visit the right subtree
    • visits nodes in order of smallest to largest
  • postorder traversal - vist the left subtree, visit the right subtree, visit the root
comments powered by Disqus