Tuesday, January 29, 2013

Calculating Standard Deviation

Whenever calculations are made from observed data using some relation there will be an error in these calculations which is known as standard deviation. Now, exactly what this error is varies between the relationship used and the actual data obtained.


The standard deviation is usually represented by the \( \sigma_A \) (sigma) Greek letter, where \( A \) is the thing whose error we are finding. For example, if we calculated some value for gravity (let's call it \( g \)) the standard deviation of the calculated gravity would be represented as \( \sigma_g\). Pretty straight forward, huh?

Equations

Let's first talk about some general equations. For simplicity, let's call our calculated value \( Z \).

If \( Z \) is represented as a sum or difference of terms, \( Z = A_1 + A_2 + ... + A_n\), then \( \sigma_Z = \sqrt{ \sigma_{A_1}^2 + \sigma_{A_2}^2 + ... + \sigma_{A_n}^2 } \).

For example:
\[
  Z = A + B - C \Rightarrow \sigma_Z = \sqrt{ \sigma_A^2 + \sigma_B^2 + \sigma_C^2 }
\]

If \( Z \) is represented as a product or quotient of terms, \( Z = \frac{ A_1 \times A_2 \times ... \times A_n }{ B_1 \times B_2 \times ... \times B_n } \), then its relative standard deviation is given by \( \frac{\sigma_Z}{ Z } = \sqrt{ \left( \frac{\sigma_{A_1}}{A_1} \right)^2 + ... + \left( \frac{\sigma_{A_n}}{A_n} \right)^2 + \left( \frac{\sigma_{B_1}}{B_1} \right)^2 + ... + \left( \frac{\sigma_{B_n}}{B_n} \right)^2} \). Relative standard deviation is just the standard deviation over the calculated value itself. Ergo to get just the standard deviation we just multiply the whole thing by our calculated value.

For example:
\[
  Z = \frac{ A B }{ C } \Rightarrow \sigma_Z = Z \sqrt{ \left( \frac{\sigma_{A}}{A} \right)^2 + \left( \frac{\sigma_{B}}{B} \right)^2 + \left( \frac{\sigma_{C}}{C} \right)^2 }
\]
 If \( Z \) is represented as another variable to a power, \( Z = A^n \), then its relative standard deviation is given by \( \frac{ \sigma_Z }{ Z } = n \frac{ \sigma_A }{ A } \). Like before, to get \( \sigma_Z \), we just multiply by our calculated \( Z \).

For example:
\[
  Z = A^7 \Rightarrow \sigma_Z = 7 Z \frac{ \sigma_A }{ A }

\]


Those are the basic equations that you need to find the standard deviation. For more on techniques, see the following example.

Example

Suppose we want to find the standard deviation of a calculated value, let's call it \( Z\), where:
\[
  Z = A + \frac{ B^2 C}{D}

\]
To make things easier, let \( U = \frac{ B^2 C }{ D } \) and also let \( V = B^2 \). So,
\[
  Z = A + U \Rightarrow \sigma_Z = \sqrt{ \sigma_A^2 + \sigma_U^2 }

\]
So, we need to find \( \sigma_U \), and since \( V = B^2 \),
\[
  U =  \frac{ V C }{ D } \Rightarrow \sigma_U = U \sqrt{ \left( \frac{ \sigma_V }{ V } \right)^2 + \left( \frac{ \sigma_C }{ C } \right)^2 + \left( \frac{ \sigma_D }{ D } \right)^2}

\]
Lastly, we need to find \( \frac{ \sigma_V }{ V } \),
\[
  V = B^2 \Rightarrow \frac { \sigma_V }{ V } = 2 \frac{ \sigma_B }{ B }

\]
Subbing \( 2 \frac{ \sigma_B }{ B } \) in for \( \frac{ \sigma_V }{ V } \), and \(  \frac{B^2 C}{ D } \) for \( U \) we get,
\[
 U =  \frac{ V C }{ D } \Rightarrow \sigma_U = \frac{B^2 C}{ D } \sqrt{ \left( 2 \frac{ \sigma_B }{ B } \right)^2 + \left( \frac{ \sigma_C }{ C } \right)^2 + \left( \frac{ \sigma_D }{ D } \right)^2}
\]
Finally, subbing this new \( \sigma_U \) into the equation for \( \sigma_Z \), we get,
\[
   \sigma_Z = \sqrt{ \sigma_A^2 + \left( \frac{B^2 C}{ D } \sqrt{ \left( 2 \frac{ \sigma_B }{ B } \right)^2 + \left( \frac{ \sigma_C }{ C } \right)^2 + \left( \frac{ \sigma_D }{ D } \right)^2}  \right)^2 }
\]
Which can be more simply written as,
\[
    \sigma_Z = \sqrt{ \sigma_A^2 + \left( \frac{B^2 C}{ D } \right)^2 \left[ \left( 2 \frac{ \sigma_B }{ B } \right)^2 + \left( \frac{ \sigma_C }{ C } \right)^2 + \left( \frac{ \sigma_D }{ D } \right)^2 \right] }
\]

Monday, January 21, 2013

OpenSUSE 12.2 "Mantis" Review


I really like OpenSUSE. It's truly its own thing, in a world of Redhat and Debian derivatives. It's important to note that when I tried it out I used GNOME 3; I will try not to let that taint my review of this distro.

What I liked

  • Many options for the windowing system (meaning anything but GNOME 3).
  • Installation was easy.
  • Came with the LibreOffice suite (per opt-in) and some other useful programs.
  • The default partitioning scheme makes a partition for /home. This can be great if the OpenSUSE partition crashes and it needs to be re-installed without wiping personal files.
  • Unlike the Debian derivatives I am accustomed to, it actually had a recently updated copy of VIM!
  •  I think that having root access is a plus (feel free to disagree).

 Didn't Like

  • DISCLAIMER: This really could just be a GNOME 3 complaint: Although there are 6 workspaces, they are linear. Unless there is some hotkey that I don't know about, this means that you have to swipe through as many as 5 workspaces to get to the one you're looking for. Plus, because they are in a linear stack, unless your WOI (workspace of interest) is on the top or bottom, it loses some association points (it's hard to tell the PB from the J sometimes).
  • Doesn't come with the YUM package manager. Without it, getting some programs can be much more difficult.

 Surprises

  • By default, includes the mono C# compiler! I really was not expecting this.

Conclusion

Score: 8/10

As I said earlier, I dig me some OpenSUSE. It's unique, it's helpful -- I have no major complaints. If I made one suggestion it would be to use something other than GNOME as your desktop system.

Information

Distro Name:OpenSUSE 12.2 "Mantis"
Kernel:3.4.6-2.10-default
Size After Install:4.0 GB
Desktop Flavors Available:GNOME, KDE, XFCD, LXDE, Minimal X Window, Textual
Package Managers:Aptitude, Yast, RPM, Zypper
Languages/Compliers:Perl, C#, Python, BASH
Web Broswer(s):Firefox
Root Access:True
Has sudo:True

Thursday, January 17, 2013

Latest Project: Email Miner

Just to add to my collection of "data miners" built in Python, I made a (very bare-bones) email miner. Basically all it does is strip emails out of a given URL and save them in some output file. Here's what it looks like (again, forgive my strange windowing system):


So, as you can see it's pretty straightforward. I used a little bit of polymorphism-type classes to make the output formats. Basically, all that needs to happen to add more is the definition of a derived class, the implementation of a writer method and then the new class's name needs to be added to a dictionary of other writer class-names and it's good to go! I love this Python thing...

All code is on my github.

Tuesday, January 15, 2013

Haskell DSA: Quick Sort

Haskell continues to astound me! I have never implemented the quicksort as simply and, frankly, elegantly as this:

quickSort :: (Ord a) => [a] -> [a]

quickSort [] = []
quickSort (x:xs) = quickSort lesser ++ [x] ++ quickSort greater
 where lesser = [e | e <- xs, e < x]
       greater = [e | e <- xs, e >= x]


Here's a quick, line-by-line for anyone who's interested.
  • quickSort :: (Ord a) => [a] -> [a]
    The optional (gotta love that word in programming) function definition. It basically says that quickSort is a function that takes a list of objects, [a] and returns another list of a. Ord a means that a is ordinal, or, as far as we're concerned, sortable
  • quickSort [] = []
    The first pattern we're trying to match -- if we're given an empty list, return an empty list!
  • quickSort (x:xs) = quickSort lesser ++ [x] ++ quickSort greater
    This line does a few cool things: (1) split up a passed list into a first element and remainder, then (2) return the lower ordered elements and the higher sorted on either sides of the first element!
  • where less = [e | e <- xs, e < x]
          greater = [e | e <- xs, e >= x]

    This where clause uses list comprehensions to build and define the lists of lower and higher ordered values. Note that the higher values are also (if the case may be) equal to the head of the list (also know as the pivot).

Friday, January 4, 2013

Heap (Max) in Python

Lately I've been practicing my data structures and algorithms and I came up with this interesting implementation for a max heap:

class Heap(object):
    """ A max heap, parents have two or less equal or smaller children """
    def __init__(self):
        """ Set the count to 0 and it has no root node """
        self.count = 0
        self.root = None

    def heapify(self, iterable):
        """ loops elements in the iterable and adds them to the heap """
        for elem in iterable:
            self.add_item(elem)

    def add_item(self, ordinal):
        """ Adds an ordinal item using priority """
        self.count += 1
        if self.count == 1:
            self.root = Node(ordinal)
            return

        i = self.count
        current_node = self.root
        new_node = Node(ordinal)

        while i != 1:
            if current_node.data < new_node.data:
                current_node.data, new_node.data = new_node.data, current_node.data
            if i%2 == 0:
                if current_node.left:
                    current_node = current_node.left
                else:
                    current_node.left = new_node
            else:
                if current_node.right:
                    current_node = current_node.right
                else:
                    current_node.right = new_node
            i /= 2

class Node(object):
    """ Used to traverse the heap """
    def __init__(self, ordinal):
        """ Require data, make the pointers empty (to None) """
        self.data = ordinal
        self.left = None
        self.right = None 


This design of a heap is pretty interesting because it uses the binary representation of Heap.count (the number of elements in the heap) to determine where the next node should go. The algorithm for this is pretty straight forward:

Increment the `count` by 1         // We're adding a new item 
`i` is `count`
`current node` is `root node`

Check 1: if `new node's value` > `current node's value`,
             then swap(`new node's value`, `current node's value`)

Check 2: if binary(`i`) ends in 0, // AKA is even, (i%2 == 0),
             then `current node` is `current node's left`
         otherwise `current node` is `current node's right`

Check 3: if `current node` is null
             then `current node` is `new node`
         otherwise `i` /= 2 and goto Check 1

The end result is a balanced heap! I also added a __str__ method to the Node class for testing:

    def __str__(self):
        return "[%s <- %s -> %s]" % \
                 (str(self.left) if self.left != None \
                                 else 'END', str(self.data), \
                  str(self.right) if self.right != None \
                                 else 'END')

Thursday, January 3, 2013

One Line Fizz Buzz Test in Python

Here it is (in all its ugliness):

print '\n'.join("Fizz"*(i%3==0)+"Buzz"*(i%5==0) or str(i) for i in range(1,101))

Python is pretty astounding. This little bugger was easy to code, it is somewhat readable, and it is even understandable. Pretty cool, eh?

Short explanation:
  • '\n'.join(...)
    Every item generated in the join function will be on its own line
  • for i in range(1,101)
    Loop from 1 to 101, giving i the value of each
  • "Fizz"*(i%3==0)+"Buzz"*(i%5==0) or str(i)
    If i is divisible by 3 then keep "Fizz", do the same for "Buzz" with 5. Add the kept values together. Or it will simply output i as a string if it fails to get "Fizz" or "Buzz"