Showing posts with label python. Show all posts
Showing posts with label python. Show all posts

Saturday, January 31, 2015

Hash Table (Map) Built In Python

Lately, I've been studying up on my data structures and algorithms. I decided it would be fun to implement a HashMap (the map abstract data type). I decided to rely on Python's idiomatic __hash__ function, and use linear probing as the collision resolution mechanism. In the process of building this, I learned a few things and I would like to share them.

Deletion is the Hardest Part


Overall the data structure was not very difficult to build. Well, except for deletion. It's not as simple as shifting all elements back (overwriting the deleted). Instead, I had to flag the item as being "deleted". This wasn't that difficult, however, maintaining quick-access sizing on the data structure was complected by it.

In the end, I had to maintain two states: one that programmers who called __len__ would see and the actual count of stored elements (being actual plus deleted).

Linear Probing is Not as Hard


My "find index" method, boiled down to just
    def __locate_index(self, key):
        i = hash(key) % self.capacity()

        while self.__values[i] and self.__values[i][0] != key:
            i = (i + 1) % self.capacity()

        return i
Where each value is just a tuple where elements 0 and 1 are the key and its value, respectively.

Everything Else is Just Bookkeeping


In the end, the code wasn't too difficult. In the end, the class had a lot more state than I would've liked, however, "getting it right" wasn't that difficult. As always, I hope I'm not missing any edge cases with my tests.

As usual, the code is on github and I'm open to any suggestions or thoughts.

Wednesday, March 6, 2013

Communication between Python and Ruby Using Sockets

Sockets are a neat and (relatively) simple means of communication between different programs in a client-server relationship. They can also be used when it is necessary to communicate data between languages and speed is not a large issue.

The following code examples are a Python lyric server that I built and a Ruby client. Their full code is available on my github account.

simple_lyric_server.py (some implementation details hidden, see github for more):

import socket

 class LyricServer(object):
    def __init__(self):
        # Details hidden... See github

    def try_get_lyrics(self):
       
# Details hidden... See github

    def define_socket(self, port, n_requests=5):
        """
            Used to define the socket object from inputs
        """
        host = socket.gethostname()
        print "Lyric Thief Server: Hostname is %s" % str(host)
        self._socket.bind((host, port))
        self._n_requests = n_requests

    def handle_lyric_requests(self):
        self._socket.listen(self._n_requests)

        while True:
            client, address = self._socket.accept()
            print "Lyric Thief Server: Connected to %s" % str(address)

            self.lyrics = ''
            data = json.loads(client.recv(1024))
            try:
                self.artist, self.song = data['artist'], data['song']
            except KeyError:
                client.send("Invalid JSON passed")
                print "Lyric Thief Server: Invalid JSON Passed"
                client.close()
                return

            self.try_get_lyrics()

            # Try sending the lyrics line-by line

            for line in self.lyrics.split('\n'):
                client.send(line+'\n')

            print "Lyric Thief Server: Sent Lyrics, Closing Connection"
            client.close()

 

if __name__ == '__main__':
    server = LyricServer()
    server.define_socket(915, 5)
    server.handle_lyric_requests() 



So, quickly walking through what the methods do:
  • define_socket(self, port, n_requests=5) uses the properties attached to a socket object to construct the server's socket, where port is the port that the server will run on and n_requests specifies the maximum number of requests that the server can handle at once.
  • handle_lyric_request(self) tries to lookup (online) the lyrics of the song/artist pair JSON data that is passed in from a client and then send back an appropriate response.

Now looking at ruby_client.rb: 
require 'socket'

host = Socket.gethostname
port = 915

s = TCPSocket.new host, port
s.puts('{ "artist" : "Counting Crows", "song" : "Omaha"}')

lyrics = ""

begin
  while line = s.gets
    lyrics += line

  end
rescue
end 
s.close 

So, this Ruby client is much simpler. Essentially all it's doing is (1) building a new TCPSocket using the host machine and the same port as specified in the Python server, (2) writing a JSON object to the port (a request for the "Counting Crows" song "Omaha"), then it (3) reads in the lyrics and closes the connection.

Saturday, February 9, 2013

Deploying Aspen Projects with Heroku

Aspen is a very cool and intuitive web framework and server for building Python-powered sites. It is one of those genuine, non-MVC, "what you see is what you get" kinda things.

Heroku is a free (for basic usage) web hosting site.

For my school's math club, we needed something simple and free to post a fund-raising app and I decided that putting up an Aspen site on Heroku would be a good idea. Now that the site's up and I have ironed out those gory details, I am very happy with the results. Here are some of those details:

  • To use Heroku and Python in conjunction you need python-virtualenv. So make sure you get it online or run a pip install virtualenv, or a apt-get install python-virtualenv.
  • After you create a site using Heroku, you will have to link up git and login to Heroku (this is described on Heroku's site). Next be sure to start up your Python virtualenv
  • When you're in the virtualenv, run pip install aspen.
  • Next, run pip freeze > requirements.txt, which will tell Heroku what Python dependencies are needed server-side.
  • Alright, this is the tricky part. You now need to create a file called ProcFile in your Heroku project's root directory. This file's contents should be:

         web: aspen -p . -w www -a :$PORT
    That line tells Heroku to make the ProcFile's directory the project's root directory and to run Aspen over whatever port Heroku chooses. You now need to make a directory called "www" which will contain all your code files. Of course, you can call this directory whatever you want, you just need to change the contents of the ProcFile (the above line) to match.
    Given a successful linking of git and Heroku, and that you are logged into Heroku, you can run git push heroku master which sends your Aspen pages to the web, and foreman start which lets you test your app locally.
     

    Friday, February 1, 2013

    Latest Project: Ultimate Dev. Machine Script

    The latest thing I have been working on is pretty cool. It's simply a Python script called udm.py that you toss onto a virtual machine (Debian based) and run using:

    sudo ./udm.py

    Then, it works its magic and installs compilers, interpreters, IDEs, editors, documentation generators, source control software, and much more! So far, it has been successfully tested on Linux Mint 14.1. Here's a brief view of it:


    As it stands, it is very configurable. You can add and remove packages, and even include special installation instructions all with plug-and-play-like ease. Packages are organized by topic, in simple, struct-like objects I call containers. In the udm_script/desired_packages.py script, entire containers can be added or removed from the installation. In the containers, specific packages can be added or removed. If you know of any interesting development packages that I should add, let me know!

    All code is available on my github account.

    Thursday, January 17, 2013

    Latest Project: Email Miner

    Just to add to my collection of "data miners" built in Python, I made a (very bare-bones) email miner. Basically all it does is strip emails out of a given URL and save them in some output file. Here's what it looks like (again, forgive my strange windowing system):


    So, as you can see it's pretty straightforward. I used a little bit of polymorphism-type classes to make the output formats. Basically, all that needs to happen to add more is the definition of a derived class, the implementation of a writer method and then the new class's name needs to be added to a dictionary of other writer class-names and it's good to go! I love this Python thing...

    All code is on my github.

    Friday, January 4, 2013

    Heap (Max) in Python

    Lately I've been practicing my data structures and algorithms and I came up with this interesting implementation for a max heap:

    class Heap(object):
        """ A max heap, parents have two or less equal or smaller children """
        def __init__(self):
            """ Set the count to 0 and it has no root node """
            self.count = 0
            self.root = None
    
        def heapify(self, iterable):
            """ loops elements in the iterable and adds them to the heap """
            for elem in iterable:
                self.add_item(elem)
    
        def add_item(self, ordinal):
            """ Adds an ordinal item using priority """
            self.count += 1
            if self.count == 1:
                self.root = Node(ordinal)
                return
    
            i = self.count
            current_node = self.root
            new_node = Node(ordinal)
    
            while i != 1:
                if current_node.data < new_node.data:
                    current_node.data, new_node.data = new_node.data, current_node.data
                if i%2 == 0:
                    if current_node.left:
                        current_node = current_node.left
                    else:
                        current_node.left = new_node
                else:
                    if current_node.right:
                        current_node = current_node.right
                    else:
                        current_node.right = new_node
                i /= 2
    
    class Node(object):
        """ Used to traverse the heap """
        def __init__(self, ordinal):
            """ Require data, make the pointers empty (to None) """
            self.data = ordinal
            self.left = None
            self.right = None 
    


    This design of a heap is pretty interesting because it uses the binary representation of Heap.count (the number of elements in the heap) to determine where the next node should go. The algorithm for this is pretty straight forward:

    Increment the `count` by 1         // We're adding a new item 
    `i` is `count`
    `current node` is `root node`
    
    Check 1: if `new node's value` > `current node's value`,
                 then swap(`new node's value`, `current node's value`)
    
    Check 2: if binary(`i`) ends in 0, // AKA is even, (i%2 == 0),
                 then `current node` is `current node's left`
             otherwise `current node` is `current node's right`
    
    Check 3: if `current node` is null
                 then `current node` is `new node`
             otherwise `i` /= 2 and goto Check 1
    

    The end result is a balanced heap! I also added a __str__ method to the Node class for testing:

        def __str__(self):
            return "[%s <- %s -> %s]" % \
                     (str(self.left) if self.left != None \
                                     else 'END', str(self.data), \
                      str(self.right) if self.right != None \
                                     else 'END')
    

    Thursday, January 3, 2013

    One Line Fizz Buzz Test in Python

    Here it is (in all its ugliness):

    print '\n'.join("Fizz"*(i%3==0)+"Buzz"*(i%5==0) or str(i) for i in range(1,101))

    Python is pretty astounding. This little bugger was easy to code, it is somewhat readable, and it is even understandable. Pretty cool, eh?

    Short explanation:
    • '\n'.join(...)
      Every item generated in the join function will be on its own line
    • for i in range(1,101)
      Loop from 1 to 101, giving i the value of each
    • "Fizz"*(i%3==0)+"Buzz"*(i%5==0) or str(i)
      If i is divisible by 3 then keep "Fizz", do the same for "Buzz" with 5. Add the kept values together. Or it will simply output i as a string if it fails to get "Fizz" or "Buzz"

    Tuesday, December 18, 2012

    Latest Project: Lyric Miner

        Towards the end of this past semester, I decided to build a "data miner" in Python, that would contain tools for gathering information from the web. Though I don't have a set direction for where I would like it to go, I did build the first tool: a lyric miner called "LyricThief." My only goal for LT was to simplify the process of Googling lyrics and all that that entails. So, here's what it looks like (please forgive my Linux desktop's matrix-esque windowing):



        As you can see, all it needs is the artist and the song title then viola! It will search a list of sources and bring up the lyrics in this screen:


        From here you can read, edit or save lyrics to a text file.

        As it stands right now, adding sources (or removing them) is easy. All they require is a method to generate what the URL looks like for the site (most lyric sites have a predictable URL format) and a method to parse the URL (I usually do this in 2-4 lines of obfuscated Python code, I know, it's bad). Then, once the source is complete, they are added to the builder and it takes care of the rest.

        Although I haven't tested it in Windows, it should work if Python 2.7 and  PyGTK are installed. In Linux, however, these things are included with most modern distros by default.

    All code is on my github.