Thursday, December 19, 2013

Changes 12/19/13

NSF:

  • (Incomplete) List of fields and their xml tags:
  • Application Number:
    <application-number> 
    <doc-number>10044899</doc-number>
  • Cross Reference to Related Application:
    <cross-reference-to-related-applications>
     ...
  • Inventors:
    <inventors>
    <first-named-inventor>
    <inventor>
    ... 
  • Abstract:
    <subdoc-abstract>
  • Title:
    <title-of-invention>
  • Document Date:
    <document-date>20021003</document-date>  

Tuesday, December 17, 2013

Changes 12/17/13

NSF:

  • Installed lxml (html and xml parser) for use with BeautifulSoup
  • Read google doc on the assignment
  • XML and dealing with large amounts of data aren't something I am super familiar with so there is definitely a learning curve.
  • I understand the basics of xml (as it is not that different from html), but the way the nsf patents xml file is structured is hard to understand.
  • The tags containing data are formatted as <BXXX> and iterate up from <B100>.  I need to figure out which values correspond to the specific data fields we are looking for.
  • What I have found so far:
    • <B540> is the title
  • The main problem I am having is that I have no idea what some of the text fields are supposed to be, for example:
    <B511><PDAT>E21B 2510</PDAT></B511>
  • This line may not be important to what we are trying to find, but the problem is that I don't know how to tell which fields are which

Monday, December 16, 2013

Changes 12/16/13

NSF:

  • I forgot my laptop today, so I experimented with BeautifulSoup's syntax on another computer.
  • The code I've written takes a downloaded version of a database search for "GOVT/"NATIONAL SCIENCE FOUNDATION"" and loads it into BeautifulSoup.  It then iterates through the <tbody> element with the statement for child in soup.table.tbody.children:
  • I still can't quite figure it out, however the main problem I'm having is due to python's duck typing.  I cannot figure out how to properly check if an object is of a certain type.
  • The code in question is: isinstance(child, NavigableString), and the error I get is something like: "Type NavigableString is not valid" or something like that. I could convert the type to a string and check if it equals "bs4.elements.NavigableString", but that doesn't seem like a very elegant solution, and I want to learn how to do things "the right way"

Friday, December 13, 2013

Changes 12/13/13

NSF:

  • Spent the whole day trying to get BeautifulSoup4 to work with python3 but it just won't work
  • It says it will automatically convert but it doesn't even install to the python3 folder, only the python2.7/dist-packages/ folder.  I tried manually copying the bs4 folder to the python3/dist-packages/ folder and running 2to3, but it still doesn't work.  It just comes up with syntax errors.
  • Well I got it to work after over an hour.  The python 2to3 docs words the functionality in a way that makes it seem like it will recursively iterate through directories, but apparently it doesn't.  I fixed the problem by manually running it for each directory in the bs4 folder.  
  • I think by recursive it meant recursive for the files in the specific directory it was run in, not recursive in the directory tree.

Thursday, December 12, 2013

Changes 12/12/13

NSF:

  • Improved the python file to parse the actual html of the database search page
  • Running the program (user input in italics and my comments in bold):
  • Insert your search terms: National Science Foundation
    Insert your tag: GOVT 
    After entering the search term and tag, the program outputs a useable url
    http://appft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-adv.html&r=0&p=1&f=S&l=50&Query=GOVT%2F%22National+Science+Foundation%22&d=PG01 
    It then parses the HTML from any <a> element in the <table> that holds all the data into a list 
    20130333037
    METHODS, SYSTEMS, AND MEDIA FOR DETECTING COVERT MALWARE\n
    20130332859
    METHOD AND USER INTERFACE FOR CREATING AN ANIMATED COMMUNICATION\n
    ...
    20130314948
    Multi-Phase Grid Interface\n
    20130314717
    METHODS AND APPARATUS FOR LASER SCANNING STRUCTURED ILLUMINATION\n     MICROSCOPY AND TOMOGRAPHY\n
     
  • The program makes use of python's html.parser library
  • So far, it just outputs the name and number of each item in the dataset as a separate list item. Although the program doesn't do much now, it demonstrates successfully parsing and displaying data gathered from the html page 
  • I'm not sure why it has odd formatting such as line breaks and white space, but I guess that is just part of the page parsing from the database.

Wednesday, December 11, 2013

Changes 12/11/13

Heading:

  • http://appft.uspto.gov/netahtml/PTO/search-adv.html has all the codes for searching specific fields of the applications 
  • Created a basic python program to generate a search url and get the html.
  • import urllib.request
    
    search = input("Insert your search terms: ")
    tag = input("Insert your tag: ")
    urlstring = "http://appft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-adv.html&r=0&p={{page}}&f=S&l=50&Query={{tag}}%2F%22{{query}}%22&d=PG01" 
    # Replace placeholder strings with actual input 
    urlstring = urlstring.replace('{{page}}', str(1)) 
    urlstring = urlstring.replace('{{tag}}', tag)
    urlstring = urlstring.replace('{{query}}', search.strip().replace(' ', '+'))
    print(urlstring)
     
    # Get raw html from the query url 
    response = urllib.request.urlopen(urlstring)
    html = response.read()
    print(html)
  • So entering "National Science Foundation" for search and "GOVT" for tag generates a query identical to the example in the email

Friday, December 6, 2013

Changes 12/6/13

Math Drill:

  • Got all students functionality (adding, removing, and saving) to work with SQL
  • Fixed an error with the add students regex causing spaces to be deleted.  It was '[\W-]+' when it should have been '[\W-]+ '
  • Added getStudents() convenience method for getting students from database. It executes the command SELECT * FROM students and returns all results
  • Added html page for /admin/ with links to admin functions, namely logging out and editing students

Thursday, December 5, 2013

Changes 12/5/13

Math Drill:

  • Worked on migrating the add and remove students functionality over to SQL
  • Adding students works, but removing students does not

Wednesday, December 4, 2013

Changes 12/4/13

Math Drill:

  • Added new tables to database: Students and Questions
  • These will replace the .txt files which held the student names and question/image pairs
  • Disallowed access to /admin/ pages without signing in first

NSF:

  • Explored databases
  • The Export Awards excel spreadsheet looks like it shows various information about patents awarded due to NSF funding.  I assume that these are the reports in which people properly cited NSF as a grant giver, because I understood one of the main problems the NSF is facing is finding patents that weren't correctly filled out with proper NSF credit on them.  All the patents in the spreadsheet, however, have the same Funding Agency and Awarding Agency Code, 4900, which I'm guessing is the NSF's id.
  • If there is some way to view businesses attached to the patents, then compiling earnings based on these patents will be possible.
  • http://patents.reedtech.com/pgrbft.php#2012 looks like it's a repository of xml patent data
  • I downloaded one xml file and looked at it.  It was hard to figure out what it was showing because it had no styling, but it looked like a list of a bunch of different patents all in one file.  I couldn't tell what the number after ipg meant (the file I downloaded was named ipg131203.xml)
  • The StateObligations.xml and InstitutionObligations.xml files look like they are listing the amount of different types of "obligations" by state or institution.  I assume obligations are promises of grant money.

Changes for 12/2/13

Math Drill:

  • Changed path attribute on cookie to allow bottle to find it (before it was only availible on /login/submit/, now it is available throughout the site)
  • Added a list to store pairs of usernames and their subsequent random number for the secret attribute of the cookie
  • Began troubleshooting rare cases of duplicate accounts in the list

Tuesday, December 3, 2013

Changes for 12/3/13

Math Drill:

  • Added checkAuth() method for checking the legitimacy of the cookie
  • When a user is already present in the accounts list and they sign in again, the old list item is removed
  • Added code to show the logged in user's username of the admin/students/ page
  • Added ability for the user to log out

Changes over Thanksgiving Break 11/27/13 - 12/2/13

Math Drill:

  • Added database for storing users and passwords
  • Database stores sha256 salted hashes of passwords
  • Tried some simple SQL injection attacks on the username field.  The python code interfacing with SQL was: ('SELECT password FROM users WHERE id = \"' + username + '\"').fetchone()
  • First attack: a"; INSERT INTO users VALUES ("injection", "ca978112ca1bbdcafac231b39a23dc4da786eff8147c4e72b9807785afee48bb");" was typed into the username field
  • This results in the following line being sent to sqlite, where bold text is the injected SQL:
    SELECT password FROM users WHERE id = "a"; INSERT INTO users VALUES ("injection", "ca978112ca1bbdcafac231b39a23dc4da786eff8147c4e72b9807785afee48bb");""
  • This gave an error: sqlite3.Warning: You can only execute one statement at a time.
  • The second attack doesn't create a new statement, but instead adds SQL logic to the end of the executed statement
  • The SQL code executed looks like: SELECT password FROM users WHERE id = "a" OR ""=""
  • Normally, to sign in the user would have to enter username(testuser), password(password) to login. However, by using the username and password combination username(a" OR ""="), password(password), one can sign in without entering the username. This username string with injected SQL logic is equivalent to typing the username of the first user in the database, in this case the user "testuser".
  • Fixed injection by changing the execute statement to cur.execute('SELECT password FROM users WHERE id = ?', (username, )).fetchone()
    This protects against injection attacks
  • Sanitized input on the "add students" page by removing non alpha-numeric characters entered by the user
  • Started working on a session cookie

Website:

  • Changed CSS a bit:
  • Made use of border-bottom and border-top properties instead of using custom <hr> elements
  • Added hover properties to links in the header

Tuesday, November 26, 2013

Changes 11/25/13

Math Drill:

  • Wrote a login.tpl page with user and password fields
  • Researched hashing algorithms and how to safely store passwords
  • Began creating a database to hold username and password and eventually other stuff

Friday, November 22, 2013

Changes 11/22/13

Math Drill:

  • Fixed 500 Internal Server error: I typed student.trim() rather than student.strip(), and it didn't print out the stack trace for some reason, making it very hard to troubleshoot
  • This was fixed in two ways:
    • Code was added to make it able to run with both python3 run.py and google_appengine/dev_appserver.py projects/math_drill/:
    • dev = False
      # If -l flag is used, run in python3 mode
      if (len(sys.argv) > 1 and (sys.argv[1] == '-l' or sys.argv[1] == '-local')):
          dev = True
          print('RUNNING IN LOCAL MODE (Non Google App Engine)')

      if dev:
          app = bottle
      else:
          app = Bottle()

      curdir = ''
      if dev:
          curdir = os.path.dirname(os.path.abspath(__file__)) + '/'


      ...
      # Other Code

      ...

      if dev:
          bottle.run()
      else:
          bottle.run(app=app, server="gae", debug=True)

    • The second, MUCH easier way of fixing the problem is simply to add debug=True to the bottle.run() statement (making it bottle.run(app=app, server="gae", debug=True)). This causes the stack trace to print on the 500 Internal Server Error page itself (although the stack trace still doesn't print in the console)
  • The add/remove students functionality of the /admin/students/ page finally seems to be working as intended

Thursday, November 21, 2013

Changes 11/21/13 Part 2

Math Drill:

  • Made a .vimrc file to set tab to four spaces, which makes python programming much easier
  • Added comments to postStudents() method explaining what certain code does
  • Saved the <textarea> input to a variable so that it can be parsed and added to students[]
  • Learned that you can write mergedList = list1 + list2, which is good to know for future code 
  • Keep running into weird problems where the server will give a 500 page but not output any python errors.  This is frustrating and it is very hard to tell what is going wrong.

Changes 11/21/13 Part 1

Math Drill:

  • Fixed the list of deletions not checking against the right string
  • Used '<br>'.join(debuglist) to easily print the list of students in html form, this is a pretty useful python function
  • Checking the delete box now removes that student from the students[] list, but not the text file (yet)
  • Got a lot of "internal server error" without a stack trace or anything, so that took a while to fix.  I still don't know what was causing that
  • Added a textarea for entering additional students

Wednesday, November 20, 2013

Changes 11/20/13

Post-Note:

Math-Drill:

  • Added page /admin/students, which dynamically generates a list of all the students names with checkboxes next to them, using embedded code in the template file
  • % for student in students:
        {{student}}: <input type="checkbox" name="{{student}}_delete" value="Delete"><br>
    % end
  • Added method to POST /admin/students/submit
  • Submit button returns a page listing which students are being kept and deleted (although nothing actually changes yet)

Tuesday, November 19, 2013

Changes 11/19/13

Post Note:

  • Fixed the "_blank" attribute on the form, which was causing a new tab to open
  • Updated css to style the page
  • Added bottle.redirect('/') so that the pages refreshes when submit is pressed
  • Updated form to have two text boxes, one for the submitter name and one for the actual text
  • Added checks to make sure the data has been submitted properly without blank fields

Friday, November 15, 2013

Changes 11/15/13

Post-note (Bottle app):

  • Simple bottle app to look at how POST works for bottle and web forms in general
  • Text box allows for text entry, and when submit is clicked, the text from the text box is put in a variable, which is then set to a <h2> element
  • Code was largely copied from Math Drill and adapted
  • Learned about the app.post() method and how it interfaces with the webpage
  • Learned about the python global keyword and how it is used
  • Interestingly, the global  message variable seems to clear itself after awhile; I assume this is due to Google's hosting servers wanting to save resources
  • App was uploaded to Google App Engine (after learning the process with Math Drill it was really easy, too)
  • App is live at www.post-note.appspot.com
  • Styling and other stuff is eventually going to be added
  • The code will probably be cleaned up, since this is serves as a good example for how POST works in bottle

Thursday, November 14, 2013

Changes 11/14/13

Math-Drill:

  • Updated to Version 1-1:
  • Added new images that better show how the app will be used
  • Apparently in app.yaml you need threadsafe: set to something
  • Also I found this resource for building the app.yaml file
  • Interestingly, after updating the app, you have to manually set the new version as default by going to the versions tab

App Engine:

Wednesday, November 13, 2013

Changes 11/13/11

Web Design:

  • Added Math Drill link to homepage banner
  • Added styling for an <hr> element with class=solid_dark that allows for a border only on the top and bottom, but not sides of an element to be added (used in banner and navigation sections so far)
  • Couldn't figure out why the <hr> element has a 1px gap under the <nav> element (any ideas?)

Tuesday, November 12, 2013

Changes 11/12/11

Web Design:

  • Switched <link> tag loading stylesheet to style="@import(general.css);"
  • <link> tag documentation for html5 is very confusing, says if you want to use it in body you should use the itemprop attribute instead of rel, but I couldn't figure out what to set itemprop to. w3 <link> docs
  • Tried to figure out why jQuery code wasn't picking a random color on page load, but couldn't figure out the syntax

Friday, November 8, 2013

Changes 11/8/13

Web Design:

  • Made padding on the header links table look better
  • Changed what the header said
  • Added links to blog and another page to header
  • Added a custom <hr> element to create a border only on the bottom of the header
  • For some reason the page won't validate, and I couldn't figure out why. The error was:
    Line 56, Column 61: Element link is missing required attribute property. <link href="css/header.css" rel="stylesheet" type="text/css">

Thursday, November 7, 2013

Changes 11/7/13

Django:

  • Read some of the Django documentation

Web Design:

  • Set body margin to 0 on page
  • Started working on a header for the page
  • learned about <link> element for linking to css stylesheets, apparently it's better to use than @import url()

Wednesday, November 6, 2013

Changes 11/6/13

The post below will not be updated, please refer to this page instead

Converting a Bottle program to be Google App Engine compatible:

  1. Downloaded Google App Engine Python SDK and used google_appengine/dev_appserver.py math_drill/ rather than python3 run.py to build while testing
  2. Added from bottle import * in the import statements (I think that fixed something)
  3. Added app = Bottle() in order to:
    • Change all of the default @bottle.route() statements to @app.route() because that was causing errors
    • Changed bottle.run() statement to be bottle.run(app=app, server="gae")
  4. Created a file app.yaml:
    application: math-drill
    version: 1
    runtime: python27 #Python3 not supported, so 2.7 was used
    api_version: 1
    threadsafe: no #One tutorial said to do this, but since threads are not used it probably doesn't matter
    
    handlers:
    - url: /static
      static_dir: static
    
    - url: /.*
      script: run.app #Using run.py causes strange debug output to show up on the webpage
    
  5. With that all set up, it was uploaded to GAE using google_appengine/appcfg.py update math_drill/

Resources Used

Monday, November 4, 2013

Changes 11/4/13

Bottle:

  • Did a ton of troubleshooting and fixing and got GAE to accept the app
  • Added code app = Bottle(), and changed bottle.run() to app.run, etc
  • Almost uploaded it but didn't have time

Friday, November 1, 2013

Changes 10/1/13

Bottle:

  • Fixed refresh button, making it use location.reload(true) instead of history.go(0)
  • Uploaded math_drill to students.gctaa.net so it can be added to bottle zoo so proud

SO PROUD

  • Installed google app engine and set up yaml file
  • I think it isn't finding the template directory though because the page loads with no lines of html or anything

Thursday, October 31, 2013

Changes 10/31/13

Bottle:

  • Updated drill.css and experimented with styling
  • Tried to add a button that would refresh the page but it doesn't change the image

Wednesday, October 30, 2013

Changes 10/30/13

Bottle:

  • Updated web app to load students names from students.txt
  • Added option to load css in the base.tpl
  • Added file drill.css to start making the drill page more visually appealing

SQL:

  • Read Manga

Tuesday, October 29, 2013

Changes 10/29/13

Bottle:

  • Updated web app to load local images correctly
  • Loads from /static/img
  • Added iteration to scan /static/img/ for image files, then choose a random one
  • Added crappily drawn images to represent real problems

Monday, October 28, 2013

Changes 10/28/13

Web Design:

  • Looked at jQuery changes on index.html page

Bottle:

  • Updated older blog post to link to git repo
  • Updated run.py:
    • Added template "base.tpl" that is essentially a minimal page
    • Added template "drill.tpl" that contains variables for an image (with src and alt variables) and a text element
    • Added code to populate the drill.tpl file with proper variables
    • Experimented with ways to load a local image on the webpage, but was unsuccessful in getting it to work (although I think I got close)

Friday, October 25, 2013

Changes 10/25/13

Web Design:

  • Moved the background change script to a separate file called main.js
  • Played with the onload attribute of the <body> element, got it so that it would auto run the init() function, which changes the background
  • Updated the button on the main page so that it now called init(), rather than the old function from when the script was embedded in index.html
  • Added the <script> header and <body> onload attribute, making the background change on all pages

Thursday, October 24, 2013

Changes 10/24/13

Bottle:

  • Added extremely basic functionality to run.py, such as static file loading
  • Pushed the changes to the git repo

Wednesday, October 23, 2013

Changes 10/23/13

Web Design:

  • Tried to separate index.html and the color change script, but it seemed too complicated to do with limited javascript knowledge.
  • Changed color change javascript to be in a function and added a button to call it
  • Added an onload attribute to the <body> element to call the function

Bottle:

  • Created math_drill repository

Tuesday, October 22, 2013

Changes 10/22/13

Web Design:
  • Added custom fonts for <body> and <nav> elements

SQL:
  • Discussed how the User Interface for the check it out database should function
  • Downloaded bottle zoo from launchpad.net

Monday, October 21, 2013

Changes 10/21/13

Web Design:

  • Added link to blog at the top of the page, and used relative position to bring it to the very top
  • Moved the <nav> element listing links to the various pages out of the <div> element so that it takes up the whole width of the <body>
  • Used relative position to fix a double outline on the <nav> element
  • Moved some styling elements from general.css (the css sheet loaded on all pages) to specific pages that needed them
  • Added pattern I made in photoshop to serve as the background image
  • Added javascript to alternate main page's background between 10 color pallets

SQL: 

  • Looked at the localhost server and noticed that adding items only worked on the loans table, not people or items

Tuesday, October 15, 2013

Web Design up until 10/15/13

So far this year I:
  1. Learned about all basic html elements
  2. Created a basic webpage using the various elements (<p>, <a>, <h1>, etc)
  3. Researched CSS style syntax and read a bit of the Getting Down with CSS online book
  4. Updated homepage to have more text and images
  5. Added CSS to some of the pages, including the homepage.  This added table formatting, background images, font changes, centered images, etc.
The Website so far: