Thursday, December 12, 2013

Changes 12/12/13

NSF:

  • Improved the python file to parse the actual html of the database search page
  • Running the program (user input in italics and my comments in bold):
  • Insert your search terms: National Science Foundation
    Insert your tag: GOVT 
    After entering the search term and tag, the program outputs a useable url
    http://appft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-adv.html&r=0&p=1&f=S&l=50&Query=GOVT%2F%22National+Science+Foundation%22&d=PG01 
    It then parses the HTML from any <a> element in the <table> that holds all the data into a list 
    20130333037
    METHODS, SYSTEMS, AND MEDIA FOR DETECTING COVERT MALWARE\n
    20130332859
    METHOD AND USER INTERFACE FOR CREATING AN ANIMATED COMMUNICATION\n
    ...
    20130314948
    Multi-Phase Grid Interface\n
    20130314717
    METHODS AND APPARATUS FOR LASER SCANNING STRUCTURED ILLUMINATION\n     MICROSCOPY AND TOMOGRAPHY\n
     
  • The program makes use of python's html.parser library
  • So far, it just outputs the name and number of each item in the dataset as a separate list item. Although the program doesn't do much now, it demonstrates successfully parsing and displaying data gathered from the html page 
  • I'm not sure why it has odd formatting such as line breaks and white space, but I guess that is just part of the page parsing from the database.

No comments:

Post a Comment