Programming Stuff: Changes 12/12/13

Thursday, December 12, 2013

Changes 12/12/13

NSF:

Improved the python file to parse the actual html of the database search page
Running the program (user input in italics and my comments in bold):

Insert your search terms: National Science Foundation
Insert your tag: GOVT

After entering the search term and tag, the program outputs a useable url
http://appft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-adv.html&r=0&p=1&f=S&l=50&Query=GOVT%2F%22National+Science+Foundation%22&d=PG01

It then parses the HTML from any <a> element in the <table> that holds all the data into a list

20130333037
METHODS, SYSTEMS, AND MEDIA FOR DETECTING COVERT MALWARE\n
20130332859
METHOD AND USER INTERFACE FOR CREATING AN ANIMATED COMMUNICATION\n

...

20130314948
Multi-Phase Grid Interface\n
20130314717
METHODS AND APPARATUS FOR LASER SCANNING STRUCTURED ILLUMINATION\n     MICROSCOPY AND TOMOGRAPHY\n

The program makes use of python's html.parser library
So far, it just outputs the name and number of each item in the dataset as a separate list item. Although the program doesn't do much now, it demonstrates successfully parsing and displaying data gathered from the html page
I'm not sure why it has odd formatting such as line breaks and white space, but I guess that is just part of the page parsing from the database.

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)