Changes 2/27/14
NSF:
- Verified that reading .breakpoint behaves correctly
- Realized that I was probably getting low download speeds due to being on Wi-Fi rather than Ethernet
- Added a check for <patent-application-publication>, the program will now force stop if this is not found anywhere
- This is a temporary behavior which will allow me to pinpoint the date at which it is changed.
- Added a sort to the list of urls. It now sorts from oldest to newest, whereas before the order was based on the page layout. It does this by using regex to remove all non-numbers, then sorting the strings alphabetically (since 10 comes before 010 and so on)
- I still haven't gotten anything logged to the csv file though.
- I changed the NSF check slightly: when checking for the string 'nsf', it now does not remove spaces, so that if a word happens to have nsf in it, such as 'transfer', a false positive is not triggered. When checking 'nationalsciencefoundation', it still ignores spaces.
- Added output for when the government interest statement is found, but it never outputs, so it is possible that I somehow messed up the tag. I will look into this tomorrow.
No comments:
Post a Comment