Monday, March 31, 2014

Changes 3/31/14

Ruby:

  • Worked more on classes
  • Learned about @@var class-wide variable, which stays constant across all instances
  • Saw that there was a global variable $var, but couldn't figure out how to use it
  • Learned about attr_accessor, attr_reader, and self.var
  • Learned about custom setter methods with def var=(arg), which allows you to modify a value before setting it.  In java this was always annoying to deal with.

Friday, March 28, 2014

Changes 3/28/14

Ruby:

  • Wrote a program to explore block functionality in ruby
  • It is a bit more complicated than I anticipated and I might split the second nested block into its own variable
  • class BlockStorage
      def initialize(&block)
        @block = block
      end
    
      def execBlock(string)
        if @block.respond_to? "call"
          @block.call(string)
        end
      end
    end
    
    bs = BlockStorage.new() do |str|
      arr = str.split(" ").reverse
      arr2 = arr.map do |e|
        puts "#{e}, #{arr.first}, #{arr.last}"
        if e == arr.first
          e.gsub(/\W$/, '').capitalize
        elsif e == arr.last
          e.downcase
        else
          e
        end
      end
      arr2.join(" ")
    end
    
    puts bs.execBlock("Block coding is pretty darn cool.")
    # Outputs  'Cool darn pretty is coding block'
    puts bs.execBlock("It really streamlines the process.") 
    # Outputs 'Process the streamlines really it'
    
    

Thursday, March 27, 2014

Changes 3/27/14

Ruby:

  • I just realized that variables defined in if statements aren't out of scope from the rest of the code block.  I don't know why I thought they were.
  • Learned about begin, rescue, and ensure for error handling
  • I'm still not sure why you do "Exception => e" after rescue in order to get the exception object, I will look into this tomorrow
  • Learned about undef to remove a function
  • Learned about differences between Integer("123abc") and "123abc".to_i.
  • The first throws an exception, and the second returns 123
  • Learned that calling 'method()' is equivalent to calling 'method' unless method=somevalue is state.
  • For example:
    def method() 
      "Abcdefg" 
    end 
    puts method # outputs Abcdefg 
    puts method() # outputs Abcdefg 
     
    method = "Other string" 
    puts method # outputs Other string
    puts method() # outputs Abcdefg
    
    

Wednesday, March 26, 2014

Changes 3/26/14

Ruby:

  • Learned more ruby
  • Learned about ["string", "string2"].each {|a| print a} and other interesting ruby features

Monday, March 24, 2014

Changes 3/24/14

NSF:

Changes 3/22/14 - 3/23/14

NSF:

  • Added support for patent grants, successfully tested them against both 2007 and 2013 grants.
  • Added secondary parse option if <?RELAPP> is not present in patent grants
  • Added filename support, but it is quite sloppy and basic
  • Added -maxnsf tag to break after a certain number of NSF documents have been found

Friday, March 21, 2014

Changes 3/21/14

NSF:

  • Moved cmd_args dict into patutil so it can be accessed everywhere
  • Added -dump flag to dump split xml into files
  • Added inc_ prefix to csvs that were parsed with -max [num] flag
  • Added basic support for patent grants

Thursday, March 20, 2014

Changes 3/20/14 Part 2

NSF:

  • Command line arguments are now saved to a dict rather than all being passed to main.
  • Added more flags, such as no_nsf to parse all regardless of govt interest, and -single to only look at one file
  • Changed -r to -max. 
  • Various bugfixes

Changes 3/20/14 Part 1

NSF:

  • Added support for commandline arguments.  Arguments are currently 
  • [filename] - set filename to parse (ommitting will parse from webpage)
  • [-g/-a] - sets mode to either grant or application
  • [-r [number]] - sets max number of splits (for debugging)

Wednesday, March 19, 2014

Changes 3/19/14

NSF:

  • Cleaned up console output more, removed several debug print statements.  
  • Added patparser.Tags().getAppHeadings() method to format csv tag headings a bit better.
  • Removed some scrape tags which didn't match up with what NSF is looking for.
  • Emailed Michelle with questions about CSV formatting
  • Added github link on students.gctaa.net

Tuesday, March 18, 2014

Changes 3/18/14

NSF:

  • Ran scrape on 2013 application, began enhancing compatibility
  • Fixed bug: I didn't take into account that <us-patent-application> may have attributes, leading string find not to recognize it.
  • Fixed bug: split_xml would split at the end tags, but it is supposed to only look at the data within the start and end tags.  For example, it was looking at
    <random_tag>
    <us-patent-application>
    <data-we-want>
    </us-patent-application>
    , when the <random-tag> should have been ignored.
  • Added dump_xml() method to patutil, which takes an xml document string and a filename, and writes the data to a file with the filename.  This is useful for debugging in specific patent applications, and also led me to find the above two bugs.

Monday, March 17, 2014

Changes 3/17/14

NSF:

  • Changed ignore url functionality so that it looks at written csvs rather than downloaded zip files

Math Drill:

  • Fixed logic errors
  • Updated accountcreation.py to have more functionality
  • Should now be fully operational

Thursday, March 13, 2014

Changes 3/13/14

NSF:

  • Continued checking tags for correctness
  • Added ipa_ prefix to all patent app variables in the Tags class
  • Added iterative returning of all the variables with ipa_ prefix in the Tags class using self.__dict__.  This is a very cool python feature that I don't think has a Java equivalent

Wednesday, March 12, 2014

Changes 3/12/14

NSF:

  • Added Tags class to contain tags being scraped
  • Streamlined process of replacing tags as DTD standard changes
  • Checked over tags for Patent-Application.  Now the only one I am not sure about is Related Patent documents

Tuesday, March 11, 2014

Changes 3/11/14

NSF:

  • Fixed applicants tag not scraping correctly
  • Added a patutil.py to contain utility methods
  • Removed distinction in code between tags that had a tree structure (such as 'tag1>tag2>tag3') and ones that did not (such as 'tag1').  Now the same logic applies to both

Monday, March 10, 2014

Changes 3/10/14

NSF:

  • Added support for the special tags within iteration
  • Added placeholder value for commas that are within the text
  • Fixed a weird bug where regex would not replace a line return if it was preceded by \r (\r\n would not be replaced but \n would).
  • Began investigating the large number of missing values in the result (it looks like some of Michelle's values may be from a different DTD standard or something)
  • Pushed to github

Sunday, March 9, 2014

Changes 3/9/14

NSF:

  • Added ability to scrape text between tags without using lxml.  I basically reworked the govt interest code I had already written, but made it take arguments for the tags, instead of being hardcoded to look for federal research statements
  • Tomorrow I will look into all the null values I am getting (I'm assuming the tags aren't correct)

Friday, March 7, 2014

Changes 3/7/14

NSF:

  • Added header to csv that prints out all the tags
  • Switched write codec to unicode for file, as there were some unicode characters causing errors (such as the 1/2 character)
  • Looked at the ruby parser and converted some of the xpath tags to my format
  • Michelle is scraping different for different fields than what was originally in the google doc, so I will have to ask her about that
  • I'm not sure what the proper way to handle a comma within the scrape is.  If I write it to the file, it messes up the csv format

Wednesday, March 5, 2014

Changes 3/5/14

NSF:

  • Realized that NSF only wants the first and last names of inventors, which makes formatting easier
  • Worked on formatting Inventors
  • Worked on fixing various bugs related to me converting some things into different classes
  • Created an updated list of the tags for 2013 (I'm missing some though)