Programming Stuff: March 2014

Monday, March 31, 2014

Changes 3/31/14

Ruby:

Worked more on classes
Learned about @@var class-wide variable, which stays constant across all instances
Saw that there was a global variable $var, but couldn't figure out how to use it
Learned about attr_accessor, attr_reader, and self.var
Learned about custom setter methods with def var=(arg), which allows you to modify a value before setting it. In java this was always annoying to deal with.

Friday, March 28, 2014

Changes 3/28/14

Ruby:

Wrote a program to explore block functionality in ruby
It is a bit more complicated than I anticipated and I might split the second nested block into its own variable

class BlockStorage
  def initialize(&block)
    @block = block
  end

  def execBlock(string)
    if @block.respond_to? "call"
      @block.call(string)
    end
  end
end

bs = BlockStorage.new() do |str|
  arr = str.split(" ").reverse
  arr2 = arr.map do |e|
    puts "#{e}, #{arr.first}, #{arr.last}"
    if e == arr.first
      e.gsub(/\W$/, '').capitalize
    elsif e == arr.last
      e.downcase
    else
      e
    end
  end
  arr2.join(" ")
end

puts bs.execBlock("Block coding is pretty darn cool.")

# Outputs  'Cool darn pretty is coding block'

puts bs.execBlock("It really streamlines the process.")

# Outputs 'Process the streamlines really it'

Thursday, March 27, 2014

Changes 3/27/14

Ruby:

I just realized that variables defined in if statements aren't out of scope from the rest of the code block. I don't know why I thought they were.
Learned about begin, rescue, and ensure for error handling
I'm still not sure why you do "Exception => e" after rescue in order to get the exception object, I will look into this tomorrow
Learned about undef to remove a function
Learned about differences between Integer("123abc") and "123abc".to_i.
The first throws an exception, and the second returns 123
Learned that calling 'method()' is equivalent to calling 'method' unless method=somevalue is state.

For example:

def method() 
  "Abcdefg" 
end 
puts method # outputs Abcdefg 
puts method() # outputs Abcdefg 
 
method = "Other string" 
puts method # outputs Other string
puts method() # outputs Abcdefg

Wednesday, March 26, 2014

Changes 3/26/14

Ruby:

Learned more ruby
Learned about ["string", "string2"].each {|a| print a} and other interesting ruby features

Monday, March 24, 2014

Changes 3/24/14

NSF:

Began learning ruby, used http://repl.it/, https://www.ruby-lang.org/en/about/, and https://www.ruby-lang.org/en/documentation/quickstart/

Changes 3/22/14 - 3/23/14

NSF:

Added support for patent grants, successfully tested them against both 2007 and 2013 grants.
Added secondary parse option if <?RELAPP> is not present in patent grants
Added filename support, but it is quite sloppy and basic
Added -maxnsf tag to break after a certain number of NSF documents have been found

Friday, March 21, 2014

Changes 3/21/14

NSF:

Moved cmd_args dict into patutil so it can be accessed everywhere
Added -dump flag to dump split xml into files
Added inc_ prefix to csvs that were parsed with -max [num] flag
Added basic support for patent grants

Thursday, March 20, 2014

Changes 3/20/14 Part 2

NSF:

Command line arguments are now saved to a dict rather than all being passed to main.
Added more flags, such as no_nsf to parse all regardless of govt interest, and -single to only look at one file
Changed -r to -max.
Various bugfixes

Changes 3/20/14 Part 1

NSF:

Added support for commandline arguments. Arguments are currently
[filename] - set filename to parse (ommitting will parse from webpage)
[-g/-a] - sets mode to either grant or application
[-r [number]] - sets max number of splits (for debugging)

Wednesday, March 19, 2014

Changes 3/19/14

NSF:

Cleaned up console output more, removed several debug print statements.
Added patparser.Tags().getAppHeadings() method to format csv tag headings a bit better.
Removed some scrape tags which didn't match up with what NSF is looking for.
Emailed Michelle with questions about CSV formatting
Added github link on students.gctaa.net

Tuesday, March 18, 2014

Changes 3/18/14

NSF:

Ran scrape on 2013 application, began enhancing compatibility
Fixed bug: I didn't take into account that <us-patent-application> may have attributes, leading string find not to recognize it.
Fixed bug: split_xml would split at the end tags, but it is supposed to only look at the data within the start and end tags. For example, it was looking at
<random_tag>
<us-patent-application>
<data-we-want>
</us-patent-application>, when the <random-tag> should have been ignored.
Added dump_xml() method to patutil, which takes an xml document string and a filename, and writes the data to a file with the filename. This is useful for debugging in specific patent applications, and also led me to find the above two bugs.

Monday, March 17, 2014

Changes 3/17/14

NSF:

Changed ignore url functionality so that it looks at written csvs rather than downloaded zip files

Math Drill:

Fixed logic errors
Updated accountcreation.py to have more functionality
Should now be fully operational

Thursday, March 13, 2014

Changes 3/13/14

NSF:

Continued checking tags for correctness
Added ipa_ prefix to all patent app variables in the Tags class
Added iterative returning of all the variables with ipa_ prefix in the Tags class using self.__dict__. This is a very cool python feature that I don't think has a Java equivalent

Wednesday, March 12, 2014

Changes 3/12/14

NSF:

Added Tags class to contain tags being scraped
Streamlined process of replacing tags as DTD standard changes
Checked over tags for Patent-Application. Now the only one I am not sure about is Related Patent documents

Tuesday, March 11, 2014

Changes 3/11/14

NSF:

Fixed applicants tag not scraping correctly
Added a patutil.py to contain utility methods
Removed distinction in code between tags that had a tree structure (such as 'tag1>tag2>tag3') and ones that did not (such as 'tag1'). Now the same logic applies to both

Monday, March 10, 2014

Changes 3/10/14

NSF:

Added support for the special tags within iteration
Added placeholder value for commas that are within the text
Fixed a weird bug where regex would not replace a line return if it was preceded by \r (\r\n would not be replaced but \n would).
Began investigating the large number of missing values in the result (it looks like some of Michelle's values may be from a different DTD standard or something)
Pushed to github

Sunday, March 9, 2014

Changes 3/9/14

NSF:

Added ability to scrape text between tags without using lxml. I basically reworked the govt interest code I had already written, but made it take arguments for the tags, instead of being hardcoded to look for federal research statements
Tomorrow I will look into all the null values I am getting (I'm assuming the tags aren't correct)

Friday, March 7, 2014

Changes 3/7/14

NSF:

Added header to csv that prints out all the tags
Switched write codec to unicode for file, as there were some unicode characters causing errors (such as the 1/2 character)
Looked at the ruby parser and converted some of the xpath tags to my format
Michelle is scraping different for different fields than what was originally in the google doc, so I will have to ask her about that
I'm not sure what the proper way to handle a comma within the scrape is. If I write it to the file, it messes up the csv format

Wednesday, March 5, 2014

Changes 3/5/14

NSF:

Realized that NSF only wants the first and last names of inventors, which makes formatting easier
Worked on formatting Inventors
Worked on fixing various bugs related to me converting some things into different classes
Created an updated list of the tags for 2013 (I'm missing some though)