Programming Stuff: April 2014

Wednesday, April 30, 2014

Changes 4/30/14

Ruby:

DRYed out code by removing repeated "app", "grant", and "both" strings to check range type with constants FileRange.app_key, FileRange.grant_key, and FileRange.both_key.
Started working on better arguments parser error reporting, as these error messages could be helpful to the end user.

Tuesday, April 29, 2014

Changes 4/29/14

Ruby:

Ran tests, fixed a nil error which was kind of confusing:
In ruby, adding ! to some methods causes the function to apply itself to the object. For example, the statements:
arr = ["a"]
arr.map! {|e| e.upcase}
would cause arr to equal ["A"].
I had a method returning the statement:
fileargs.compact!
Compact removes all nil values from an array, and the ! causes it to apply to fileargs. This normally worked fine, but since it was returning as well as applying the operatoin, some strange things happened. If fileargs was [], then for some reason the method would return nil, whereas normally it would return a compacted array. I fixed this by removing the !, since it is not necessary (and probably not correct).
I have no idea why the method would only return nil in a very specific case, though. It probably has something to do with the differences between .compact and .compact!
I also submitted a pull request for my code. Functionally, it's pretty cool now. You can pass something like grant 12 to parse all grants in the year 2012. You can even parse something like app 12-14 to parse all applications between January 1st 2012 and December 31st 2014

Javascript:

Had a good discussion about proper data structure implementation in javascript applications, as one could store data in the html classes, a separate dataset or both.
The best way is to have an underlying datastructure from which the html displayed to the user is generated.

Changes 4/25/14-4/28/14

Ruby:

Restructured arguments parsing to be more object oriented
Previously, all the parsing code had simply been in the module and run when the module was run.

Friday, April 25, 2014

Changes 4/25/14

Ruby:

Began moving command line parsing into a more object oriented setup, but got stuck on expanding single dates into date ranges (ie if processFile both 14 is passed, it should get all the patent applications and grants from the year 2014).
Even though I wrote the code, I was having trouble figuring out the best way to restructure it into classes.

Tuesday, April 22, 2014

Changes 4/22/14

Ruby:

I had worked over break on adding file range functionality. I continued doing this today after getting feedback from Kevin. I added support for additional date formats and split the parsing functionality for file arguments into a separate object
I had to commit code that didn't work past a certain point because I ran out of time
Tomorrow I will work on separating the patent type (grant/app) from the range by having it in a separate keyword.

Friday, April 11, 2014

Changes 4/11

Ruby/Github:

Merged a branch and dealt with conflicts, this resource was a big help.
Pushed new code which would remove any newlines in the scraped data when writing to the csv
I then removed all the newline substitutions that were no longer necessary.
I also started looking at our regular expression for parsing NSF award numbers from the government interest field. I came up with http://rubular.com/r/0f4YNgZFP1, or:
(?:(?:(?:\b[a-z]{3})|\b)(?:\d-?){6,8})\b
The ?: tell regex not to add the arguments in () to a capture group, which was the question I asked Alan, but he didn't know. If you look on the rubular link, you can see why it is the way it is.

Wednesday, April 9, 2014

Changes 4/9/14

Ruby:

Finished working on the download parameter and submitted a pull request.
I did a lot of testing to make sure the parameters worked correctly, which took a long time due to the size of the files and the time it took to download
Started trying to remove the various gsub (regex substitution) methods and adding one catch-all gsub before writing to the CSV
However, the way in which it was coded uses the (pseudo?)method << (which usually is equivalent to a .append or .push or something of the sort) to add things to the file, which for some reason assumes the thing being added with << is an enumerable. Therefore, the current method does not work because things must be a string to call a gsub on them, but cannot be a string to be added to the csv.

Tuesday, April 8, 2014

Changes 4/8/14

Github:

Learned about modifying/deleting already existing commits

Ruby/Regular Expressions:

Began working on adding a preference for either reedtech or google's servers
I wanted to do it as elegantly as possible, so I took a long time and didn't finish
Basically, the download flag functions as it did, but now one can do download-reed(tech) or download-goog(le).
This will then be passed to the url generator.

Code:

#should_download = actions.empty? || (actions.include? "download") # old line of code

# new lines of code:

should_download = false
server_preference = "goog" # Default to google
unless actions.empty?
  actions.each do |action| # Actions is an array of launch options, ex: ["download-goog", "unzip", "extract"]
    should_download   = !(action.match /download/).nil? # is there a better way to evaluate false to nil and true to not nil?
    
    if should_download # break once the download option has been found

      server_preference = action.match(/goog|reed/).to_s # Sets server preference to "goog" or "reed" dependent on dl suffix

      break

end

  end
end

Monday, April 7, 2014

Changes 4/7/14

Ruby:

Mainly worked on better understanding the processFile.rb code and regular expressions
Learned about the MatchData object in ruby
In [code 1], the matches variable is a MatchData object. It stores the full matched string as well as the individual values of the things within the parentheses of the regex, which are called backtraces according to the ruby docs
[Code 2] illustrates how this works. The parentheses match all text within the xml tags, which is stored in md[1]. The full string matched is stored in md[2]
Learned about the m option of regex, which makes . (all characters) match new lines (\n) as well
For example, /.*/m would match all characters, including newlines.

Code:

processxref1 = '<?cross-reference-to-related-applications ... ?>'

processxref2 = '<?cross-reference-to-related-applications ... ?>'
matches = app.to_s.match(/#{Regexp.escape(processxref1)}(.*)#{Regexp.escape(processxref2)}/m)

if matches
        Nokogiri::XML.fragment(matches[1].strip).xpath("./p/text()").to_s.gsub(/\n/, "")

text = "<abc>text</abc>"

md = text.match %r{<abc>(.*)</abc>}m

puts md[0] #=> <abc>text</abc>

puts md[1] #=> text

Friday, April 4, 2014

Changes 4/4/14

Ruby:

Kevin emailed me explanations of how classbuilder objects would function in ruby.
I looked over the processFile.rb code. It kept giving me an error about not finding a .xml after unzipping
Eventually I realized it was because I was passing it the command line arguments "download" and "extract", but extract referred to extracting the NSF data, while I meant to pass "unzip" to unzip the file
Switched the processFile.rb download server from reedtech to google and pushed it to my repo. However, I accidentally did two commits (one which changed the server and another where I removed test code I had left in). I don't think there is an easy way to merge them into one.

Thursday, April 3, 2014

Changes 4/3/14

Ruby:

Added more functionality to the block variables I was working with yesterday. The block I ended up with [See Code 1] worked well and had proper error handling, and was tested with a test driven development in mind. For example,

block.call nil #=> error: ""
block.call "a" #=> error: "a"
puts block.call 2 #=> 4
block.call %w[cat dog banana] #=> [error: "cat", error: "dog", error: "banana"]
block.call (0..10) #=> [0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

The block could also be passed as a variable, ie

numbers = (0..5) # a range of numbers from 0 to 5
block_flex_better.call(numbers) #=> [0, 1, 4, 9, 16, 25]
numbers.map &block_flex_better #=> [0, 1, 4, 9, 16, 25]

The two statements are equivalent. In the first, the enumerable is traversed within the block. In the second, the elements in the enumerable are traversed by the map function and each element in the enumerable is passed to the block for operating on.
The & prefix before block_flex_better in the 2nd statement indicates that it is a block.
If you are unfamiliar with ruby, the %w[cat dog banana] is shorthand for ["cat", "dog", "banana"]
However, Kevin recommended that as these blocks grow more complex, they be split into their own classes. He also said that it is bad practice in ruby to have a boolean flag as an argument, because a programmer then needs to go into the code to see what the flag does.

Code:

block_flex_better = lambda do |arg, join=false| # arguments passed to the block
  final_numbers = nil

  # Takes some variable and attempts to square it
  operation = lambda do |num|
    if num.respond_to? "**" # Exponent notation in ruby
      num**2
    else
      "error: \"#{num.to_s}\"" # If num cannot have exponents applied 
    end
  end

  if arg.respond_to? "map" # If the argument is some sort of enumerable
    final_numbers = arg.map &operation
  else # If the argument is a non-enumerable (such as a fixnum)
    final_numbers = operation.call(arg)
  end
  join ? final_numbers.join(", ") : final_numbers # For convenience, a join can be applied automatically 
                                                  # by setting the join boolean to true.
end

Wednesday, April 2, 2014

Changes 4/2/14

Ruby:

Worked with Enumerable functions such as map and select
Worked with storing blocks in variables
Created blocks that could operate on a variety of arguments, as well as differentiate between Enumerables and single items

Tuesday, April 1, 2014

Changes 4/1/14

NSF:

I had emailed Kevin yesterday with some code I had written to teach myself block functionality and classes in ruby
Today I looked at his response and went through his suggestions for my code. He had made a lot of changes to my blocks.rb file, so I looked over the changes and then changed my code to match (from memory of what he had recommended, not just by copying). I think that helped me to understand what I could have done better.
Tomorrow I will look into more higher order functions for arrays, such as map (which I have dealt with already), each, select, and others.