The solution is designed to be object-oriented, and it composes three parts: Pattern, Analyzer and Predictor.
A Pattern handles all business logic related with first_name and last_name in this case.
Its responsibility is solely designed to fulfill the purpose of looking for/predictoring patterns based on first_name and last_name.
In our case, there are four cases, which are first_name_dot_last_name, first_name_dot_last_initial, first_initial_dot_last_initial, first_initial_dot_last_name.
A Analyzer should be created with a raw dataset and a pattern,
and based on instructions defined in patterns, analyzer generate dataset which can be used for further prediction process.
A Predictor should be created with a dataset and a pattern, and its ‘#formulate’ method will take a name and company as input,
and based on exisiting pattern which is processed and stored in dataset, it will predict email addresses for the given name and company.
The philosiphy behind this can be interpreted in this way:
Pattern can easily be replaced with other patterns or a pattern can easily to modifed to accept more business logic based on other attributes.
Analyzer should only care about its given pattern and its given raw data, and generate a dataset.
Predictor should only caret about its given pattern and its given dataset.
Analyzer and Predictor should not know each other.
require'pry'require'pp'classPatternattr_reader:first_name,:last_namedefsplit(name)@first_name,@last_name=name.split(/[.|\s{1}]/)enddeffind[first_part,last_part].join('_dot_').to_symenddefpredict(pattern)%w(first_initial first_name last_initial last_name).inject([])do|memo,params|memo<<send(params)ifpattern.to_s=~/#{params}/memoend.join('.')endprivatedeffirst_initial@first_name[0]enddeflast_initial@last_name[0]enddeffirst_part@first_name.length==1?'first_initial':'first_name'enddeflast_part@last_name.length==1?'last_initial':'last_name'endendclassAnalyzerattr_reader:dataset,:patterndefinitialize(attributes)@dataset={}@pattern=attributes[:pattern]enddefprocess(raw_data)raw_data.eachdo|_,email_address|name,email=email_address.split('@')@pattern.split(name)pattern=@pattern.find@dataset[email]||=[]@dataset[email]<<patternunless@dataset[email].include?(pattern)endendendclassPredictorattr_reader:pattern,:datasetdefinitialize(attributes)@dataset=attributes[:dataset]@pattern=attributes[:pattern]enddefformulate(attributes)name=attributes[:name].downcaseemail=attributes[:email]return"There is no matching in dataset for #{attributes[:name]} working for #{email}"if@dataset[email].nil?@dataset[email].mapdo|pattern|@pattern.split(name)@pattern.predict(pattern)+"@#{email}"endendend# Instructions to run:raw={"John Ferguson"=>"john.ferguson@alphasights.com","Damon Aw"=>"damon.aw@alphasights.com","Linda Li"=>"linda.li@alphasights.com","Larry Page"=>"larry.p@google.com","Sergey Brin"=>"s.brin@google.com","Steve Jobs"=>"s.j@apple.com"}# Create a analyzer object and give it a patternanalyzer=Analyzer.new(pattern:Pattern.new)# Analyze the given datasetanalyzer.process(raw)# Create a predictor object and give it a dataset and a patternpredictor=Predictor.new(dataset:analyzer.dataset,pattern:Pattern.new)# Formulate potential patterns for given name and emailpppredictor.formulate(name:'Criag Silverstein',email:'google.com')pppredictor.formulate(name:'Peter Wong',email:'alphasights.com')pppredictor.formulate(name:'Steve Wozniak',email:'apple.com')pppredictor.formulate(name:'Barack Obama',email:'whitehouse.gov')
# Instruction to run test:# rspec -fd email_predictor_spec.rbrequire'./email_predictor'describePatterndolet(:pattern){Pattern.new}it'should respond to first name'doexpect(pattern).torespond_to(:first_name)endit'should respond to last name'doexpect(pattern).torespond_to(:last_name)enddescribe'#split'doit'should split name into first_name and last_name if name is separted by white space'dopattern.split('John Ferguson')expect(pattern.first_name).toeql('John')expect(pattern.last_name).toeql('Ferguson')endit'should split name into first_name and last_name if name is separted by .'dopattern.split('steve.jobs')expect(pattern.first_name).toeql('steve')expect(pattern.last_name).toeql('jobs')endenddescribe'#find'doit'should generate pattern if first and last are full'dopattern.split('john.ferguson')expect(pattern.find).toeql(:first_name_dot_last_name)endit'should generate pattern if first is initial and last is full'dopattern.split('j.ferguson')expect(pattern.find).toeql(:first_initial_dot_last_name)endit'should generate pattern if first is full and last are initial'dopattern.split('john.f')expect(pattern.find).toeql(:first_name_dot_last_initial)endit'should generate pattern if both are initial'dopattern.split('j.f')expect(pattern.find).toeql(:first_initial_dot_last_initial)endenddescribe'#predict'doit'should be able to predict for first_name_dot_last_name'dopattern.split('john ferguson')expect(pattern.predict(:first_name_dot_last_name)).toeql('john.ferguson')endit'should be able to predict for first_initial_dot_last_name'dopattern.split('john ferguson')expect(pattern.predict(:first_initial_dot_last_name)).toeql('j.ferguson')endit'should be able to predict for first_name_dot_last_initial'dopattern.split('john ferguson')expect(pattern.predict(:first_name_dot_last_initial)).toeql('john.f')endit'should be able to predict for first_initial_dot_last_initial'dopattern.split('john ferguson')expect(pattern.predict(:first_initial_dot_last_initial)).toeql('j.f')endendenddescribeAnalyzerdolet(:pattern){Pattern.new}let(:analyzer){Analyzer.newpattern:pattern}it'should respond to dataset'doexpect(analyzer).torespond_to(:dataset)endit'should use hash as dataset'doexpect(analyzer.dataset).tobe_a(Hash)endit'should respond to raw data'doexpect(analyzer).torespond_to(:pattern)enddescribe'#process'dolet(:raw){{"John Ferguson"=>"john.ferguson@alphasights.com"}}it'should use pattern to split name and find pattern'dopattern.should_receive(:split)pattern.should_receive(:find).and_return(:first_name_dot_last_name)analyzer.process(raw)endit'should be able to process data based on given rule'doanalyzer.process(raw)expect(analyzer.dataset.size).toeql(1)expect(analyzer.dataset).toeql('alphasights.com'=>[:first_name_dot_last_name])endlet(:multiple)do{"John Ferguson"=>"john.ferguson@alphasights.com","Damon Aw"=>"damon.aw@alphasights.com","Linda Li"=>"linda.li@alphasights.com","Larry Page"=>"larry.p@google.com"}endit'should be albe to process data for multiple companies'doanalyzer.process(multiple)expect(analyzer.dataset.size).toeql(2)expect(analyzer.dataset).toeql('alphasights.com'=>[:first_name_dot_last_name],'google.com'=>[:first_name_dot_last_initial])endendenddescribePredictordolet(:pattern){Pattern.new}let(:predictor){Predictor.newpattern:pattern}it'should respond to dataset'doexpect(predictor).torespond_to(:dataset)endit'should respond to pattern'doexpect(predictor).torespond_to(:pattern)enddescribe'#formulate'doit'should not formulate a email address if company is not given in the dataest'dopredictor=Predictor.newpattern:pattern,dataset:{}attributes={name:'Barack Obama',email:'whitehouse.gov'}response=predictor.formulate(attributes)expect(response).toeql("There is no matching in dataset for #{attributes[:name]} working for #{attributes[:email]}")endit'should use pattern to split name and predict email'dopredictor=Predictor.newpattern:pattern,dataset:{'alphasights.com'=>[:first_name_dot_last_name]}attributes={name:'Peter Wong',email:'alphasights.com'}pattern.should_receive(:split).and_return(true)pattern.should_receive(:predict).and_return('peter.wong')predictor.formulate(attributes)endit'should predict email address based if one pattern exist'dopredictor=Predictor.newpattern:pattern,dataset:{'alphasights.com'=>[:first_name_dot_last_name]}attributes={name:'Peter Wong',email:'alphasights.com'}response=predictor.formulate(attributes)expect(response).toeql(['peter.wong@alphasights.com'])endit'should predict email address based if multiple patterns exist'dopredictor=Predictor.newpattern:pattern,dataset:{'google.com'=>[:first_name_dot_last_initial,:first_initial_dot_last_name]}attributes={name:'Criag Silverstein',email:'google.com'}response=predictor.formulate(attributes)expect(response).tomatch_array(['c.silverstein@google.com','criag.s@google.com'])endendend