-
Notifications
You must be signed in to change notification settings - Fork 0
add new algorithm for systematic sampling #34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| # frozen_string_literal: true | ||
| module Split | ||
| module Algorithms | ||
| module SystematicSampling | ||
| def self.choose_alternative(experiment) | ||
| count = experiment.next_cohorting_block_index | ||
|
|
||
| block_length = experiment.cohorting_block_magnitude * experiment.alternatives.length | ||
| block_num, index = count.divmod block_length | ||
|
|
||
| r = Random.new(block_num + experiment.cohorting_block_seed) | ||
| block = (experiment.alternatives*experiment.cohorting_block_magnitude).shuffle(random: r) | ||
|
Rowan441 marked this conversation as resolved.
|
||
| block[index] | ||
|
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What we do here to create any block with equal # of each alternative: take the list of alternatives:
repeat it
shuffle it (seeded by
|
||
| end | ||
| end | ||
| end | ||
| end | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -9,6 +9,8 @@ class Experiment | |
| attr_accessor :alternative_probabilities | ||
| attr_accessor :metadata | ||
| attr_accessor :friendly_name | ||
| attr_accessor :cohorting_block_seed | ||
| attr_accessor :cohorting_block_magnitude | ||
|
|
||
| attr_reader :alternatives | ||
| attr_reader :resettable | ||
|
|
@@ -17,6 +19,7 @@ class Experiment | |
| DEFAULT_OPTIONS = { | ||
| :resettable => true, | ||
| :retain_user_alternatives_after_reset => false, | ||
| :cohorting_block_magnitude => 1 | ||
|
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. instead of block length, I chose to use e.g I did this because since there are two (or more) alternatives some block lengths would be invalid (e.g 5) since the blocks have to be balanced between alternatives |
||
| } | ||
|
|
||
| def initialize(name, options = {}) | ||
|
|
@@ -43,6 +46,11 @@ def set_alternatives_and_options(options) | |
| self.metadata = options_with_defaults[:metadata] | ||
| self.friendly_name = options_with_defaults[:friendly_name] || @name | ||
| self.retain_user_alternatives_after_reset = options_with_defaults[:retain_user_alternatives_after_reset] | ||
|
|
||
| if self.algorithm == Split::Algorithms::SystematicSampling | ||
| self.cohorting_block_seed = options_with_defaults[:cohorting_block_seed] || self.name.to_i(36) | ||
|
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Seed is used so any instance of Rails can generate the blocks and they are all the same Default seed if none is specified in YAML is based on the name of the experiment: I think something like: Digest::MD5.hexdigest('exp893_solo').to_i(16)
=> 214402640715795185602270768195163573728would be better, but I wasn't sure if importing Digest module would be overkill There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just want to point out,
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That is perfect for our use case, thanks! |
||
| self.cohorting_block_magnitude = options_with_defaults[:cohorting_block_magnitude] | ||
| end | ||
| end | ||
|
|
||
| def extract_alternatives_from_options(options) | ||
|
|
@@ -64,6 +72,8 @@ def extract_alternatives_from_options(options) | |
| options[:algorithm] = exp_config[:algorithm] | ||
| options[:friendly_name] = exp_config[:friendly_name] | ||
| options[:retain_user_alternatives_after_reset] = exp_config[:retain_user_alternatives_after_reset] | ||
| options[:cohorting_block_seed] = exp_config[:cohorting_block_seed] | ||
| options[:cohorting_block_magnitude] = exp_config[:cohorting_block_magnitude] | ||
| end | ||
| end | ||
|
|
||
|
|
@@ -232,6 +242,14 @@ def friendly_name_key | |
| "#{name}:friendly_name" | ||
| end | ||
|
|
||
| def cohorting_block_seed_key | ||
| "#{name}:cohorting_block_seed" | ||
| end | ||
|
|
||
| def cohorting_block_magnitude_key | ||
| "#{name}:cohorting_block_magnitude" | ||
| end | ||
|
|
||
| def resettable? | ||
| resettable | ||
| end | ||
|
|
@@ -266,6 +284,8 @@ def load_from_redis | |
|
|
||
| options = { | ||
| retain_user_alternatives_after_reset: exp_config['retain_user_alternatives_after_reset'], | ||
| cohorting_block_seed: load_cohorting_block_seed_from_redis, | ||
| cohorting_block_magnitude: load_cohorting_block_magnitude_from_redis, | ||
| resettable: exp_config['resettable'], | ||
| algorithm: exp_config['algorithm'], | ||
| friendly_name: load_friendly_name_from_redis, | ||
|
|
@@ -423,6 +443,10 @@ def enable_cohorting | |
| redis.hset(experiment_config_key, :cohorting, false) | ||
| end | ||
|
|
||
| def next_cohorting_block_index | ||
| Split.redis.incr("#{name}:cohorting_block_index") - 1 | ||
| end | ||
|
|
||
| protected | ||
|
|
||
| def experiment_config_key | ||
|
|
@@ -446,6 +470,14 @@ def load_friendly_name_from_redis | |
| redis.get(friendly_name_key) | ||
| end | ||
|
|
||
| def load_cohorting_block_seed_from_redis | ||
| redis.get(cohorting_block_seed_key).to_i | ||
| end | ||
|
|
||
| def load_cohorting_block_magnitude_from_redis | ||
| redis.get(cohorting_block_magnitude_key).to_i | ||
| end | ||
|
|
||
| def load_alternatives_from_configuration | ||
| alts = Split.configuration.experiment_for(@name)[:alternatives] | ||
| raise ArgumentError, "Experiment configuration is missing :alternatives array" unless alts | ||
|
|
@@ -492,6 +524,8 @@ def persist_experiment_configuration | |
| goals_collection.save | ||
| redis.set(metadata_key, @metadata.to_json) unless @metadata.nil? | ||
| redis.set(friendly_name_key, self.friendly_name) | ||
| redis.set(cohorting_block_seed_key, self.cohorting_block_seed) | ||
| redis.set(cohorting_block_magnitude_key, self.cohorting_block_magnitude) | ||
| end | ||
|
|
||
| def remove_experiment_configuration | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,77 @@ | ||
| # frozen_string_literal: true | ||
| require "spec_helper" | ||
|
|
||
| describe Split::Algorithms::SystematicSampling do | ||
| let(:experiment) do | ||
| Split::Experiment.new( | ||
| 'link_color', | ||
| :alternatives => ['red', 'blue', 'green'], | ||
| :algorithm => Split::Algorithms::SystematicSampling, | ||
| :cohorting_block_magnitude => 2 | ||
| ) | ||
| end | ||
|
|
||
| it "should return an alternative" do | ||
| expect(Split::Algorithms::SystematicSampling.choose_alternative(experiment).class).to eq(Split::Alternative) | ||
| end | ||
|
|
||
| context "experiments with a random seed" do | ||
| it "cohorts the first block of users equally into each alternative" do | ||
| results = {'red' => 0, 'blue' => 0, 'green' => 0} | ||
| 6.times do | ||
| results[Split::Algorithms::SystematicSampling.choose_alternative(experiment).name] += 1 | ||
| end | ||
|
|
||
| expect(experiment.cohorting_block_magnitude * experiment.alternatives.length).to eq(6) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are you trying to make sure the |
||
| expect(results).to eq({'red' => 2, 'blue' => 2, 'green' => 2}) | ||
| end | ||
|
|
||
| it "cohorts the second block of users equally into each alternative" do | ||
| 6.times do | ||
| Split::Algorithms::SystematicSampling.choose_alternative(experiment).name | ||
| end | ||
|
|
||
| results = {'red' => 0, 'blue' => 0, 'green' => 0} | ||
| 6.times do | ||
| results[Split::Algorithms::SystematicSampling.choose_alternative(experiment).name] += 1 | ||
| end | ||
|
|
||
| expect(experiment.cohorting_block_magnitude * experiment.alternatives.length).to eq(6) | ||
| expect(results).to eq({'red' => 2, 'blue' => 2, 'green' => 2}) | ||
| end | ||
| end | ||
|
|
||
| context "experiments with set seed" do | ||
| let(:seeded_experiment1) do | ||
| Split::Experiment.new( | ||
| 'link_color', | ||
| :alternatives => ['red', 'blue', 'green'], | ||
| :algorithm => Split::Algorithms::SystematicSampling, | ||
| :cohorting_block_seed => 1234 | ||
| ) | ||
| end | ||
|
|
||
| let(:seeded_experiment2) do | ||
| Split::Experiment.new('link_highlight', | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. new line for exp_name would be easier to compare to the one above. |
||
| :alternatives => ['red', 'blue', 'green'], | ||
| :algorithm => Split::Algorithms::SystematicSampling, | ||
| :cohorting_block_seed => 1234) | ||
| end | ||
|
|
||
| it "cohorts users in a set order" do | ||
| results1 = [] | ||
| results2 = [] | ||
|
|
||
| 12.times do | ||
| results1 << Split::Algorithms::SystematicSampling.choose_alternative(seeded_experiment1).name | ||
| end | ||
|
|
||
| 12.times do | ||
| results2 << Split::Algorithms::SystematicSampling.choose_alternative(seeded_experiment2).name | ||
| end | ||
|
|
||
| expect(seeded_experiment1.cohorting_block_seed).to eq(seeded_experiment2.cohorting_block_seed) | ||
| expect(results1).to eq(results2) | ||
| end | ||
| end | ||
| end | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the Redis call to increment a counter that keeps track of the number of users that have been run through this algorithm so far.
It will return an integer from Redis that will be the next value of the counter, because this value is supplied by Redis there shouldn't be any race conditions where multiple users have the same
countvariableThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
next_cohorting_block_indexis more like a cohort count? Theindexleads me to a misunderstanding, it's the index number in a block, which shouldn't be larger than the size of the block.