Skip to content

Commit

Permalink
Add aliased_index contextmanager for reindexing
Browse files Browse the repository at this point in the history
This is the final piece of the automatic reindexing puzzle - with this
we should be able to basically-atomically reindex data via a cronjob.
This is accomplished by indexing a new version of the data into an index
named like votizen_verifier_20151010010203 and then once the import
finishes, moving the index alias `votizen_verifier` to point to that
index.

This commit adds a contextmanager which handles the creation and
movement of this alias. If there is an unhandled exception during the
upgrade, the index alias remains unmoved.

Also, this does not clean up any old indexes. That much should be
manual, IMHO, at least until we are pretty confident in this process.

Change-Id: I31b1465b8d1a908a54c20e055f5876a0de432dad
Reviewed-on: https://code.brigade.com/6761
Tested-by: Leeroy Jenkins <[email protected]>
Reviewed-by: Shane da Silva <[email protected]>
  • Loading branch information
tdooner committed Oct 22, 2015
1 parent 93009d9 commit 6f44ad7
Showing 1 changed file with 15 additions and 18 deletions.
33 changes: 15 additions & 18 deletions index.py
Original file line number Diff line number Diff line change
Expand Up @@ -113,27 +113,24 @@ def index_records(index_name, voters):


if __name__ == '__main__':
tmp_index = os.environ.get('VERIFIER_NEW_INDEX_NAME', INDEX + "_" + time.strftime("%Y%m%d%H%M%S"))
with aliased_index(es_client, INDEX) as index:
sys.stderr.write("Loading data into index {0}...\n".format(index))
sys.stderr.write("Set VERIFIER_NEW_INDEX_NAME=[...] to override default index name.\n")

ensure_mapping_exists(tmp_index, es_client, force_delete=True)
voters = []
for row in sys.stdin:
row = row.decode("utf-8-sig").split("\t")

sys.stderr.write("Loading data into index {0}...\n".format(tmp_index))
sys.stderr.write("Set VERIFIER_NEW_INDEX_NAME=[...] to override default index name.\n")

voters = []
for row in sys.stdin:
row = row.decode("utf-8-sig").split("\t")

if row[0] == 'voterbase_id':
sys.stderr.write("Found header row.\n")
headers = row
header_map = {header: i for i, header in enumerate(headers)}
continue
if row[0] == 'voterbase_id':
sys.stderr.write("Found header row.\n")
headers = row
header_map = {header: i for i, header in enumerate(headers)}
continue

***REMOVED***

if len(voters) >= 1000000:
index_records(tmp_index, voters)
voters = []
if len(voters) >= 1000000:
index_records(index, voters)
voters = []

index_records(tmp_index, voters)
index_records(index, voters)

0 comments on commit 6f44ad7

Please sign in to comment.