Spaces:
				
			
			
	
			
			
		Build error
		
	
	
	
			
			
	
	
	
	
		
		
		Build error
		
	Commit History
Merging back dataset statistics
		e8ac901
	
		
		
	
		meg-huggingface
		
	commited on
		
		
Be gone, you merge conflicting filegit rm data_measurements/dataset_statistics.py
		2981bb2
	
		
		
	
		meg-huggingface
		
	commited on
		
		
Update from rollback
		f9936fb
	
		
		
	
		meg-huggingface
		
	commited on
		
		
Adding dependencies for images
		deefca3
	
		
		
	
		meg-huggingface
		
	commited on
		
		
Change to npmi display ordering
		5546565
	
		
		
	
		meg-huggingface
		
	commited on
		
		
Loading per-widget.  Various changes to streamlit interactions for efficiency.
		d3c28ec
	
		
		
	
		meg-huggingface
		
	commited on
		
		
One more flag passing needed for setting live deployment
		e122a90
	
		
		
	
		meg-huggingface
		
	commited on
		
		
Adds flag for live deployment so that things will not be all recalculated when live.
		7c5239c
	
		
		
	
		meg-huggingface
		
	commited on
		
		
Standardizing filenaming a bit.
		0803ab3
	
		
		
	
		meg-huggingface
		
	commited on
		
		
More modularizing; npmi and labels
		a2ae370
	
		
		
	
		meg-huggingface
		
	commited on
		
		
Some additional modularizing and caching of the text lengths widget
		335424f
	
		
		
	
		meg-huggingface
		
	commited on
		
		
Modularization and caching of text length widget
		85cf91c
	
		
		
	
		meg-huggingface
		
	commited on
		
		
Removes extraneous debugging print statements
		6a9c993
	
		
		
	
		meg-huggingface
		
	commited on
		
		
Begins modularizing so that each widget can be independently loaded without having a requirement on the ordering of load_or_preparing in app.py.  This means that each function corresponding to a widget will check if the variables it depends on have been calculated yet.  If not, it will call back to calculate them.  Because of the messiness this causes with passing the use_cache variable around, I've now set use_cache as a global variable, set when the DatasetStatisticsCacheClass is initialized, and removed the use_cache arguments appearing in nearly every function.
		4b53042
	
		
		
	
		meg-huggingface
		
	commited on
		
		
Removing need to keep around base dset for the header widget; now just saving what is shown -- the first n lines of the base dataset -- as a json, and loading if it's cached.
		66693d5
	
		
		
	
		meg-huggingface
		
	commited on
		
		
Removing any need for a dataframe in expander_general_stats; instead making sure to cache and load the small amount of details needed for this widget.  Note I also moved around a couple functions -- same content, just moved -- so that it was easier for me to navigate through the code.  I also pulled out a couple of sub-functions from larger functions, again to make the code easier to work with/understand, as well as helping to further modularize so we can limit what needs to be cached.
		e1f2cc3
	
		
		
	
		meg-huggingface
		
	commited on
		
		
Splitting prepare_dataset into preparing the base dataset, and the tokenized dataset. This will help us to have further control over caching and loading data, eventually removing the storage of base dataset.
		6af9ef6
	
		
		
	
		meg-huggingface
		
	commited on
		
		
Continuing cache minimizing in new repository. Please see https://github.com/huggingface/DataMeasurements for full history
		d8ab532
	
		
		
	
		meg-huggingface
		
	commited on
		
		

