[FRIAM] Sorting Algorithm? AI? Identifying "types" within data

Frank Wimberly wimberly3 at gmail.com
Sat Jan 7 08:22:57 EST 2023


This answer seems reasonable to me.  I worked on Project Talent during 1967
which had some similar goals and data.  See

https://en.m.wikipedia.org/wiki/Project_Talent

Our data was for thousands of highschool students and our software was all
written in Fortran.

---
Frank C. Wimberly
140 Calle Ojo Feliz,
Santa Fe, NM 87505

505 670-9918
Santa Fe, NM

On Fri, Jan 6, 2023, 11:32 PM Pieter Steenekamp <pieters at randcontrols.co.za>
wrote:

> I asked https://chat.openai.com/chat and here is the conversation:
>
> *Pieter Steenekamp*
> can you suggest a solution for the following problem "I'm hoping someone
> here could help out. Let's imagine I had some data where each row was a
> person's career. We could list major events every year.For example: 2004
> they were highered, 2007 they get a promotion, 2010 they leave for a
> different company, 2012 they come back at a higher level, 2015 get a
> promotion, then no change until 2022.  Let's say I had data like this for
> roughly 2 million people, and that there are around 10 different types of
> changes that could happen during any time period (could be yearly,
> quarterly, monthly, I can make it how I want).  I was hoping we could ask a
> computer to tell us if there were "types of careers" that people had. We
> could say "put all these careers into 4 buckets" or "7 buckets" based on
> similarity. Then we could look at the piles the computer made and try to
> make sense of them. One type might be "company man" for people who tend to
> stay in place for 20 or more years, another type could be a "rotator", who
> leaves and returns every 3 years or so. Etc. The point is, I want a
> computer to make the piles for me, rather than trying to come up with
> potential piles a priori. Are there methods for doing this? I know it's a
> problem we've *talked* about a lot, but I don't know if there are
> solutions. "
> *https://chat.openai.com/chat <https://chat.openai.com/chat>*
> There are several approaches that you could use to group the careers of
> these 2 million people into clusters based on similarity. One approach is
> to use unsupervised machine learning techniques, specifically clustering
> algorithms, to automatically group the data into clusters. Some popular
> clustering algorithms that you could use for this problem include k-means,
> hierarchical clustering, and DBSCAN (Density-Based Spatial Clustering of
> Applications with Noise). You could also use dimensionality reduction
> techniques, such as PCA (Principal Component Analysis) or t-SNE
> (t-Distributed Stochastic Neighbor Embedding), to reduce the dimensionality
> of the data and then use a clustering algorithm to group the data into
> clusters. Another approach is to use supervised machine learning
> techniques, where you would need to manually label a subset of the data
> with the cluster labels, and then use this labeled data to train a
> classifier to predict the cluster labels for the remaining data. It's worth
> noting that these approaches will require some feature engineering, where
> you will need to extract relevant features from the data that can be used
> to represent the careers of the people in the dataset. Once you have
> identified and extracted these features, you can then use them as input to
> the machine learning algorithms to group the data into clusters.
> *Pieter Steenekamp*
> would it then be possible to get  for example if there were "types of
> careers" that people had?
> *https://chat.openai.com/chat <https://chat.openai.com/chat>*
> Yes, once you have grouped the careers of the 2 million people into
> clusters using one of the approaches I mentioned above, you can then
> analyze the characteristics of each cluster to identify common patterns and
> trends within each cluster. This will allow you to identify "types of
> careers" that people have, based on the similarity of their career events.
> For example, if one of the clusters is characterized by people who have had
> long tenures at a single company, you could label this cluster as "company
> man" careers. Similarly, if another cluster is characterized by people who
> frequently switch between companies, you could label this cluster as
> "rotator" careers. You can then use this information to gain insights into
> the types of careers that are represented in the dataset, and potentially
> draw conclusions about the factors that may influence career paths and
> success.
> *Pieter Steenekamp*
> can you propose python code to do sometingh like this please
> *https://chat.openai.com/chat <https://chat.openai.com/chat>*
> Sure, here is an example of how you could use the scikit-learn library in
> Python to group the careers of the 2 million people into clusters using the
> k-means algorithm:
> Copy code
> from sklearn.cluster import KMeans
> # Load the data into a NumPy array
> data = np.loadtxt('career_data.txt')
> # Create the k-means model
> model = KMeans(n_clusters=4)
> # Fit the model to the data
> model.fit(data)
> # Predict the cluster labels for each data point
> labels = model.predict(data)
> # Print the cluster labels
> print(labels)
> This code will group the data into 4 clusters, as specified by the
> n_clusters parameter. You can adjust this parameter to change the number of
> clusters that the data is grouped into.
> To extract features from the data that can be used as input to the k-means
> model, you will need to preprocess the data and select relevant features.
> This will likely involve some feature engineering, which will depend on the
> specific characteristics of the data and the goals of your analysis.
> I hope this helps! Let me know if you have any questions.
>
> On Fri, 6 Jan 2023 at 19:34, Eric Charles <eric.phillip.charles at gmail.com>
> wrote:
>
>> Greetings all,
>> I'm hoping someone here could help out. Let's imagine I had some data
>> where each row was a person's career. We could list major events every
>> year.
>>
>> For example: 2004 they were highered, 2007 they get a promotion, 2010
>> they leave for a different company, 2012 they come back at a higher level,
>> 2015 get a promotion, then no change until 2022.
>>
>> Let's say I had data like this for roughly 2 million people, and that
>> there are around 10 different types of changes that could happen during any
>> time period (could be yearly, quarterly, monthly, I can make it how I
>> want).
>>
>> I was hoping we could ask a computer to tell us if there were "types of
>> careers" that people had. We could say "put all these careers into 4
>> buckets" or "7 buckets" based on similarity. Then we could look at the
>> piles the computer made and try to make sense of them.
>>
>> One type might be "company man" for people who tend to stay in place for
>> 20 or more years, another type could be a "rotator", who leaves and returns
>> every 3 years or so. Etc. The point is, I want a computer to make the piles
>> for me, rather than trying to come up with potential piles a priori.
>>
>> Are there methods for doing this? I know it's a problem we've *talked*
>> about a lot, but I don't know if there are solutions.
>>
>> Any help would be appreciated.
>>
>> Best,
>> Eric
>>
>> <echarles at american.edu>
>> -. --- - / ...- .- .-.. .. -.. / -- --- .-. ... . / -.-. --- -.. .
>> FRIAM Applied Complexity Group listserv
>> Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom
>> https://bit.ly/virtualfriam
>> to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
>> FRIAM-COMIC http://friam-comic.blogspot.com/
>> archives:  5/2017 thru present
>> https://redfish.com/pipermail/friam_redfish.com/
>>   1/2003 thru 6/2021  http://friam.383.s1.nabble.com/
>>
> -. --- - / ...- .- .-.. .. -.. / -- --- .-. ... . / -.-. --- -.. .
> FRIAM Applied Complexity Group listserv
> Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom
> https://bit.ly/virtualfriam
> to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
> FRIAM-COMIC http://friam-comic.blogspot.com/
> archives:  5/2017 thru present
> https://redfish.com/pipermail/friam_redfish.com/
>   1/2003 thru 6/2021  http://friam.383.s1.nabble.com/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://redfish.com/pipermail/friam_redfish.com/attachments/20230107/ffe35450/attachment.html>


More information about the Friam mailing list